mirror of
https://github.com/go-micro/go-micro.git
synced 2026-05-06 19:21:46 +02:00
baeb282cf1
* Initial plan * Enhance cache rate limiting to protect against rolling deployment scenarios Per @asim's feedback, rate limiting now applies to ALL cache refresh attempts (not just error cases) to prevent registry overload during rolling deployments. Changes: - Rate limit ALL refresh attempts using lastRefreshAttempt tracking - Always return stale cache if available (even if expired) when rate limiting - Only return ErrNotFound when no cache exists during rate limit period - Rate limiting happens inside singleflight to avoid race conditions - Update test to reflect new behavior (no retry when stale cache + rate limit) - Enhanced documentation with rolling deployment scenario examples This addresses the scenario where all upstream services expire their cache simultaneously during a downstream rolling deployment, which would previously cause a stampede to the registry under high QPS. Co-authored-by: asim <17530+asim@users.noreply.github.com> * Remove unused failedAttempts and consecutiveFailures fields Per @asim's feedback, these fields are no longer needed since the new rate limiting strategy uses lastRefreshAttempt to track ALL refresh attempts, not just failed ones. Removed: - failedAttempts map[string]time.Time - consecutiveFailures int - All code that sets these fields The rate limiting logic now only uses lastRefreshAttempt, making these fields redundant. All tests continue to pass. Co-authored-by: asim <17530+asim@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: asim <17530+asim@users.noreply.github.com>
4.0 KiB
4.0 KiB
Registry Cache
Cache is a library that provides a caching layer for the go-micro registry.
If you're looking for caching in your microservices use the selector.
Features
- Caching: Caches registry lookups with configurable TTL
- Stale Cache Fallback: Returns stale cached data when registry is unavailable
- Singleflight Protection: Deduplicates concurrent requests for the same service
- Adaptive Throttling: Rate limits failed lookups to prevent cache penetration (new in v5)
Interface
// Cache is the registry cache interface
type Cache interface {
// embed the registry interface
registry.Registry
// stop the cache watcher
Stop()
}
Usage
Basic Usage
import (
"github.com/micro/go-micro/registry"
"github.com/micro/go-micro/registry/cache"
)
r := registry.NewRegistry()
cache := cache.New(r)
services, _ := cache.GetService("my.service")
Advanced Configuration
import (
"time"
"github.com/micro/go-micro/registry"
"github.com/micro/go-micro/registry/cache"
)
r := registry.NewRegistry()
// Configure cache with custom options
cache := cache.New(r,
cache.WithTTL(2*time.Minute), // Cache TTL
cache.WithMinimumRetryInterval(10*time.Second), // Throttle failed lookups
)
services, _ := cache.GetService("my.service")
Adaptive Throttling
The cache implements rate limiting on ALL cache refresh attempts (not just errors) to prevent overwhelming the registry. This protects against multiple scenarios:
- Registry failures: When etcd is down/overloaded
- Rolling deployments: When all caches expire simultaneously under high QPS
- Cache expiration storms: When many services expire at once
How It Works
- Rate limiting: Refresh attempts are throttled per-service using
MinimumRetryInterval(default 5s) - Stale cache preference: If stale cache exists (even if expired), return it instead of calling registry
- No cache fallback: If no cache exists, return
ErrNotFoundand rely on gRPC retry - Singleflight deduplication: Concurrent requests are still deduplicated
- Recovery: Throttling is reset on successful registry lookup
Example Scenarios
Scenario 1: Registry Failure with Stale Cache
cache := cache.New(etcdRegistry, cache.WithMinimumRetryInterval(10*time.Second))
// Initial lookup populates cache
services, _ := cache.GetService("api") // → Calls etcd, caches result
// Cache expires after TTL
time.Sleep(2 * time.Minute)
// Etcd fails, but we have stale cache
services, err := cache.GetService("api") // → Returns stale cache WITHOUT calling etcd
// err == nil, services contains stale data
Scenario 2: Rolling Deployment Cache Storm
// Scenario: All 1000 upstream pods watch downstream service
// Downstream does rolling deployment - last pod updated
// All 1000 upstream caches expire simultaneously
// High QPS hits the system at this moment
// First request after cache expiration
services, _ := cache.GetService("downstream") // → Calls etcd, updates lastRefreshAttempt
// Next 999 requests arrive within MinimumRetryInterval
services, _ := cache.GetService("downstream") // → Returns stale cache, NO etcd call
// Rate limiting prevents 999 stampede requests to etcd
Scenario 3: No Cache Available
// First lookup when etcd is down (no cache exists yet)
_, err := cache.GetService("new-service") // → Calls etcd, fails, records attempt time
// err != nil
// Immediate retry (< 10s later, still no cache)
_, err = cache.GetService("new-service") // → Throttled, returns ErrNotFound immediately
// err == ErrNotFound
// After MinimumRetryInterval
time.Sleep(10 * time.Second)
_, err = cache.GetService("new-service") // → Allowed to retry, calls etcd again
This prevents cache penetration scenarios where thousands of concurrent requests hammer a failing or overloaded registry.