Make recovery use fine-grained eviction and remove (most) sync forms#1892
Merged
Conversation
…r/recov-ha-nosync
…tsCallback and LoadObjectsForRecoveryPass2 Improve RecoveryLoadObjectsPass2
…o evict from the snapshot records as well
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request refactors Garnet/Tsavorite recovery to be predominantly asynchronous (moving call chains to RecoverAsync / ValueTask) and updates Tsavorite’s snapshot recovery to support fine-grained, budget-aware eviction with a two-pass “read pages first, load objects later” flow.
Changes:
- Replaced many synchronous recovery entry points (cluster, database manager, AOF, TsavoriteLog, TsavoriteKV) with async counterparts, updating call sites and tests accordingly.
- Implemented/extended recovery-time eviction and deferred object loading for snapshot recovery, including object-log byte copying from snapshot object-log into the main object-log to make pages evictable under memory pressure.
- Added/updated tests to cover async recovery and snapshot recovery + eviction scenarios.
Reviewed changes
Copilot reviewed 53 out of 53 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/standalone/Garnet.test/RespConfigTests.cs | Updates a size-tracker precondition in a config/eviction-related test. |
| test/standalone/Garnet.test.collections/GarnetObjectTests.cs | Switches store recovery in collection tests to RecoverAsync. |
| libs/storage/Tsavorite/cs/test/TestUtils.cs | Removes CompletionSyncMode enum used by tests. |
| libs/storage/Tsavorite/cs/test/test.session.context/UnsafeContextTests.cs | Removes sync completion mode path; uses async completion APIs in unsafe-context tests. |
| libs/storage/Tsavorite/cs/test/test.session.context/TransactionalUnsafeContextTests.cs | Removes sync completion mode path; uses async completion APIs in transactional unsafe-context tests. |
| libs/storage/Tsavorite/cs/test/test.recovery/SimpleRecoveryTest.cs | Updates recovery/checkpoint tests to async-only completion and recovery. |
| libs/storage/Tsavorite/cs/test/test.recovery/RecoveryTests.cs | Updates recovery tests to async-only recovery/checkpointing paths. |
| libs/storage/Tsavorite/cs/test/test.recovery/RecoveryCheckTests.cs | Updates multiple recovery-check tests to async flows and adds ConfigureAwait(false) in some awaits. |
| libs/storage/Tsavorite/cs/test/test.recovery/ObjectRecoveryTest3.cs | Updates object recovery test to async-only recovery. |
| libs/storage/Tsavorite/cs/test/test.recovery/ObjectRecoveryTest2.cs | Updates object recovery test to async-only recovery/checkpoint completion. |
| libs/storage/Tsavorite/cs/test/test.recovery/ObjectRecoveryTest.cs | Updates object recovery test to async-only recovery and removes unused static import. |
| libs/storage/Tsavorite/cs/test/test.recovery/ObjectRecoverySnapshotEvictionTests.cs | New tests exercising snapshot deferred object load + eviction, plus compact+truncate after recovery. |
| libs/storage/Tsavorite/cs/test/test.recovery/LargeObjectTests.cs | Updates large-object recovery/checkpoint tests to RecoverAsync / async completion. |
| libs/storage/Tsavorite/cs/test/test.recovery/ComponentRecoveryTests.cs | Converts component recovery tests to async (RecoverAsync APIs with cancellation tokens). |
| libs/storage/Tsavorite/cs/test/test.recordops/RecordLifecycleTests.cs | Minor test message punctuation tweak. |
| libs/storage/Tsavorite/cs/test/test.hlog/LogTests.cs | Converts TsavoriteLog manual commit test to async RecoverAsync. |
| libs/storage/Tsavorite/cs/test/test.hlog/LogRecoverReadOnlyTests.cs | Removes sync/async toggle; uses async-only RecoverReadOnlyAsync and async log creation. |
| libs/storage/Tsavorite/cs/test/test.hlog/LogFastCommitTests.cs | Converts fast-commit test to async RecoverAsync. |
| libs/storage/Tsavorite/cs/test/test.hlog/FlakyDeviceTests.cs | Removes trailing whitespace/blank line. |
| libs/storage/Tsavorite/cs/test/SharedDirectoryTests.cs | Removes sync/async toggle; uses async-only store recovery. |
| libs/storage/Tsavorite/cs/test/MiscTests.cs | Converts a recovery test to async RecoverAsync. |
| libs/storage/Tsavorite/cs/src/core/Utilities/PageAsyncResultTypes.cs | Adds RecoveryPhase and extends async page result types with recovery-phase and snapshot-copy metadata. |
| libs/storage/Tsavorite/cs/src/core/TsavoriteLog/TsavoriteLog.cs | Introduces async RecoverAsync, removes sync RecoverReadOnly, refactors restore helpers to async; updates internal state tracking. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Tsavorite.cs | Removes sync Recover overloads; relies on async recovery and sync-bridge (GetResult) in a few legacy/compat paths. |
| libs/storage/Tsavorite/cs/src/core/Index/Recovery/Recovery.cs | Major refactor of recovery driver to async; implements two-pass recovery with page trimming and deferred object loading with eviction support. |
| libs/storage/Tsavorite/cs/src/core/Index/Recovery/IndexRecovery.cs | Removes sync fuzzy-index recovery/wait helpers; leaves async-only recovery APIs. |
| libs/storage/Tsavorite/cs/src/core/Index/Common/LogSizeTracker.cs | Renames/adjusts size-tracker APIs (IsOverBudget, RemainingBudget), updates eviction range logic and page scanning. |
| libs/storage/Tsavorite/cs/src/core/Index/Common/LogSettings.cs | Adds kMinPageCount constant used for min memory sizing. |
| libs/storage/Tsavorite/cs/src/core/Allocator/TsavoriteLogAllocator.cs | Adds new interface methods (e.g., GetPageObjectIdMap) and updated eviction signature. |
| libs/storage/Tsavorite/cs/src/core/Allocator/SpanByteAllocator.cs | Adds new interface methods and updated eviction signature (no-op for record-eviction range). |
| libs/storage/Tsavorite/cs/src/core/Allocator/ObjectSerialization/ObjectLogWriter.cs | Adds recovery-time snapshot object-byte copy helper (CopyRecoveredObjectBytes). |
| libs/storage/Tsavorite/cs/src/core/Allocator/ObjectIdMap.cs | Adds IsEmpty helper used by eviction/recovery code paths. |
| libs/storage/Tsavorite/cs/src/core/Allocator/ObjectAllocatorImpl.cs | Extends per-record eviction API with recovery awareness; implements snapshot object-log copy during recovery flush and demand-loaded reader setup. |
| libs/storage/Tsavorite/cs/src/core/Allocator/ObjectAllocator.cs | Adds GetPageObjectIdMap plumbing and updated eviction signature. |
| libs/storage/Tsavorite/cs/src/core/Allocator/MallocFixedPageSize.cs | Removes sync recovery/wait helpers; keeps async recovery API. |
| libs/storage/Tsavorite/cs/src/core/Allocator/LogRecord.cs | Adds RepointObjectLogPosition to support snapshot->main object-log copy during recovery flush. |
| libs/storage/Tsavorite/cs/src/core/Allocator/IAllocator.cs | Extends allocator interface with GetPageObjectIdMap and EvictRecordsInRange(..., isRecovery). |
| libs/storage/Tsavorite/cs/src/core/Allocator/AllocatorBase.cs | Adds recovery/object-load helpers, refactors page allocation helper, extends recovery read/flush APIs with recovery phase and snapshot copy metadata. |
| libs/storage/Tsavorite/cs/benchmark/YCSB.benchmark/TestLoader.cs | Updates recovery call to RecoverAsync().GetResult() in benchmark loader. |
| libs/server/StoreWrapper.cs | Converts recovery flow to RecoverAsync, and makes checkpoint/AOF recovery async. |
| libs/server/Providers/GarnetProvider.cs | Exposes async recovery API from provider. |
| libs/server/Databases/SingleDatabaseManager.cs | Converts checkpoint and AOF recovery to async and updates internal recovery routines accordingly. |
| libs/server/Databases/MultiDatabaseManager.cs | Converts checkpoint and AOF recovery to async for multi-db mode, updating error messaging. |
| libs/server/Databases/IDatabaseManager.cs | Changes recovery APIs to async (RecoverCheckpointAsync, RecoverAOFAsync). |
| libs/server/Databases/DatabaseManagerBase.cs | Refactors shared recovery helpers to async (RecoverDatabaseCheckpointAsync, RecoverDatabaseAOFAsync). |
| libs/server/Cluster/IClusterProvider.cs | Changes cluster recovery API to RecoverAsync. |
| libs/server/AOF/SingleLog.cs | Converts AOF log recovery to async. |
| libs/server/AOF/ShardedLog.cs | Converts sharded AOF recovery to async (awaits per-sublog recovery). |
| libs/server/AOF/GarnetLog.cs | Converts GarnetLog recovery to async, delegating to single/sharded logs. |
| libs/host/GarnetServer.cs | Bridges server startup to async recovery with .GetAwaiter().GetResult() and warning suppression. |
| libs/cluster/Server/Replication/ReplicationManager.cs | Converts replication recovery driver to async and updates logging. |
| libs/cluster/Server/Replication/ReplicaOps/ReplicaDiskbasedSync.cs | Awaits async AOF recovery; sync-bridges async checkpoint recovery in a synchronous RESP path with warning suppression. |
| libs/cluster/Server/ClusterProvider.cs | Switches cluster recovery to async delegation to replication manager. |
…ete calculation in LogSizeTracker; remove no-longer-needed evictForBudget argument
badrishc
approved these changes
Jun 23, 2026
vazois
approved these changes
Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors the cluster and database recovery logic to be fully asynchronous, replacing synchronous recovery methods with async counterparts throughout the codebase. This change improves scalability and responsiveness during cluster and database startup, ensuring that recovery operations do not block threads unnecessarily. The changes touch core interfaces, implementations, and usage sites, updating method signatures and internal logic to use
ValueTaskandasync/awaitpatterns.Key changes include:
Asynchronous Recovery Refactor
Changed all recovery methods such as
Recover,RecoverCheckpoint, andRecoverAOFin cluster and database manager interfaces and implementations to their asynchronous equivalents (RecoverAsync,RecoverCheckpointAsync, andRecoverAOFAsync), updating signatures and call sites accordingly. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]Updated recovery logic in
ReplicationManagerandReplicaDiskbasedSyncto call asynchronous recovery methods and useawaitwhere appropriate, including transitioning methods likeRecover,RecoverCheckpointAndAOF, and related calls to async. [1] [2] [3] [4] [5]Interface and API Consistency
IClusterProvider,IDatabaseManager, and related interfaces to use async recovery methods, ensuring consistency across the codebase and enabling async recovery flows from top-level startup routines down to storage engines. [1] [2] [3]Synchronous Startup Compatibility
.AsTask().GetAwaiter().GetResult()with appropriate warnings to maintain compatibility while transitioning to async APIs. [1] [2]Internal Implementation Updates
Refactored internal recovery implementations for
SingleLog,ShardedLog, and related storage classes to provide async recovery methods, and updated their usage throughout the codebase. [1] [2] [3]Modified Recovery eviction logic to be two-pass:
These changes collectively modernize the recovery path, improve non-blocking behavior, and lay the groundwork for further async enhancements in cluster and database management.