Multilog consistent read optimizations#1832
Open
vazois wants to merge 17 commits into
Open
Conversation
…ough passing the timeout directly through the method signature
Contributor
There was a problem hiding this comment.
Pull request overview
This PR investigates and optimizes the multi-log (virtual sublog) consistent read path, focusing on reducing overhead in replica reads and improving correctness of the per-key replay frontier sketch under concurrency.
Changes:
- Renames and refactors Tsavorite consistent-read session hooks (single-key + batch) and threads key-hash through
ReadOptionsto avoid redundant hash computation. - Optimizes Garnet replica consistent-read enforcement via cached session switching and improved read-consistency replay state tracking (sketch indexing + waiter signaling).
- Adds a BenchmarkDotNet scenario to measure consistent-read overhead across SingleLog / MultiLog+Primary / MultiLog+Replica modes, and introduces
AofReplayMaxDriftconfiguration + replica replay throttling support.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/storage/Tsavorite/cs/src/core/Index/Interfaces/SessionFunctionsBase.cs | Renames default consistent-read hook methods in the base session functions. |
| libs/storage/Tsavorite/cs/src/core/Index/Interfaces/ISessionFunctions.cs | Renames consistent-read hook APIs (single-key + batch) on the session-functions interface. |
| libs/storage/Tsavorite/cs/src/core/ClientSession/TransactionalConsistentReadContext.cs | Plumbs key-hash via ReadOptions and uses renamed consistent-read hooks in transactional consistent-read paths. |
| libs/storage/Tsavorite/cs/src/core/ClientSession/NoOpSessionFunctions.cs | Updates no-op session-functions implementation to new consistent-read hook names. |
| libs/storage/Tsavorite/cs/src/core/ClientSession/ConsistentReadContext.cs | Uses renamed consistent-read hooks; passes key-hash via ReadOptions; updates pending-completion callbacks. |
| libs/storage/Tsavorite/cs/benchmark/YCSB.benchmark/SessionFixedLenFunctions.cs | Updates benchmark session functions to satisfy the renamed consistent-read interface hooks. |
| libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs | Updates unified-store iteration reader to call renamed consistent-read hooks. |
| libs/server/Storage/Functions/VectorStore/VectorSessionFunctions.cs | Renames consistent-read hook forwarding to ReadSessionState. |
| libs/server/Storage/Functions/UnifiedStore/UnifiedSessionFunctions.cs | Renames consistent-read hook forwarding to ReadSessionState. |
| libs/server/Storage/Functions/ObjectStore/ObjectSessionFunctions.cs | Renames consistent-read hook forwarding to ReadSessionState. |
| libs/server/Storage/Functions/MainStore/MainSessionFunctions.cs | Renames consistent-read hook forwarding to ReadSessionState. |
| libs/server/Servers/GarnetServerOptions.cs | Adds AofReplayMaxDrift option to configure replay-driver drift throttling. |
| libs/server/Resp/RespServerSession.cs | Caches the consistent-read session switch to avoid redundant SwitchActiveDatabaseSession work. |
| libs/server/AOF/ReadConsistency/VirtualSublogReplayState.cs | Updates sketch indexing and replaces semaphore-based signaling with a waiter-list + spin/wait scheme. |
| libs/server/AOF/ReadConsistency/ReplicaReadSessionContext.cs | Adds per-sublog max caching, removes per-op timeout CTS reset, and renames consistent-read hook entry points. |
| libs/server/AOF/ReadConsistency/ReadConsistencyManager.cs | Adds physical-sublog max query, introduces cached-sublog-max fast path, and updates consistent-read verification flow. |
| libs/server/AOF/GarnetLog.cs | Minor expression formatting in virtual-sublog index calculation. |
| libs/host/defaults.conf | Adds default config entry for AofReplayMaxDrift. |
| libs/host/Configuration/Options.cs | Adds CLI option + mapping for AofReplayMaxDrift. |
| libs/cluster/Server/Replication/ReplicaOps/AOFReplay/ReplicaReplayDriver.cs | Implements replay throttling using AofReplayMaxDrift and ReadConsistencyManager physical-sublog max. |
| benchmark/BDN.benchmark/Cluster/ConsistentRead/ConsistentReadParams.cs | New: parameter type for consistent-read benchmark modes. |
| benchmark/BDN.benchmark/Cluster/ConsistentRead/ConsistentReadOperations.cs | New: BDN benchmarks for GET + MGET exercising consistent-read paths. |
| benchmark/BDN.benchmark/Cluster/ConsistentRead/ConsistentReadContext.cs | New: embedded server/session harness to run consistent-read benchmarks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Investigate and implement optimizations for the multilog consistent read path.
Optimizations
(B) Use bitshift/mask instead of division/mod for sublog idx calculation— not worth it; marginal gain with power-of-two restriction on sublog counts(D) Drop unnecessary inProgress lock and ResetTimeoutCts contentionPre-optimization baseline
Post-optimization results
Follow-up items (optional)
Defer sketch-max update to reduce cache invalidation under concurrent load — When writers and replay threads are active alongside readers, updating the per-key sequence number sketch immediately on each replay invalidates cache lines shared with reader threads. Deferring or batching the sketch-max update avoids false sharing. Not a factor in the current BDN (no active writer), but a best-practice for production workloads.
Replay-thread pacing / coordination barrier — Without coordination, replay threads for different sublogs can diverge arbitrarily. A lightweight barrier or watermark sync would keep them roughly aligned, reducing worst-case reader wait time and bounding the staleness window without adding per-key cost on the read path.
Consolidate witness-tail into the replication stream (time-advancement sentinel) — Eliminate the separate witness-tail task by emitting a lightweight sentinel on the same connection that ships the log. The sentinel signals "time advanced to sequence N" for a sublog without appending a real AOF record. Key validations: (1) blind sequence-number acquisition is safe under monotonic-max semantics, (2) sentinel does not allocate AOF space, (3) no races with concurrent write sessions acquiring sequence numbers in parallel.