Skip to content

Multilog consistent read optimizations#1832

Open
vazois wants to merge 17 commits into
mainfrom
vazois/mlog-opt
Open

Multilog consistent read optimizations#1832
vazois wants to merge 17 commits into
mainfrom
vazois/mlog-opt

Conversation

@vazois

@vazois vazois commented May 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Investigate and implement optimizations for the multilog consistent read path.

Optimizations

  • Add BDN benchmark to validate consistent read overhead
  • (A) Cache consistent read session switch (avoid redundant SwitchActiveDatabaseSession)
  • (B) Use bitshift/mask instead of division/mod for sublog idx calculation — not worth it; marginal gain with power-of-two restriction on sublog counts
  • (C) Use key hash bits after sublog idx shift for sketch indexing (collision fix)
  • (D) Drop unnecessary inProgress lock and ResetTimeoutCts contention
    • (D.1) Fix ResetTimeoutCts contention

Pre-optimization baseline

Method Runtime Params Mean Error StdDev
Get .NET 10.0 MultiLog+Primary 23.96 us 0.469 us 0.416 us
MGet .NET 10.0 MultiLog+Primary 16.79 us 0.106 us 0.083 us
Get .NET 8.0 MultiLog+Primary 24.83 us 0.051 us 0.045 us
MGet .NET 8.0 MultiLog+Primary 16.86 us 0.117 us 0.110 us
Get .NET 10.0 MultiLog+Replica 39.09 us 0.113 us 0.095 us
MGet .NET 10.0 MultiLog+Replica 32.88 us 0.214 us 0.190 us
Get .NET 8.0 MultiLog+Replica 41.55 us 0.176 us 0.156 us
MGet .NET 8.0 MultiLog+Replica 33.62 us 0.140 us 0.124 us
Get .NET 10.0 SingleLog 24.07 us 0.068 us 0.060 us
MGet .NET 10.0 SingleLog 16.36 us 0.104 us 0.098 us
Get .NET 8.0 SingleLog 25.30 us 0.173 us 0.145 us
MGet .NET 8.0 SingleLog 16.80 us 0.092 us 0.082 us

Post-optimization results

Method Runtime Params Mean Error StdDev A C
Get .NET 10.0 MultiLog+Primary 24.34 us 0.387 us 0.362 us
MGet .NET 10.0 MultiLog+Primary 18.32 us 0.324 us 0.303 us
Get .NET 8.0 MultiLog+Primary 26.25 us 0.513 us 0.570 us
MGet .NET 8.0 MultiLog+Primary 19.36 us 0.552 us 1.610 us
Get .NET 10.0 MultiLog+Replica 40.67 us 0.789 us 0.700 us -2.5% ~0%
MGet .NET 10.0 MultiLog+Replica 33.29 us 0.345 us 0.306 us -1.7% ~0%
Get .NET 8.0 MultiLog+Replica 42.54 us 0.554 us 0.519 us -1.0% ~0%
MGet .NET 8.0 MultiLog+Replica 34.25 us 0.476 us 0.422 us -1.8% ~0%
Get .NET 10.0 SingleLog 24.86 us 0.465 us 0.435 us
MGet .NET 10.0 SingleLog 16.82 us 0.220 us 0.195 us
Get .NET 8.0 SingleLog 25.15 us 0.260 us 0.230 us
MGet .NET 8.0 SingleLog 17.24 us 0.218 us 0.204 us

Notes:

  • (A) Negligible improvement (~1-2%) — session switch is just 7 field assignments.
  • (C) No throughput change in single-threaded BDN (same computational cost). Its value is correctness: eliminates false sketch collisions that cause unnecessary blocking under real multi-threaded replay workloads.
  • Dominant overhead is in the per-key lock + CTS reset (optimization D).

Follow-up items (optional)

  • Defer sketch-max update to reduce cache invalidation under concurrent load — When writers and replay threads are active alongside readers, updating the per-key sequence number sketch immediately on each replay invalidates cache lines shared with reader threads. Deferring or batching the sketch-max update avoids false sharing. Not a factor in the current BDN (no active writer), but a best-practice for production workloads.

  • Replay-thread pacing / coordination barrier — Without coordination, replay threads for different sublogs can diverge arbitrarily. A lightweight barrier or watermark sync would keep them roughly aligned, reducing worst-case reader wait time and bounding the staleness window without adding per-key cost on the read path.

  • Consolidate witness-tail into the replication stream (time-advancement sentinel) — Eliminate the separate witness-tail task by emitting a lightweight sentinel on the same connection that ships the log. The sentinel signals "time advanced to sequence N" for a sublog without appending a real AOF record. Key validations: (1) blind sequence-number acquisition is safe under monotonic-max semantics, (2) sentinel does not allocate AOF space, (3) no races with concurrent write sessions acquiring sequence numbers in parallel.

@vazois vazois force-pushed the vazois/mlog-opt branch from 38c96bf to 423d76c Compare June 8, 2026 23:28
@vazois vazois marked this pull request as ready for review June 8, 2026 23:39
Copilot AI review requested due to automatic review settings June 8, 2026 23:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR investigates and optimizes the multi-log (virtual sublog) consistent read path, focusing on reducing overhead in replica reads and improving correctness of the per-key replay frontier sketch under concurrency.

Changes:

  • Renames and refactors Tsavorite consistent-read session hooks (single-key + batch) and threads key-hash through ReadOptions to avoid redundant hash computation.
  • Optimizes Garnet replica consistent-read enforcement via cached session switching and improved read-consistency replay state tracking (sketch indexing + waiter signaling).
  • Adds a BenchmarkDotNet scenario to measure consistent-read overhead across SingleLog / MultiLog+Primary / MultiLog+Replica modes, and introduces AofReplayMaxDrift configuration + replica replay throttling support.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
libs/storage/Tsavorite/cs/src/core/Index/Interfaces/SessionFunctionsBase.cs Renames default consistent-read hook methods in the base session functions.
libs/storage/Tsavorite/cs/src/core/Index/Interfaces/ISessionFunctions.cs Renames consistent-read hook APIs (single-key + batch) on the session-functions interface.
libs/storage/Tsavorite/cs/src/core/ClientSession/TransactionalConsistentReadContext.cs Plumbs key-hash via ReadOptions and uses renamed consistent-read hooks in transactional consistent-read paths.
libs/storage/Tsavorite/cs/src/core/ClientSession/NoOpSessionFunctions.cs Updates no-op session-functions implementation to new consistent-read hook names.
libs/storage/Tsavorite/cs/src/core/ClientSession/ConsistentReadContext.cs Uses renamed consistent-read hooks; passes key-hash via ReadOptions; updates pending-completion callbacks.
libs/storage/Tsavorite/cs/benchmark/YCSB.benchmark/SessionFixedLenFunctions.cs Updates benchmark session functions to satisfy the renamed consistent-read interface hooks.
libs/server/Storage/Session/Common/ArrayKeyIterationFunctions.cs Updates unified-store iteration reader to call renamed consistent-read hooks.
libs/server/Storage/Functions/VectorStore/VectorSessionFunctions.cs Renames consistent-read hook forwarding to ReadSessionState.
libs/server/Storage/Functions/UnifiedStore/UnifiedSessionFunctions.cs Renames consistent-read hook forwarding to ReadSessionState.
libs/server/Storage/Functions/ObjectStore/ObjectSessionFunctions.cs Renames consistent-read hook forwarding to ReadSessionState.
libs/server/Storage/Functions/MainStore/MainSessionFunctions.cs Renames consistent-read hook forwarding to ReadSessionState.
libs/server/Servers/GarnetServerOptions.cs Adds AofReplayMaxDrift option to configure replay-driver drift throttling.
libs/server/Resp/RespServerSession.cs Caches the consistent-read session switch to avoid redundant SwitchActiveDatabaseSession work.
libs/server/AOF/ReadConsistency/VirtualSublogReplayState.cs Updates sketch indexing and replaces semaphore-based signaling with a waiter-list + spin/wait scheme.
libs/server/AOF/ReadConsistency/ReplicaReadSessionContext.cs Adds per-sublog max caching, removes per-op timeout CTS reset, and renames consistent-read hook entry points.
libs/server/AOF/ReadConsistency/ReadConsistencyManager.cs Adds physical-sublog max query, introduces cached-sublog-max fast path, and updates consistent-read verification flow.
libs/server/AOF/GarnetLog.cs Minor expression formatting in virtual-sublog index calculation.
libs/host/defaults.conf Adds default config entry for AofReplayMaxDrift.
libs/host/Configuration/Options.cs Adds CLI option + mapping for AofReplayMaxDrift.
libs/cluster/Server/Replication/ReplicaOps/AOFReplay/ReplicaReplayDriver.cs Implements replay throttling using AofReplayMaxDrift and ReadConsistencyManager physical-sublog max.
benchmark/BDN.benchmark/Cluster/ConsistentRead/ConsistentReadParams.cs New: parameter type for consistent-read benchmark modes.
benchmark/BDN.benchmark/Cluster/ConsistentRead/ConsistentReadOperations.cs New: BDN benchmarks for GET + MGET exercising consistent-read paths.
benchmark/BDN.benchmark/Cluster/ConsistentRead/ConsistentReadContext.cs New: embedded server/session harness to run consistent-read benchmarks.

Comment thread libs/server/Resp/RespServerSession.cs
Comment thread libs/server/AOF/ReadConsistency/VirtualSublogReplayState.cs
Comment thread libs/server/AOF/ReadConsistency/ReplicaReadSessionContext.cs Outdated
Comment thread libs/server/AOF/ReadConsistency/ReplicaReadSessionContext.cs Outdated
Comment thread libs/server/AOF/ReadConsistency/ReplicaReadSessionContext.cs Outdated
Comment thread libs/storage/Tsavorite/cs/src/core/ClientSession/ConsistentReadContext.cs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants