HDDS-14187. Use BatchOperation to batch writes to tables of FSORepairTool#10578
Conversation
| if (currentDir == null) { | ||
| if (isVerbose()) { | ||
| info("Directory key" + currentDirKey + "to be processed was not found in the directory table."); | ||
| try (BatchedTempWriter writer = new BatchedTempWriter(reachableTable)) { |
There was a problem hiding this comment.
Most of this diff is re-indentation from wrapping the existing DFS in this try-with-resources. The only behavioral change is opening a single BatchedTempWriter for the bucket and threading it into addReachableEntry/getChildDirectoriesAndMarkAsReachable. The walk itself is unchanged.
| if (!deletedDirKey.startsWith(bucketPrefix)) { | ||
| break; | ||
| } | ||
| try (BatchedTempWriter writer = new BatchedTempWriter(pendingToDeleteTable)) { |
There was a problem hiding this comment.
Same as markReachableObjectsInBucket: the large diff here is re-indentation from the try-with-resources wrap. Logic is unchanged apart from threading the writer.
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @chihsuan for the patch.
| @VisibleForTesting | ||
| static int tempDbBatchSize = 10_000; |
There was a problem hiding this comment.
I think it would be better to add a CLI option for batch size:
- allow user to adjust it without rebuild (in case it is needed)
- avoids
@VisibleForTesting
There was a problem hiding this comment.
Thanks for the quick feedback! @adoroszlai Good call, replaced with a --batch-size CLI option in 4fde712
| if (pending > 0) { | ||
| tempDB.commitBatchOperation(batch); | ||
| } | ||
| batch.close(); |
There was a problem hiding this comment.
Should we set pending = 0?
There was a problem hiding this comment.
Not strictly needed since the writer isn't reused after close() but agreed it's more defensive. Updated in c322eab
adoroszlai
left a comment
There was a problem hiding this comment.
Thanks @chihsuan for updating the patch, LGTM.
|
@sarvekshayr please take a look |
sarvekshayr
left a comment
There was a problem hiding this comment.
Thanks @chihsuan for the improvement.
| private String bucketFilter; | ||
|
|
||
| @CommandLine.Option(names = {"--batch-size"}, | ||
| defaultValue = "10000", |
There was a problem hiding this comment.
nit: Add showDefaultValue = Visibility.ALWAYS
|
Thanks @chihsuan for the patch, @sarvekshayr for the review. |
What changes were proposed in this pull request?
Problem. When
FSORepairToolmarks the temporaryreachableandpendingToDeletetables intemp.db, it writes each entry with an individualTable.put. Every put is a separate RocksDB write (WAL + fsync). For FSO buckets with thousands or millions of files and directories, this per-entry fsync overhead dominates the run.Fix. Buffer the temp-table writes in a bounded RocksDB
BatchOperationinstead of oneputper entry. A smallBatchedTempWriterhelper accumulatesputWithBatchcalls, flushes every--batch-sizeentries to cap memory, and commits the remainder on close; each marking phase wraps its directory walk in one writer. The already-batched repair-mode move logic is unchanged.The batch size is exposed as a
--batch-sizeCLI option (default 10000) so operators can tune it without a rebuild. Values below 1 are rejected.This is safe because the temp tables are only written during the marking phases and only read back in the later classification phase, so every bucket's writes are committed before it is classified.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-14187
How was this patch tested?
TestFSORepairToolsuite (connected / disconnected / empty / unreachable trees, dry-run, volume and bucket filters, repair mode, and post-repair OM restart validation) passes unchanged, confirming the batched writes produce identical reports.testBatchedTempWrites, which runs a full dry-run with--batch-size 1so the batch commit/reset path is exercised for both thereachableandpendingToDeletetables across all tree shapes, and asserts the report is identical to the default-batch run.testInvalidBatchSize, which asserts--batch-size 0fails with a non-zero exit and a clear error message.checkstyle.shis clean on the changed modules.Generated-by: Claude Code (claude-opus-4-8)