Skip to content

HDDS-14187. Use BatchOperation to batch writes to tables of FSORepairTool#10578

Merged
adoroszlai merged 7 commits into
apache:masterfrom
chihsuan:HDDS-14187
Jun 23, 2026
Merged

HDDS-14187. Use BatchOperation to batch writes to tables of FSORepairTool#10578
adoroszlai merged 7 commits into
apache:masterfrom
chihsuan:HDDS-14187

Conversation

@chihsuan

@chihsuan chihsuan commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Problem. When FSORepairTool marks the temporary reachable and pendingToDelete tables in temp.db, it writes each entry with an individual Table.put. Every put is a separate RocksDB write (WAL + fsync). For FSO buckets with thousands or millions of files and directories, this per-entry fsync overhead dominates the run.

Fix. Buffer the temp-table writes in a bounded RocksDB BatchOperation instead of one put per entry. A small BatchedTempWriter helper accumulates putWithBatch calls, flushes every --batch-size entries to cap memory, and commits the remainder on close; each marking phase wraps its directory walk in one writer. The already-batched repair-mode move logic is unchanged.

The batch size is exposed as a --batch-size CLI option (default 10000) so operators can tune it without a rebuild. Values below 1 are rejected.

This is safe because the temp tables are only written during the marking phases and only read back in the later classification phase, so every bucket's writes are committed before it is classified.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14187

How was this patch tested?

  • The existing TestFSORepairTool suite (connected / disconnected / empty / unreachable trees, dry-run, volume and bucket filters, repair mode, and post-repair OM restart validation) passes unchanged, confirming the batched writes produce identical reports.
  • Added testBatchedTempWrites, which runs a full dry-run with --batch-size 1 so the batch commit/reset path is exercised for both the reachable and pendingToDelete tables across all tree shapes, and asserts the report is identical to the default-batch run.
  • Added testInvalidBatchSize, which asserts --batch-size 0 fails with a non-zero exit and a clear error message.
  • checkstyle.sh is clean on the changed modules.

Generated-by: Claude Code (claude-opus-4-8)

if (currentDir == null) {
if (isVerbose()) {
info("Directory key" + currentDirKey + "to be processed was not found in the directory table.");
try (BatchedTempWriter writer = new BatchedTempWriter(reachableTable)) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this diff is re-indentation from wrapping the existing DFS in this try-with-resources. The only behavioral change is opening a single BatchedTempWriter for the bucket and threading it into addReachableEntry/getChildDirectoriesAndMarkAsReachable. The walk itself is unchanged.

if (!deletedDirKey.startsWith(bucketPrefix)) {
break;
}
try (BatchedTempWriter writer = new BatchedTempWriter(pendingToDeleteTable)) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as markReachableObjectsInBucket: the large diff here is re-indentation from the try-with-resources wrap. Logic is unchanged apart from threading the writer.

@adoroszlai adoroszlai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chihsuan for the patch.

Comment on lines +92 to +93
@VisibleForTesting
static int tempDbBatchSize = 10_000;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to add a CLI option for batch size:

  • allow user to adjust it without rebuild (in case it is needed)
  • avoids @VisibleForTesting

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick feedback! @adoroszlai Good call, replaced with a --batch-size CLI option in 4fde712

Comment on lines +585 to +588
if (pending > 0) {
tempDB.commitBatchOperation(batch);
}
batch.close();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we set pending = 0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly needed since the writer isn't reused after close() but agreed it's more defensive. Updated in c322eab

@chihsuan chihsuan marked this pull request as ready for review June 22, 2026 15:35

@adoroszlai adoroszlai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chihsuan for updating the patch, LGTM.

@adoroszlai

Copy link
Copy Markdown
Contributor

@sarvekshayr please take a look

@sarvekshayr sarvekshayr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chihsuan for the improvement.

private String bucketFilter;

@CommandLine.Option(names = {"--batch-size"},
defaultValue = "10000",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add showDefaultValue = Visibility.ALWAYS

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done 66ea234

@adoroszlai adoroszlai merged commit 3db369d into apache:master Jun 23, 2026
47 checks passed
@adoroszlai

Copy link
Copy Markdown
Contributor

Thanks @chihsuan for the patch, @sarvekshayr for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants