Skip to content

HBASE-30238 HBase bulkload replication causes duplicate HFile loading and compaction storm when source RPC times out#8380

Open
mini666 wants to merge 5 commits into
apache:masterfrom
mini666:HBASE-30238
Open

HBASE-30238 HBase bulkload replication causes duplicate HFile loading and compaction storm when source RPC times out#8380
mini666 wants to merge 5 commits into
apache:masterfrom
mini666:HBASE-30238

Conversation

@mini666

@mini666 mini666 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

JIRA: https://issues.apache.org/jira/browse/HBASE-30238

This PR adds:

  • Configurable bandwidth throttling for bulkloaded HFile copy during replication.
  • In-progress bulkload deduplication for concurrent RPC retries of the same bulkload event.

Testing:

  • JAVA_HOME=/Users/juke.mini666/.asdf/installs/java/temurin-17.0.6+10 mvn test -pl hbase-server -am -Dtest=TestHFileReplicatorBandwidth,TestReplicationSinkBulkLoadDedup -DfailIfNoTests=false -Dhadoop-three.version=3.4.3 -Dwarbucks.skip=true -Dcheckstyle.skip=true -Dspotbugs.skip=true -Drat.skip=true

…eplication

Introduce hbase.replication.bulkload.copy.bandwidth.mb to rate-limit HFile copy from source HDFS in HFileReplicator.Copier.

The limit is enforced through a shared RateLimiter across copy threads, and ReplicationSink updates the rate through configuration reload without requiring RegionServer restart.
Track in-progress bulkload events by replication cluster, encoded region, and bulkload sequence number so concurrent RPC retries skip duplicate execution.

The key is removed after the current attempt completes or fails, preserving at-least-once retry semantics while avoiding concurrent duplicate loads.
bulkLoadHFileMap.put(tableName, newFamilyHFilePathsList);
}

private static String buildBulkLoadKey(String replicationClusterId, BulkLoadDescriptor bld) {

@wchevreuil wchevreuil Jun 19, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to be static.

@wchevreuil wchevreuil left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions:

  1. Do we know if the source always retries the same sink, in the event of rpc timeout? Otherwise, we may still have same bulkload entry submitted more than once.

  2. Isn't the throttling going to increase likelihood for such RPC timeouts? If so, can we mention that on code comments?

Coordinate replicated bulkload WAL events across target region servers using ZooKeeper in-progress and completed markers.

Add master-side cleanup for completed markers and cover cross-RS retry replay with MiniCluster.
@mini666

mini666 commented Jun 21, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review.

I added a follow-up commit to handle the cross-target-RS retry case. The sink now coordinates replicated bulkload WAL events through ZooKeeper using in-progress and completed markers, so a retry sent to a different target region server can observe that the same bulkload event has already completed and skip reloading the HFiles.

I also added a master-side chore to clean expired completed markers, while keeping them if a matching in-progress marker still exists.

For coverage, I added unit tests for the ZooKeeper event tracker and cleanup chore, a sink-level completed-marker skip test, and a MiniCluster test that replays the exact same bulkload WAL batch first through one target RS sink and then through another target RS sink.

I verified the change with the focused bulkload replication tests, including the new cross-target-RS retry MiniCluster test. The run passed with 13 tests, 0 failures, 0 errors, and 0 skipped.

Keep using BulkLoadDescriptor cluster ids when invoking HFileReplicator so replicated bulkload RPCs do not treat the target cluster as already handled. The ZooKeeper event tracking identity remains unchanged.

@wchevreuil wchevreuil left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wait. I like the tracker idea, but there are some caveats. Please see my comments inline.

}
}

public void choreForTesting() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for this extra method. Tests in the same package can simply call chore() directly. And the super class already defines same method, only difference being it's synchronized there.

Comment on lines +31 to +33
/**
* Cleans completed replicated bulk load event markers after they are old enough.
*/

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add explanation about how the new configuration properties influence this chore behaviour.

Comment on lines +41 to +42
@InterfaceAudience.Private
public class ReplicationBulkLoadEventTracker {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a tracker for this kind of events is a good idea, but we should not make it tight to ZK. There's been an effort to gradually move HBase features out of ZK. Replication queue storage, for example, already has a ZKless alternative (see HBASE-27109).

This tracker, should follow similar approach. We should have it as interface/abstract class and provide both ZK based and table based implementations. However, I don't think this should be implemented in this same PR. My suggestion for you then is to move this tracking logic out of this PR, let's leave this one simply handling the issue on individual RS level. Then once this is merged, you can file a new jira/PR to implement the tracking logic at a global level, covering both ZK and table based registries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants