feat: coalesce contiguous range reads for partial segment downloads by capistrant · Pull Request #19652 · apache/druid

capistrant · 2026-07-02T21:57:27Z

Description

First cut at adding some coalescing to range reads for partial segment downloads. The overarching goal is to combine what would have been multiple reads of contiguous ranges (or near-contiguous ranges - see below) into a single read when it meets certain parameters. I say near-contiguous because this PR adds a knob to define a gap in bytes between required reads that can be opportunistically also read in order to create a ranged read over two required internal files. If unneeded files are downloaded for a coalesced range, they are still marked downloaded on the host and become queryable. This technically means there could be never-requested data becoming resident on the data server and cause evictions on a host that has disk pressure. Operators can control the knobs to reduce or increase these new configs to become more or less aggressive with coalescing reads. The hope is the defaults are a generally good for all, but may need tuning after real world learnings.

Configs

Config	Default	Description
`druid.segmentCache.virtualStorageCoalesceMaxGapBytes`	`1048576` (1 MiB)	Largest unwanted gap, in bytes, read through to merge two adjacent requested files into a single deep-storage range read. Larger values trade over-fetched bytes for fewer requests; `0` merges only truly-adjacent files. Must be `>= 0`.
`druid.segmentCache.virtualStorageCoalesceMaxChunkBytes`	`16777216` (16 MiB)	Largest size, in bytes, of a single coalesced range read. Bounds how big one fetch can grow and keeps a wide request split into several reads that can download concurrently rather than collapsing into one serial read. A single file larger than this is still fetched whole (the cap only limits how many files are merged). Must be `>= 1`.

Both are validated at startup; an invalid value fails service startup with a clear message.

New metric

Metric	Description	Dimensions
`storage/virtual/read/gapBytes`	Of `storage/virtual/read/bytes`, the bytes read that were not part of a requested file (unrequested files spanned to coalesce, plus inter-file padding). Ratio to `read/bytes` is the over-fetch fraction.	`location`

Release note

Key changed/added classes in this PR

processing/src/main/java/org/apache/druid/segment/PartialQueryableIndex.java
processing/src/main/java/org/apache/druid/segment/PartialQueryableIndexCursorFactory.java
processing/src/main/java/org/apache/druid/segment/file/PartialSegmentFileMapperV10.java

This PR has:

Embedded test rename fix checkstyle

   * @param jsonMapper        used by the metadata entry's mount path to parse the header
   * @param storagePool       thread pool the async cursor path submits on-demand column downloads to (which bounds
   *                          load concurrency itself); may be null in tests that never invoke the cursor factory
+   * @param coalesceConfig    range-coalescing thresholds applied to on-demand downloads once the entry is mounted


jtuglu1 · 2026-07-03T03:35:35Z

Question 1: Per-historical, do we track in-flight range requests so we avoid duplicate calls? E.g. query 1 calls for range A, not available locally so we initiate fetch from S3, meanwhile query 2 calls for range A. Does query 2 know about the in-flight request to avoid sending a duplicate?

Question 2: how are we balancing coalescing ranged GET calls while maximizing parallelism at the S3 connection level?

first cut range read coalesce

97dbe94

Embedded test rename fix checkstyle

capistrant marked this pull request as draft July 2, 2026 21:57

github-actions Bot added the Area - Segment Format and Ser/De label Jul 2, 2026

github-advanced-security AI found potential problems Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: coalesce contiguous range reads for partial segment downloads#19652

feat: coalesce contiguous range reads for partial segment downloads#19652
capistrant wants to merge 1 commit into
apache:masterfrom
capistrant:range-reading-coalesce

capistrant commented Jul 2, 2026

Uh oh!

jtuglu1 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

capistrant commented Jul 2, 2026

Description

Configs

New metric

Release note

Key changed/added classes in this PR

Uh oh!

jtuglu1 commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants