Cap refresh start when hypertable has tiered data by kpan2034 · Pull Request #9811 · timescale/timescaledb

kpan2034 · 2026-05-13T15:11:47Z

Currently during a refresh, we process the ranges before the earliest chunk in the hypertable as well. This can lead to potential missing data when tiered data is present but tiered reads are disabled during the refresh.

By capping the refresh range at the start of the earliest chunk, any invalidations before the earliest chunk are no longer processed and removed. Thus, in a subsequent refresh, if tiered data reads are enabled, that data would be materialized in the CAgg.

We only need to do this when the hypertable has tiered data. Any new tiered data will either have been processed before tiering (since it exists in some chunk) or will be inserted after this refresh (which will generate invalidations).

We also cap the refresh start when generating batches for an incremental refresh.

codecov · 2026-06-05T20:44:34Z

Codecov Report

❌ Patch coverage is 87.17949% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
tsl/src/continuous_aggs/invalidation.c	81.25%	1 Missing and 2 partials ⚠️
tsl/src/continuous_aggs/refresh.c	77.77%	0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-06-05T21:13:52Z

@melihmutlu, @natalya-aksman: please review this pull request.

Powered by pull-review

Currently during a refresh, we process the ranges before the earliest chunk in the hypertable as well. This can lead to potential missing data when tiered data is present but tiered reads are disabled during the refresh. By capping the refresh range at the start of the earliest chunk, any invalidations before the earliest chunk are no longer processed and removed. Thus, in a subsequent refresh, if tiered data reads are enabled, that data would be materialized in the CAgg. We only need to do this when the hypertable has tiered data. Any new tiered data will either have been processed before tiering (since it exists in some chunk) or will be inserted after this refresh (which will generate invalidations).

Capping is done incorrectly when the OSM chunk is the earliest chunk.

Incremental batch generation would generate batches considering tiered data ranges, even if tiered reads are disabled. This can lead to invalid batches since we cap the refresh start to the earliest chunk in the hypertable during the refresh.

melihmutlu · 2026-06-15T10:09:54Z

+----------------------------------------------------------------------
+-- Test 5: When the OSM chunk's range is updated to precede the
+-- earliest real chunk, the wrong dimension slice is picked up
+-- and the refresh is not capped correctly.
+----------------------------------------------------------------------


the comment needs an update as it can cap correctly now.

melihmutlu · 2026-06-15T11:16:23Z

+	*slice = dimension_slice_from_slot(ti->slot);
+	MemoryContextSwitchTo(old);
+	Chunk *chunk = ts_chunk_get_by_id((*slice)->fd.chunk_id, true);
+
+	if (IS_OSM_CHUNK(chunk))
+	{
+		return SCAN_CONTINUE;
+	}


*slice here assigned whether it's an osm chunk or not. Imagine a case where there is no non-osm chunk and all data is tiered, we would end up this *slice already assigned. The caller assumes that it's non-osm as long as it's not NULL which does not hold in this case.

We should either check again in the caller, or nullify *slice if it's osm in the later IS_OSM_CHUNK check here.

melihmutlu · 2026-06-15T11:32:12Z

+	int64 earliest_start = invalidation_get_earliest_chunk_start(cagg->data.raw_hypertable_id);
+	if (earliest_start != INVAL_NEG_INFINITY)
+	{
+		Invalidation boundary = { .lowest_modified_value = earliest_start,
+								  .greatest_modified_value = earliest_start };
+		invalidation_expand_to_bucket_boundaries(&boundary,
+												 cagg->partition_type,
+												 cagg->bucket_function);
+		earliest_start = boundary.greatest_modified_value;
+	}


I understand why we want to ignore anything before the value initially return by invalidation_get_earliest_chunk_start. But why do we move earliest_start to its bucket end which is a further value?

I feel like skipping the whole bucket as if it's not invalidated may trigger rewrite to use the cagg when the specific bucket is actually stale in the cagg.

kpan2034 self-assigned this May 13, 2026

kpan2034 added the Continuous Aggregate label May 13, 2026

kpan2034 force-pushed the cap-invals-at-chunk-min branch from 0ef8281 to 3e5c07d Compare May 27, 2026 14:28

vineethapai added this to the v2.28.0 milestone Jun 1, 2026

kpan2034 force-pushed the cap-invals-at-chunk-min branch from 3e5c07d to 221041a Compare June 5, 2026 20:33

kpan2034 force-pushed the cap-invals-at-chunk-min branch from 221041a to fbd5fcd Compare June 5, 2026 21:13

kpan2034 marked this pull request as ready for review June 5, 2026 21:13

github-actions Bot requested review from melihmutlu and natalya-aksman June 5, 2026 21:13

kpan2034 force-pushed the cap-invals-at-chunk-min branch from fbd5fcd to 1ed30b3 Compare June 5, 2026 21:20

kpan2034 changed the title ~~Cap refresh range start at dimension minimum~~ Cap refresh start when hypertable has tiered data Jun 5, 2026

kpan2034 force-pushed the cap-invals-at-chunk-min branch from 1ed30b3 to 3ddfbf5 Compare June 5, 2026 21:23

kpan2034 requested review from pnthao and removed request for natalya-aksman June 5, 2026 21:24

melihmutlu reviewed Jun 8, 2026

View reviewed changes

Comment thread tsl/src/continuous_aggs/invalidation.c Outdated

kpan2034 force-pushed the cap-invals-at-chunk-min branch 2 times, most recently from f4a57f4 to e9c4ce4 Compare June 10, 2026 20:53

kpan2034 added 3 commits June 10, 2026 14:59

Add test to show incorrect capping

c21db67

Capping is done incorrectly when the OSM chunk is the earliest chunk.

Fix capping when OSM chunk is earliest

ddca3ff

kpan2034 force-pushed the cap-invals-at-chunk-min branch from e9c4ce4 to ddca3ff Compare June 10, 2026 20:59

kpan2034 force-pushed the cap-invals-at-chunk-min branch from 6d40793 to ff7856d Compare June 10, 2026 23:43

melihmutlu reviewed Jun 15, 2026

View reviewed changes

philkra modified the milestones: v2.28.0, v2.28.1, 2.28.2 Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap refresh start when hypertable has tiered data#9811

Cap refresh start when hypertable has tiered data#9811
kpan2034 wants to merge 4 commits into
timescale:mainfrom
kpan2034:cap-invals-at-chunk-min

kpan2034 commented May 13, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Uh oh!

melihmutlu Jun 15, 2026

Uh oh!

melihmutlu Jun 15, 2026

Uh oh!

melihmutlu Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

kpan2034 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 5, 2026

Uh oh!

Uh oh!

melihmutlu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

melihmutlu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

melihmutlu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

kpan2034 commented May 13, 2026 •

edited

Loading

codecov Bot commented Jun 5, 2026 •

edited

Loading