Add compact_chunk function#9957
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
ff03edc to
2dce6bf
Compare
|
@svenklemm, @kpan2034: please review this pull request.
|
2dce6bf to
3a9fb1a
Compare
3a9fb1a to
1d729b9
Compare
This new function will compact the chunk by looking for overlapping batches and combining them together in order to produce globally ordered chunks. This is change is a first step towards supporting direct compress in production workloads.
| 08:20 > 08:11 → OVERLAP → merge | ||
| ``` | ||
|
|
||
| ### 6. Mixed-null batch overlaps a neighbor |
There was a problem hiding this comment.
Do we still have special handling for nulls? I thought the firstlast index just handles this transparently, i.e. the comparison gives total order that includes the nulls, and we just apply the comparison regardless of the values.
There was a problem hiding this comment.
Only special handling that is in place is that nulls get put into a separate batch... I guess that's not necessary anymore and is more of an artifact of previous version.
Do you think that's fine or should we handle it without separating the NULL batch.
There was a problem hiding this comment.
I'd remove it because it just extra logic for no particular reason, looks confusing.
| * using each column's NULLS FIRST/LAST setting. | ||
| */ | ||
| static bool | ||
| batches_overlap_firstlast(RecompressContext *recompress_ctx, Datum *prev_last, |
There was a problem hiding this comment.
This is basically comparison of tuples on particular columns. Would it be more convenient to reuse the standard SortSupport and ApplySortComparator Postgres functions?
There was a problem hiding this comment.
Can take a look if it makes our lives easier.
akuzm
left a comment
There was a problem hiding this comment.
Looks much more straighforward with the new sparse index type.
This new function will compact the chunk by looking for overlapping batches and combining them together in order to produce globally ordered chunks. This is change is a first step towards supporting direct compress in production workloads.