Skip to content

tlog: Add MTC subtree primitives#8808

Open
beautifulentropy wants to merge 5 commits into
mainfrom
tlog-subtree
Open

tlog: Add MTC subtree primitives#8808
beautifulentropy wants to merge 5 commits into
mainfrom
tlog-subtree

Conversation

@beautifulentropy

@beautifulentropy beautifulentropy commented Jun 17, 2026

Copy link
Copy Markdown
Member

Implement the generation and verification of MTC subtree consistency proofs, which prove that an aligned interval of the log (a subtree) belongs to the tree under a given root hash. The MTC draft defines these in section 4.

x/mod's sumdb/tlog provides the hash primitives and the stored-hash reader these functions use. Its proofs cover only two cases: an inclusion proof for a single entry and a consistency proof for a prefix of the log. MTC requires a proof for any subtree; x/mod has no such proof.

Export four functions:

A note to reviewers: this package and its tests were written with assistance from Claude Opus 4.8 and Fable 5. That being said, it has been reviewed and extensively revised by myself before submission.

@beautifulentropy beautifulentropy changed the title tlog: Add MTC subtree generation and verification tlog: Add MTC subtree primitives Jun 17, 2026
@beautifulentropy beautifulentropy marked this pull request as ready for review June 17, 2026 18:58
@beautifulentropy beautifulentropy requested a review from a team as a code owner June 17, 2026 18:58
@beautifulentropy beautifulentropy requested a review from jsha June 22, 2026 17:46

@jsha jsha left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a first look, my high level comments:

There are a number of places where subtree.go panics. I notice that the upstream tlog.go panics in a bunch of places where the match is wrong too, but I'd like to see if we can have fewer panic sites to reason about. For instance, what if ValidSubtree() returns a Subtree() object, which has accessors for Hi(), Lo(), Size(), and Height(). We can guarantee in the contract for Size() and Height(), for instance, that they are positive (non-negative and non-zero). This might also let us consolidate the various //nolint: gosec // G115 correctness comments into a smaller number of places.

We could make this into a constructor:

type Subtree struct {
  start, len, treeSize uint64
}

func New(start, end, treeSize int64) (Subtree, error) {}

I notice that both internal callers of ValidSubtree() check end > treeSize immediately after. That suggests to me that ValidSubtree() should be a function of not just start and end, but also of tree size, so that we can check that property as part of it.

Comment thread trees/subtree/subtree.go
}
// bitCeil is BIT_CEIL(end-start). A multiple of a power of two has its low
// bits zero, so start & (bitCeil-1) == 0 becomes our validity test.
bitCeil := uint64(1) << bits.Len64(uint64(end-start-1)) //nolint:gosec // G115: start < end, so end-start-1 is non-negative.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments and the spec say we should use BIT_CEIL(end-start), but the code says end-start-1. Why's that?

Also, a nit: it's nice for the correctness comment to be easily matched to the code that enforces it. So saying "start >= end, so end-start is positive" would be a little easier to validate. Also note that "positive" is a stronger constraint than "non-negative" since it excludes zero.

@beautifulentropy beautifulentropy Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments and the spec say we should use BIT_CEIL(end-start), but the code says end-start-1. Why's that?

Fair, there needs to be a much better comment above this line explaining what's going on here; I'll add that.

How about:

	// start must be a multiple of BIT_CEIL(end-start). bits.Len64(x) is the bit
	// width of x, so 1<<bits.Len64(x) is the smallest power of two strictly
	// above x, an exclusive ceiling. BIT_CEIL(x) is inclusive, the smallest
	// power of two at least x, so we apply it to end-start-1.
	bitCeil := uint64(1) << bits.Len64(uint64(end-start-1)) //nolint:gosec // G115: the start >= end check above leaves end-start positive, so end-start-1 is non-negative.

	// bitCeil-1 masks the bits below bitCeil, so start & (bitCeil-1) is zero
	// exactly when start is a multiple of bitCeil.
	return uint64(start)&(bitCeil-1) == 0
}

@beautifulentropy beautifulentropy Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a nit: it's nice for the correctness comment to be easily matched to the code that enforces it. So saying "start >= end, so end-start is positive" would be a little easier to validate. Also note that "positive" is a stronger constraint than "non-negative" since it excludes zero.

Yes, start > end should read start >= end. But end - start - 1 is non-negative, not positive. A single leaf, for instance [7, 8), is still a valid subtree, and 8 - 7 - 1 = 0.

How about:

//nolint:gosec // G115: the start >= end check above leaves end-start positive (at least 1), so end-start-1 is non-negative.

Comment thread trees/subtree/subtree.go
Comment on lines +49 to +50
// bitCeil is BIT_CEIL(end-start). A multiple of a power of two has its low
// bits zero, so start & (bitCeil-1) == 0 becomes our validity test.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence "A multiple of a power of two has its low bits zero ..." should go one line lower, right above where we perform this calculation.

Comment thread tlog/subtree.go Outdated
Comment on lines +31 to +38
}

// Split the list into two subtree roots, the left being a "perfect" subtree
// and the right being the remainder which may or may not be perfect.
k := largestPowerOfTwoSmallerThan(int64(len(leaves)))

// Hash the two subtree roots together as SHA-256(0x01 || left || right).
return xtlog.NodeHash(SubtreeHash(leaves[:k]), SubtreeHash(leaves[k:]))

@jsha jsha Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these lines should be combined into a default: clause of the switch, to emphasize the fact that there are three different possible outcomes, and this case is only possible when len(leaves) is >=2.

Edit: see above, I think we can get rid of this function.

Comment thread tlog/subtree.go Outdated
return int64(1) << (bits.Len64(uint64(n-1)) - 1) //nolint:gosec // G115: n > 1, so n-1 is positive.
}

// SubtreeHash returns the RFC 6962 section 2.1 Merkle Tree Hash over leaves

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since a subtree is a Merkle tree, this calculates a Merkle Tree Hash. I think this function is equivalent to tlog.TreeHash and we can reuse that, right?

I note that this function takes a list of leaves in-memory, while tlog.TreeHash takes a HashReader, which generalizes over hash storage. But if we want to use leaves in-memory, we can pass a HashReaderFunc that accesses them.

So, I think we can do without this function.

Comment thread tlog/subtree.go Outdated
Comment on lines +93 to +98
// rangeHash returns MTH(D[lo:hi)), the RFC 6962 section 2.1 Merkle Tree Hash
// over the leaves in [lo, hi) as an independent list, read through the provided
// reader. It decomposes [lo, hi) into its maximal aligned perfect subtrees and
// reads all of their roots in a single ReadHashes call before folding them
// together.
func rangeHash(lo, hi int64, reader xtlog.HashReader) (xtlog.Hash, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also be implemented in terms of tlog.TreeHash(hi-lo, someReader).

The distinction is that TreeHash will make storage calls starting from zero, but we need to make sure our storage layer is correctly indexing based on the subtree we want. I think we can achieve that with a tlog.HashReaderFunc:

func subtreeHash(lo, hi int64, reader xtlog.HashReader) (xtlog.Hash, error) {
  return tlog.TreeHash(hi-lo, func(zeroBasedIndexes []int64) []Hash {
    var absoluteIndexes []int64
    for _, i := range zeroBasedIndexes {
      absoluteIndexes = append(absoluteIndexes, i+lo)
    }
    return reader.ReadHashes(absoluteIndexes)
  }
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the HashReaderFunc is very clever, but the way you're calculating the offset is subtly wrong. The indexes TreeHash hands our reader aren't leaf numbers. They're stored hash indexes, a flat numbering of every node in the tree, with leaves and internal nodes interleaved. So i + lo is adding a leaf count to something that isn't a leaf position. The offset has to account for the node's level. A leaf may be lo away from another leaf, but a node spanning 8 leaves moves by only lo / 8.

So instead of offsetting the index, we decode it, shift its position by lo scaled to that level, and re-encode:

level, n := tlog.SplitStoredHashIndex(idx)
absolute := tlog.StoredHashIndex(level, (lo>>level)+n)

Which gives us:

func rangeHash(lo, hi int64, reader tlog.HashReader) (tlog.Hash, error) {
	return tlog.TreeHash(hi-lo, tlog.HashReaderFunc(
		func(zeroBasedIndexes []int64) ([]tlog.Hash, error) {
			absoluteIndexes := make([]int64, len(zeroBasedIndexes))
			for i, idx := range zeroBasedIndexes {
				level, n := tlog.SplitStoredHashIndex(idx)
				absoluteIndexes[i] = tlog.StoredHashIndex(level, (lo>>level)+n)
			}
			return reader.ReadHashes(absoluteIndexes)
		}))
}

But fixing the offset surfaces our next problem. lo>>level is only exact when lo is a multiple of 2^level, which holds for every range we currently pass to rangeHash(). If a later change passes a lo that isn't, the shift drops bits, the index points to a different node, and we silently fold a wrong hash.

So we would need to add a guard:

func rangeHash(lo, hi int64, reader tlog.HashReader) (tlog.Hash, error) {
	return tlog.TreeHash(hi-lo, tlog.HashReaderFunc(
		func(zeroBasedIndexes []int64) ([]tlog.Hash, error) {
			absoluteIndexes := make([]int64, len(zeroBasedIndexes))
			for i, idx := range zeroBasedIndexes {
				level, n := tlog.SplitStoredHashIndex(idx)
				loAtLevel := lo >> level
				if loAtLevel<<level != lo {
					return nil, fmt.Errorf("subtreeHash: lo=%d is not a multiple of 2^%d", lo, level)
				}
				absoluteIndexes[i] = tlog.StoredHashIndex(level, loAtLevel+n)
			}
			return reader.ReadHashes(absoluteIndexes)
		}))
}

For what it's worth, your idea does work. That said, the version we have doesn't require this awkward translation layer that's somewhat difficult to reason about.

Comment thread trees/subtree/subtree.go
// that the leaves are an aligned subtree.
//
// https://datatracker.ietf.org/doc/html/rfc9162#section-2.1.1
func Hash(leaves []tlog.Hash) tlog.Hash {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API shape is surprising to me. Being willing to hash an arbitrary set of leaves a) feels like something to be left for a general package (like xtlog itself), not subtree; and b) feels dangerous! What if the caller doesn't respect the contract that they need to make sure it's a valid subtree first?

I think this package should only expose func SubtreeHash(start, end int64, reader tlog.HashReader) tlog.Hash. It can verify that the interval is a valid subtree, then read all the necessary hashes from the reader itself. This guarantees that we never supply an invalid set of input hashes.

I think the existing API of this function might be acceptable as a private helper, but not as part of the public interface of the package.

Comment thread trees/subtree/subtree.go
Comment on lines +11 to +16
// largestPowerOfTwoSmallerThan returns the largest power of two strictly less
// than n, for n > 1. n <= 1 results in a panic.
func largestPowerOfTwoSmallerThan(n int64) int64 {
if n <= 1 {
panic(fmt.Sprintf("n must be > 1, got %d", n))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There are multiple callers of this function that don't (at least, not obviously to me as a reader) guarantee that n > 1. Seems like we're risking a panic.
  2. Why is panicking the correct answer? What goes wrong if we return 0? Or have an explicit error return?

These comments may be obviated if you change the function signature entirely as I suggest above.

Comment thread trees/subtree/subtree.go
Comment on lines +57 to +58
// perfectSubtree reports whether [start, end) is an aligned perfect subtree
// (power-of-two size, start aligned to that size), and if so its level.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the case that a (hypothetical) unaligned perfect subtree would simply not be a valid subtree at all?

If so, can't this be simplified to bits.OnesCount64(end-start) == 1 && valid(start, end)?

Comment thread trees/subtree/subtree.go
return h, nil
}

func appendIntervalHash(start, end int64, reader tlog.HashReader, proof []tlog.Hash) ([]tlog.Hash, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the broken out into a helper function? It has exactly one caller, no documentation, and is just a single stanza. I suggest inlining it into subtreeSubProof.

Comment thread trees/subtree/subtree.go
Comment on lines +87 to +95
func perfectSubtreeIndexes(start, end int64, storedHashIndexes []int64) []int64 {
level, ok := perfectSubtree(start, end)
if ok {
return append(storedHashIndexes, tlog.StoredHashIndex(level, start>>level))
}
k := largestPowerOfTwoSmallerThan(end - start)
storedHashIndexes = perfectSubtreeIndexes(start, start+k, storedHashIndexes)
return perfectSubtreeIndexes(start+k, end, storedHashIndexes)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a very non-idiomatic form of recursion, passing the partial result into each sub-call. I think we can get the exact same behavior much more simply:

Suggested change
func perfectSubtreeIndexes(start, end int64, storedHashIndexes []int64) []int64 {
level, ok := perfectSubtree(start, end)
if ok {
return append(storedHashIndexes, tlog.StoredHashIndex(level, start>>level))
}
k := largestPowerOfTwoSmallerThan(end - start)
storedHashIndexes = perfectSubtreeIndexes(start, start+k, storedHashIndexes)
return perfectSubtreeIndexes(start+k, end, storedHashIndexes)
}
func perfectSubtreeIndexes(start, end int64) []int64 {
level, ok := perfectSubtree(start, end)
if ok {
return []int64{tlog.StoredHashIndex(level, start>>level))}
}
k := largestPowerOfTwoSmallerThan(end - start)
return append(perfectSubtreeIndexes(start, start+k), perfectSubtreeIndexes(start+k, end)...)
}

Comment thread trees/subtree/subtree.go

// largestPowerOfTwoSmallerThan returns the largest power of two strictly less
// than n, for n > 1. n <= 1 results in a panic.
func largestPowerOfTwoSmallerThan(n int64) int64 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to make the purpose of this function even more obvious, I'd change the signature to:

// splitPoint returns the index at which the given subtree should be split in order to
// produce a perfect subtree on the left and a (potentially) ragged-right subtree on the right.
func splitPoint(start, end int64) int64 {

This would allow the function's return value to directly correspond to mid as used in draft-ietf-plants-merkle-tree-certs-04. It would also allow you to simplify code like below:

-	k := largestPowerOfTwoSmallerThan(end - start)
-	left, rest := combineIntervalHash(start, start+k, hashes)
-	right, rest := combineIntervalHash(start+k, end, rest)
+	mid := splitPoint(start, end)
+	left, rest := combineIntervalHash(start, mid, hashes)
+	right, rest := combineIntervalHash(mid, end, rest)

Comment thread trees/subtree/subtree.go
// through the provided reader. It splits [start, end) into the largest
// power-of-two subtrees the tree already keeps a single stored hash for, reads
// those hashes in a single ReadHashes call, and combines them.
func intervalHash(start, end int64, reader tlog.HashReader) (tlog.Hash, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only ever used to hash valid subtrees, not arbitrary intervals. So why are we changing language to talk about intervals instead of subtrees?

Comment thread trees/subtree/subtree.go
Comment on lines +103 to +114
indexes := perfectSubtreeIndexes(start, end, nil)
hashes, err := reader.ReadHashes(indexes)
if err != nil {
return tlog.Hash{}, err
}
if len(hashes) != len(indexes) {
// Reader returned a slice shorter or larger than the requested indexes.
// Avoid panicking in combineIntervalHash.
return tlog.Hash{}, fmt.Errorf("ReadHashes returned %d hashes for %d indexes", len(hashes), len(indexes))
}
h, _ := combineIntervalHash(start, end, hashes)
return h, nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels too clever, as though you're optimizing for a case we don't actually have (and don't want to support). Even if we do want to compute hashes of arbitrary intervals (which as I note above, I don't think we do -- I'm pretty sure all of our inputs to this function are valid subtrees), the MTC draft shows that any arbitrary interval can be decomposed into two subtrees: a perfect left subtree, and a ragged right subtree.

This function's approach is to do a big recursive "find all the indices", followed by a second recursive "combine the hashes". But the whole thing can be done in a single, much-more-simply recursive "look up hash of left perfect tree, combine with recursive computation of hash of right ragged tree".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants