Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ require (
go.opentelemetry.io/otel/trace v1.43.0
go.yaml.in/yaml/v3 v3.0.4
golang.org/x/crypto v0.52.0
golang.org/x/mod v0.35.0
golang.org/x/net v0.55.0
golang.org/x/sync v0.20.0
golang.org/x/term v0.43.0
Expand Down Expand Up @@ -85,7 +86,6 @@ require (
go.opentelemetry.io/otel/metric v1.43.0 // indirect
go.opentelemetry.io/proto/otlp v1.7.1 // indirect
go.uber.org/atomic v1.11.0 // indirect
golang.org/x/mod v0.35.0 // indirect
golang.org/x/sys v0.45.0 // indirect
golang.org/x/tools v0.44.0 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20251202230838-ff82c1b0f217 // indirect
Expand Down
287 changes: 287 additions & 0 deletions trees/subtree/subtree.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
package subtree

import (
"crypto/sha256"
"fmt"
"math/bits"

"golang.org/x/mod/sumdb/tlog"
)

// largestPowerOfTwoSmallerThan returns the largest power of two strictly less
// than n, for n > 1. n <= 1 results in a panic.
func largestPowerOfTwoSmallerThan(n int64) int64 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: to make the purpose of this function even more obvious, I'd change the signature to:

// splitPoint returns the index at which the given subtree should be split in order to
// produce a perfect subtree on the left and a (potentially) ragged-right subtree on the right.
func splitPoint(start, end int64) int64 {

This would allow the function's return value to directly correspond to mid as used in draft-ietf-plants-merkle-tree-certs-04. It would also allow you to simplify code like below:

-	k := largestPowerOfTwoSmallerThan(end - start)
-	left, rest := combineIntervalHash(start, start+k, hashes)
-	right, rest := combineIntervalHash(start+k, end, rest)
+	mid := splitPoint(start, end)
+	left, rest := combineIntervalHash(start, mid, hashes)
+	right, rest := combineIntervalHash(mid, end, rest)

if n <= 1 {
panic(fmt.Sprintf("n must be > 1, got %d", n))
}
Comment on lines +11 to +16

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There are multiple callers of this function that don't (at least, not obviously to me as a reader) guarantee that n > 1. Seems like we're risking a panic.
  2. Why is panicking the correct answer? What goes wrong if we return 0? Or have an explicit error return?

These comments may be obviated if you change the function signature entirely as I suggest above.

return int64(1) << (bits.Len64(uint64(n-1)) - 1) //nolint:gosec // G115: n > 1, so n-1 is positive.
}

// Hash returns the RFC 9162 section 2.1.1 Merkle Tree Hash over leaves treated
// as an independent list. It combines the list it is given without checking
// that the leaves are an aligned subtree.
//
// https://datatracker.ietf.org/doc/html/rfc9162#section-2.1.1
func Hash(leaves []tlog.Hash) tlog.Hash {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API shape is surprising to me. Being willing to hash an arbitrary set of leaves a) feels like something to be left for a general package (like xtlog itself), not subtree; and b) feels dangerous! What if the caller doesn't respect the contract that they need to make sure it's a valid subtree first?

I think this package should only expose func SubtreeHash(start, end int64, reader tlog.HashReader) tlog.Hash. It can verify that the interval is a valid subtree, then read all the necessary hashes from the reader itself. This guarantees that we never supply an invalid set of input hashes.

I think the existing API of this function might be acceptable as a private helper, but not as part of the public interface of the package.

switch len(leaves) {
case 0:
// The hash of an empty list is the hash of an empty string.
return tlog.Hash(sha256.Sum256(nil))
case 1:
// The hash of a list with one entry is just the leaf hash.
return leaves[0]
}

// Split the list into two subtree roots, the left being a "perfect" subtree
// and the right being the remainder which may or may not be perfect.
k := largestPowerOfTwoSmallerThan(int64(len(leaves)))

// Combine the two parts' roots as SHA-256(0x01 || left || right).
return tlog.NodeHash(Hash(leaves[:k]), Hash(leaves[k:]))
}

// valid reports whether [start, end) is a valid subtree per the MTC draft
// section 4.1 Definition of a Subtree: 0 <= start < end and start is a multiple
// of BIT_CEIL(end - start).
func valid(start, end int64) bool {
if start < 0 || start >= end {
// A subtree must have 0 <= start < end.
return false
}
// bitCeil is BIT_CEIL(end-start). A multiple of a power of two has its low
// bits zero, so start & (bitCeil-1) == 0 becomes our validity test.
Comment on lines +51 to +52

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sentence "A multiple of a power of two has its low bits zero ..." should go one line lower, right above where we perform this calculation.

bitCeil := uint64(1) << bits.Len64(uint64(end-start-1)) //nolint:gosec // G115: start < end, so end-start-1 is non-negative.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments and the spec say we should use BIT_CEIL(end-start), but the code says end-start-1. Why's that?

Also, a nit: it's nice for the correctness comment to be easily matched to the code that enforces it. So saying "start >= end, so end-start is positive" would be a little easier to validate. Also note that "positive" is a stronger constraint than "non-negative" since it excludes zero.

@beautifulentropy beautifulentropy Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments and the spec say we should use BIT_CEIL(end-start), but the code says end-start-1. Why's that?

Fair, there needs to be a much better comment above this line explaining what's going on here; I'll add that.

How about:

	// start must be a multiple of BIT_CEIL(end-start). bits.Len64(x) is the bit
	// width of x, so 1<<bits.Len64(x) is the smallest power of two strictly
	// above x, an exclusive ceiling. BIT_CEIL(x) is inclusive, the smallest
	// power of two at least x, so we apply it to end-start-1.
	bitCeil := uint64(1) << bits.Len64(uint64(end-start-1)) //nolint:gosec // G115: the start >= end check above leaves end-start positive, so end-start-1 is non-negative.

	// bitCeil-1 masks the bits below bitCeil, so start & (bitCeil-1) is zero
	// exactly when start is a multiple of bitCeil.
	return uint64(start)&(bitCeil-1) == 0
}

@beautifulentropy beautifulentropy Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, a nit: it's nice for the correctness comment to be easily matched to the code that enforces it. So saying "start >= end, so end-start is positive" would be a little easier to validate. Also note that "positive" is a stronger constraint than "non-negative" since it excludes zero.

Yes, start > end should read start >= end. But end - start - 1 is non-negative, not positive. A single leaf, for instance [7, 8), is still a valid subtree, and 8 - 7 - 1 = 0.

How about:

//nolint:gosec // G115: the start >= end check above leaves end-start positive (at least 1), so end-start-1 is non-negative.

return uint64(start)&(bitCeil-1) == 0
}

// perfectSubtree reports whether [start, end) is an aligned perfect subtree
// (power-of-two size, start aligned to that size), and if so its level.
Comment on lines +57 to +58

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it the case that a (hypothetical) unaligned perfect subtree would simply not be a valid subtree at all?

If so, can't this be simplified to bits.OnesCount64(end-start) == 1 && valid(start, end)?

func perfectSubtree(start, end int64) (level int, ok bool) {
if start < 0 || start >= end || end < 0 {
panic(fmt.Sprintf("invalid interval [%d, %d)", start, end))
}
size := end - start
if bits.OnesCount64(uint64(size)) != 1 || start&(size-1) != 0 { //nolint:gosec // G115: callers pass start < end, so size is positive.
return 0, false
}
return bits.TrailingZeros64(uint64(size)), true //nolint:gosec // G115: callers pass start < end, so size is positive.
}

// combineIntervalHash combines subtree roots, in the order
// perfectSubtreeIndexes lists them, into MTH(D[start:end]). It returns the hash
// and the unconsumed remainder.
func combineIntervalHash(start, end int64, hashes []tlog.Hash) (tlog.Hash, []tlog.Hash) {
_, ok := perfectSubtree(start, end)
if ok {
return hashes[0], hashes[1:]
}
k := largestPowerOfTwoSmallerThan(end - start)
left, rest := combineIntervalHash(start, start+k, hashes)
right, rest := combineIntervalHash(start+k, end, rest)
return tlog.NodeHash(left, right), rest
}

// perfectSubtreeIndexes splits [start, end) into the largest power-of-two
// subtrees the tree already keeps a single stored hash for, and appends each
// one's stored hash index left to right.
func perfectSubtreeIndexes(start, end int64, storedHashIndexes []int64) []int64 {
level, ok := perfectSubtree(start, end)
if ok {
return append(storedHashIndexes, tlog.StoredHashIndex(level, start>>level))
}
k := largestPowerOfTwoSmallerThan(end - start)
storedHashIndexes = perfectSubtreeIndexes(start, start+k, storedHashIndexes)
return perfectSubtreeIndexes(start+k, end, storedHashIndexes)
}
Comment on lines +87 to +95

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a very non-idiomatic form of recursion, passing the partial result into each sub-call. I think we can get the exact same behavior much more simply:

Suggested change
func perfectSubtreeIndexes(start, end int64, storedHashIndexes []int64) []int64 {
level, ok := perfectSubtree(start, end)
if ok {
return append(storedHashIndexes, tlog.StoredHashIndex(level, start>>level))
}
k := largestPowerOfTwoSmallerThan(end - start)
storedHashIndexes = perfectSubtreeIndexes(start, start+k, storedHashIndexes)
return perfectSubtreeIndexes(start+k, end, storedHashIndexes)
}
func perfectSubtreeIndexes(start, end int64) []int64 {
level, ok := perfectSubtree(start, end)
if ok {
return []int64{tlog.StoredHashIndex(level, start>>level))}
}
k := largestPowerOfTwoSmallerThan(end - start)
return append(perfectSubtreeIndexes(start, start+k), perfectSubtreeIndexes(start+k, end)...)
}


// intervalHash returns MTH(D[start:end]), the RFC 9162 section 2.1.1 Merkle
// Tree Hash over the leaves in [start, end) as an independent list, read
// through the provided reader. It splits [start, end) into the largest
// power-of-two subtrees the tree already keeps a single stored hash for, reads
// those hashes in a single ReadHashes call, and combines them.
func intervalHash(start, end int64, reader tlog.HashReader) (tlog.Hash, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is only ever used to hash valid subtrees, not arbitrary intervals. So why are we changing language to talk about intervals instead of subtrees?

indexes := perfectSubtreeIndexes(start, end, nil)
hashes, err := reader.ReadHashes(indexes)
if err != nil {
return tlog.Hash{}, err
}
if len(hashes) != len(indexes) {
// Reader returned a slice shorter or larger than the requested indexes.
// Avoid panicking in combineIntervalHash.
return tlog.Hash{}, fmt.Errorf("ReadHashes returned %d hashes for %d indexes", len(hashes), len(indexes))
}
h, _ := combineIntervalHash(start, end, hashes)
return h, nil
Comment on lines +103 to +114

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels too clever, as though you're optimizing for a case we don't actually have (and don't want to support). Even if we do want to compute hashes of arbitrary intervals (which as I note above, I don't think we do -- I'm pretty sure all of our inputs to this function are valid subtrees), the MTC draft shows that any arbitrary interval can be decomposed into two subtrees: a perfect left subtree, and a ragged right subtree.

This function's approach is to do a big recursive "find all the indices", followed by a second recursive "combine the hashes". But the whole thing can be done in a single, much-more-simply recursive "look up hash of left perfect tree, combine with recursive computation of hash of right ragged tree".

}

func appendIntervalHash(start, end int64, reader tlog.HashReader, proof []tlog.Hash) ([]tlog.Hash, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the broken out into a helper function? It has exactly one caller, no documentation, and is just a single stanza. I suggest inlining it into subtreeSubProof.

h, err := intervalHash(start, end, reader)
if err != nil {
return nil, err
}
return append(proof, h), nil
}

// subtreeSubProof implements SUBTREE_SUBPROOF(start, end, D_n, b) from the MTC
// draft section 4.4.1 Generating a Subtree Consistency Proof, detailed further
// in the draft's Appendix B.4. start and end are relative to the current
// subtree D_n of size n rooted at absolute offset base, and known is the
// draft's b flag. It reads stored hashes through the provided reader and
// returns proof with the hashes it emits appended.
func subtreeSubProof(start, end, base, n int64, known bool, reader tlog.HashReader, proof []tlog.Hash) ([]tlog.Hash, error) {
if start == 0 && end == n {
// [start, end) now covers this whole node D_n, the SUBTREE_SUBPROOF
// base case. known decides whether the proof carries it.
if known {
// The verifier already has this node, so emit nothing.
return proof, nil
}

// The verifier doesn't have it, so emit its hash MTH(D_n).
h, err := intervalHash(base, base+n, reader)
if err != nil {
return nil, err
}
return append(proof, h), nil
}

// [start, end) covers only part of this node, so split at k. The switch
// routes by where the subtree falls (left child, right child, or straddle)
// and names the other child as the sibling the shared tail appends.
k := largestPowerOfTwoSmallerThan(n)
var err error
var siblingStart int64
var siblingEnd int64
switch {
case end <= k:
// The subtree fits in the left child. Recurse there, with the right
// child [k, n) as the sibling.
proof, err = subtreeSubProof(start, end, base, k, known, reader, proof)
siblingStart = base + k
siblingEnd = base + n
case k <= start:
// The subtree fits in the right child. Recurse there (shifting
// coordinates by k), with the left child [0, k) as the sibling.
proof, err = subtreeSubProof(start-k, end-k, base+k, n-k, known, reader, proof)
siblingStart = base
siblingEnd = base + k
default:
// The subtree straddles the split (start < k < end), which a valid
// subtree only does when start == 0. Recurse on the right child's
// prefix [0, end-k), no longer a node the verifier knows (known =
// false), with the left child [0, k) as the sibling.
proof, err = subtreeSubProof(0, end-k, base+k, n-k, false, reader, proof)
siblingStart = base
siblingEnd = base + k
}
if err != nil {
return nil, err
}
return appendIntervalHash(siblingStart, siblingEnd, reader, proof)
}

// ConsistencyProof returns SUBTREE_PROOF(start, end, D_n) for the tree of size
// treeSize, reading stored hashes through the provided reader, per the MTC
// draft section 4.4.1 Generating a Subtree Consistency Proof, detailed further
// in the draft's Appendix B.4. [start, end) must be a valid subtree with end <=
// treeSize.
//
// - https://ietf-plants-wg.github.io/merkle-tree-certs/draft-ietf-plants-merkle-tree-certs.html#section-4.4.1
// - https://ietf-plants-wg.github.io/merkle-tree-certs/draft-ietf-plants-merkle-tree-certs.html#appendix-B.4
func ConsistencyProof(start, end, treeSize int64, reader tlog.HashReader) ([]tlog.Hash, error) {
if !valid(start, end) || end > treeSize {
return nil, fmt.Errorf("[%d, %d) is not a valid subtree of a tree of size %d", start, end, treeSize)
}
return subtreeSubProof(start, end, 0, treeSize, true, reader, nil)
}

// VerifyConsistency reports whether proof shows that the subtree [start, end),
// whose hash is nodeHash, sits at those positions in the tree of size n with
// root rootHash. It follows the procedure in MTC draft section 4.4.3, detailed
// further in the draft's Appendix B.5.
//
// - https://ietf-plants-wg.github.io/merkle-tree-certs/draft-ietf-plants-merkle-tree-certs.html#section-4.4.3
// - https://ietf-plants-wg.github.io/merkle-tree-certs/draft-ietf-plants-merkle-tree-certs.html#appendix-B.5
func VerifyConsistency(start, end, n int64, proof []tlog.Hash, nodeHash, rootHash tlog.Hash) bool {
if !valid(start, end) || end > n {
return false
}

// fn, sn, tn track the subtree's first leaf, its last leaf, and the tree's
// last leaf. Right-shifting a cursor climbs one level.
fn := start
sn := end - 1
tn := n - 1

// Skip the levels that need no proof hash. The branch turns on whether the
// subtree's right edge meets the tree's right edge (sn == tn) or not.
if sn == tn {
// A flush subtree has no outside sibling to combine on the way up to
// nodeHash, so climb every level.
for fn != sn {
fn >>= 1
sn >>= 1
tn >>= 1
}
} else {
// An interior subtree eventually meets an outside sibling, so climb
// only while sn is a right child.
for fn != sn && sn&1 == 1 {
fn >>= 1
sn >>= 1
tn >>= 1
}
}

// fr and sr climb together from a shared seed: fr rebuilds the subtree
// hash, sr the tree root.
var fr tlog.Hash
var sr tlog.Hash
var rest []tlog.Hash
if fn == sn {
// A single node: the seed is its hash, nodeHash.
fr = nodeHash
sr = nodeHash
rest = proof
} else {
// The subtree is larger, so the seed is proof[0], the largest perfect
// subtree flush with its right edge.
if len(proof) == 0 {
return false
}
fr = proof[0]
sr = proof[0]
rest = proof[1:]
}

for _, c := range rest {
if tn == 0 {
// The proof has more hashes than the tree has levels.
return false
}
if sn&1 == 1 || sn == tn {
if fn < sn {
// fr only combines while fn < sn. Freezing it at fn == sn is
// what makes the final fr == nodeHash check meaningful.
fr = tlog.NodeHash(c, fr)
}
sr = tlog.NodeHash(c, sr)
// At the ragged right edge (sn == tn) the just-combined node is
// shorter than its left sibling, so skip its empty levels here,
// consuming no proof hash, until sn is odd again.
for sn&1 == 0 {
fn >>= 1
sn >>= 1
tn >>= 1
}
} else {
// c is the node's right sibling, outside the subtree, so it extends
// sr toward the root.
sr = tlog.NodeHash(sr, c)
}
fn >>= 1
sn >>= 1
tn >>= 1
}
return tn == 0 && fr == nodeHash && sr == rootHash
}
Loading