Skip to content

AES-GCM-SIV: Add implementation in C and assembly#10807

Open
SparkiDev wants to merge 1 commit into
wolfSSL:masterfrom
SparkiDev:aes_gcm_siv_asm
Open

AES-GCM-SIV: Add implementation in C and assembly#10807
SparkiDev wants to merge 1 commit into
wolfSSL:masterfrom
SparkiDev:aes_gcm_siv_asm

Conversation

@SparkiDev

Copy link
Copy Markdown
Contributor

Description

Added assembly for Intel x64, ARM64, ARM32, Thumb2.

Testing

Tested each platform with AES-GCM-SIV.

@SparkiDev SparkiDev self-assigned this Jun 29, 2026
@SparkiDev SparkiDev force-pushed the aes_gcm_siv_asm branch 3 times, most recently from 79ae040 to 8bf1491 Compare June 30, 2026 04:51
@dgarske dgarske self-requested a review June 30, 2026 15:58

@dgarske dgarske left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skoll Multi-Scan Review

Modes: review-securityOverall recommendation: COMMENT
Findings: 4 total — 4 posted, 0 skipped
2 finding(s) posted as inline comments (see file-level comments below)
2 finding(s) not tied to a diff line (full detail below)

One or more scans did not complete — the results below are partial.

Posted findings

  • [Medium] [review-security] SGX cpuid hard-set now forces CPUID_VAES | CPUID_AVX512, can crash AES-GCM on SGX CPUs lacking those extensionswolfcrypt/src/cpuid.c:56-57
  • [Medium] [review-security] New Thumb-2 AES-GCM-SIV asm and C wrappers are neither built nor run in the added CI.github/workflows/aesgcm-siv.yml:60-148

Findings not tied to a diff line

POLYVAL hash-key stack temporaries not zeroized (t, hrev)

File: wolfcrypt/src/aes.c (AesGcmSivPolyvalInit asm branch; AesGcmSivPolyvalInitSw)
Function: AesGcmSivPolyvalInit / AesGcmSivPolyvalInitSw
Severity: Low
Category: Zeroization

The per-message POLYVAL key is derived from the secret message-authentication-key. The persistent copies in AesGcmSivPolyval (poly->h / poly->m / poly->hHw) are correctly wiped by AesGcmSivPolyvalFinal via ForceZero(poly, sizeof(*poly)). However the intermediate stack buffers that hold transformed copies of that key are not wiped before going out of scope: byte t[WC_AES_BLOCK_SIZE] in the asm key-prep branch of AesGcmSivPolyvalInit (holds ByteReverse(mulX(ByteReverse(h)))), and byte hrev[WC_AES_BLOCK_SIZE] in both AesGcmSivPolyvalInitSw variants (holds ByteReverse(h)). These leave a permutation of the POLYVAL key on the stack after the function returns. This is defense-in-depth only (recovering it would let an attacker forge for that specific key+nonce), and the buffers are typically overwritten by later stack frames.

Recommendation: Add ForceZero(t, sizeof(t)); before the asm key-prep block returns and ForceZero(hrev, sizeof(hrev)); at the end of each AesGcmSivPolyvalInitSw variant, consistent with the zeroization used elsewhere in the SIV code.

Referenced code: wolfcrypt/src/aes.c (AesGcmSivPolyvalInit asm branch; AesGcmSivPolyvalInitSw) (9 lines)


ARM64 GHASH length conversion changed from ubfiz (#3,#32) to lsl #3, dropping upper-32-bit masking

File: wolfcrypt/src/port/arm/armv8-aes-asm.S (multiple *_partial_done labels); wolfcrypt/src/port/arm/armv8-aes-asm_c.c (AES_GCM_encrypt_final/decrypt_final/_AARCH64 paths)
Function: AES_GCM_*_AARCH64 / aes_gcm_*_arm64_crypto_*_partial_done
Severity: Info
Category: Logic

In the existing (non-SIV) AES-GCM AArch64 code regenerated by this PR, byte-length-to-bit-length conversions changed from ubfiz xN, xN, #3, #32 to lsl xN, xN, #3, and standalone ubfiz xN, xN, #0, #32 truncations (e.g. on tagSz in the *_part_tag paths) were removed. ubfiz #3,#32 masks the operand to its low 32 bits before shifting; lsl #3 shifts the full 64-bit register. The two are identical for any operand whose upper 32 bits are zero, which is always the case for the word32 length/tag-size arguments as materialized by wolfSSL callers (32-bit values are zero-extended). So no wrong tag results for valid wolfSSL usage, and these GCM paths are covered by the broader GCM test suite. The note is only that the new sequence is marginally less defensive against a caller leaving non-zero garbage in the upper 32 bits of the argument register.

Recommendation: No action required if these are deliberate asm-generator output; the behavior is equivalent for all in-tree callers. If the generator can emit it cheaply, retaining a 32-bit mask (or documenting that callers must pass zero-extended lengths) preserves the prior defensiveness.

Referenced code: wolfcrypt/src/port/arm/armv8-aes-asm.S (multiple *_partial_done labels); wolfcrypt/src/port/arm/armv8-aes-asm_c.c (AES_GCM_encrypt_final/decrypt_final/_AARCH64 paths) (4 lines)


Review generated by Skoll

Comment thread wolfcrypt/src/cpuid.c
Comment thread .github/workflows/aesgcm-siv.yml

@dgarske dgarske left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

macOS/clang build break:

--enable-intelasm --enable-aesgcm-siv fails to compile on an Intel MacBookPro:

aes.c:885: error: unused function 'AesEcbEncryptBlocks' [-Werror,-Wunused-function]
aes.c:912: error: unused function 'AesEcbDecryptBlocks'
aes.c:997: error: unused function 'AesCtrEncryptBlocks'

These three new static helpers' definitions are compiled in the AES-NI config, but their call sites (lines 7796/15153/15252) are behind narrower #if guards, so without AVX512/AES-ECB they're defined-but-unused. gcc tolerates unused static inline; clang -Werror doesn't, which is why all the Linux x86 hosts passed and only the Mac caught it.

Fix: align the guards (only define each helper when a caller is compiled) or mark them unused.

@SparkiDev

Copy link
Copy Markdown
Contributor Author

Add #ifdefs for the three functions.

@SparkiDev

SparkiDev commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

Jenkins: retest this please

FIPS aborted
fips-ready failed

@dgarske dgarske self-requested a review July 1, 2026 14:53
@SparkiDev SparkiDev force-pushed the aes_gcm_siv_asm branch 2 times, most recently from 16b5ab7 to 37e4187 Compare July 2, 2026 10:56
@SparkiDev SparkiDev assigned dgarske and unassigned SparkiDev Jul 2, 2026

@dgarske dgarske left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skoll Multi-Scan Review

Modes: review + review-securityOverall recommendation: REQUEST_CHANGES
Findings: 5 total — 5 posted, 0 skipped
4 finding(s) posted as inline comments (see file-level comments below)
1 finding(s) not tied to a diff line (full detail below)

Posted findings

  • [High] [review-security] Windows MASM AES_GCMSIV_ctr_ clobber round-count and ctr-pointer registers (wrong ciphertext + wild 16-byte write)* — wolfcrypt/src/aes_gcm_asm.asm
  • [Medium] [review+review-security] New AES-GCM-SIV pure-C and assembly POLYVAL/CTR variants (Windows MASM, Thumb2, AArch32/AArch64 base, GCM_WORD32, GCM_SMALL) not exercised by added CI.github/workflows/aesgcm-siv.yml:1-148
  • [Low] [review-security] Feature-guarding of AES-NI bulk helpers removes WC_MAYBE_UNUSED; verify no unguarded callerswolfcrypt/src/aes.c:908-1062
  • [Low] [review] wc_AesGcmSivEncrypt lacks WARN_UNUSED_RESULT while decrypt has itwolfssl/wolfcrypt/aes.h:838-847

Findings not tied to a diff line

POLYVAL update leaves plaintext/AAD residue on the stack (block/rev not zeroized)

File: wolfcrypt/src/aes.c
Function: AesGcmSivPolyvalUpdate
Severity: Low
Category: Zeroization

AesGcmSivPolyvalUpdate copies the trailing partial block of the hashed data into local buffers block[16] and rev[16] (XMEMCPY(block, data, partial) / AesGcmSivByteReverse(rev, ...)). During tag computation this data is plaintext (and AAD). Neither block nor rev is zeroized before return. AesGcmSivPolyvalFinal does ForceZero(poly, sizeof(*poly)) (covering the running sum and key/table material), but the Update-local buffers are separate stack storage and retain a 16-byte plaintext/AAD residue. Per the project rule to zero secrets with ForceZero(), this is a defense-in-depth gap. Impact is limited: the same plaintext already lives in caller-owned in/out buffers, and this matches wolfSSL's existing GHASH behavior, so it is Low.

Recommendation: Add ForceZero(block, sizeof(block)); ForceZero(rev, sizeof(rev)); before returning from AesGcmSivPolyvalUpdate (both the ASM-partial and C-fallback exits) to avoid leaving plaintext/AAD on the stack.

Referenced code: wolfcrypt/src/aes.c (13 lines)


Review generated by Skoll

ptr_L_aes_gcmsiv_ctr_aesni_one QWORD L_aes_gcmsiv_ctr_aesni_one
_DATA ENDS
_TEXT SEGMENT READONLY PARA
AES_GCMSIV_ctr_aesni PROC

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 [High] Windows MASM AES_GCMSIV_ctr_ clobber round-count and ctr-pointer registers (wrong ciphertext + wild 16-byte write)* · Security

All four new Windows MASM CTR routines mis-allocate registers versus the reference System V .S versions. On Windows x64 the args are in=rcx, out=rdx, length=r8, KS=r9, nr=[rsp+48], ctr=[rsp+56]. The prologue does mov eax,[rsp+48] (nr) and mov r10,[rsp+56] (ctr), then IMMEDIATELY executes xor eax, eax (destroying nr) and later mov r10d, r8d (destroying the ctr pointer with the length). Thereafter eax is used as the byte-offset index (add eax,64/lea [rcx+rax]) yet the AES round-count decisions are cmp eax, 11 / cmp eax, 13 -- comparing the byte offset instead of nr. In the correct SysV .S these are cmpl $11, %r8d (nr in r8) and the ctr write-back is movdqu %xmm7, (%r9) (ctr in r9, never clobbered). Consequences on Windows MASM builds (WOLFSSL_AESNI + WOLFSSL_AESGCM_SIV): (1) Incorrect keystream/ciphertext -- for AES-256 the first 64-byte block runs only 10 rounds while later blocks run 14; even AES-128 breaks past the first 64 bytes because eax>=13 forces 14 rounds. (2) Memory corruption -- at L_..._done_enc the movdqu OWORD PTR [r10], xmm7 writes the 16-byte counter to address r10 = length (masked), a caller-length-derived low address, not the ctr buffer (segfault or heap corruption). The AVX512 variant additionally uses cmp eax, 11/13 to decide round-key caching, leaving zmm23-26 uninitialized for AES-256. The bug is systematic across all four functions (generator register-remap collision) and is not exercised by the added CI (Linux .S + ARM qemu only), so it was not caught.

Fix: Do not ship the MASM .asm as generated. Regenerate aes_gcm_asm.asm with a generator fix so the round count (nr) is kept in a register that survives (e.g. keep it out of eax, which is the byte index) and so the ctr pointer register (r10) is not overwritten by length; the final counter write-back must target the ctr pointer. Cross-check against the SysV .S (nr=r8d, ctr=r9). Add a Windows/MSVC MASM CI job (or at minimum a local Windows AES-GCM-SIV KAT run for AES-128 >64B and AES-256) to validate before merge.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -0,0 +1,148 @@
name: AES-GCM-SIV (RFC 8452) tests

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 [Medium] New AES-GCM-SIV pure-C and assembly POLYVAL/CTR variants (Windows MASM, Thumb2, AArch32/AArch64 base, GCM_WORD32… · Missing Tests

The added workflow provides good RFC 8452 KAT coverage but which POLYVAL multiply / CTR backend it exercises depends on build config. It covers native x86_64 make check (software word64 table + Intel .S asm) and cross-compiled AArch64 (PMULL/NEON/base) and AArch32 with QEMU_CPU=max (crypto/vmull.p64 path). It does NOT exercise these newly written, security-sensitive paths: the Windows MASM .asm variant (which in fact contains a correctness+memory-safety bug -- see the AES_GCMSIV_ctr finding -- that this gap allowed through), the word32 table variant (GCM_WORD32, AesGcmSivGMult/AesGcmSivPolyvalInitSw word32 block in aes.c), the table-free GCM_SMALL variant, the AArch32 base (WC_POLYVAL_ASM_AARCH32_BASE) NO_HW_CRYPTO path (armhf job forces QEMU_CPU=max, selecting the crypto path), the AArch64 base (AES_GCMSIV_polyval_base, only reached with NEON+HW-crypto both disabled), and Thumb2 (AES_GCMSIV_polyval_thumb2 / gcm_siv_thumb2, targeting armv7-m which qemu-user cannot run). A transposition/reduction/register error in any of these would not be caught by CI. Severity views: review flagged this Medium/SUGGEST (test coverage); review-security flagged Medium (missing-tests, and it is the reason the High MASM defect went undetected).

Fix: Add a Windows MSVC/MASM job that builds with AES-NI + AES-GCM-SIV and runs the KATs. Add at least one x86 make check config forcing the word32 and GCM_SMALL software multiplies (e.g. CPPFLAGS=-DGCM_WORD32 and CPPFLAGS=-DGCM_SMALL) so the two untested pure-C POLYVAL variants run the RFC 8452 KATs at near-zero cost. Add an AArch32 NO_HW_CRYPTO base config (no QEMU_CPU=max) so the generated software-table CTR/POLYVAL variants are exercised. Note the Thumb-2 gap explicitly in release notes if it cannot be emulated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Comment thread wolfcrypt/src/aes.c
@@ -907,7 +908,8 @@ block cipher mechanism that uses n-bit binary string parameter key with 128-bits
/* Pick the widest available implementation at runtime. Callers must

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [Low] Feature-guarding of AES-NI bulk helpers removes WC_MAYBE_UNUSED; verify no unguarded callers · Logic

These x86_64 AES-NI bulk helpers were previously always compiled with WC_MAYBE_UNUSED WC_INLINE. The diff now wraps them in #ifdef HAVE_AES_ECB, #if defined(HAVE_AES_ECB) && defined(HAVE_AES_DECRYPT), and #ifdef WOLFSSL_AES_COUNTER respectively and drops WC_MAYBE_UNUSED. If any caller references one of these helpers in a configuration where the new guard is not satisfied (e.g. an AES-NI + AESGCM build without HAVE_AES_ECB / WOLFSSL_AES_COUNTER), the build would fail with an undefined-symbol error; conversely, removing WC_MAYBE_UNUSED could trigger unused-function warnings under -Werror in a sub-config where the guard is met but the function is not called. This is a portability/build concern rather than a runtime security issue, and the --enable-all-crypto --enable-intelasm CI config satisfies all guards, so it may be benign.

Fix: Confirm that every caller of AesEcbEncryptBlocks/AesEcbDecryptBlocks/AesCtrEncryptBlocks is itself guarded by the same macros (HAVE_AES_ECB / HAVE_AES_DECRYPT / WOLFSSL_AES_COUNTER), and build a narrow config such as AES-NI + AESGCM only (no AES_ECB, no AES_COUNTER) to ensure no regression. Keep WC_MAYBE_UNUSED if any guarded-but-uncalled path remains possible.

@SparkiDev SparkiDev Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guards correct.

Comment thread wolfssl/wolfcrypt/aes.h
* The POLYVAL hash is constant-time wherever the CPU provides carry-less
* multiply (x86 PCLMUL, Arm PMULL/VMULL) - the runtime default on such CPUs.
* Software-only builds fall back to a key-dependent 4-bit table (a cache-timing
* trade-off matching GCM_TABLE GHASH); GCM_SMALL avoids the table entirely. */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 [Low] wc_AesGcmSivEncrypt lacks WARN_UNUSED_RESULT while decrypt has it · Convention

wc_AesGcmSivDecrypt is annotated WOLFSSL_API WARN_UNUSED_RESULT (correct - the auth result must not be ignored), but wc_AesGcmSivEncrypt is only WOLFSSL_API. An ignored encrypt return could let a caller transmit uninitialized/garbage output or a stale tag after a BAD_FUNC_ARG (e.g. wrong key size). This mirrors the existing wc_AesGcmEncrypt signature, so it is consistent with the codebase, hence a NIT rather than a defect.

Fix: Consider adding WARN_UNUSED_RESULT to wc_AesGcmSivEncrypt for symmetry and to force callers to check for parameter/setup failures; acceptable to leave as-is to match wc_AesGcmEncrypt.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@dgarske dgarske left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tested correctly on my array of ARM and Intel machines, but Skoll review found some interesting things -> #10807 (review)

@dgarske dgarske assigned SparkiDev and unassigned dgarske Jul 2, 2026
Added assembly for Intel x64, ARM64, ARM32, Thumb2.
@SparkiDev SparkiDev assigned dgarske and unassigned SparkiDev Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants