AES-GCM-SIV: Add implementation in C and assembly#10807
Conversation
79ae040 to
8bf1491
Compare
dgarske
left a comment
There was a problem hiding this comment.
Skoll Multi-Scan Review
Modes: review-securityOverall recommendation: COMMENT
Findings: 4 total — 4 posted, 0 skipped
2 finding(s) posted as inline comments (see file-level comments below)
2 finding(s) not tied to a diff line (full detail below)
One or more scans did not complete — the results below are partial.
Posted findings
- [Medium] [review-security] SGX cpuid hard-set now forces CPUID_VAES | CPUID_AVX512, can crash AES-GCM on SGX CPUs lacking those extensions —
wolfcrypt/src/cpuid.c:56-57 - [Medium] [review-security] New Thumb-2 AES-GCM-SIV asm and C wrappers are neither built nor run in the added CI —
.github/workflows/aesgcm-siv.yml:60-148
Findings not tied to a diff line
POLYVAL hash-key stack temporaries not zeroized (t, hrev)
File: wolfcrypt/src/aes.c (AesGcmSivPolyvalInit asm branch; AesGcmSivPolyvalInitSw)
Function: AesGcmSivPolyvalInit / AesGcmSivPolyvalInitSw
Severity: Low
Category: Zeroization
The per-message POLYVAL key is derived from the secret message-authentication-key. The persistent copies in AesGcmSivPolyval (poly->h / poly->m / poly->hHw) are correctly wiped by AesGcmSivPolyvalFinal via ForceZero(poly, sizeof(*poly)). However the intermediate stack buffers that hold transformed copies of that key are not wiped before going out of scope: byte t[WC_AES_BLOCK_SIZE] in the asm key-prep branch of AesGcmSivPolyvalInit (holds ByteReverse(mulX(ByteReverse(h)))), and byte hrev[WC_AES_BLOCK_SIZE] in both AesGcmSivPolyvalInitSw variants (holds ByteReverse(h)). These leave a permutation of the POLYVAL key on the stack after the function returns. This is defense-in-depth only (recovering it would let an attacker forge for that specific key+nonce), and the buffers are typically overwritten by later stack frames.
Recommendation: Add ForceZero(t, sizeof(t)); before the asm key-prep block returns and ForceZero(hrev, sizeof(hrev)); at the end of each AesGcmSivPolyvalInitSw variant, consistent with the zeroization used elsewhere in the SIV code.
Referenced code: wolfcrypt/src/aes.c (AesGcmSivPolyvalInit asm branch; AesGcmSivPolyvalInitSw) (9 lines)
ARM64 GHASH length conversion changed from ubfiz (#3,#32) to lsl #3, dropping upper-32-bit masking
File: wolfcrypt/src/port/arm/armv8-aes-asm.S (multiple *_partial_done labels); wolfcrypt/src/port/arm/armv8-aes-asm_c.c (AES_GCM_encrypt_final/decrypt_final/_AARCH64 paths)
Function: AES_GCM_*_AARCH64 / aes_gcm_*_arm64_crypto_*_partial_done
Severity: Info
Category: Logic
In the existing (non-SIV) AES-GCM AArch64 code regenerated by this PR, byte-length-to-bit-length conversions changed from ubfiz xN, xN, #3, #32 to lsl xN, xN, #3, and standalone ubfiz xN, xN, #0, #32 truncations (e.g. on tagSz in the *_part_tag paths) were removed. ubfiz #3,#32 masks the operand to its low 32 bits before shifting; lsl #3 shifts the full 64-bit register. The two are identical for any operand whose upper 32 bits are zero, which is always the case for the word32 length/tag-size arguments as materialized by wolfSSL callers (32-bit values are zero-extended). So no wrong tag results for valid wolfSSL usage, and these GCM paths are covered by the broader GCM test suite. The note is only that the new sequence is marginally less defensive against a caller leaving non-zero garbage in the upper 32 bits of the argument register.
Recommendation: No action required if these are deliberate asm-generator output; the behavior is equivalent for all in-tree callers. If the generator can emit it cheaply, retaining a 32-bit mask (or documenting that callers must pass zero-extended lengths) preserves the prior defensiveness.
Referenced code: wolfcrypt/src/port/arm/armv8-aes-asm.S (multiple *_partial_done labels); wolfcrypt/src/port/arm/armv8-aes-asm_c.c (AES_GCM_encrypt_final/decrypt_final/_AARCH64 paths) (4 lines)
Review generated by Skoll
dgarske
left a comment
There was a problem hiding this comment.
macOS/clang build break:
--enable-intelasm --enable-aesgcm-siv fails to compile on an Intel MacBookPro:
aes.c:885: error: unused function 'AesEcbEncryptBlocks' [-Werror,-Wunused-function]
aes.c:912: error: unused function 'AesEcbDecryptBlocks'
aes.c:997: error: unused function 'AesCtrEncryptBlocks'
These three new static helpers' definitions are compiled in the AES-NI config, but their call sites (lines 7796/15153/15252) are behind narrower #if guards, so without AVX512/AES-ECB they're defined-but-unused. gcc tolerates unused static inline; clang -Werror doesn't, which is why all the Linux x86 hosts passed and only the Mac caught it.
Fix: align the guards (only define each helper when a caller is compiled) or mark them unused.
8bf1491 to
64646dc
Compare
|
Add #ifdefs for the three functions. |
|
Jenkins: retest this please FIPS aborted |
16b5ab7 to
37e4187
Compare
dgarske
left a comment
There was a problem hiding this comment.
Skoll Multi-Scan Review
Modes: review + review-securityOverall recommendation: REQUEST_CHANGES
Findings: 5 total — 5 posted, 0 skipped
4 finding(s) posted as inline comments (see file-level comments below)
1 finding(s) not tied to a diff line (full detail below)
Posted findings
- [High] [review-security] Windows MASM AES_GCMSIV_ctr_ clobber round-count and ctr-pointer registers (wrong ciphertext + wild 16-byte write)* —
wolfcrypt/src/aes_gcm_asm.asm - [Medium] [review+review-security] New AES-GCM-SIV pure-C and assembly POLYVAL/CTR variants (Windows MASM, Thumb2, AArch32/AArch64 base, GCM_WORD32, GCM_SMALL) not exercised by added CI —
.github/workflows/aesgcm-siv.yml:1-148 - [Low] [review-security] Feature-guarding of AES-NI bulk helpers removes WC_MAYBE_UNUSED; verify no unguarded callers —
wolfcrypt/src/aes.c:908-1062 - [Low] [review] wc_AesGcmSivEncrypt lacks WARN_UNUSED_RESULT while decrypt has it —
wolfssl/wolfcrypt/aes.h:838-847
Findings not tied to a diff line
POLYVAL update leaves plaintext/AAD residue on the stack (block/rev not zeroized)
File: wolfcrypt/src/aes.c
Function: AesGcmSivPolyvalUpdate
Severity: Low
Category: Zeroization
AesGcmSivPolyvalUpdate copies the trailing partial block of the hashed data into local buffers block[16] and rev[16] (XMEMCPY(block, data, partial) / AesGcmSivByteReverse(rev, ...)). During tag computation this data is plaintext (and AAD). Neither block nor rev is zeroized before return. AesGcmSivPolyvalFinal does ForceZero(poly, sizeof(*poly)) (covering the running sum and key/table material), but the Update-local buffers are separate stack storage and retain a 16-byte plaintext/AAD residue. Per the project rule to zero secrets with ForceZero(), this is a defense-in-depth gap. Impact is limited: the same plaintext already lives in caller-owned in/out buffers, and this matches wolfSSL's existing GHASH behavior, so it is Low.
Recommendation: Add ForceZero(block, sizeof(block)); ForceZero(rev, sizeof(rev)); before returning from AesGcmSivPolyvalUpdate (both the ASM-partial and C-fallback exits) to avoid leaving plaintext/AAD on the stack.
Referenced code: wolfcrypt/src/aes.c (13 lines)
Review generated by Skoll
| ptr_L_aes_gcmsiv_ctr_aesni_one QWORD L_aes_gcmsiv_ctr_aesni_one | ||
| _DATA ENDS | ||
| _TEXT SEGMENT READONLY PARA | ||
| AES_GCMSIV_ctr_aesni PROC |
There was a problem hiding this comment.
🔴 [High] Windows MASM AES_GCMSIV_ctr_ clobber round-count and ctr-pointer registers (wrong ciphertext + wild 16-byte write)* · Security
All four new Windows MASM CTR routines mis-allocate registers versus the reference System V .S versions. On Windows x64 the args are in=rcx, out=rdx, length=r8, KS=r9, nr=[rsp+48], ctr=[rsp+56]. The prologue does mov eax,[rsp+48] (nr) and mov r10,[rsp+56] (ctr), then IMMEDIATELY executes xor eax, eax (destroying nr) and later mov r10d, r8d (destroying the ctr pointer with the length). Thereafter eax is used as the byte-offset index (add eax,64/lea [rcx+rax]) yet the AES round-count decisions are cmp eax, 11 / cmp eax, 13 -- comparing the byte offset instead of nr. In the correct SysV .S these are cmpl $11, %r8d (nr in r8) and the ctr write-back is movdqu %xmm7, (%r9) (ctr in r9, never clobbered). Consequences on Windows MASM builds (WOLFSSL_AESNI + WOLFSSL_AESGCM_SIV): (1) Incorrect keystream/ciphertext -- for AES-256 the first 64-byte block runs only 10 rounds while later blocks run 14; even AES-128 breaks past the first 64 bytes because eax>=13 forces 14 rounds. (2) Memory corruption -- at L_..._done_enc the movdqu OWORD PTR [r10], xmm7 writes the 16-byte counter to address r10 = length (masked), a caller-length-derived low address, not the ctr buffer (segfault or heap corruption). The AVX512 variant additionally uses cmp eax, 11/13 to decide round-key caching, leaving zmm23-26 uninitialized for AES-256. The bug is systematic across all four functions (generator register-remap collision) and is not exercised by the added CI (Linux .S + ARM qemu only), so it was not caught.
Fix: Do not ship the MASM .asm as generated. Regenerate aes_gcm_asm.asm with a generator fix so the round count (nr) is kept in a register that survives (e.g. keep it out of eax, which is the byte index) and so the ctr pointer register (r10) is not overwritten by length; the final counter write-back must target the ctr pointer. Cross-check against the SysV .S (nr=r8d, ctr=r9). Add a Windows/MSVC MASM CI job (or at minimum a local Windows AES-GCM-SIV KAT run for AES-128 >64B and AES-256) to validate before merge.
| @@ -0,0 +1,148 @@ | |||
| name: AES-GCM-SIV (RFC 8452) tests | |||
There was a problem hiding this comment.
🟠 [Medium] New AES-GCM-SIV pure-C and assembly POLYVAL/CTR variants (Windows MASM, Thumb2, AArch32/AArch64 base, GCM_WORD32… · Missing Tests
The added workflow provides good RFC 8452 KAT coverage but which POLYVAL multiply / CTR backend it exercises depends on build config. It covers native x86_64 make check (software word64 table + Intel .S asm) and cross-compiled AArch64 (PMULL/NEON/base) and AArch32 with QEMU_CPU=max (crypto/vmull.p64 path). It does NOT exercise these newly written, security-sensitive paths: the Windows MASM .asm variant (which in fact contains a correctness+memory-safety bug -- see the AES_GCMSIV_ctr finding -- that this gap allowed through), the word32 table variant (GCM_WORD32, AesGcmSivGMult/AesGcmSivPolyvalInitSw word32 block in aes.c), the table-free GCM_SMALL variant, the AArch32 base (WC_POLYVAL_ASM_AARCH32_BASE) NO_HW_CRYPTO path (armhf job forces QEMU_CPU=max, selecting the crypto path), the AArch64 base (AES_GCMSIV_polyval_base, only reached with NEON+HW-crypto both disabled), and Thumb2 (AES_GCMSIV_polyval_thumb2 / gcm_siv_thumb2, targeting armv7-m which qemu-user cannot run). A transposition/reduction/register error in any of these would not be caught by CI. Severity views: review flagged this Medium/SUGGEST (test coverage); review-security flagged Medium (missing-tests, and it is the reason the High MASM defect went undetected).
Fix: Add a Windows MSVC/MASM job that builds with AES-NI + AES-GCM-SIV and runs the KATs. Add at least one x86 make check config forcing the word32 and GCM_SMALL software multiplies (e.g. CPPFLAGS=-DGCM_WORD32 and CPPFLAGS=-DGCM_SMALL) so the two untested pure-C POLYVAL variants run the RFC 8452 KATs at near-zero cost. Add an AArch32 NO_HW_CRYPTO base config (no QEMU_CPU=max) so the generated software-table CTR/POLYVAL variants are exercised. Note the Thumb-2 gap explicitly in release notes if it cannot be emulated.
| @@ -907,7 +908,8 @@ block cipher mechanism that uses n-bit binary string parameter key with 128-bits | |||
| /* Pick the widest available implementation at runtime. Callers must | |||
There was a problem hiding this comment.
🔵 [Low] Feature-guarding of AES-NI bulk helpers removes WC_MAYBE_UNUSED; verify no unguarded callers · Logic
These x86_64 AES-NI bulk helpers were previously always compiled with WC_MAYBE_UNUSED WC_INLINE. The diff now wraps them in #ifdef HAVE_AES_ECB, #if defined(HAVE_AES_ECB) && defined(HAVE_AES_DECRYPT), and #ifdef WOLFSSL_AES_COUNTER respectively and drops WC_MAYBE_UNUSED. If any caller references one of these helpers in a configuration where the new guard is not satisfied (e.g. an AES-NI + AESGCM build without HAVE_AES_ECB / WOLFSSL_AES_COUNTER), the build would fail with an undefined-symbol error; conversely, removing WC_MAYBE_UNUSED could trigger unused-function warnings under -Werror in a sub-config where the guard is met but the function is not called. This is a portability/build concern rather than a runtime security issue, and the --enable-all-crypto --enable-intelasm CI config satisfies all guards, so it may be benign.
Fix: Confirm that every caller of AesEcbEncryptBlocks/AesEcbDecryptBlocks/AesCtrEncryptBlocks is itself guarded by the same macros (HAVE_AES_ECB / HAVE_AES_DECRYPT / WOLFSSL_AES_COUNTER), and build a narrow config such as AES-NI + AESGCM only (no AES_ECB, no AES_COUNTER) to ensure no regression. Keep WC_MAYBE_UNUSED if any guarded-but-uncalled path remains possible.
| * The POLYVAL hash is constant-time wherever the CPU provides carry-less | ||
| * multiply (x86 PCLMUL, Arm PMULL/VMULL) - the runtime default on such CPUs. | ||
| * Software-only builds fall back to a key-dependent 4-bit table (a cache-timing | ||
| * trade-off matching GCM_TABLE GHASH); GCM_SMALL avoids the table entirely. */ |
There was a problem hiding this comment.
🔵 [Low] wc_AesGcmSivEncrypt lacks WARN_UNUSED_RESULT while decrypt has it · Convention
wc_AesGcmSivDecrypt is annotated WOLFSSL_API WARN_UNUSED_RESULT (correct - the auth result must not be ignored), but wc_AesGcmSivEncrypt is only WOLFSSL_API. An ignored encrypt return could let a caller transmit uninitialized/garbage output or a stale tag after a BAD_FUNC_ARG (e.g. wrong key size). This mirrors the existing wc_AesGcmEncrypt signature, so it is consistent with the codebase, hence a NIT rather than a defect.
Fix: Consider adding WARN_UNUSED_RESULT to wc_AesGcmSivEncrypt for symmetry and to force callers to check for parameter/setup failures; acceptable to leave as-is to match wc_AesGcmEncrypt.
dgarske
left a comment
There was a problem hiding this comment.
All tested correctly on my array of ARM and Intel machines, but Skoll review found some interesting things -> #10807 (review)
Added assembly for Intel x64, ARM64, ARM32, Thumb2.
37e4187 to
1b53a18
Compare
Description
Added assembly for Intel x64, ARM64, ARM32, Thumb2.
Testing
Tested each platform with AES-GCM-SIV.