Skip to content

Retry IAM calls on throttling in RuntimeSupport integration tests#2458

Open
GarrettBeatty wants to merge 1 commit into
devfrom
gcbeatty/fix-integ-iam-throttle
Open

Retry IAM calls on throttling in RuntimeSupport integration tests#2458
GarrettBeatty wants to merge 1 commit into
devfrom
gcbeatty/fix-integ-iam-throttle

Conversation

@GarrettBeatty

Copy link
Copy Markdown
Contributor

Problem

The Run Tests on AWS CodeBuild step has been intermittently failing (e.g. run 28538095638) with 2 of 24 tests failing in Amazon.Lambda.RuntimeSupport.IntegrationTests:

  • CustomRuntimeNET10Tests.TestAllNET10HandlersAsync
  • CustomRuntimeAspNetCoreMinimalApiCustomSerializerTest.TestThreadingLogging

Both fail identically during test setup, not on any assertion:

Amazon.IdentityManagement.AmazonIdentityManagementServiceException : Rate exceeded
  at BaseCustomRuntimeTest.ValidateAndSetIamRoleArn(...)   BaseCustomRuntimeTest.cs:150
  at BaseCustomRuntimeTest.PrepareTestResources(...)       BaseCustomRuntimeTest.cs:111

Root cause

The integration tests now run in parallel (#2451). Each test class uses a unique function name and sets ExecutionRoleName = FunctionName, so every test provisions its own IAM role — GetRole / CreateRole / AttachRolePolicy at startup and ListAttachedRolePolicies / DetachRolePolicy / DeleteRole at teardown. IAM's control-plane APIs have low request-rate limits, so the concurrent burst intermittently throttles with Rate exceeded, and the SDK's default retries aren't enough under this contention.

Fix

Wrap the IAM calls in BaseCustomRuntimeTest with a throttle-aware retry (exponential backoff, 2s → capped 30s, up to 8 attempts). This smooths the parallel startup/teardown burst instead of failing the test. Throttling is detected via HTTP 429, throttle-family error codes, and the Rate exceeded message.

Follows the test-stabilization theme of #2451. Test-only change — no shipped package affected, so no change file (matching #2451).

The RuntimeSupport integration tests now run in parallel, and each test
class provisions its own IAM role (GetRole/CreateRole/AttachRolePolicy at
startup, ListAttachedRolePolicies/DetachRolePolicy/DeleteRole at teardown).
The IAM control-plane APIs have low request-rate limits, so the parallel
burst intermittently throttles with 'Rate exceeded', failing
TestAllNET10HandlersAsync and TestThreadingLogging in setup rather than on
any assertion.

Wrap the IAM calls in a throttle-aware retry with exponential backoff so
the parallel startup/teardown burst is smoothed instead of failing.
@GarrettBeatty GarrettBeatty requested review from a team as code owners July 2, 2026 19:13
@GarrettBeatty GarrettBeatty requested review from normj and philasmar July 2, 2026 19:13
@GarrettBeatty GarrettBeatty added the Release Not Needed Add this label if a PR does not need to be released. label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Release Not Needed Add this label if a PR does not need to be released.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants