Skip to content

feat(core): update request retry statistics in real-time#3773

Open
harryautomazione wants to merge 1 commit into
apify:masterfrom
harryautomazione:fix/retry-stats-realtime
Open

feat(core): update request retry statistics in real-time#3773
harryautomazione wants to merge 1 commit into
apify:masterfrom
harryautomazione:fix/retry-stats-realtime

Conversation

@harryautomazione

Copy link
Copy Markdown

Description

This PR resolves the issue where the requestsRetries counter and the requestRetryHistogram were only updated once a request was fully handled (either succeeded or failed permanently).

Updating these statistics only at the very end of a job lifecycle created a false impression of no errors during long-running crawls, and led to incomplete statistics if the crawler crashed or was migrated early.

Changes

  • Core/Statistics:
    • Added registerRetry(retryCount: number) in the Statistics class to dynamically increment the global requestsRetries and update requestRetryHistogram in real-time as soon as a retry is triggered.
    • Added input validation guards inside registerRetry and _saveRetryCountForJob to prevent NaN or negative values from corrupting the statistics.
    • Implemented automatic array backfilling (0) inside the histogram to avoid sparse arrays which serialize to null in JSON.
    • Refactored _saveRetryCountForJob to only record job completions at index 0 if no retries occurred, preventing duplicate counters.
  • Basic Crawler:
    • Hooked this.stats.registerRetry(request.retryCount) within the _requestFunctionErrorHandler right after updating the request's retry count.
  • Tests:
    • Added comprehensive unit tests in statistics.test.ts to cover real-time tracking, negative/NaN input protection, and JSON serialization integrity.

Fixes #2732

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Statistics.state.requestsRetries only updates after request is fully handled

3 participants