Skip to content

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#31

Open
janiussyafiq wants to merge 2 commits into
masterfrom
feat/ai-lakera-guard-pr2
Open

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#31
janiussyafiq wants to merge 2 commits into
masterfrom
feat/ai-lakera-guard-pr2

Conversation

@janiussyafiq

Copy link
Copy Markdown
Owner

Description

PR-2 of the ai-lakera-guard plugin (follow-up to the input MVP in apache#13570). Adds output/response scanning for both non-streaming and streaming traffic. Back-compatible — defaults are unchanged (direction still defaults to input).

Changes

  • Schema: direction enum extended input{input, output, both}; new response_failure_message (default "Response blocked by Lakera Guard").
  • Plugin:
    • access is gated by direction (input/both scan the request; output skips request-time work). both short-circuits at the request when the prompt is flagged, so the LLM is never called.
    • New lua_body_filter scans the LLM response:
      • Non-streaming: scans ctx.var.llm_response_text; a flagged response is replaced with a provider-compatible deny carrying response_failure_message.
      • Streaming (SSE): buffers the response (withholding chunks), scans the assembled completion once at end-of-stream, then releases it verbatim when clean or replaces it with a deny SSE (terminated by [DONE]) when flagged. Buffering is required to truly block — partial flagged tokens must never reach the client.
    • A shared moderate() helper backs both the request and response paths.
  • Docs (en + zh): new "Scanning direction" section (input/output/both + streaming behavior and its limitation), response_failure_message, and an output example.
  • Tests: added TEST 20–32 to t/plugin/ai-lakera-guard.t (output non-streaming clean/flagged, input back-compat, both, streaming clean/flagged, alert mode) plus fixtures.

Design notes

  • Mirrors ai-aliyun-content-moderation's response path: lua_body_filter is invoked only through ai-proxy's response dispatch, so the hard ai-proxy dependency check stays in access.
  • Reuses the protocol's build_deny_response({ stream = true }) for the deny SSE — no hand-rolled framing.
  • Scans the assistant text content, consistent with Lakera /guard's text-only screening (verified against Lakera's API docs and Kong's reference plugin).

Known limitation (documented): a streamed block is delivered as a 200 SSE body (the stream's headers are already committed when buffering begins); if the upstream ends a stream abnormally without a terminal event, buffered content is not released.

Testing

  • prove t/plugin/ai-lakera-guard.t — 99/99 subtests pass (PR-1 + PR-2).
  • t/plugin/ai-lakera-guard-secrets.t — 15/15 (no regression).
  • luacheck + lj-releng clean.

Part of apache#13291.

🤖 Generated with Claude Code

…t and both directions; update documentation and tests

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds response/output moderation to the ai-lakera-guard plugin, extending it from request-only scanning to support direction: output|both across both non-streaming and streaming (SSE) LLM traffic, with docs and tests to validate the new behavior.

Changes:

  • Extends plugin schema with direction: input|output|both and adds response_failure_message.
  • Implements response scanning via lua_body_filter for both non-streaming completions and buffered streaming SSE responses.
  • Adds fixtures + expands the test suite to cover output/both directions, streaming allow/block, and alert mode.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
apisix/plugins/ai-lakera-guard.lua Adds shared moderate() helper and a new response scanning lua_body_filter path (non-streaming + buffered SSE).
apisix/plugins/ai-lakera-guard/schema.lua Expands direction enum and introduces response_failure_message.
docs/en/latest/plugins/ai-lakera-guard.md Documents direction semantics and streaming buffering/blocking behavior; adds examples.
docs/zh/latest/plugins/ai-lakera-guard.md Same as EN docs, localized.
t/plugin/ai-lakera-guard.t Adds tests for output/both scanning, streaming buffering/blocking, and alert behavior.
t/fixtures/openai/chat-injection.json Adds a flagged non-streaming fixture.
t/fixtures/openai/chat-streaming-injection.sse Adds a flagged streaming SSE fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apisix/plugins/ai-lakera-guard.lua
@janiussyafiq janiussyafiq requested a review from Copilot June 25, 2026 03:44

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants