feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming) by janiussyafiq · Pull Request #31 · janiussyafiq/apisix

janiussyafiq · 2026-06-24T09:10:16Z

Description

PR-2 of the ai-lakera-guard plugin (follow-up to the input MVP in apache#13570). Adds output/response scanning for both non-streaming and streaming traffic. Back-compatible — defaults are unchanged (direction still defaults to input).

Changes

Schema: direction enum extended input → {input, output, both}; new response_failure_message (default "Response blocked by Lakera Guard").
Plugin:
- access is gated by direction (input/both scan the request; output skips request-time work). both short-circuits at the request when the prompt is flagged, so the LLM is never called.
- New lua_body_filter scans the LLM response:
  - Non-streaming: scans ctx.var.llm_response_text; a flagged response is replaced with a provider-compatible deny carrying response_failure_message.
  - Streaming (SSE): buffers the response (withholding chunks), scans the assembled completion once at end-of-stream, then releases it verbatim when clean or replaces it with a deny SSE (terminated by [DONE]) when flagged. Buffering is required to truly block — partial flagged tokens must never reach the client.
- A shared moderate() helper backs both the request and response paths.
Docs (en + zh): new "Scanning direction" section (input/output/both + streaming behavior and its limitation), response_failure_message, and an output example.
Tests: added TEST 20–32 to t/plugin/ai-lakera-guard.t (output non-streaming clean/flagged, input back-compat, both, streaming clean/flagged, alert mode) plus fixtures.

Design notes

Mirrors ai-aliyun-content-moderation's response path: lua_body_filter is invoked only through ai-proxy's response dispatch, so the hard ai-proxy dependency check stays in access.
Reuses the protocol's build_deny_response({ stream = true }) for the deny SSE — no hand-rolled framing.
Scans the assistant text content, consistent with Lakera /guard's text-only screening (verified against Lakera's API docs and Kong's reference plugin).

Known limitation (documented): a streamed block is delivered as a 200 SSE body (the stream's headers are already committed when buffering begins); if the upstream ends a stream abnormally without a terminal event, buffered content is not released.

Testing

prove t/plugin/ai-lakera-guard.t — 99/99 subtests pass (PR-1 + PR-2).
t/plugin/ai-lakera-guard-secrets.t — 15/15 (no regression).
luacheck + lj-releng clean.

Part of apache#13291.

🤖 Generated with Claude Code

…t and both directions; update documentation and tests

Copilot

Pull request overview

Adds response/output moderation to the ai-lakera-guard plugin, extending it from request-only scanning to support direction: output|both across both non-streaming and streaming (SSE) LLM traffic, with docs and tests to validate the new behavior.

Changes:

Extends plugin schema with direction: input|output|both and adds response_failure_message.
Implements response scanning via lua_body_filter for both non-streaming completions and buffered streaming SSE responses.
Adds fixtures + expands the test suite to cover output/both directions, streaming allow/block, and alert mode.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`apisix/plugins/ai-lakera-guard.lua`	Adds shared `moderate()` helper and a new response scanning `lua_body_filter` path (non-streaming + buffered SSE).
`apisix/plugins/ai-lakera-guard/schema.lua`	Expands `direction` enum and introduces `response_failure_message`.
`docs/en/latest/plugins/ai-lakera-guard.md`	Documents direction semantics and streaming buffering/blocking behavior; adds examples.
`docs/zh/latest/plugins/ai-lakera-guard.md`	Same as EN docs, localized.
`t/plugin/ai-lakera-guard.t`	Adds tests for output/both scanning, streaming buffering/blocking, and alert behavior.
`t/fixtures/openai/chat-injection.json`	Adds a flagged non-streaming fixture.
`t/fixtures/openai/chat-streaming-injection.sse`	Adds a flagged streaming SSE fixture.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ce test coverage for output direction

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

feat(ai-lakera-guard): enhance scanning capabilities to support outpu…

4b535f9

…t and both directions; update documentation and tests

janiussyafiq requested a review from Copilot June 24, 2026 09:17

Copilot started reviewing on behalf of janiussyafiq June 24, 2026 09:17 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread apisix/plugins/ai-lakera-guard.lua

feat(ai-lakera-guard): implement multi-chunk streaming mock and enhan…

caf8500

…ce test coverage for output direction

janiussyafiq requested a review from Copilot June 25, 2026 03:44

Copilot AI reviewed Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#31

feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#31
janiussyafiq wants to merge 2 commits into
masterfrom
feat/ai-lakera-guard-pr2

janiussyafiq commented Jun 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janiussyafiq commented Jun 24, 2026

Description

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants