feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#31
Open
janiussyafiq wants to merge 2 commits into
Open
feat(ai-lakera-guard): scan LLM responses (direction output/both, non-streaming + streaming)#31janiussyafiq wants to merge 2 commits into
janiussyafiq wants to merge 2 commits into
Conversation
…t and both directions; update documentation and tests
There was a problem hiding this comment.
Pull request overview
Adds response/output moderation to the ai-lakera-guard plugin, extending it from request-only scanning to support direction: output|both across both non-streaming and streaming (SSE) LLM traffic, with docs and tests to validate the new behavior.
Changes:
- Extends plugin schema with
direction: input|output|bothand addsresponse_failure_message. - Implements response scanning via
lua_body_filterfor both non-streaming completions and buffered streaming SSE responses. - Adds fixtures + expands the test suite to cover output/both directions, streaming allow/block, and alert mode.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
apisix/plugins/ai-lakera-guard.lua |
Adds shared moderate() helper and a new response scanning lua_body_filter path (non-streaming + buffered SSE). |
apisix/plugins/ai-lakera-guard/schema.lua |
Expands direction enum and introduces response_failure_message. |
docs/en/latest/plugins/ai-lakera-guard.md |
Documents direction semantics and streaming buffering/blocking behavior; adds examples. |
docs/zh/latest/plugins/ai-lakera-guard.md |
Same as EN docs, localized. |
t/plugin/ai-lakera-guard.t |
Adds tests for output/both scanning, streaming buffering/blocking, and alert behavior. |
t/fixtures/openai/chat-injection.json |
Adds a flagged non-streaming fixture. |
t/fixtures/openai/chat-streaming-injection.sse |
Adds a flagged streaming SSE fixture. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ce test coverage for output direction
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
PR-2 of the
ai-lakera-guardplugin (follow-up to the input MVP in apache#13570). Adds output/response scanning for both non-streaming and streaming traffic. Back-compatible — defaults are unchanged (directionstill defaults toinput).Changes
directionenum extendedinput→{input, output, both}; newresponse_failure_message(default"Response blocked by Lakera Guard").accessis gated bydirection(input/both scan the request; output skips request-time work).bothshort-circuits at the request when the prompt is flagged, so the LLM is never called.lua_body_filterscans the LLM response:ctx.var.llm_response_text; a flagged response is replaced with a provider-compatible deny carryingresponse_failure_message.[DONE]) when flagged. Buffering is required to truly block — partial flagged tokens must never reach the client.moderate()helper backs both the request and response paths.response_failure_message, and an output example.TEST 20–32tot/plugin/ai-lakera-guard.t(output non-streaming clean/flagged, input back-compat,both, streaming clean/flagged, alert mode) plus fixtures.Design notes
ai-aliyun-content-moderation's response path:lua_body_filteris invoked only throughai-proxy's response dispatch, so the hardai-proxydependency check stays inaccess.build_deny_response({ stream = true })for the deny SSE — no hand-rolled framing./guard's text-only screening (verified against Lakera's API docs and Kong's reference plugin).Known limitation (documented): a streamed block is delivered as a
200SSE body (the stream's headers are already committed when buffering begins); if the upstream ends a stream abnormally without a terminal event, buffered content is not released.Testing
prove t/plugin/ai-lakera-guard.t— 99/99 subtests pass (PR-1 + PR-2).t/plugin/ai-lakera-guard-secrets.t— 15/15 (no regression).luacheck+lj-relengclean.Part of apache#13291.
🤖 Generated with Claude Code