LanguageScanner

Threat class: Language switching and payload splitting
Purpose: Enforce output language policies or input-output language consistency

The LanguageScanner checks the language of generated text and compares it against a policy. This can be a static list of allowed languages or a correspondence check between input and output.

Input-Output Correspondence Check

Ensures the model responds in the same language as the input. This is particularly useful for preventing attacks where the model is tricked into using base64 or other encodings to evade downstream keyword filters. Pass the user's input as the reference_text argument.

user_input = "Tell me a joke."
model_output = "Aquí hay una broma..."

result = scanner.check(
    content=model_output,
    reference_text=user_input
)

if not result.safe:
    print("Language mismatch detected")
    # The result metadata contains the reason 'correspondence_mismatch'

typescript // TODO

Asynchronous Execution

The scanner provides an asynchronous check method that runs in a thread pool.

result = await scanner.a_check(
    content=model_output, 
    reference_text=user_input
)

if not result.safe:
    handle_violation(result)

typescript // TODO

Configuration

Static Policy Check

Input-Output Correspondence Check

Asynchronous Execution

On this page