LanguageScanner
Enforce output language policies and prevent payload splitting attacks.
- Threat class: Language switching and payload splitting
- Purpose: Enforce output language policies or input-output language consistency
The LanguageScanner checks the language of generated text and compares it against a policy. This can be a static list of allowed languages or a correspondence check between input and output.
Configuration
You can configure the scanner using two primary arguments:
allowed_languages: A list of ISO 639-1 codes (like['en', 'fr']) that enforces a static policy. If omitted, no static policy is enforced.languages_to_load: A performance optimization list. By default, the scanner loads all languages into memory, which takes around 100MB and provides maximum accuracy. Providing a specific list of ISO codes makes the scanner much lighter, but it will only detect languages within that list.
from deconvolute import LanguageScanner
# Load specific languages to save memory
scanner = LanguageScanner(
allowed_languages=["en", "es"],
languages_to_load=["en", "es", "fr", "de"]
)typescript // TODO
Static Policy Check
Verifies that the output language is part of the allowed set.
result = scanner.check("Bonjour le monde")
if not result.safe:
print("Unexpected language detected")
# The result metadata contains the reason 'policy_violation'typescript // TODO
Input-Output Correspondence Check
Ensures the model responds in the same language as the input. This is particularly useful for preventing attacks where the model is tricked into using base64 or other encodings to evade downstream keyword filters. Pass the user's input as the reference_text argument.
user_input = "Tell me a joke."
model_output = "Aquí hay una broma..."
result = scanner.check(
content=model_output,
reference_text=user_input
)
if not result.safe:
print("Language mismatch detected")
# The result metadata contains the reason 'correspondence_mismatch'typescript // TODO
Asynchronous Execution
The scanner provides an asynchronous check method that runs in a thread pool.
result = await scanner.a_check(
content=model_output,
reference_text=user_input
)
if not result.safe:
handle_violation(result)typescript // TODO