garak

garak

garak.ai

3

About this website

garak is an open-source command-line tool designed to systematically probe large language models (LLMs) for various security vulnerabilities and behavioral weaknesses. It functions as a red-teaming and assessment kit, enabling developers, security researchers, and organizations to evaluate the robustness of LLMs before deployment or during continuous monitoring. The tool operates by loading a target model through supported backends—including Hugging Face Transformers, OpenAI API, Anthropic, and local models via frameworks like llama.cpp or vLLM—and then applying a suite of plug-in probes that generate thousands of carefully crafted prompts. These probes target specific vulnerability classes such as prompt injection, jailbreak attempts, data leakage, denial-of-service, toxicity, hallucination, bias, and content policy violations. Each probe can produce multiple test prompts, and garak automatically scores model responses against expected behaviors, flagging failures with detailed logs. garak’s architecture is modular: users can select from dozens of pre-built probes, or create custom ones by writing simple Python classes that define the prompt generation logic and response evaluation criteria. The tool also supports “generators” that manage how the model is invoked, enabling tests under different configurations such as temperature, top-p, or system prompts. Results are output in JSON or HTML reports, showing pass/fail status per probe, response excerpts, and severity levels. This allows teams to track regressions after model updates or fine-tuning. The scanner is actively maintained by NVIDIA and the open-source community, with regular updates that add new probe types, improve detection accuracy, and extend support for emerging model architectures. It includes a built-in

Statistics

3
Views
0
Clicks
0
Like
0
Dislike

Comments

Log In to post a comment

No comments yet. Be the first!