New AI-Security Model: Wolf Defender
Domi
3 min
Threat Research

Over the last weeks, we tried to further improve our model and reduce FPR.
We achieved amazing results by improving the data instead of the model.
I. Data > Model
Instead of chasing new architectures, we focused on:
dataset cleaning
removing mislabeled samples
improving data balance
adding realistic edge cases
This alone pushed our model from ~0.90 → 0.95 F1.
II. Reducing False Positives (the real problem)
Most prompt injection detectors fail in production because of false positives.
We made this our primary optimization target.
Example (Qualifire benchmark):
→ FPR reduced from ~0.11 → ~0.03
And importantly:
This was not achieved by training on benchmark data.
Instead, we introduced:
hard benign examples
injection-looking but safe prompts
real-world developer text (docs, issues, discussions)
This forces the model to actually learn the difference between:
malicious structure
and harmless context
The Result
Top-tier performance across benchmarks
Lowest FPR across benchmarks
Strong generalization across datasets
On average, we now outperform:
existing open-weight detectors
and even fine-tuned LLM-based solutions (e.g. Sentinel)
Small Model, Big Impact
We also trained a 50% smaller variant:
→ minimal performance drop → strong benchmark results
This is critical because it enables: aggressive quantization -> efficient deployment
What’s next
Our next step:
→ Reduce model size from ~500MB → <250MB → while maintaining state-of-the-art performance
This is how real AI security should work:
on-device
real-time
no cloud dependency
no data leaving the system
Model
https://huggingface.co/patronus-studio/wolf-defender-prompt-injection
We believe the future of AI security is:
local, fast, and data-driven.
