New AI-Security Model: Wolf Defender
3 min

Over the last weeks, we tried to further improve our model and reduce FPR.
We achieved amazing results by improving the data instead of the model.
I. Data > Model
Instead of chasing new architectures, we focused on:
dataset cleaning
removing mislabeled samples
improving data balance
adding realistic edge cases
This alone pushed our model from ~0.90 → 0.95 F1.
II. Reducing False Positives (the real problem)
Most prompt injection detectors fail in production because of false positives.
We made this our primary optimization target.
Example (Qualifire benchmark):
→ FPR reduced from ~0.11 → ~0.03
And importantly:
This was not achieved by training on benchmark data.
Instead, we introduced:
hard benign examples
injection-looking but safe prompts
real-world developer text (docs, issues, discussions)
This forces the model to actually learn the difference between:
malicious structure
and harmless context
The Result
Top-tier performance across benchmarks
Lowest FPR across benchmarks
Strong generalization across datasets
On average, we now outperform:
existing open-weight detectors
and even fine-tuned LLM-based solutions (e.g. Sentinel)
Small Model, Big Impact
We also trained a 50% smaller variant:
→ minimal performance drop → strong benchmark results
This is critical because it enables: aggressive quantization -> efficient deployment
What’s next
Our next step:
→ Reduce model size from ~500MB → <250MB → while maintaining state-of-the-art performance
This is how real AI security should work:
on-device
real-time
no cloud dependency
no data leaving the system
Model
https://huggingface.co/patronus-studio/wolf-defender-prompt-injection
We believe the future of AI security is:
local, fast, and data-driven.
