Orca-Sonar: Our Multilingual Document Classifier for AI Security
Dominik Hommer
8 min
AI Research

Orca-Sonar: Our Multilingual Document Classifier for AI Security
The Problem
Anyone securing AI systems in the enterprise runs into one deceptively simple question: what is this text actually about?A contract, a résumé, a quarterly report, source code, or just a harmless bit of small talk — the answer decides whether content gets logged, redacted, blocked, or simply passed through.
In practice this is surprisingly hard. The same information shows up in wildly different shapes: as a clean document, an email thread, a Slack message — and increasingly as an instruction to an AI ("Summarize this contract for me: …"). Naïve classifiers latch onto the format instead of the content and get it wrong.
What Orca-Sonar Does
Orca-Sonar assigns every text to exactly one of seven classes:
Class | Examples |
|---|---|
legal | contracts, NDAs, ToS, privacy policies, judgments, compliance |
hr | résumés, job ads, employment contracts, performance reviews |
finance | balance sheets, reports, invoices, cash flow, filings |
internal_and_tech | ADRs, RFCs, postmortems, specs, wikis, architecture |
source_code | raw code & configuration (Python, Go, SQL, Terraform …) |
marketing | press releases, newsletters, sales & landing-page copy |
other | conversational / non-business: small talk, recipes, learning |
The guiding principle: the topic determines the class, not the format. Whether it arrives as a plain PDF or wrapped in "Hey, can you quickly check this: …", the content is what counts. On ambiguity, the more sensitive class wins (legal > hr > finance > internal_and_tech > source_code > marketing > other), so borderline cases are protected rather than leaked.
Under the Hood
Orca-Sonar is built on mmBERT (ModernBERT family), a compact, multilingual encoder. That makes the model:
fast & lightweight: small enough for cheap, low-latency inference, including near the edge;
genuinely multilingual: German and English trained as equals;
deployment-friendly: available as
safetensorsand as an FP16 ONNX variant (half the size, identical predictions).
Performance
The numbers below are measured on our own internal held-out test set (real data only), where Orca-Sonar reaches a macro-F1 of ~0.98. It generalizes especially well on source_code, legal, and hr — holding up even on text from sources quite different from the training data.
One honest caveat on interpreting that score: there is no established, general benchmark for this exact 7-class document-topic task, nobody has a standard test set covering legal / hr / finance / internal_and_tech / source_code / marketing / other across German and English. We're currently building a general benchmark for this task and evaluating Orca-Sonar against external, model-unseen datasets to get a realistic, cross-distribution picture.
How to Use It
For production, low-latency deployments, the ONNX variant is available under onnx/onnx_fp16/.
The Dataset
Orca-Sonar was trained on our own in-house dataset (German + English, 7 classes), curated specifically for real-world security scenarios. We'll publish it shortly.
Part of Patronus Protect
Orca-Sonar complements our security stack around Wolf Defender (prompt-injection detection): where Wolf Defender asks is this input an attack?, Orca-Sonar answers what is it actually about? Together they form the basis for intelligent, policy-driven routing in AI pipelines.
What's Next
Publishing the training dataset
creating a public Benchmark
