Distil-PII: Family of PII Redaction SLMs
distil labs released specialized small language models for policy-aware PII redaction. The 1B parameter model achieves 0.81 +/- 0.02, effectively matching a frontier 600B+ LLM — while running on laptops with full data privacy.
Available Models
| Model | HuggingFace |
|---|---|
| Distil-PII-Llama-3.2-3B-Instruct | gguf |
| Distil-PII-Llama-3.2-1B-Instruct | gguf |
| Distil-PII-gemma-3-270m-it | gguf |
| Distil-PII-SmolLM2-135M-Instruct | gguf |
Performance
Before Fine-Tuning
| Model | Score |
|---|---|
| DeepSeek 3.1 (685B) — Teacher | 0.84 +/- 0.03 |
| Llama-3.2-3B (base) | 0.03 +/- 0.02 |
| Llama-3.2-1B (base) | 0.00 +/- 0.00 |
After Fine-Tuning
| Model | Score |
|---|---|
| Llama-3.2-3B (tuned) | 0.82 +/- 0.03 |
| Llama-3.2-1B (tuned) | 0.81 +/- 0.02 |
| Gemma-3-270M (tuned) | 0.73 +/- 0.07 |
The fine-tuned 1B model matches the 685B teacher — a 685x size reduction with comparable accuracy.
The Task
Models perform policy-aware PII redaction: given input text, output JSON containing the redacted text and entity details.
Output Schema
{
"redacted_text": "...",
"entities": [
{
"value": "original PII value",
"replacement_token": "[REDACTION_TYPE]",
"reason": "why this was redacted"
}
]
}
Supported PII Categories (14 types)
Names, email addresses, phone numbers, physical addresses, Social Security numbers, credit card numbers, dates of birth, IP addresses, URLs, financial account numbers, medical record numbers, passport numbers, driver’s license numbers, and demographic attributes.
Why Local PII Redaction?
Sending text containing PII to a cloud LLM for redaction defeats the purpose of data protection. Distil-PII runs entirely on your machine — the sensitive data never leaves your infrastructure.
Use cases:
- Pre-processing data before sending to external APIs
- Compliance with GDPR, HIPAA, and data residency requirements
- Edge deployment for real-time PII scrubbing
- Pipeline integration for data anonymization