← All content

Distil-PII: Family of PII Redaction SLMs

A family of PII redaction SLMs from 135M to 3B parameters. The 1B model matches a 685B teacher — runs on laptops, data never leaves your machine.

View on GitHub

Distil-PII: Family of PII Redaction SLMs

distil labs released specialized small language models for policy-aware PII redaction. The 1B parameter model achieves 0.81 +/- 0.02, effectively matching a frontier 600B+ LLM — while running on laptops with full data privacy.


Available Models

ModelHuggingFace
Distil-PII-Llama-3.2-3B-Instructgguf
Distil-PII-Llama-3.2-1B-Instructgguf
Distil-PII-gemma-3-270m-itgguf
Distil-PII-SmolLM2-135M-Instructgguf

Performance

Before Fine-Tuning

ModelScore
DeepSeek 3.1 (685B) — Teacher0.84 +/- 0.03
Llama-3.2-3B (base)0.03 +/- 0.02
Llama-3.2-1B (base)0.00 +/- 0.00

After Fine-Tuning

ModelScore
Llama-3.2-3B (tuned)0.82 +/- 0.03
Llama-3.2-1B (tuned)0.81 +/- 0.02
Gemma-3-270M (tuned)0.73 +/- 0.07

The fine-tuned 1B model matches the 685B teacher — a 685x size reduction with comparable accuracy.


The Task

Models perform policy-aware PII redaction: given input text, output JSON containing the redacted text and entity details.

Output Schema

{
  "redacted_text": "...",
  "entities": [
    {
      "value": "original PII value",
      "replacement_token": "[REDACTION_TYPE]",
      "reason": "why this was redacted"
    }
  ]
}

Supported PII Categories (14 types)

Names, email addresses, phone numbers, physical addresses, Social Security numbers, credit card numbers, dates of birth, IP addresses, URLs, financial account numbers, medical record numbers, passport numbers, driver’s license numbers, and demographic attributes.


Why Local PII Redaction?

Sending text containing PII to a cloud LLM for redaction defeats the purpose of data protection. Distil-PII runs entirely on your machine — the sensitive data never leaves your infrastructure.

Use cases:

  • Pre-processing data before sending to external APIs
  • Compliance with GDPR, HIPAA, and data residency requirements
  • Edge deployment for real-time PII scrubbing
  • Pipeline integration for data anonymization

Resources


Keep Learning