Auto PII Redaction — 25+ Detection Patterns
Automatically detect and redact 25+ PII types: SSN, Aadhaar, PAN, GSTIN, CPF, IBAN, NHS, RFC, credit cards, emails, phones, addresses. GDPR/HIPAA ready.
Key Features
- Regex detection for structured PII — SSN, credit card (Luhn-validated), phone, email, IP address, account numbers, dates
- On-device NER for unstructured PII — names, organizations, locations, dates (browser-side bert-base-NER, 45 MB cached)
- True content-stream redaction — text is removed from PDF underlying data, not just visually covered
- Per-category toggles — redact only credit cards, only names, or any combination
- Confidence-threshold control — adjust NER threshold to catch more (higher recall) or fewer (higher precision) entities
- Manual draw-rectangle mode for any field the auto-detect misses
- Bulk-redact mode — apply same rules to a batch of documents
- Audit log — every redacted entity logged with category and source page for HIPAA / GDPR compliance documentation
- Runs in browser — sensitive documents never leave your device
About Auto PII Redaction
Auto-PII Redaction finds and permanently removes personally identifiable information from PDF documents — names, emails, phone numbers, social security numbers, credit cards, addresses, dates of birth, account numbers, IP addresses, and 20+ other categories. Unlike "redaction" tools that just draw a black rectangle on top (the underlying text is still in the file and can be selected, copied, or extracted), our pipeline strips the text from the PDF's content stream and replaces it with redaction marks — the source data is genuinely gone.
The combination of regex pattern detection + on-device NER (Xenova/bert-base-NER, 45 MB cached model) catches PII the regex misses. SSNs, credit cards, and emails are easy with regex. People's names, company names, and city / country names need named-entity recognition — and that runs entirely in your browser, so your sensitive document never leaves your device. The output is a permanently redacted PDF where the underlying text content is rewritten, not just visually obscured.
Who Uses This Tool
- Legal teams redacting discovery documents before producing to opposing counsel
- HR sharing employment records with auditors — redact SSNs, salaries, addresses while keeping case context
- Medical practices anonymizing patient records for research or billing review
- GDPR data-subject-access-request fulfillment — return individual's data while redacting others mentioned
- Journalists publishing leaked documents while protecting source identifiers
- Banks responding to subpoenas — redact non-subject account holder info before producing transaction records
How to Use Auto PII Redaction — 25+ Detection Patterns
- Step 1: Drop your PDF into the drop zone — bank statements, medical records, HR files, anything with PII
- Step 2: Click "Detect & Mark" for regex patterns, then "AI: Auto-detect names, orgs, locations" for NER (first run downloads 45 MB model)
- Step 3: Review the marked redactions — color-coded by category, click any to preview the source language
- Step 4: Add manual rectangle redactions for anything the auto-detect missed (rare on well-formed documents)
- Step 5: Click "Apply Redactions" — content-stream rewrite produces a final PDF with PII permanently removed. Download.
Frequently Asked Questions
Is auto-PII redaction free?
Yes — regex patterns, NER auto-detect, and content-stream redaction are all free. No signup. The browser-side NER model is downloaded once (~45 MB) and cached forever.
Is the redaction permanent?
Yes. Unlike Adobe / Smallpdf "redaction" that draws a black box on top of preserved text, our pipeline rewrites the PDF's content stream so the redacted text is genuinely removed — Ctrl+F can't find it, copy/paste can't extract it, screen readers don't read it. This is the only legally-defensible redaction.
What types of PII are detected?
Regex catches: SSN, credit cards (Luhn-validated), phone, email, IP address, dates of birth, postal codes, account numbers, URLs. NER catches: people's names, company names, organization names, city / country / location names. You can toggle categories on / off and adjust the NER confidence threshold.
Is this HIPAA / GDPR compliant?
The output (with all PII removed via content-stream rewrite) meets the technical bar for HIPAA "limited dataset" and GDPR "anonymization." Whether your overall workflow is compliant depends on your handling of the source documents — we provide an audit log of every redacted entity per document, which most compliance teams require.
Will the redacted PDF still be searchable?
Yes — the non-PII text remains fully searchable and selectable. Only the redacted entities are removed; everything else stays as the original. Document layout is unchanged.
What if the auto-detect misses something?
Switch to draw-rectangle mode and manually mark any field that escaped detection. This is rare on well-formed documents but does happen with handwritten annotations, watermark text, or unusually formatted entities. The audit log records both auto and manual redactions.