Auto PII Redaction — 25+ Detection Patterns

Automatically detect and redact 25+ PII types: SSN, Aadhaar, PAN, GSTIN, CPF, IBAN, NHS, RFC, credit cards, emails, phones, addresses. GDPR/HIPAA ready.

Key Features

About Auto PII Redaction

Auto-PII Redaction finds and permanently removes personally identifiable information from PDF documents — names, emails, phone numbers, social security numbers, credit cards, addresses, dates of birth, account numbers, IP addresses, and 20+ other categories. Unlike "redaction" tools that just draw a black rectangle on top (the underlying text is still in the file and can be selected, copied, or extracted), our pipeline strips the text from the PDF's content stream and replaces it with redaction marks — the source data is genuinely gone.

The combination of regex pattern detection + on-device NER (Xenova/bert-base-NER, 45 MB cached model) catches PII the regex misses. SSNs, credit cards, and emails are easy with regex. People's names, company names, and city / country names need named-entity recognition — and that runs entirely in your browser, so your sensitive document never leaves your device. The output is a permanently redacted PDF where the underlying text content is rewritten, not just visually obscured.

Who Uses This Tool

How to Use Auto PII Redaction — 25+ Detection Patterns

  1. Step 1: Drop your PDF into the drop zone — bank statements, medical records, HR files, anything with PII
  2. Step 2: Click "Detect & Mark" for regex patterns, then "AI: Auto-detect names, orgs, locations" for NER (first run downloads 45 MB model)
  3. Step 3: Review the marked redactions — color-coded by category, click any to preview the source language
  4. Step 4: Add manual rectangle redactions for anything the auto-detect missed (rare on well-formed documents)
  5. Step 5: Click "Apply Redactions" — content-stream rewrite produces a final PDF with PII permanently removed. Download.

Frequently Asked Questions

Is auto-PII redaction free?

Yes — regex patterns, NER auto-detect, and content-stream redaction are all free. No signup. The browser-side NER model is downloaded once (~45 MB) and cached forever.

Is the redaction permanent?

Yes. Unlike Adobe / Smallpdf "redaction" that draws a black box on top of preserved text, our pipeline rewrites the PDF's content stream so the redacted text is genuinely removed — Ctrl+F can't find it, copy/paste can't extract it, screen readers don't read it. This is the only legally-defensible redaction.

What types of PII are detected?

Regex catches: SSN, credit cards (Luhn-validated), phone, email, IP address, dates of birth, postal codes, account numbers, URLs. NER catches: people's names, company names, organization names, city / country / location names. You can toggle categories on / off and adjust the NER confidence threshold.

Is this HIPAA / GDPR compliant?

The output (with all PII removed via content-stream rewrite) meets the technical bar for HIPAA "limited dataset" and GDPR "anonymization." Whether your overall workflow is compliant depends on your handling of the source documents — we provide an audit log of every redacted entity per document, which most compliance teams require.

Will the redacted PDF still be searchable?

Yes — the non-PII text remains fully searchable and selectable. Only the redacted entities are removed; everything else stays as the original. Document layout is unchanged.

What if the auto-detect misses something?

Switch to draw-rectangle mode and manually mark any field that escaped detection. This is rare on well-formed documents but does happen with handwritten annotations, watermark text, or unusually formatted entities. The audit log records both auto and manual redactions.