SDK Reference

Python SDK

High-performance PII redaction for Python. Sync and async support,< 1 ms per page, ~2,000 records/second.

Installation

Terminal

$ pip install euredact

redact()

Main entry point. Detects and redacts PII from a text string. Returns a RedactResult with the cleaned text and a list of detections.

Signature

def redact(
    text: str,
    *,
    countries: list[str] | None = None,
    mode: str = "rules",
    referential_integrity: bool = False,
    detect_dates: bool = False,
    cache: bool = True,
) -> RedactResult

Parameter	Type	Default	Description
`text`	`str`	`—`	Input text to scan.
`countries`	`list[str] \| None`	`None`	ISO country codes to restrict detection. None = all 31 supported countries.
`mode`	`str`	`"rules"`	Detection mode (currently only "rules").
`referential_integrity`	`bool`	`False`	Replace PII with consistent labels (NAME_1, IBAN_1) instead of generic [TYPE] labels.
`detect_dates`	`bool`	`False`	Include DOB/date-of-death detection. Off by default because it requires context.
`cache`	`bool`	`True`	Cache results for identical inputs.

Example

example.py

import euredact

result = euredact.redact(
    "Mijn BSN is 111222333 en IBAN NL91ABNA0417164300.",
    countries=["NL"],
)

print(result.redacted_text)
# "Mijn BSN is [NATIONAL_ID] en IBAN [IBAN]."

print(result.detections)

redact_batch()

Batch redaction. More efficient than calling redact() in a loop because it loads configs once.

Signature

def redact_batch(
    texts: list[str],
    **kwargs,
) -> list[RedactResult]

Parameter	Type	Default	Description
`texts`	`list[str]`	`—`	List of input texts to redact.
`**kwargs`		`—`	Same keyword arguments as redact() (countries, mode, referential_integrity, detect_dates, cache).

Example

batch_example.py

import euredact

texts = [
    "BSN 111222333",
    "IBAN NL91ABNA0417164300",
]

results = euredact.redact_batch(texts, countries=["NL"])
for r in results:
    print(r.redacted_text)

aredact()

Async version of redact(). Offloads CPU work to a thread pool. Same keyword arguments and return type.

Signature

async def aredact(
    text: str,
    **kwargs,
) -> RedactResult

Example

async_example.py

import asyncio
import euredact

async def main():
    result = await euredact.aredact("BSN 111222333")
    print(result.redacted_text)

asyncio.run(main())

aredact_batch()

Async batch redaction with controlled concurrency.

Signature

async def aredact_batch(
    texts: list[str],
    *,
    max_concurrency: int = 4,
    **kwargs,
) -> list[RedactResult]

Parameter	Type	Default	Description
`texts`	`list[str]`	`—`	List of input texts to redact.
`max_concurrency`	`int`	`4`	Maximum number of concurrent tasks.
`**kwargs`		`—`	Same keyword arguments as redact().

redact_iter()

Lazy iterator for large datasets. Yields results one at a time without loading everything into memory.

Signature

def redact_iter(
    texts: Iterable[str],
    **kwargs,
) -> Iterator[RedactResult]

Example

iter_example.py

import euredact

texts = ["BSN 111222333", "IBAN DE89370400440532013000"]

for result in euredact.redact_iter(texts):
    print(result.redacted_text)

add_custom_pattern()

Signature

def add_custom_pattern(
    name: str,
    pattern: str,
) -> None

Parameter	Type	Default	Description
`name`	`str`	`—`	Entity type name for matches (e.g., "EMPLOYEE_ID").
`pattern`	`str`	`—`	Regular expression pattern to match.

available_countries()

Returns a sorted list of supported ISO country codes.

example.py

import euredact

print(euredact.available_countries())  # ["AT", "BE", "BG", ...]

EuRedact class

For isolated instances with separate caches and custom patterns. Useful when different parts of your application need different configurations.

instance_example.py

from euredact import EuRedact

instance = EuRedact()
instance.add_custom_pattern("CASE_REF", r"CASE-\d{8}")

result = instance.redact(
    "See CASE-20260401",
    countries=["NL", "BE"],
)

print(result.redacted_text)
# "See [CASE_REF]"

The EuRedact instance exposes the same methods: redact(), redact_batch(), aredact(), aredact_batch(), redact_iter(), and add_custom_pattern().

Return Types

RedactResult

A dataclass returned by all redaction functions.

Parameter	Type	Default	Description
`redacted_text`	`str`	`—`	The text with PII replaced by labels like [NATIONAL_ID], [IBAN], etc.
`detections`	`list[Detection]`	`—`	List of detected PII spans.
`source`	`str`	`"rules"`	Detection source used.
`degraded`	`bool`	`False`	Whether results may be incomplete due to an internal issue.

Detection

A frozen dataclass (immutable and hashable) representing a single PII detection.

Parameter	Type	Default	Description
`entity_type`	`EntityType \| str`	`—`	The type of PII detected (e.g., NATIONAL_ID, IBAN).
`start`	`int`	`—`	Start character offset in the original text.
`end`	`int`	`—`	End character offset in the original text.
`text`	`str`	`—`	The matched PII text.
`source`	`DetectionSource`	`—`	Detection source ("rules" or "cloud").
`country`	`str \| None`	`—`	ISO country code the detection is associated with.
`confidence`	`str`	`"high"`	Confidence level of the detection.

Custom Patterns

Register custom regex patterns to detect domain-specific identifiers. Custom patterns are always active regardless of the countries parameter.

custom_patterns.py

import euredact

euredact.add_custom_pattern("EMPLOYEE_ID", r"EMP-\d{6}")

result = euredact.redact("Contact EMP-123456 for details")
print(result.redacted_text)
# "Contact [EMPLOYEE_ID] for details"

Priority order:validated patterns > custom patterns > regex-only patterns.

For isolated pattern registrations, use separate EuRedact instances.

Secret Detection

euRedact automatically detects secrets and API keys using two strategies.

Known-prefix detection

Matches tokens with recognized prefixes from common services:

AWS

AKIA...

GitHub

ghp_, gho_, ghs_, github_pat_

Stripe

sk_live_, pk_live_

OpenAI / Anthropic

sk-, sk-ant-

Slack

xoxb-, xoxp-

JWT

eyJ...

SendGrid

SG.

Entropy-based detection

Flags 32+ character high-entropy strings found near context keywords such as key, token, secret, password, and their translations in 12 EU languages.

Performance

< 1 ms

Per page (~500 words)

~2,000

Records per second

~50 KB

Memory per country

pyahocorasick

Optional accelerator

Supported Countries

31 European countries supported out of the box.

ATBEBGCHCYCZDEDKEEELESFIFRHRHUIEISITLTLULVMTNLNOPLPTROSESISKUK

Entity Types

31 entity types detected across all supported countries.

[NAME][ADDRESS][IBAN][BIC][CREDIT_CARD][PHONE][EMAIL][DOB][DATE_OF_DEATH][NATIONAL_ID][SSN][TAX_ID][PASSPORT][DRIVERS_LICENSE][RESIDENCE_PERMIT][LICENSE_PLATE][VIN][VAT][POSTAL_CODE][IP_ADDRESS][IPV6_ADDRESS][MAC_ADDRESS][HEALTH_INSURANCE][HEALTHCARE_PROVIDER][CHAMBER_OF_COMMERCE][IMEI][GPS_COORDINATES][UUID][SOCIAL_HANDLE][SECRET][OTHER]

View source on GitHub

Browse the code, report issues, or contribute.

GitHubarrow_outward