SDK Reference

Python SDK

High-performance PII redaction for Python. Sync and async support,< 1 ms per page, ~2,000 records/second.

Installation

Terminal
$ pip install euredact

redact()

Main entry point. Detects and redacts PII from a text string. Returns a RedactResult with the cleaned text and a list of detections.

Signature
def redact(
    text: str,
    *,
    countries: list[str] | None = None,
    mode: str = "rules",
    referential_integrity: bool = False,
    detect_dates: bool = False,
    cache: bool = True,
) -> RedactResult
ParameterTypeDefaultDescription
textstrInput text to scan.
countrieslist[str] | NoneNoneISO country codes to restrict detection. None = all 31 supported countries.
modestr"rules"Detection mode (currently only "rules").
referential_integrityboolFalseReplace PII with consistent labels (NAME_1, IBAN_1) instead of generic [TYPE] labels.
detect_datesboolFalseInclude DOB/date-of-death detection. Off by default because it requires context.
cacheboolTrueCache results for identical inputs.

Example

example.py
import euredact

result = euredact.redact(
    "Mijn BSN is 111222333 en IBAN NL91ABNA0417164300.",
    countries=["NL"],
)

print(result.redacted_text)
# "Mijn BSN is [NATIONAL_ID] en IBAN [IBAN]."

print(result.detections)

redact_batch()

Batch redaction. More efficient than calling redact() in a loop because it loads configs once.

Signature
def redact_batch(
    texts: list[str],
    **kwargs,
) -> list[RedactResult]
ParameterTypeDefaultDescription
textslist[str]List of input texts to redact.
**kwargsSame keyword arguments as redact() (countries, mode, referential_integrity, detect_dates, cache).

Example

batch_example.py
import euredact

texts = [
    "BSN 111222333",
    "IBAN NL91ABNA0417164300",
]

results = euredact.redact_batch(texts, countries=["NL"])
for r in results:
    print(r.redacted_text)

aredact()

Async version of redact(). Offloads CPU work to a thread pool. Same keyword arguments and return type.

Signature
async def aredact(
    text: str,
    **kwargs,
) -> RedactResult

Example

async_example.py
import asyncio
import euredact

async def main():
    result = await euredact.aredact("BSN 111222333")
    print(result.redacted_text)

asyncio.run(main())

aredact_batch()

Async batch redaction with controlled concurrency.

Signature
async def aredact_batch(
    texts: list[str],
    *,
    max_concurrency: int = 4,
    **kwargs,
) -> list[RedactResult]
ParameterTypeDefaultDescription
textslist[str]List of input texts to redact.
max_concurrencyint4Maximum number of concurrent tasks.
**kwargsSame keyword arguments as redact().

redact_iter()

Lazy iterator for large datasets. Yields results one at a time without loading everything into memory.

Signature
def redact_iter(
    texts: Iterable[str],
    **kwargs,
) -> Iterator[RedactResult]

Example

iter_example.py
import euredact

texts = ["BSN 111222333", "IBAN DE89370400440532013000"]

for result in euredact.redact_iter(texts):
    print(result.redacted_text)

add_custom_pattern()

Register a custom regex pattern. Matches are reported with the given name as the entity type.

Signature
def add_custom_pattern(
    name: str,
    pattern: str,
) -> None
ParameterTypeDefaultDescription
namestrEntity type name for matches (e.g., "EMPLOYEE_ID").
patternstrRegular expression pattern to match.

available_countries()

Returns a sorted list of supported ISO country codes.

example.py
import euredact

print(euredact.available_countries())  # ["AT", "BE", "BG", ...]

EuRedact class

For isolated instances with separate caches and custom patterns. Useful when different parts of your application need different configurations.

instance_example.py
from euredact import EuRedact

instance = EuRedact()
instance.add_custom_pattern("CASE_REF", r"CASE-\d{8}")

result = instance.redact(
    "See CASE-20260401",
    countries=["NL", "BE"],
)

print(result.redacted_text)
# "See [CASE_REF]"

The EuRedact instance exposes the same methods: redact(), redact_batch(), aredact(), aredact_batch(), redact_iter(), and add_custom_pattern().

Return Types

RedactResult

A dataclass returned by all redaction functions.

ParameterTypeDefaultDescription
redacted_textstrThe text with PII replaced by labels like [NATIONAL_ID], [IBAN], etc.
detectionslist[Detection]List of detected PII spans.
sourcestr"rules"Detection source used.
degradedboolFalseWhether results may be incomplete due to an internal issue.

Detection

A frozen dataclass (immutable and hashable) representing a single PII detection.

ParameterTypeDefaultDescription
entity_typeEntityType | strThe type of PII detected (e.g., NATIONAL_ID, IBAN).
startintStart character offset in the original text.
endintEnd character offset in the original text.
textstrThe matched PII text.
sourceDetectionSourceDetection source ("rules" or "cloud").
countrystr | NoneISO country code the detection is associated with.
confidencestr"high"Confidence level of the detection.

Custom Patterns

Register custom regex patterns to detect domain-specific identifiers. Custom patterns are always active regardless of the countries parameter.

custom_patterns.py
import euredact

euredact.add_custom_pattern("EMPLOYEE_ID", r"EMP-\d{6}")

result = euredact.redact("Contact EMP-123456 for details")
print(result.redacted_text)
# "Contact [EMPLOYEE_ID] for details"

Priority order:validated patterns > custom patterns > regex-only patterns.

For isolated pattern registrations, use separate EuRedact instances.

Secret Detection

euRedact automatically detects secrets and API keys using two strategies.

Known-prefix detection

Matches tokens with recognized prefixes from common services:

AWS
AKIA...
GitHub
ghp_, gho_, ghs_, github_pat_
Stripe
sk_live_, pk_live_
OpenAI / Anthropic
sk-, sk-ant-
Slack
xoxb-, xoxp-
JWT
eyJ...
SendGrid
SG.

Entropy-based detection

Flags 32+ character high-entropy strings found near context keywords such as key, token, secret, password, and their translations in 12 EU languages.

Performance

< 1 ms
Per page (~500 words)
~2,000
Records per second
~50 KB
Memory per country
pyahocorasick
Optional accelerator

Supported Countries

31 European countries supported out of the box.

ATBEBGCHCYCZDEDKEEELESFIFRHRHUIEISITLTLULVMTNLNOPLPTROSESISKUK

Entity Types

31 entity types detected across all supported countries.

[NAME][ADDRESS][IBAN][BIC][CREDIT_CARD][PHONE][EMAIL][DOB][DATE_OF_DEATH][NATIONAL_ID][SSN][TAX_ID][PASSPORT][DRIVERS_LICENSE][RESIDENCE_PERMIT][LICENSE_PLATE][VIN][VAT][POSTAL_CODE][IP_ADDRESS][IPV6_ADDRESS][MAC_ADDRESS][HEALTH_INSURANCE][HEALTHCARE_PROVIDER][CHAMBER_OF_COMMERCE][IMEI][GPS_COORDINATES][UUID][SOCIAL_HANDLE][SECRET][OTHER]

View source on GitHub

Browse the code, report issues, or contribute.

GitHubarrow_outward