Python SDK
High-performance PII redaction for Python. Sync and async support,< 1 ms per page, ~2,000 records/second.
Installation
$ pip install euredactredact()
Main entry point. Detects and redacts PII from a text string. Returns a RedactResult with the cleaned text and a list of detections.
def redact(
text: str,
*,
countries: list[str] | None = None,
mode: str = "rules",
referential_integrity: bool = False,
detect_dates: bool = False,
cache: bool = True,
) -> RedactResult| Parameter | Type | Default | Description |
|---|---|---|---|
text | str | — | Input text to scan. |
countries | list[str] | None | None | ISO country codes to restrict detection. None = all 31 supported countries. |
mode | str | "rules" | Detection mode (currently only "rules"). |
referential_integrity | bool | False | Replace PII with consistent labels (NAME_1, IBAN_1) instead of generic [TYPE] labels. |
detect_dates | bool | False | Include DOB/date-of-death detection. Off by default because it requires context. |
cache | bool | True | Cache results for identical inputs. |
Example
import euredact
result = euredact.redact(
"Mijn BSN is 111222333 en IBAN NL91ABNA0417164300.",
countries=["NL"],
)
print(result.redacted_text)
# "Mijn BSN is [NATIONAL_ID] en IBAN [IBAN]."
print(result.detections)redact_batch()
Batch redaction. More efficient than calling redact() in a loop because it loads configs once.
def redact_batch(
texts: list[str],
**kwargs,
) -> list[RedactResult]| Parameter | Type | Default | Description |
|---|---|---|---|
texts | list[str] | — | List of input texts to redact. |
**kwargs | | — | Same keyword arguments as redact() (countries, mode, referential_integrity, detect_dates, cache). |
Example
import euredact
texts = [
"BSN 111222333",
"IBAN NL91ABNA0417164300",
]
results = euredact.redact_batch(texts, countries=["NL"])
for r in results:
print(r.redacted_text)aredact()
Async version of redact(). Offloads CPU work to a thread pool. Same keyword arguments and return type.
async def aredact(
text: str,
**kwargs,
) -> RedactResultExample
import asyncio
import euredact
async def main():
result = await euredact.aredact("BSN 111222333")
print(result.redacted_text)
asyncio.run(main())aredact_batch()
Async batch redaction with controlled concurrency.
async def aredact_batch(
texts: list[str],
*,
max_concurrency: int = 4,
**kwargs,
) -> list[RedactResult]| Parameter | Type | Default | Description |
|---|---|---|---|
texts | list[str] | — | List of input texts to redact. |
max_concurrency | int | 4 | Maximum number of concurrent tasks. |
**kwargs | | — | Same keyword arguments as redact(). |
redact_iter()
Lazy iterator for large datasets. Yields results one at a time without loading everything into memory.
def redact_iter(
texts: Iterable[str],
**kwargs,
) -> Iterator[RedactResult]Example
import euredact
texts = ["BSN 111222333", "IBAN DE89370400440532013000"]
for result in euredact.redact_iter(texts):
print(result.redacted_text)add_custom_pattern()
Register a custom regex pattern. Matches are reported with the given name as the entity type.
def add_custom_pattern(
name: str,
pattern: str,
) -> None| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | — | Entity type name for matches (e.g., "EMPLOYEE_ID"). |
pattern | str | — | Regular expression pattern to match. |
available_countries()
Returns a sorted list of supported ISO country codes.
import euredact
print(euredact.available_countries()) # ["AT", "BE", "BG", ...]EuRedact class
For isolated instances with separate caches and custom patterns. Useful when different parts of your application need different configurations.
from euredact import EuRedact
instance = EuRedact()
instance.add_custom_pattern("CASE_REF", r"CASE-\d{8}")
result = instance.redact(
"See CASE-20260401",
countries=["NL", "BE"],
)
print(result.redacted_text)
# "See [CASE_REF]"The EuRedact instance exposes the same methods: redact(), redact_batch(), aredact(), aredact_batch(), redact_iter(), and add_custom_pattern().
Return Types
RedactResult
A dataclass returned by all redaction functions.
| Parameter | Type | Default | Description |
|---|---|---|---|
redacted_text | str | — | The text with PII replaced by labels like [NATIONAL_ID], [IBAN], etc. |
detections | list[Detection] | — | List of detected PII spans. |
source | str | "rules" | Detection source used. |
degraded | bool | False | Whether results may be incomplete due to an internal issue. |
Detection
A frozen dataclass (immutable and hashable) representing a single PII detection.
| Parameter | Type | Default | Description |
|---|---|---|---|
entity_type | EntityType | str | — | The type of PII detected (e.g., NATIONAL_ID, IBAN). |
start | int | — | Start character offset in the original text. |
end | int | — | End character offset in the original text. |
text | str | — | The matched PII text. |
source | DetectionSource | — | Detection source ("rules" or "cloud"). |
country | str | None | — | ISO country code the detection is associated with. |
confidence | str | "high" | Confidence level of the detection. |
Custom Patterns
Register custom regex patterns to detect domain-specific identifiers. Custom patterns are always active regardless of the countries parameter.
import euredact
euredact.add_custom_pattern("EMPLOYEE_ID", r"EMP-\d{6}")
result = euredact.redact("Contact EMP-123456 for details")
print(result.redacted_text)
# "Contact [EMPLOYEE_ID] for details"Priority order:validated patterns > custom patterns > regex-only patterns.
For isolated pattern registrations, use separate EuRedact instances.
Secret Detection
euRedact automatically detects secrets and API keys using two strategies.
Known-prefix detection
Matches tokens with recognized prefixes from common services:
Entropy-based detection
Flags 32+ character high-entropy strings found near context keywords such as key, token, secret, password, and their translations in 12 EU languages.
Performance
Supported Countries
31 European countries supported out of the box.
Entity Types
31 entity types detected across all supported countries.
View source on GitHub
Browse the code, report issues, or contribute.