PII Detection
Govrix Scout scans every request payload sent to an LLM, and every response returned from one, for personally identifiable information (PII). Detection results are recorded on the AgentEvent and are available for policy rules, alerting, and audit queries.
Detected PII types
| PII Type | Example match | Pattern count |
|---|---|---|
| Email address | user@example.com | 3 patterns |
| Phone number | +1-800-555-0100, (415) 555-0100 | 6 patterns |
| US Social Security Number | 123-45-6789, 123 45 6789 | 4 patterns |
| Credit card number | 4111 1111 1111 1111, 4111-1111-1111-1111 | 8 patterns |
| IP address | 192.168.1.1, 10.0.0.1 | 4 patterns |
25+ regex patterns in total, covering international formats, common separators, and whitespace variants. All patterns are compiled once at startup into a shared RegexSet.
Detection-only architecture
Govrix Scout detects PII but does not mask or redact it. The payload is forwarded to the LLM unchanged.
This is an architectural choice, not an oversight. Masking creates several problems:
- Semantic drift: replacing
john@acme.comwith[EMAIL]changes the meaning of the prompt and can corrupt the model’s response. - False-positive risk: an over-eager masker can redact non-sensitive content that happens to match a pattern (e.g., a version string like
1.2.3.4matching an IP pattern). - Liability shift: once you mask, you are making an assertion that the data is now safe. Detection-only lets the operator decide what to do (block via policy rule, alert, log, or pass through).
If you want to block requests that contain PII, configure a policy rule with pii_detected: true and action block. See the YAML Policy Engine for details.
Implementation
| File | Role |
|---|---|
crates/govrix-scout-proxy/src/policy/pii.rs | Scanner implementation — 5 types, 25+ patterns, 25+ unit tests |
The scanner runs in under 1ms on typical LLM prompt sizes (under 8KB). It is called synchronously on the hot path before the upstream request is forwarded, so pii_detected is available to the policy engine in the same request cycle.
The RegexSet is compiled once at process startup and shared across all request handlers via an Arc. There is no per-request compilation cost.
Event field
When PII is detected, the pii_detected field on the event is set to true. The specific types detected are recorded in pii_types (an array of strings).
Example API response for an event with PII:
{
"event_id": "018e4b7a-0003-7abc-8def-000000000003",
"session_id": "018e4b7a-1234-7abc-8def-000000000001",
"agent_id": "support-agent",
"timestamp": "2026-03-05T09:12:44.871Z",
"direction": "request",
"model": "gpt-4o",
"token_count": 312,
"pii_detected": true,
"pii_types": ["email", "phone"],
"compliance_tag": "internal",
"lineage_hash": "c7d3a19f..."
}Writing a policy rule to block PII
rules:
- name: block-pii-in-requests
conditions:
- field: pii_detected
operator: eq
value: true
action: block
message: "Request blocked: PII detected in prompt payload."Add this rule to your govrix.toml under the policy key. See YAML Policy Engine for the full configuration reference.