Skip to Content
ConceptsPII Detection

PII Detection

Govrix Scout scans every request payload sent to an LLM, and every response returned from one, for personally identifiable information (PII). Detection results are recorded on the AgentEvent and are available for policy rules, alerting, and audit queries.

Detected PII types

PII TypeExample matchPattern count
Email addressuser@example.com3 patterns
Phone number+1-800-555-0100, (415) 555-01006 patterns
US Social Security Number123-45-6789, 123 45 67894 patterns
Credit card number4111 1111 1111 1111, 4111-1111-1111-11118 patterns
IP address192.168.1.1, 10.0.0.14 patterns

25+ regex patterns in total, covering international formats, common separators, and whitespace variants. All patterns are compiled once at startup into a shared RegexSet.

Detection-only architecture

Govrix Scout detects PII but does not mask or redact it. The payload is forwarded to the LLM unchanged.

This is an architectural choice, not an oversight. Masking creates several problems:

  • Semantic drift: replacing john@acme.com with [EMAIL] changes the meaning of the prompt and can corrupt the model’s response.
  • False-positive risk: an over-eager masker can redact non-sensitive content that happens to match a pattern (e.g., a version string like 1.2.3.4 matching an IP pattern).
  • Liability shift: once you mask, you are making an assertion that the data is now safe. Detection-only lets the operator decide what to do (block via policy rule, alert, log, or pass through).

If you want to block requests that contain PII, configure a policy rule with pii_detected: true and action block. See the YAML Policy Engine for details.

Implementation

FileRole
crates/govrix-scout-proxy/src/policy/pii.rsScanner implementation — 5 types, 25+ patterns, 25+ unit tests

The scanner runs in under 1ms on typical LLM prompt sizes (under 8KB). It is called synchronously on the hot path before the upstream request is forwarded, so pii_detected is available to the policy engine in the same request cycle.

The RegexSet is compiled once at process startup and shared across all request handlers via an Arc. There is no per-request compilation cost.

Event field

When PII is detected, the pii_detected field on the event is set to true. The specific types detected are recorded in pii_types (an array of strings).

Example API response for an event with PII:

{ "event_id": "018e4b7a-0003-7abc-8def-000000000003", "session_id": "018e4b7a-1234-7abc-8def-000000000001", "agent_id": "support-agent", "timestamp": "2026-03-05T09:12:44.871Z", "direction": "request", "model": "gpt-4o", "token_count": 312, "pii_detected": true, "pii_types": ["email", "phone"], "compliance_tag": "internal", "lineage_hash": "c7d3a19f..." }

Writing a policy rule to block PII

rules: - name: block-pii-in-requests conditions: - field: pii_detected operator: eq value: true action: block message: "Request blocked: PII detected in prompt payload."

Add this rule to your govrix.toml under the policy key. See YAML Policy Engine for the full configuration reference.

Last updated on