Automating Data Subject Rights

The "Dual-RAG" Architecture

Standard AI models don't know your internal policies or database schemas. To ensure legal accuracy and prevent "hallucinations," we use Retrieval-Augmented Generation (RAG). We split retrieval into two distinct streams: one for unstructured policies and one for structured database schemas.

User Query: "Delete data for User ID 123"

↓

Stream A: Policy Retriever

Vector Search (Embeddings)

Source: Legal PDFs, Handbooks

↓

Retrieved: "Retain invoices for 10 years (Policy 4.2)"

Stream B: Schema Retriever

Knowledge Graph / Keyword

Source: Data Catalog, DDL

↓

Retrieved: Table `users` (col: `id`), Table `orders` (col: `user_id`)

↓↓

LLM / AI Agent

Combines Query + Policy Context + Schema Definition

↓

Output: "DELETE FROM users WHERE id=123; -- (Per Policy 4.2)"

Why Use RAG for Policies?

1.
Accuracy is Non-Negotiable: LLMs hallucinate. You cannot guess a legal retention period. RAG cites the exact source document.
2.
Volatility: Laws change. With RAG, you just upload the new PDF policy. You don't need to retrain the model.

Why Use RAG for Schemas?

1.
Context Limits: You can't paste 500 table definitions into one prompt. RAG fetches only the 3-5 tables relevant to the specific user request.
2.
Security: RAG ensures the LLM only "sees" the metadata it is authorized to access, preventing schema leakage.

Optimization: Fine-Tuning + RAG

"Should I fine-tune?" The answer is Yes, but only for behavior, not for facts. We use RAG to fetch the data (Facts), but we Fine-Tune the model to understand your specific code syntax and output style (Behavior).

Why add Fine-Tuning to RAG?

Generic LLM (RAG Only)

A standard model (like GPT-4 or Gemini) writes "standard" SQL.

-- Generic Output
SELECT email FROM users WHERE id = 123;
Error: Table 'users' does not exist. It's called 'app_users_v2'.

Fine-Tuned LLM + RAG

A model trained on 1,000 examples of your company's actual code.

-- Fine-Tuned Output
SELECT usr_email FROM app_users_v2 WHERE usr_id = 123;
Success: Matches internal naming conventions.

How to Implement Fine-Tuning

1. Data Prep

Gather 500+ pairs of Prompt (e.g., "Find user email") and Ideal Completion (e.g., your perfect SQL query). Use historical logs for this.

→

2. Train Adapter

Upload this dataset to your LLM provider (OpenAI, Vertex AI, Hugging Face) to create a "LoRA" adapter or fine-tuned version.

→

3. Swap Brain

In your RAG architecture, point the API call to your new model-ft-2025 instead of the generic base model.

Advanced Automation: AI Agents & LLMs

Large Language Models (LLMs) and AI Agents represent the next frontier. They move beyond simple orchestration to handle complex, unstructured tasks, enabling near-total autonomy for the DSR process.

How LLMs Supercharge Automation

✓

Classify Unstructured Requests

An LLM can read a free-text email ("Hi, can you plz delete my stuff?") and automatically classify it as a "Deletion Request," extracting the user's name and email.
✓

Discover PII in Unstructured Data

LLMs can scan documents, support tickets, and call transcripts to find and redact personal information that data maps might miss.
✓

Summarize Access Reports

After data is collated from 20 systems, an LLM can generate a clean, easy-to-understand summary for the end-user, improving the customer experience.

The Autonomous AI Agent Flow

1. Request Intake

Unstructured email or form submission.

↓

🧠

Autonomous AI Agent

Parse: Understands the request via LLM.
Verify: Triggers automated IDV.
Act: Connects to all systems to delete/access data.
Draft: Generates the fulfillment report.

↓

3. Final Fulfillment

Human performs a 1-click review. Agent sends response to customer.