Loocero — Privacy Architecture

This file = the privacy contract (what data we hold / never hold + how the boundary is enforced). Column-level inventory → DATA_MAP.md · retention → DATA_RETENTION.md · OWASP/RLS audit → SECURITY_AUDIT.md.

Status as of M12P.1. This document codifies the privacy contract Loocero ships under. It is the authoritative reference for what data the hosted SaaS holds, what it never holds, and how the boundary is enforced.

Companion docs: DATA_RETENTION.md, DATA_MAP.md, production-hardening.md §8.

1. The seven non-negotiable rules

These rules are the contract. No feature, marketing claim, or operational change may violate them.

Raw uploaded import files are deleted automatically after successful import finalization.
Abandoned uploads and parser staging artifacts are auto-deleted on a short retention window.
Hosted SaaS stores only normalized financial records and minimal import metadata.
Hosted SaaS stores no saved notes.
Hosted SaaS stores no AI chat history, summaries, or memory.
AI chat is real-time only. Users may export chats locally to their own device, but nothing is persisted in Loocero.
Internal logging, analytics, and error reporting do not capture financial payloads or chat content.

Note (2026-05-02 licensing pivot): an earlier revision of this doc included an eighth rule — "Self-hosting remains the privacy-max path for users who want full infrastructure control." — dating from a planned open-source release. Loocero now ships as proprietary, hosted SaaS (see NOTICE at the repo root) and does not offer a default self-hostable build. The eighth rule was dropped to keep the contract truthful: every rule above is enforced today by the hosted product code. Private self-hosted deployments may be offered in the future under a commercial agreement, but that is a sales arrangement, not a default product guarantee, and does not belong in this contract.

2. Enforcement gate (M12P)

M12P is the gate milestone for privacy claims. No public marketing language may state, suggest, or imply any of the seven rules until M12P.4 is complete and merged. That includes the Loocero landing page, README hero copy, social posts, and pricing-page bullets.

Sequence:

Sub	Scope	Status gate for marketing?
M12P.1	Codify the contract (this doc + DATA_RETENTION + DATA_MAP + production-hardening §8 + memory rebrand)	Docs only — claims still embargoed
M12P.2	Remove `/api/chat` server-side persistence + add local Markdown/JSON export	Rule 5/6 partially enforced (writes stop, tables remain)
M12P.3	Drop `conversations` + `messages` tables via migration	Rule 5/6 fully enforced
M12P.4	Sentry `beforeSend` scrubber + tightened `lib/observability/log.ts` allow-list	Rule 7 enforced — marketing embargo lifts here
M12P.5	Drop `transactions.raw_row` jsonb column	Rule 3 fully enforced

Until M12P.4 ships to production, the public story is "Loocero is being built privacy-first" — present tense, not "stores no chat history" past tense. The truthful claim is gated on enforcement, not documentation.

3. Trust boundaries

Loocero is a Next.js + Supabase app. Three concentric trust zones:

┌─────────────────────────────────────────────────────────┐
│  Browser (untrusted)                                    │
│   - User input, Plaid Link iframe, AI chat input        │
│   - Holds session cookie (httpOnly, Secure, SameSite)   │
│   - In-memory chat state (cleared on tab close)         │
└──────────────────┬──────────────────────────────────────┘
                   │ TLS 1.3
┌──────────────────▼──────────────────────────────────────┐
│  Vercel runtime (semi-trusted, ephemeral)               │
│   - Server Actions + Route Handlers                     │
│   - In-memory CSV/PDF parsing — never persists files    │
│   - OPENAI_API_KEY, PLAID_*, SUPABASE_* in env only     │
│   - No filesystem writes (read-only Lambda)             │
└──────────────────┬──────────────────────────────────────┘
                   │ TLS 1.3 + Supabase JWT (RLS-scoped)
┌──────────────────▼──────────────────────────────────────┐
│  Supabase Postgres (tenant-isolated, at-rest encrypted) │
│   - Normalized financial rows, RLS by user_id           │
│   - Plaid access tokens via pgcrypto (M12 onward)       │
│   - No raw upload files, no chat history (post-M12P.3)  │
└─────────────────────────────────────────────────────────┘

Service-role key is never used by the app runtime. Only the integration test harness instantiates an admin client, and only for auth.admin.createUser/deleteUser lifecycle. See docs/reference/production-hardening.md §2.

4. Data flows

4.1 CSV / PDF import (rules 1, 2, 3)

File picked in browser
   │ multipart/form-data POST to Server Action (parseCsvFile / pdf route)
   ▼
Server Action receives `File` blob
   │ reads via file.text() / unpdf into memory
   │ parses → headers + rows array
   │ enriches with merchant matches (read-only DB lookups)
   ▼
Returns ParseResult to browser   ◄── file.text() result is GC'd as soon as
                                     the action returns. No tmp/, no Storage,
                                     no disk write at any stage.
   │ user reviews preview, clicks Import
   ▼
Server Action importTransactions(payload)
   │ classifies rows against existing transactions (RLS-scoped)
   │ inserts imports row (status = PROCESSING)
   │ bulk-inserts transactions
   │ updates imports row (status = DONE, outcome = counts only)
   ▼
Done. Browser shows summary.

Why this satisfies rule 1: the source File object is a request-scoped Web API value. After the action returns its ParseResult, no reference to the original bytes survives. Vercel's Lambda filesystem is read-only outside of /tmp, and Loocero never writes to /tmp. Confirmed by inspection of src/lib/actions/import.ts and src/app/api/import/pdf/route.ts.

Why this satisfies rule 2: there is no staging table, no Supabase Storage bucket, no scheduled job that purges anything — because nothing is ever written. The "abandoned upload" case collapses to "the user closed their tab," which leaves zero artifacts behind.

Why this satisfies rule 3: post-M12P.5, the transactions table holds only typed columns (date, description, amount, type, currency, category_id, ...). The raw_row jsonb column was dropped in migration 20260502010000_drop_transactions_raw_row.sql. The imports row holds only filename (user-supplied label), status, row_count, and outcome (typed integer counts).

4.2 AI chat (rules 5, 6, 7)

Post-M12P.3:

User types a message in /chat
   │ in-memory message[] state in ChatShell
   ▼
POST /api/chat { messages: [...last 10] }   ◄── nothing in the browser
   │ auth check (Supabase JWT)                  state is persisted to
   │ buildFinancialContext(userId)              localStorage or IndexedDB
   │ → reads from transactions, accounts,
   │   budgets, etc. via RLS
   │ system = context + GUARDRAILS
   ▼
streamText(model, system, messages)
   │ tokens stream back to browser
   ▼
Stream closes. onFinish runs no DB writes.    ◄── post-M12P.2
                                                  No conversations row.
                                                  No messages row.
                                                  No "memory."
   │
   ▼
Browser holds the conversation in React state.
User can hit Reset (clears state) or Export (downloads .md or .json).
Closing the tab discards the conversation forever.

Why this satisfies rules 5/6: after M12P.3, the conversations and messages tables do not exist. There is no place for chat to persist server-side. The onFinish hook in /api/chat does not write. Browser state is useState only — no localStorage, no IndexedDB.

Why this satisfies rule 7 (post-M12P.4): Sentry beforeSend scrubs financial-payload-shaped strings from event payloads before they leave the runtime. lib/observability/log.ts uses an allow-list (caller passes typed context; everything else is dropped). Stack-trace strings from Postgres errors that may contain row data are scrubbed in beforeSend by regex match on description / amount / date patterns.

4.3 Plaid bank sync (rule 3, plus encryption-at-rest commitment)

User clicks "Connect bank" on /accounts
   │ POST /api/plaid/link-token  → returns Plaid link_token
   ▼
react-plaid-link iframe (Plaid-hosted, isolated origin)
   │ user enters bank credentials INSIDE Plaid's UI — Loocero
   │ never sees the username or password
   ▼
Plaid issues public_token to the browser
   │ POST /api/plaid/exchange { public_token }
   ▼
Server: itemPublicTokenExchange(public_token) → access_token
   │ pgp_sym_encrypt(access_token, env.PLAID_TOKEN_ENC_KEY)
   │ insert into institutions (plaid_access_token_enc bytea)
   ▼
Done. access_token never returns to browser, never logs.

Subsequent sync: /api/plaid/sync decrypts the access token in-memory, calls transactionsSync, maps results through the existing M11 import pipeline (one imports row per sync run, source = 'plaid'), writes typed transactions rows. The decrypted token is GC'd at the end of the request.

The PLAID_TOKEN_ENC_KEY is a Vercel-side env secret. Rotation procedure documented in docs/reference/production-hardening.md §9 (added in M12.B.1).

5. Encryption commitments

Layer	What	How
In transit	All client ↔ server traffic	TLS 1.3 (Vercel + Supabase enforced)
At rest, disk-level	All Postgres data, all Supabase Storage (we use none)	AWS-managed, AES-256, Supabase platform default
At rest, column-level	Plaid `access_token` + future BYOK `openai_api_key`	`pgcrypto` symmetric (`pgp_sym_encrypt`/`decrypt`) with env-held key
Backups	Daily logical backup (Free tier = 7d)	Encrypted at rest by Supabase platform

Not in scope: end-to-end encryption (would require client-side key management; defers AI features).

6. User rights

Right	Mechanism
Export financial data	`/settings` → Download CSV (planned, M14 or earlier)
Export chat	In-chat Export button — Markdown by default, JSON via secondary link. Client-side only, no network round-trip
Delete account	`/settings` → Delete account → cascades through `users` FK to wipe all tenant data atomically
Access disclosure (GDPR Art. 15)	Same export mechanism above; tenant has full read access via RLS
Rectification	Inline edit on any record (transactions, budgets, etc.)
Restrict processing	Disconnect Plaid institutions; revoke OpenAI API key (when BYOK ships in M16)

All rights above apply to every Loocero customer regardless of tier or pricing plan.

7. Change control

This document and its companions (DATA_RETENTION.md, DATA_MAP.md) are part of the privacy contract. Changes require:

PR with rationale in the description, linking the user-visible feature that motivates the change.
Update to production-hardening.md §8 if the enforcement surface changes.
Memory entry update in project_loocero_privacy.md.
If a rule is materially weakened: explicit changelog entry in CHANGELOG.md (file added in M14 or earlier).

Adding stricter rules is a regular PR. Loosening any of the seven rules is a project-instructions-level decision and lives outside this doc.