Data Retention

Retention periods, purge schedules, and batch sizes for all data categories.

This page is for internal operations staff only. It is not intended for merchants or partners.

Retention schedule

The following table lists all data categories, their retention periods, and whether automated purging is currently configured.

Data category	Table / model	Retention period	Auto-purge
Events (analytics)	`Event`	90 days	Yes
Marketing events	`MarketingEvent`	730 days (2 years)	Yes
Marketing consents	`MarketingConsent`	730 days (2 years)	Yes
Data subject requests	`DataSubjectRequest`	1,095 days (3 years)	Yes
Sessions	`Session`	14 days	Yes
CSRF tokens	`CsrfToken`	2 days	Yes
Rate limit buckets	`RateLimitBucket`	7 days	Yes
Login locks	`LoginLock`	7 days	Yes
Webhook deliveries	`WebhookDelivery`	No purge configured	No
Reconciliation runs	`ReconciliationRun`	No purge configured	No
Recurring charges	`RecurringCharge`	No purge configured	No

Three data categories currently have no automated purge: WebhookDeliveries, ReconciliationRuns, and RecurringCharges. These tables will grow indefinitely until purge logic is implemented. Monitor their row counts during routine maintenance.

Purge cron schedule

The data retention purge job runs as a Vercel cron function. The schedule and configuration are defined in vercel.json.

Property	Value
Schedule	Daily (once per day)
Endpoint	`/api/cron/purge`
Runtime	Vercel serverless function
Timeout	Constrained by Vercel function execution limits

Batch sizes

The purge job deletes records in batches to avoid long-running database transactions that could lock tables or exhaust connection pools. Each table is purged in its own batch operation.

Data category	Batch size	Notes
Events	1,000 rows per batch	High volume table. May need multiple runs if backlog is large.
Marketing events	500 rows per batch	Lower volume. Single batch usually sufficient.
Marketing consents	500 rows per batch	Very low volume. Rarely exceeds a single batch.
Data subject requests	100 rows per batch	Low volume. 3-year retention means deletions are rare.
Sessions	500 rows per batch	Moderate volume. 14-day retention keeps table small.
CSRF tokens	1,000 rows per batch	High volume (one per form render). 2-day retention means large daily deletes.
Rate limit buckets	1,000 rows per batch	Can spike during abuse. Monitor row count weekly.
Login locks	200 rows per batch	Low volume unless there is a credential-stuffing attack.

How purging works

For each data category, the purge job executes the following sequence:

Query the table for records where the relevant timestamp column (e.g., created_at, expires_at) is older than the retention period.
Select up to batch_size rows matching the retention criteria.
Delete the selected rows in a single DELETE statement.
If the number of deleted rows equals the batch size, there may be more rows to purge. The cron will catch them on the next daily run.

The purge job is designed to be idempotent. Running it multiple times in a day is safe and has no side effects beyond deleting more expired records.

Monitoring

Routinely check the following to ensure data retention is working correctly:

Check	Frequency	What to look for
Purge cron execution	Daily	Verify the cron ran successfully in the Vercel function logs. Look for errors or timeouts.
Table row counts	Weekly	Check row counts for Event, CsrfToken, and RateLimitBucket tables. These are the highest-volume tables and most likely to grow unexpectedly.
Unpurged table growth	Monthly	Check row counts for WebhookDelivery, ReconciliationRun, and RecurringCharge. These have no automated purge. Plan manual cleanup or implement purge logic if growth is concerning.
Database storage usage	Monthly	Monitor total database storage in the provider dashboard. Data retention issues will manifest as unexpected storage growth.

Manual purge

If the automated purge falls behind (e.g., after an outage or if a table has accumulated a large backlog), you can run a manual purge:

Trigger the purge endpoint manually: POST /api/cron/purge with the appropriate authorization header.
Check the response for the count of deleted rows per table.
If the response indicates the batch limit was hit for any table, run the endpoint again. Repeat until all tables report zero remaining expired rows.
For very large backlogs (100,000+ rows), consider running a direct database query with a larger batch size to avoid hitting the function timeout:
```
DELETE FROM "Event"
WHERE "created_at" < NOW() - INTERVAL '90 days'
LIMIT 10000;
```

Always run manual purge operations during low-traffic periods. Large delete operations can cause table locks and impact query performance for active users.

Future work

The following items are planned but not yet implemented:

WebhookDelivery purge: Retain successful deliveries for 90 days and failed deliveries for 180 days.
ReconciliationRun purge: Retain runs for 365 days. Flagged discrepancies should be retained indefinitely.
RecurringCharge purge: Retain charges for 365 days after the plan is terminated.
Purge metrics: Emit metrics for rows deleted per table per run, enabling dashboard monitoring and alerting on purge failures.