Data Retention

Retention periods, purge schedules, and batch sizes for all data categories.

Retention schedule

The following table lists all data categories, their retention periods, and whether automated purging is currently configured.

Data categoryTable / modelRetention periodAuto-purge
Events (analytics)Event90 daysYes
Marketing eventsMarketingEvent730 days (2 years)Yes
Marketing consentsMarketingConsent730 days (2 years)Yes
Data subject requestsDataSubjectRequest1,095 days (3 years)Yes
SessionsSession14 daysYes
CSRF tokensCsrfToken2 daysYes
Rate limit bucketsRateLimitBucket7 daysYes
Login locksLoginLock7 daysYes
Webhook deliveriesWebhookDeliveryNo purge configuredNo
Reconciliation runsReconciliationRunNo purge configuredNo
Recurring chargesRecurringChargeNo purge configuredNo
Three data categories currently have no automated purge: WebhookDeliveries, ReconciliationRuns, and RecurringCharges. These tables will grow indefinitely until purge logic is implemented. Monitor their row counts during routine maintenance.

Purge cron schedule

The data retention purge job runs as a Vercel cron function. The schedule and configuration are defined in vercel.json.

PropertyValue
ScheduleDaily (once per day)
Endpoint/api/cron/purge
RuntimeVercel serverless function
TimeoutConstrained by Vercel function execution limits

Batch sizes

The purge job deletes records in batches to avoid long-running database transactions that could lock tables or exhaust connection pools. Each table is purged in its own batch operation.

Data categoryBatch sizeNotes
Events1,000 rows per batchHigh volume table. May need multiple runs if backlog is large.
Marketing events500 rows per batchLower volume. Single batch usually sufficient.
Marketing consents500 rows per batchVery low volume. Rarely exceeds a single batch.
Data subject requests100 rows per batchLow volume. 3-year retention means deletions are rare.
Sessions500 rows per batchModerate volume. 14-day retention keeps table small.
CSRF tokens1,000 rows per batchHigh volume (one per form render). 2-day retention means large daily deletes.
Rate limit buckets1,000 rows per batchCan spike during abuse. Monitor row count weekly.
Login locks200 rows per batchLow volume unless there is a credential-stuffing attack.

How purging works

For each data category, the purge job executes the following sequence:

  1. Query the table for records where the relevant timestamp column (e.g., created_at, expires_at) is older than the retention period.
  2. Select up to batch_size rows matching the retention criteria.
  3. Delete the selected rows in a single DELETE statement.
  4. If the number of deleted rows equals the batch size, there may be more rows to purge. The cron will catch them on the next daily run.
The purge job is designed to be idempotent. Running it multiple times in a day is safe and has no side effects beyond deleting more expired records.

Monitoring

Routinely check the following to ensure data retention is working correctly:

CheckFrequencyWhat to look for
Purge cron executionDailyVerify the cron ran successfully in the Vercel function logs. Look for errors or timeouts.
Table row countsWeeklyCheck row counts for Event, CsrfToken, and RateLimitBucket tables. These are the highest-volume tables and most likely to grow unexpectedly.
Unpurged table growthMonthlyCheck row counts for WebhookDelivery, ReconciliationRun, and RecurringCharge. These have no automated purge. Plan manual cleanup or implement purge logic if growth is concerning.
Database storage usageMonthlyMonitor total database storage in the provider dashboard. Data retention issues will manifest as unexpected storage growth.

Manual purge

If the automated purge falls behind (e.g., after an outage or if a table has accumulated a large backlog), you can run a manual purge:

  1. Trigger the purge endpoint manually: POST /api/cron/purge with the appropriate authorization header.
  2. Check the response for the count of deleted rows per table.
  3. If the response indicates the batch limit was hit for any table, run the endpoint again. Repeat until all tables report zero remaining expired rows.
  4. For very large backlogs (100,000+ rows), consider running a direct database query with a larger batch size to avoid hitting the function timeout:
    DELETE FROM "Event"
    WHERE "created_at" < NOW() - INTERVAL '90 days'
    LIMIT 10000;

Future work

The following items are planned but not yet implemented:

  • WebhookDelivery purge: Retain successful deliveries for 90 days and failed deliveries for 180 days.
  • ReconciliationRun purge: Retain runs for 365 days. Flagged discrepancies should be retained indefinitely.
  • RecurringCharge purge: Retain charges for 365 days after the plan is terminated.
  • Purge metrics: Emit metrics for rows deleted per table per run, enabling dashboard monitoring and alerting on purge failures.