Database Archival Feature

The Growth Problem

Church management databases accumulate attendance data at a steady, predictable rate. A mid-size church with 500 weekly check-ins generates approximately 26,000 check-in records per year. Add registrations, group memberships, and multiple services per week, and the numbers multiply quickly.

In practice, many database systems grow unchecked because extracting and archiving old data is prohibitively complex. Foreign key constraints, cascading dependencies, and tightly coupled relational schemas make it risky to delete historical rows — breaking a single FK reference can cascade into data integrity failures across the system. As a result, tables accumulate years of data that is rarely queried but constantly slows down:

Backups — larger tables mean longer backup times and bigger backup files
Indexing — more rows in B-tree indexes means slower INSERT and slower lookups on non-indexed columns
Maintenance — VACUUM, REINDEX, and ANALYZE take longer on large tables
Day-to-day queries — even indexed queries slow down as the table grows beyond cache capacity

The Common Trap Most systems add archival as an afterthought, if at all. By the time the database is large enough to cause problems, the relational dependencies are so intertwined that nobody dares delete old data. The solution becomes "buy a bigger server" or "add read replicas" rather than addressing the root cause.

Problem Solved

With ~5,800 people, 252 events, and 6 months of attendance data (112,140 check-in records), the archival script was run against the full database (46 MB including indexes). Here is what happened:

78,571

Checkins Archived

2,109

Registrations Cleaned

171

Summary Rows Created

460 : 1

Row Compression Ratio

Before Archival

Table	Rows	Size (data + indexes)
`checkins`	112,140	17 MB
`registrations`	3,118	728 KB
`event_attendance`	3	1,760 KB
All other tables	—	26.5 MB
Total database		46 MB

After Archival

Table	Rows	Size (data + indexes)
`checkins`	33,569	2.8 MB (-84%)
`registrations`	1,009	128 KB (-82%)
`event_attendance`	171 (see below)	2,928 KB
All other tables	—	26.5 MB
Total database		29 MB (-37%)

event_attendance grows intentionally — it is the archive destination, absorbing 78,571 individual check-in rows compressed into 171 JSONB summary rows (one per event). The real-time tables stay small and fast (only the most recent 2 months), while every historical attendance record remains fully queryable in compact long-term storage. The savings compound over time—after 10 years, the system stores attendance data in roughly 5% of the space a purely relational approach would require.

Full details, methodology, and long-term projections are covered in the Results & Metrics and Long-Term Projections sections at the bottom of this document.

Design Philosophy

This system takes a fundamentally different approach: archival is designed into the schema from day one. The relational tables are intentionally treated as a working buffer holding only the data needed for real-time operations. The JSONB archival tables are the planned, permanent destination.

Because both tiers are part of the same schema with clear ownership boundaries (one JSONB row per event, one per group meeting), the archival script can safely migrate and delete relational rows in a single atomic transaction without risking referential integrity violations. The FK relationships are structured so that archived data flows into parent records rather than being orphaned:

checkins → event_attendance.attendee_ids (JSONB array on the event)
group_membership → church_group.archived_member_ids (JSONB array on the group)
registrations → deleted (intent data has no archival value after the event)

Key Design Decisions

Decision	Rationale
Event-date-based eligibility	Archival is based on the linked event's start date, not the record's creation date. This prevents premature archival of registrations for future events (e.g., summer camp registered in March for a July event).
2-month rolling window	Relational tables keep ~2 months of hot data. This covers the active check-in window, recent activity views, and admin dashboards without storing years of historical data.
Single-transaction archival	All three archival operations (checkins, registrations, memberships) run in one transaction. If any step fails, the entire archival rolls back. No partial state.
Registrations: delete without archival	A registration is a declaration of intent to attend, which has no historical value after the event has occurred. Actual attendance (the meaningful data) is captured in check-in records.
VACUUM FULL post-commit	Runs outside the transaction to physically reclaim disk space. Takes exclusive lock, acceptable at 3 AM Sunday when no users are active.

Two-Tier Storage Model

Tier 1 — Relational
checkins, registrations
Rolling 2-month window

→

Weekly Archival
SQL script via cron
Sunday 3:00 AM

→

Tier 2 — JSONB
event_attendance
small_group_attendance
Permanent compact storage

Storage Comparison

200 check-ins for one event Relational (checkins table) 200 rows × ~40 bytes each = ~8,000 bytes — event_id (8B) + person_id (8B) + security_code (4B) + timestamp (8B) + overhead JSONB (event_attendance table) 1 row with attendee_ids = [101, 102, 103, ..., 300] ~1,200 bytes (array of 200 integers + JSONB overhead) Compression ratio: ~6.7x smaller per event

Tier 1: Hot Tables (Relational)

The relational tables hold only the most recent ~2 months of records, optimized for real-time operations:

`checkins` — Hot Data

Column	Type	Purpose
`event_id`	INTEGER (PK)	FK to events
`person_id`	INTEGER (PK)	FK to people
`security_code`	TEXT	4-char pickup code (e.g., A3K7)
`checked_in_at`	TIMESTAMPTZ	Exact check-in time

Used by: kiosk duplicate detection, security code lookup, recent activity view, admin dashboard.

`registrations` — Hot Data

Column	Type	Purpose
`event_id`	INTEGER (PK)	FK to events
`person_id`	INTEGER (PK)	FK to people
`registered_at`	TIMESTAMPTZ	Registration time

Used by: registration management, form registration flow, event roster display.

Tier 2: Cold JSONB Tables

JSONB tables store attendance as a flat array of person IDs per event or meeting date. One JSONB row replaces potentially hundreds of relational rows.

`event_attendance` — JSONB

Column	Type	Purpose
`id`	INTEGER (PK)	Auto-generated
`event_id`	INTEGER (UNIQUE)	FK to events — one row per event
`attendee_ids`	JSONB	Flat array of person IDs, e.g., `[123, 456, 789]`
`notes`	TEXT	Optional notes
`recorded_by`	INTEGER	FK to people (nullable)
`created_at`	TIMESTAMPTZ	When archived or manually recorded

`small_group_attendance` — JSONB

Column	Type	Purpose
`id`	INTEGER (PK)	Auto-generated
`small_group_id`	INTEGER	FK to small_group
`meeting_date`	DATE	UNIQUE with small_group_id
`attendee_ids`	JSONB	Flat array of person IDs
`notes`	TEXT	Optional meeting notes
`recorded_by`	INTEGER	FK to people (nullable)

JSONB tables are used for:

Per-person attendance history — "Show me every event John attended in 2025"
Head count trends — array length gives instant head count without COUNT(*)
Reporting & analytics — attendance trends over months and years

GIN Indexes

JSONB attendance arrays are queried using PostgreSQL's @> containment operator, backed by GIN (Generalized Inverted Index) indexes. This allows efficient per-person lookups across all attendance records without sequential scans.

CREATE INDEX idx_event_attendance_attendee_ids
  ON event_attendance USING GIN (attendee_ids);

CREATE INDEX idx_small_group_attendance_attendee_ids
  ON small_group_attendance USING GIN (attendee_ids);

Query Example

-- Find all events person #42 attended
SELECT event_id, attendee_ids, created_at
FROM event_attendance
WHERE attendee_ids @> '42'::jsonb;

-- EF Core equivalent (via Npgsql)
.Where(a => EF.Functions.JsonContains(a.AttendeeIds, personId))

Permanent Groups

The church_group.is_permanent flag (boolean, default false) prevents a group's memberships from ever being archived. This is used for groups where historical membership is operationally important:

Safety & security teams (must always know who had access)
Staff rosters (HR/compliance requirement)
Leadership teams (historical accountability)

← Back to Documentation Home

Archival Script Overview

The archival engine is a single SQL script (scripts/archive_old_data.sql) that performs three operations inside one transaction. Script file: ~138 lines of annotated SQL.

Execution Flow

Cron fires
Sunday 3 AM

→

BEGIN

→

1. Upsert checkins
→ event_attendance

→

2. Delete old
registrations

→

3. Archive inactive
memberships

→

COMMIT

→

VACUUM FULL

Operation	Source	Destination	Action
1. Checkins	`checkins`	`event_attendance.attendee_ids`	Upsert (merge person IDs into JSONB array), then delete
2. Registrations	`registrations`	none	Delete only — transient intent data
3. Memberships	`group_membership`	`church_group.archived_member_ids`	Merge into group's JSONB array, then delete

Operation 1: Checkin Archival (Upsert)

The most complex operation. Uses PostgreSQL INSERT ... ON CONFLICT DO UPDATE to safely merge person IDs into the event_attendance.attendee_ids array, even when a row already exists (e.g., from a previous partial archival or manual attendance save).

INSERT INTO event_attendance (event_id, attendee_ids, notes, created_at)
SELECT
    c.event_id,
    jsonb_agg(DISTINCT c.person_id ORDER BY c.person_id),
    NULL,
    now()
FROM checkins c
JOIN events e ON e.id = c.event_id
WHERE e.starts_at < cutoff_date
GROUP BY c.event_id
ON CONFLICT (event_id) DO UPDATE SET
    attendee_ids = (
        SELECT jsonb_agg(DISTINCT val ORDER BY val)
        FROM (
            SELECT jsonb_array_elements(event_attendance.attendee_ids)::integer AS val
            UNION
            SELECT jsonb_array_elements(EXCLUDED.attendee_ids)::integer AS val
        ) merged
    );

How the Merge Works

GROUP BY c.event_id — aggregates all checkins for each event into one row
jsonb_agg(DISTINCT ...) — creates a deduplicated, sorted JSONB array of person IDs
ON CONFLICT — if the event already has an attendance row, merge arrays via UNION
After upsert, delete the source checkin rows

Idempotent by Design The UNION-based merge is idempotent. Running the archival script twice produces the same result — duplicate person IDs are deduplicated by the DISTINCT clause. This is important for operational safety.

Operation 2: Registration Cleanup

Registrations are deleted without archival. A registration represents intent to attend an event. Once the event has passed, the meaningful data is the check-in record (actual attendance), not the registration. Historical questions like "who attended VBS 2025?" are answered by the event_attendance JSONB data, not by registrations.

DELETE FROM registrations
USING events e
WHERE registrations.event_id = e.id
  AND e.starts_at < cutoff_date;

Operation 3: Membership Archival

Inactive group memberships are archived by merging person IDs into the group's archived_member_ids JSONB column. Only memberships meeting all criteria are eligible:

membership_status = 'INACTIVE'
church_group.is_permanent = false
end_date IS NOT NULL
end_date < cutoff_date

This preserves a historical record of who was once in each group (queryable via JSONB containment) while removing rows from the relational table.

VACUUM & Disk Reclaim

After the transaction commits, VACUUM FULL runs on the three affected tables:

VACUUM FULL checkins;
VACUUM FULL registrations;
VACUUM FULL group_membership;

PostgreSQL's MVCC (Multi-Version Concurrency Control) model doesn't immediately reclaim disk space when rows are deleted — it marks them as "dead tuples" for later cleanup. VACUUM FULL rewrites the table file entirely, physically reclaiming the space. This takes an exclusive lock on each table, which is acceptable at 3 AM on a Sunday when no users are active.

VACUUM FULL runs outside the transaction It cannot run inside a transaction block. The script issues COMMIT first, then runs VACUUM FULL as separate statements. If the VACUUM fails (e.g., server crash), the data is still safely archived — only the disk space reclamation would need to be retried.

Scheduling

# Production cron (Ubuntu, runs as postgres user)
0 3 * * 0 sudo -u postgres psql -d myapp \
  -f /var/www/church-app/scripts/archive_old_data.sql \
  >> /var/log/church-archive.log 2>&1

Parameter	Value	Rationale
Schedule	Sunday 3:00 AM	Lowest traffic window; church services don't start until 9 AM
Cutoff	2 months	Keeps recent data for admin dashboards and activity views
Logging	`/var/log/church-archive.log`	Append mode; `RAISE NOTICE` messages show row counts
User	postgres	Superuser needed for `VACUUM FULL`

← Back to Documentation Home

EF Core & Npgsql Integration

The JSONB columns are mapped to List<long> in C# entities. Npgsql 8+ requires explicit opt-in for dynamic JSON serialization:

// Program.cs — required for List<long> JSONB columns
var dataSourceBuilder = new NpgsqlDataSourceBuilder(connectionString);
dataSourceBuilder.EnableDynamicJson();
var dataSource = dataSourceBuilder.Build();

builder.Services.AddDbContext<ChurchCheckInDbContext>(options =>
    options.UseNpgsql(dataSource));

Entity Mapping

// SmallGroupAttendance entity
public class SmallGroupAttendance
{
    public long Id { get; set; }
    public long SmallGroupId { get; set; }
    public DateOnly MeetingDate { get; set; }
    public List<long> AttendeeIds { get; set; } = [];  // ← JSONB
    public string? Notes { get; set; }
    public long? RecordedBy { get; set; }
}

Query Patterns

Per-Person Attendance History (JSONB Containment)

// Repository: Find all events a person attended
public async Task<IEnumerable<EventAttendance>> GetByPersonIdAsync(
    long personId, DateOnly? from, DateOnly? to)
{
    var query = _db.EventAttendances
        .Include(a => a.Event).ThenInclude(e => e.Location)
        .Where(a => EF.Functions.JsonContains(a.AttendeeIds, personId));

    if (from.HasValue) query = query.Where(...);
    if (to.HasValue) query = query.Where(...);

    return await query.OrderByDescending(a => a.CreatedAt).ToListAsync();
}

Upsert Attendance (Save or Update)

// Repository: Save/update attendance for a meeting date
var existing = await GetByDateAsync(smallGroupId, date);
if (existing != null)
{
    existing.AttendeeIds = dto.AttendeeIds;
    existing.Notes = dto.Notes;
    await UpdateAsync(existing);
}
else
{
    await AddAsync(new SmallGroupAttendance { ... });
}

Head Count (No COUNT(*) Needed)

// DTO includes computed HeadCount from array length
HeadCount = attendance.AttendeeIds.Count  // List<long>.Count in C#

← Back to Documentation Home

API Endpoints

The attendance data is exposed through dedicated REST endpoints:

Endpoint	Auth	Description
`GET /api/attendance/event/{eventId}`	Admin	Get event attendance (JSONB)
`POST /api/attendance/event/{eventId}`	Admin	Save/upsert event attendance
`GET /api/attendance/person/{personId}`	Auth	Unified person attendance history
`GET /api/smallgroup/{id}/attendance`	Admin	Small group attendance list
`POST /api/smallgroup/{id}/attendance`	Admin	Save/upsert small group attendance
`GET /api/smallgroup/{id}/attendance/person/{personId}`	Admin	Per-person stats in a group

Proof of Concept: Demo Setup

A comprehensive data generation script (scripts/generate_checkin_demo_data.sql) was created to demonstrate the archival system at scale. The script generates 4 months of realistic church activity data across multiple event types, groups, and attendance patterns.

158

Events Generated

61,927

Checkins Generated

9,584

Registrations

Small Groups Active

For full details on the data generation plan, event categories, attendance rates, and group configurations, see the Data Generation Plan document.

Results & Metrics

The archival script was run with a 2-month cutoff against a database with ~5,800 people and 6 months of attendance history (Aug 2025 – Feb 2026). Events older than 2 months were archived.

Archival Operations

78,571

Checkins Archived

2,109

Registrations Deleted

171

JSONB Rows Created

Table Size: Before vs. After

Before Archival

Table	Rows	Data	Indexes	Total
`checkins`	112,140	6,672 KB	11 MB	17 MB
`registrations`	3,118	160 KB	568 KB	728 KB
`event_attendance`	3	120 KB	1,640 KB	1,760 KB
All other tables	—	—	—	26.5 MB
Total Database				46 MB

After Archival + VACUUM FULL

Table	Rows	Data	Indexes	Total
`checkins`	33,569	1,712 KB	1,080 KB	2.8 MB (-84%)
`registrations`	1,009	48 KB	80 KB	128 KB (-82%)
`event_attendance`	171	168 KB	2,760 KB	2,928 KB
All other tables	—	—	—	26.5 MB
Total Database				29 MB (-37%)

Key Achievements

Metric	Value
Total database size reduction	-37% (46 MB → 29 MB)
Checkins table reduction	-84% (17 MB → 2.8 MB)
Checkins index reduction	-90% (11 MB → 1,080 KB)
Registrations table reduction	-82% (728 KB → 128 KB)
JSONB archive growth	+1,168 KB (1,760 KB → 2,928 KB) — absorbs 78K rows with only 66% size increase
Row compression ratio (checkins)	460:1 (78,571 rows → 171 JSONB rows)
Net disk space reclaimed	17 MB (checkins + registrations freed, minus event_attendance growth)
Remaining hot data	33,569 checkins + 1,009 registrations (most recent 2 months)

460:1 Row Reduction — Index Savings Are the Real Win 78,571 individual check-in rows were compressed into just 171 JSONB event_attendance rows. While the data itself shrank 74% (6,672 KB → 1,712 KB), the indexes shrank 90% (11 MB → 1,080 KB). In production databases, indexes often consume more space than the data itself — every row needs primary key, foreign key, and query-optimization index entries. The JSONB format eliminates all per-row index overhead, storing only the essential attendance data.

← Back to Documentation Home

Long-Term Projections

Based on measured results (~225,000 check-ins/year across ~500 events for a mid-size church with ~5,800 people), here are projected impacts over time:

Timeframe	Without Archival (checkins table)	With Archival (hot rows + JSONB)	JSONB Archive Size	Savings
1 year	~225K rows (~34 MB with indexes)	~37K hot rows (~5 MB)	~500 JSONB rows (~3 MB)	~76%
3 years	~675K rows (~102 MB with indexes)	~37K hot rows (~5 MB)	~1,500 JSONB rows (~8 MB)	~87%
5 years	~1.1M rows (~170 MB with indexes)	~37K hot rows (~5 MB)	~2,500 JSONB rows (~13 MB)	~89%
10 years	~2.25M rows (~340 MB with indexes)	~37K hot rows (~5 MB)	~5,000 JSONB rows (~25 MB)	~91%

Asymmetric Growth — The Key Insight Without archival, the checkins table grows at ~225K rows/year (data plus 2–3 indexes per row). With archival, the hot table stays constant (~2 months of data), and the JSONB archive adds only ~500 rows/year (one per event, not per person). After 10 years, a 2.25M-row table that would consume ~340 MB is replaced by a fixed 5 MB hot table plus a compact 25 MB JSONB archive — 91% less storage with zero loss of queryable data.

Additional benefits beyond raw storage:

Faster backups — smaller tables mean faster pg_dump and smaller backup files
Faster VACUUM — routine maintenance operates on a smaller working set
Faster kiosk check-in — the checkins table stays small, keeping duplicate detection fast
Consistent query performance — the hot tables never grow beyond ~2 months of data

Document	Description
Documentation Home	Main documentation page with links to all documents
Technical Addendum	Full system architecture, tech stack, all design decisions
Data Generation Plan	Detailed plan for generating 50K+ demo records, step-by-step data generation, actual results

Script	Purpose
`scripts/archive_old_data.sql`	Weekly archival script — the production archival engine
`scripts/generate_checkin_demo_data.sql`	Data generation script for archival demonstration
`scripts/create_attendance_tables.sql`	Schema creation for event_attendance and small_group_attendance tables

The Growth Problem

Problem Solved

Before Archival

After Archival

Design Philosophy

Key Design Decisions

Two-Tier Storage Model

Storage Comparison

Tier 1: Hot Tables (Relational)

checkins — Hot Data

registrations — Hot Data

Tier 2: Cold JSONB Tables

event_attendance — JSONB

small_group_attendance — JSONB

GIN Indexes

Query Example

Permanent Groups

Archival Script Overview

Execution Flow

Operation 1: Checkin Archival (Upsert)

How the Merge Works

Operation 2: Registration Cleanup

Operation 3: Membership Archival

VACUUM & Disk Reclaim

Scheduling

EF Core & Npgsql Integration

Entity Mapping

Query Patterns

Per-Person Attendance History (JSONB Containment)

Upsert Attendance (Save or Update)

Head Count (No COUNT(*) Needed)

API Endpoints

Proof of Concept: Demo Setup

Results & Metrics

Archival Operations

Table Size: Before vs. After

Before Archival

After Archival + VACUUM FULL

Key Achievements

Long-Term Projections

Related Documents

Related Scripts

`checkins` — Hot Data

`registrations` — Hot Data

`event_attendance` — JSONB

`small_group_attendance` — JSONB