The Growth Problem
Church management databases accumulate attendance data at a steady, predictable rate. A mid-size church with 500 weekly check-ins generates approximately 26,000 check-in records per year. Add registrations, group memberships, and multiple services per week, and the numbers multiply quickly.
In practice, many database systems grow unchecked because extracting and archiving old data is prohibitively complex. Foreign key constraints, cascading dependencies, and tightly coupled relational schemas make it risky to delete historical rows — breaking a single FK reference can cascade into data integrity failures across the system. As a result, tables accumulate years of data that is rarely queried but constantly slows down:
- Backups — larger tables mean longer backup times and bigger backup files
- Indexing — more rows in B-tree indexes means slower INSERT and slower lookups on non-indexed columns
- Maintenance — VACUUM, REINDEX, and ANALYZE take longer on large tables
- Day-to-day queries — even indexed queries slow down as the table grows beyond cache capacity
Problem Solved
With ~5,800 people, 252 events, and 6 months of attendance data (112,140 check-in records), the archival script was run against the full database (46 MB including indexes). Here is what happened:
Before Archival
| Table | Rows | Size (data + indexes) |
|---|---|---|
checkins | 112,140 | 17 MB |
registrations | 3,118 | 728 KB |
event_attendance | 3 | 1,760 KB |
| All other tables | — | 26.5 MB |
| Total database | 46 MB | |
After Archival
| Table | Rows | Size (data + indexes) |
|---|---|---|
checkins | 33,569 | 2.8 MB (-84%) |
registrations | 1,009 | 128 KB (-82%) |
event_attendance | 171 | 2,928 KB |
| All other tables | — | 26.5 MB |
| Total database | 29 MB (-37%) | |
The real-time tables stay small and fast (only the most recent 2 months), while every historical attendance record remains fully queryable in compact long-term storage. The savings compound over time—after 10 years, the system stores attendance data in roughly 5% of the space a purely relational approach would require.
Full details, methodology, and long-term projections are covered in the Results & Metrics and Long-Term Projections sections at the bottom of this document.
Design Philosophy
This system takes a fundamentally different approach: archival is designed into the schema from day one. The relational tables are intentionally treated as a working buffer holding only the data needed for real-time operations. The JSONB archival tables are the planned, permanent destination.
Because both tiers are part of the same schema with clear ownership boundaries (one JSONB row per event, one per group meeting), the archival script can safely migrate and delete relational rows in a single atomic transaction without risking referential integrity violations. The FK relationships are structured so that archived data flows into parent records rather than being orphaned:
checkins→event_attendance.attendee_ids(JSONB array on the event)group_membership→church_group.archived_member_ids(JSONB array on the group)registrations→ deleted (intent data has no archival value after the event)
Key Design Decisions
| Decision | Rationale |
|---|---|
| Event-date-based eligibility | Archival is based on the linked event's start date, not the record's creation date. This prevents premature archival of registrations for future events (e.g., summer camp registered in March for a July event). |
| 2-month rolling window | Relational tables keep ~2 months of hot data. This covers the active check-in window, recent activity views, and admin dashboards without storing years of historical data. |
| Single-transaction archival | All three archival operations (checkins, registrations, memberships) run in one transaction. If any step fails, the entire archival rolls back. No partial state. |
| Registrations: delete without archival | A registration is a declaration of intent to attend, which has no historical value after the event has occurred. Actual attendance (the meaningful data) is captured in check-in records. |
| VACUUM FULL post-commit | Runs outside the transaction to physically reclaim disk space. Takes exclusive lock, acceptable at 3 AM Sunday when no users are active. |
Two-Tier Storage Model
checkins, registrations
Rolling 2-month window
SQL script via cron
Sunday 3:00 AM
event_attendance
small_group_attendance
Permanent compact storage
Storage Comparison
Tier 1: Hot Tables (Relational)
The relational tables hold only the most recent ~2 months of records, optimized for real-time operations:
checkins — Hot Data
| Column | Type | Purpose |
|---|---|---|
event_id | INTEGER (PK) | FK to events |
person_id | INTEGER (PK) | FK to people |
security_code | TEXT | 4-char pickup code (e.g., A3K7) |
checked_in_at | TIMESTAMPTZ | Exact check-in time |
Used by: kiosk duplicate detection, security code lookup, recent activity view, admin dashboard.
registrations — Hot Data
| Column | Type | Purpose |
|---|---|---|
event_id | INTEGER (PK) | FK to events |
person_id | INTEGER (PK) | FK to people |
registered_at | TIMESTAMPTZ | Registration time |
Used by: registration management, form registration flow, event roster display.
Tier 2: Cold JSONB Tables
JSONB tables store attendance as a flat array of person IDs per event or meeting date. One JSONB row replaces potentially hundreds of relational rows.
event_attendance — JSONB
| Column | Type | Purpose |
|---|---|---|
id | INTEGER (PK) | Auto-generated |
event_id | INTEGER (UNIQUE) | FK to events — one row per event |
attendee_ids | JSONB | Flat array of person IDs, e.g., [123, 456, 789] |
notes | TEXT | Optional notes |
recorded_by | INTEGER | FK to people (nullable) |
created_at | TIMESTAMPTZ | When archived or manually recorded |
small_group_attendance — JSONB
| Column | Type | Purpose |
|---|---|---|
id | INTEGER (PK) | Auto-generated |
small_group_id | INTEGER | FK to small_group |
meeting_date | DATE | UNIQUE with small_group_id |
attendee_ids | JSONB | Flat array of person IDs |
notes | TEXT | Optional meeting notes |
recorded_by | INTEGER | FK to people (nullable) |
JSONB tables are used for:
- Per-person attendance history — "Show me every event John attended in 2025"
- Head count trends — array length gives instant head count without
COUNT(*) - Reporting & analytics — attendance trends over months and years
GIN Indexes
JSONB attendance arrays are queried using PostgreSQL's @> containment operator,
backed by GIN (Generalized Inverted Index) indexes. This allows efficient per-person lookups
across all attendance records without sequential scans.
CREATE INDEX idx_event_attendance_attendee_ids
ON event_attendance USING GIN (attendee_ids);
CREATE INDEX idx_small_group_attendance_attendee_ids
ON small_group_attendance USING GIN (attendee_ids);
Query Example
-- Find all events person #42 attended
SELECT event_id, attendee_ids, created_at
FROM event_attendance
WHERE attendee_ids @> '42'::jsonb;
-- EF Core equivalent (via Npgsql)
.Where(a => EF.Functions.JsonContains(a.AttendeeIds, personId))
Permanent Groups
The church_group.is_permanent flag (boolean, default false) prevents a
group's memberships from ever being archived. This is used for groups where historical membership
is operationally important:
- Safety & security teams (must always know who had access)
- Staff rosters (HR/compliance requirement)
- Leadership teams (historical accountability)
Archival Script Overview
The archival engine is a single SQL script (scripts/archive_old_data.sql) that performs
three operations inside one transaction. Script file: ~138 lines of annotated SQL.
Execution Flow
Sunday 3 AM
BEGIN→ event_attendance
registrations
memberships
COMMITVACUUM FULL| Operation | Source | Destination | Action |
|---|---|---|---|
| 1. Checkins | checkins |
event_attendance.attendee_ids |
Upsert (merge person IDs into JSONB array), then delete |
| 2. Registrations | registrations |
none | Delete only — transient intent data |
| 3. Memberships | group_membership |
church_group.archived_member_ids |
Merge into group's JSONB array, then delete |
Operation 1: Checkin Archival (Upsert)
The most complex operation. Uses PostgreSQL INSERT ... ON CONFLICT DO UPDATE to safely
merge person IDs into the event_attendance.attendee_ids array, even when a row already
exists (e.g., from a previous partial archival or manual attendance save).
INSERT INTO event_attendance (event_id, attendee_ids, notes, created_at)
SELECT
c.event_id,
jsonb_agg(DISTINCT c.person_id ORDER BY c.person_id),
NULL,
now()
FROM checkins c
JOIN events e ON e.id = c.event_id
WHERE e.starts_at < cutoff_date
GROUP BY c.event_id
ON CONFLICT (event_id) DO UPDATE SET
attendee_ids = (
SELECT jsonb_agg(DISTINCT val ORDER BY val)
FROM (
SELECT jsonb_array_elements(event_attendance.attendee_ids)::integer AS val
UNION
SELECT jsonb_array_elements(EXCLUDED.attendee_ids)::integer AS val
) merged
);
How the Merge Works
GROUP BY c.event_id— aggregates all checkins for each event into one rowjsonb_agg(DISTINCT ...)— creates a deduplicated, sorted JSONB array of person IDsON CONFLICT— if the event already has an attendance row, merge arrays viaUNION- After upsert, delete the source checkin rows
Operation 2: Registration Cleanup
Registrations are deleted without archival. A registration represents intent to attend an event.
Once the event has passed, the meaningful data is the check-in record (actual attendance), not the
registration. Historical questions like "who attended VBS 2025?" are answered by the
event_attendance JSONB data, not by registrations.
DELETE FROM registrations
USING events e
WHERE registrations.event_id = e.id
AND e.starts_at < cutoff_date;
Operation 3: Membership Archival
Inactive group memberships are archived by merging person IDs into the group's
archived_member_ids JSONB column. Only memberships meeting all criteria are eligible:
membership_status = 'INACTIVE'church_group.is_permanent = falseend_date IS NOT NULLend_date < cutoff_date
This preserves a historical record of who was once in each group (queryable via JSONB containment) while removing rows from the relational table.
VACUUM & Disk Reclaim
After the transaction commits, VACUUM FULL runs on the three affected tables:
VACUUM FULL checkins;
VACUUM FULL registrations;
VACUUM FULL group_membership;
PostgreSQL's MVCC (Multi-Version Concurrency Control) model doesn't immediately reclaim disk space
when rows are deleted — it marks them as "dead tuples" for later cleanup. VACUUM FULL
rewrites the table file entirely, physically reclaiming the space. This takes an exclusive lock
on each table, which is acceptable at 3 AM on a Sunday when no users are active.
COMMIT first, then runs
VACUUM FULL as separate statements. If the VACUUM fails (e.g., server crash), the data
is still safely archived — only the disk space reclamation would need to be retried.
Scheduling
# Production cron (Ubuntu, runs as postgres user)
0 3 * * 0 sudo -u postgres psql -d myapp \
-f /var/www/church-app/scripts/archive_old_data.sql \
>> /var/log/church-archive.log 2>&1
| Parameter | Value | Rationale |
|---|---|---|
| Schedule | Sunday 3:00 AM | Lowest traffic window; church services don't start until 9 AM |
| Cutoff | 2 months | Keeps recent data for admin dashboards and activity views |
| Logging | /var/log/church-archive.log | Append mode; RAISE NOTICE messages show row counts |
| User | postgres | Superuser needed for VACUUM FULL |
EF Core & Npgsql Integration
The JSONB columns are mapped to List<long> in C# entities. Npgsql 8+ requires
explicit opt-in for dynamic JSON serialization:
// Program.cs — required for List<long> JSONB columns
var dataSourceBuilder = new NpgsqlDataSourceBuilder(connectionString);
dataSourceBuilder.EnableDynamicJson();
var dataSource = dataSourceBuilder.Build();
builder.Services.AddDbContext<ChurchCheckInDbContext>(options =>
options.UseNpgsql(dataSource));
Entity Mapping
// SmallGroupAttendance entity
public class SmallGroupAttendance
{
public long Id { get; set; }
public long SmallGroupId { get; set; }
public DateOnly MeetingDate { get; set; }
public List<long> AttendeeIds { get; set; } = []; // ← JSONB
public string? Notes { get; set; }
public long? RecordedBy { get; set; }
}
Query Patterns
Per-Person Attendance History (JSONB Containment)
// Repository: Find all events a person attended
public async Task<IEnumerable<EventAttendance>> GetByPersonIdAsync(
long personId, DateOnly? from, DateOnly? to)
{
var query = _db.EventAttendances
.Include(a => a.Event).ThenInclude(e => e.Location)
.Where(a => EF.Functions.JsonContains(a.AttendeeIds, personId));
if (from.HasValue) query = query.Where(...);
if (to.HasValue) query = query.Where(...);
return await query.OrderByDescending(a => a.CreatedAt).ToListAsync();
}
Upsert Attendance (Save or Update)
// Repository: Save/update attendance for a meeting date
var existing = await GetByDateAsync(smallGroupId, date);
if (existing != null)
{
existing.AttendeeIds = dto.AttendeeIds;
existing.Notes = dto.Notes;
await UpdateAsync(existing);
}
else
{
await AddAsync(new SmallGroupAttendance { ... });
}
Head Count (No COUNT(*) Needed)
// DTO includes computed HeadCount from array length
HeadCount = attendance.AttendeeIds.Count // List<long>.Count in C#
API Endpoints
The attendance data is exposed through dedicated REST endpoints:
| Endpoint | Auth | Description |
|---|---|---|
GET /api/attendance/event/{eventId} | Admin | Get event attendance (JSONB) |
POST /api/attendance/event/{eventId} | Admin | Save/upsert event attendance |
GET /api/attendance/person/{personId} | Auth | Unified person attendance history |
GET /api/smallgroup/{id}/attendance | Admin | Small group attendance list |
POST /api/smallgroup/{id}/attendance | Admin | Save/upsert small group attendance |
GET /api/smallgroup/{id}/attendance/person/{personId} | Admin | Per-person stats in a group |
Proof of Concept: Demo Setup
A comprehensive data generation script (scripts/generate_checkin_demo_data.sql) was
created to demonstrate the archival system at scale. The script generates 4 months of realistic
church activity data across multiple event types, groups, and attendance patterns.
For full details on the data generation plan, event categories, attendance rates, and group configurations, see the Data Generation Plan document.
Results & Metrics
The archival script was run with a 2-month cutoff against a database with ~5,800 people and 6 months of attendance history (Aug 2025 – Feb 2026). Events older than 2 months were archived.
Archival Operations
Table Size: Before vs. After
Before Archival
| Table | Rows | Data | Indexes | Total |
|---|---|---|---|---|
checkins | 112,140 | 6,672 KB | 11 MB | 17 MB |
registrations | 3,118 | 160 KB | 568 KB | 728 KB |
event_attendance | 3 | 120 KB | 1,640 KB | 1,760 KB |
| All other tables | — | — | — | 26.5 MB |
| Total Database | 46 MB | |||
After Archival + VACUUM FULL
| Table | Rows | Data | Indexes | Total |
|---|---|---|---|---|
checkins | 33,569 | 1,712 KB | 1,080 KB | 2.8 MB (-84%) |
registrations | 1,009 | 48 KB | 80 KB | 128 KB (-82%) |
event_attendance | 171 | 168 KB | 2,760 KB | 2,928 KB |
| All other tables | — | — | — | 26.5 MB |
| Total Database | 29 MB (-37%) | |||
Key Achievements
| Metric | Value |
|---|---|
| Total database size reduction | -37% (46 MB → 29 MB) |
| Checkins table reduction | -84% (17 MB → 2.8 MB) |
| Checkins index reduction | -90% (11 MB → 1,080 KB) |
| Registrations table reduction | -82% (728 KB → 128 KB) |
| JSONB archive growth | +1,168 KB (1,760 KB → 2,928 KB) — absorbs 78K rows with only 66% size increase |
| Row compression ratio (checkins) | 460:1 (78,571 rows → 171 JSONB rows) |
| Net disk space reclaimed | 17 MB (checkins + registrations freed, minus event_attendance growth) |
| Remaining hot data | 33,569 checkins + 1,009 registrations (most recent 2 months) |
Long-Term Projections
Based on measured results (~225,000 check-ins/year across ~500 events for a mid-size church with ~5,800 people), here are projected impacts over time:
| Timeframe | Without Archival (checkins table) | With Archival (hot rows + JSONB) | JSONB Archive Size | Savings |
|---|---|---|---|---|
| 1 year | ~225K rows (~34 MB with indexes) | ~37K hot rows (~5 MB) | ~500 JSONB rows (~3 MB) | ~76% |
| 3 years | ~675K rows (~102 MB with indexes) | ~37K hot rows (~5 MB) | ~1,500 JSONB rows (~8 MB) | ~87% |
| 5 years | ~1.1M rows (~170 MB with indexes) | ~37K hot rows (~5 MB) | ~2,500 JSONB rows (~13 MB) | ~89% |
| 10 years | ~2.25M rows (~340 MB with indexes) | ~37K hot rows (~5 MB) | ~5,000 JSONB rows (~25 MB) | ~91% |
Additional benefits beyond raw storage:
- Faster backups — smaller tables mean faster
pg_dumpand smaller backup files - Faster VACUUM — routine maintenance operates on a smaller working set
- Faster kiosk check-in — the
checkinstable stays small, keeping duplicate detection fast - Consistent query performance — the hot tables never grow beyond ~2 months of data