← Back to Documentation Home

The Growth Problem

Church management databases accumulate attendance data at a steady, predictable rate. A mid-size church with 500 weekly check-ins generates approximately 26,000 check-in records per year. Add registrations, group memberships, and multiple services per week, and the numbers multiply quickly.

In practice, many database systems grow unchecked because extracting and archiving old data is prohibitively complex. Foreign key constraints, cascading dependencies, and tightly coupled relational schemas make it risky to delete historical rows — breaking a single FK reference can cascade into data integrity failures across the system. As a result, tables accumulate years of data that is rarely queried but constantly slows down:

  • Backups — larger tables mean longer backup times and bigger backup files
  • Indexing — more rows in B-tree indexes means slower INSERT and slower lookups on non-indexed columns
  • Maintenance — VACUUM, REINDEX, and ANALYZE take longer on large tables
  • Day-to-day queries — even indexed queries slow down as the table grows beyond cache capacity
The Common Trap Most systems add archival as an afterthought, if at all. By the time the database is large enough to cause problems, the relational dependencies are so intertwined that nobody dares delete old data. The solution becomes "buy a bigger server" or "add read replicas" rather than addressing the root cause.

Problem Solved

With ~5,800 people, 252 events, and 6 months of attendance data (112,140 check-in records), the archival script was run against the full database (46 MB including indexes). Here is what happened:

78,571
Checkins Archived
2,109
Registrations Cleaned
171
Summary Rows Created
460 : 1
Row Compression Ratio

Before Archival

TableRowsSize (data + indexes)
checkins112,14017 MB
registrations3,118728 KB
event_attendance31,760 KB
All other tables26.5 MB
Total database46 MB

After Archival

TableRowsSize (data + indexes)
checkins33,5692.8 MB (-84%)
registrations1,009128 KB (-82%)
event_attendance1712,928 KB
All other tables26.5 MB
Total database29 MB (-37%)

The real-time tables stay small and fast (only the most recent 2 months), while every historical attendance record remains fully queryable in compact long-term storage. The savings compound over time—after 10 years, the system stores attendance data in roughly 5% of the space a purely relational approach would require.

Full details, methodology, and long-term projections are covered in the Results & Metrics and Long-Term Projections sections at the bottom of this document.

Design Philosophy

This system takes a fundamentally different approach: archival is designed into the schema from day one. The relational tables are intentionally treated as a working buffer holding only the data needed for real-time operations. The JSONB archival tables are the planned, permanent destination.

Because both tiers are part of the same schema with clear ownership boundaries (one JSONB row per event, one per group meeting), the archival script can safely migrate and delete relational rows in a single atomic transaction without risking referential integrity violations. The FK relationships are structured so that archived data flows into parent records rather than being orphaned:

  • checkinsevent_attendance.attendee_ids (JSONB array on the event)
  • group_membershipchurch_group.archived_member_ids (JSONB array on the group)
  • registrationsdeleted (intent data has no archival value after the event)

Key Design Decisions

DecisionRationale
Event-date-based eligibility Archival is based on the linked event's start date, not the record's creation date. This prevents premature archival of registrations for future events (e.g., summer camp registered in March for a July event).
2-month rolling window Relational tables keep ~2 months of hot data. This covers the active check-in window, recent activity views, and admin dashboards without storing years of historical data.
Single-transaction archival All three archival operations (checkins, registrations, memberships) run in one transaction. If any step fails, the entire archival rolls back. No partial state.
Registrations: delete without archival A registration is a declaration of intent to attend, which has no historical value after the event has occurred. Actual attendance (the meaningful data) is captured in check-in records.
VACUUM FULL post-commit Runs outside the transaction to physically reclaim disk space. Takes exclusive lock, acceptable at 3 AM Sunday when no users are active.

Two-Tier Storage Model

Tier 1 — Relational
checkins, registrations
Rolling 2-month window
Weekly Archival
SQL script via cron
Sunday 3:00 AM
Tier 2 — JSONB
event_attendance
small_group_attendance
Permanent compact storage

Storage Comparison

200 check-ins for one event Relational (checkins table) 200 rows × ~40 bytes each = ~8,000 bytes — event_id (8B) + person_id (8B) + security_code (4B) + timestamp (8B) + overhead JSONB (event_attendance table) 1 row with attendee_ids = [101, 102, 103, ..., 300] ~1,200 bytes (array of 200 integers + JSONB overhead) Compression ratio: ~6.7x smaller per event

Tier 1: Hot Tables (Relational)

The relational tables hold only the most recent ~2 months of records, optimized for real-time operations:

checkinsHot Data

ColumnTypePurpose
event_idINTEGER (PK)FK to events
person_idINTEGER (PK)FK to people
security_codeTEXT4-char pickup code (e.g., A3K7)
checked_in_atTIMESTAMPTZExact check-in time

Used by: kiosk duplicate detection, security code lookup, recent activity view, admin dashboard.

registrationsHot Data

ColumnTypePurpose
event_idINTEGER (PK)FK to events
person_idINTEGER (PK)FK to people
registered_atTIMESTAMPTZRegistration time

Used by: registration management, form registration flow, event roster display.

Tier 2: Cold JSONB Tables

JSONB tables store attendance as a flat array of person IDs per event or meeting date. One JSONB row replaces potentially hundreds of relational rows.

event_attendanceJSONB

ColumnTypePurpose
idINTEGER (PK)Auto-generated
event_idINTEGER (UNIQUE)FK to events — one row per event
attendee_idsJSONBFlat array of person IDs, e.g., [123, 456, 789]
notesTEXTOptional notes
recorded_byINTEGERFK to people (nullable)
created_atTIMESTAMPTZWhen archived or manually recorded

small_group_attendanceJSONB

ColumnTypePurpose
idINTEGER (PK)Auto-generated
small_group_idINTEGERFK to small_group
meeting_dateDATEUNIQUE with small_group_id
attendee_idsJSONBFlat array of person IDs
notesTEXTOptional meeting notes
recorded_byINTEGERFK to people (nullable)

JSONB tables are used for:

  • Per-person attendance history — "Show me every event John attended in 2025"
  • Head count trends — array length gives instant head count without COUNT(*)
  • Reporting & analytics — attendance trends over months and years

GIN Indexes

JSONB attendance arrays are queried using PostgreSQL's @> containment operator, backed by GIN (Generalized Inverted Index) indexes. This allows efficient per-person lookups across all attendance records without sequential scans.

CREATE INDEX idx_event_attendance_attendee_ids
  ON event_attendance USING GIN (attendee_ids);

CREATE INDEX idx_small_group_attendance_attendee_ids
  ON small_group_attendance USING GIN (attendee_ids);

Query Example

-- Find all events person #42 attended
SELECT event_id, attendee_ids, created_at
FROM event_attendance
WHERE attendee_ids @> '42'::jsonb;

-- EF Core equivalent (via Npgsql)
.Where(a => EF.Functions.JsonContains(a.AttendeeIds, personId))

Permanent Groups

The church_group.is_permanent flag (boolean, default false) prevents a group's memberships from ever being archived. This is used for groups where historical membership is operationally important:

  • Safety & security teams (must always know who had access)
  • Staff rosters (HR/compliance requirement)
  • Leadership teams (historical accountability)
← Back to Documentation Home

Archival Script Overview

The archival engine is a single SQL script (scripts/archive_old_data.sql) that performs three operations inside one transaction. Script file: ~138 lines of annotated SQL.

Execution Flow

Cron fires
Sunday 3 AM
BEGIN
1. Upsert checkins
→ event_attendance
2. Delete old
registrations
3. Archive inactive
memberships
COMMIT
VACUUM FULL
OperationSourceDestinationAction
1. Checkins checkins event_attendance.attendee_ids Upsert (merge person IDs into JSONB array), then delete
2. Registrations registrations none Delete only — transient intent data
3. Memberships group_membership church_group.archived_member_ids Merge into group's JSONB array, then delete

Operation 1: Checkin Archival (Upsert)

The most complex operation. Uses PostgreSQL INSERT ... ON CONFLICT DO UPDATE to safely merge person IDs into the event_attendance.attendee_ids array, even when a row already exists (e.g., from a previous partial archival or manual attendance save).

INSERT INTO event_attendance (event_id, attendee_ids, notes, created_at)
SELECT
    c.event_id,
    jsonb_agg(DISTINCT c.person_id ORDER BY c.person_id),
    NULL,
    now()
FROM checkins c
JOIN events e ON e.id = c.event_id
WHERE e.starts_at < cutoff_date
GROUP BY c.event_id
ON CONFLICT (event_id) DO UPDATE SET
    attendee_ids = (
        SELECT jsonb_agg(DISTINCT val ORDER BY val)
        FROM (
            SELECT jsonb_array_elements(event_attendance.attendee_ids)::integer AS val
            UNION
            SELECT jsonb_array_elements(EXCLUDED.attendee_ids)::integer AS val
        ) merged
    );

How the Merge Works

  1. GROUP BY c.event_id — aggregates all checkins for each event into one row
  2. jsonb_agg(DISTINCT ...) — creates a deduplicated, sorted JSONB array of person IDs
  3. ON CONFLICT — if the event already has an attendance row, merge arrays via UNION
  4. After upsert, delete the source checkin rows
Idempotent by Design The UNION-based merge is idempotent. Running the archival script twice produces the same result — duplicate person IDs are deduplicated by the DISTINCT clause. This is important for operational safety.

Operation 2: Registration Cleanup

Registrations are deleted without archival. A registration represents intent to attend an event. Once the event has passed, the meaningful data is the check-in record (actual attendance), not the registration. Historical questions like "who attended VBS 2025?" are answered by the event_attendance JSONB data, not by registrations.

DELETE FROM registrations
USING events e
WHERE registrations.event_id = e.id
  AND e.starts_at < cutoff_date;

Operation 3: Membership Archival

Inactive group memberships are archived by merging person IDs into the group's archived_member_ids JSONB column. Only memberships meeting all criteria are eligible:

  • membership_status = 'INACTIVE'
  • church_group.is_permanent = false
  • end_date IS NOT NULL
  • end_date < cutoff_date

This preserves a historical record of who was once in each group (queryable via JSONB containment) while removing rows from the relational table.

VACUUM & Disk Reclaim

After the transaction commits, VACUUM FULL runs on the three affected tables:

VACUUM FULL checkins;
VACUUM FULL registrations;
VACUUM FULL group_membership;

PostgreSQL's MVCC (Multi-Version Concurrency Control) model doesn't immediately reclaim disk space when rows are deleted — it marks them as "dead tuples" for later cleanup. VACUUM FULL rewrites the table file entirely, physically reclaiming the space. This takes an exclusive lock on each table, which is acceptable at 3 AM on a Sunday when no users are active.

VACUUM FULL runs outside the transaction It cannot run inside a transaction block. The script issues COMMIT first, then runs VACUUM FULL as separate statements. If the VACUUM fails (e.g., server crash), the data is still safely archived — only the disk space reclamation would need to be retried.

Scheduling

# Production cron (Ubuntu, runs as postgres user)
0 3 * * 0 sudo -u postgres psql -d myapp \
  -f /var/www/church-app/scripts/archive_old_data.sql \
  >> /var/log/church-archive.log 2>&1
ParameterValueRationale
ScheduleSunday 3:00 AMLowest traffic window; church services don't start until 9 AM
Cutoff2 monthsKeeps recent data for admin dashboards and activity views
Logging/var/log/church-archive.logAppend mode; RAISE NOTICE messages show row counts
UserpostgresSuperuser needed for VACUUM FULL
← Back to Documentation Home

EF Core & Npgsql Integration

The JSONB columns are mapped to List<long> in C# entities. Npgsql 8+ requires explicit opt-in for dynamic JSON serialization:

// Program.cs — required for List<long> JSONB columns
var dataSourceBuilder = new NpgsqlDataSourceBuilder(connectionString);
dataSourceBuilder.EnableDynamicJson();
var dataSource = dataSourceBuilder.Build();

builder.Services.AddDbContext<ChurchCheckInDbContext>(options =>
    options.UseNpgsql(dataSource));

Entity Mapping

// SmallGroupAttendance entity
public class SmallGroupAttendance
{
    public long Id { get; set; }
    public long SmallGroupId { get; set; }
    public DateOnly MeetingDate { get; set; }
    public List<long> AttendeeIds { get; set; } = [];  // ← JSONB
    public string? Notes { get; set; }
    public long? RecordedBy { get; set; }
}

Query Patterns

Per-Person Attendance History (JSONB Containment)

// Repository: Find all events a person attended
public async Task<IEnumerable<EventAttendance>> GetByPersonIdAsync(
    long personId, DateOnly? from, DateOnly? to)
{
    var query = _db.EventAttendances
        .Include(a => a.Event).ThenInclude(e => e.Location)
        .Where(a => EF.Functions.JsonContains(a.AttendeeIds, personId));

    if (from.HasValue) query = query.Where(...);
    if (to.HasValue) query = query.Where(...);

    return await query.OrderByDescending(a => a.CreatedAt).ToListAsync();
}

Upsert Attendance (Save or Update)

// Repository: Save/update attendance for a meeting date
var existing = await GetByDateAsync(smallGroupId, date);
if (existing != null)
{
    existing.AttendeeIds = dto.AttendeeIds;
    existing.Notes = dto.Notes;
    await UpdateAsync(existing);
}
else
{
    await AddAsync(new SmallGroupAttendance { ... });
}

Head Count (No COUNT(*) Needed)

// DTO includes computed HeadCount from array length
HeadCount = attendance.AttendeeIds.Count  // List<long>.Count in C#
← Back to Documentation Home

API Endpoints

The attendance data is exposed through dedicated REST endpoints:

EndpointAuthDescription
GET /api/attendance/event/{eventId}AdminGet event attendance (JSONB)
POST /api/attendance/event/{eventId}AdminSave/upsert event attendance
GET /api/attendance/person/{personId}AuthUnified person attendance history
GET /api/smallgroup/{id}/attendanceAdminSmall group attendance list
POST /api/smallgroup/{id}/attendanceAdminSave/upsert small group attendance
GET /api/smallgroup/{id}/attendance/person/{personId}AdminPer-person stats in a group

Proof of Concept: Demo Setup

A comprehensive data generation script (scripts/generate_checkin_demo_data.sql) was created to demonstrate the archival system at scale. The script generates 4 months of realistic church activity data across multiple event types, groups, and attendance patterns.

158
Events Generated
61,927
Checkins Generated
9,584
Registrations
19
Small Groups Active

For full details on the data generation plan, event categories, attendance rates, and group configurations, see the Data Generation Plan document.

Results & Metrics

The archival script was run with a 2-month cutoff against a database with ~5,800 people and 6 months of attendance history (Aug 2025 – Feb 2026). Events older than 2 months were archived.

Archival Operations

78,571
Checkins Archived
2,109
Registrations Deleted
171
JSONB Rows Created

Table Size: Before vs. After

Before Archival

TableRowsDataIndexesTotal
checkins112,1406,672 KB11 MB17 MB
registrations3,118160 KB568 KB728 KB
event_attendance3120 KB1,640 KB1,760 KB
All other tables26.5 MB
Total Database46 MB

After Archival + VACUUM FULL

TableRowsDataIndexesTotal
checkins33,5691,712 KB1,080 KB2.8 MB (-84%)
registrations1,00948 KB80 KB128 KB (-82%)
event_attendance171168 KB2,760 KB2,928 KB
All other tables26.5 MB
Total Database29 MB (-37%)

Key Achievements

MetricValue
Total database size reduction-37% (46 MB → 29 MB)
Checkins table reduction-84% (17 MB → 2.8 MB)
Checkins index reduction-90% (11 MB → 1,080 KB)
Registrations table reduction-82% (728 KB → 128 KB)
JSONB archive growth+1,168 KB (1,760 KB → 2,928 KB) — absorbs 78K rows with only 66% size increase
Row compression ratio (checkins)460:1 (78,571 rows → 171 JSONB rows)
Net disk space reclaimed17 MB (checkins + registrations freed, minus event_attendance growth)
Remaining hot data33,569 checkins + 1,009 registrations (most recent 2 months)
460:1 Row Reduction — Index Savings Are the Real Win 78,571 individual check-in rows were compressed into just 171 JSONB event_attendance rows. While the data itself shrank 74% (6,672 KB → 1,712 KB), the indexes shrank 90% (11 MB → 1,080 KB). In production databases, indexes often consume more space than the data itself — every row needs primary key, foreign key, and query-optimization index entries. The JSONB format eliminates all per-row index overhead, storing only the essential attendance data.
← Back to Documentation Home

Long-Term Projections

Based on measured results (~225,000 check-ins/year across ~500 events for a mid-size church with ~5,800 people), here are projected impacts over time:

TimeframeWithout Archival (checkins table)With Archival (hot rows + JSONB)JSONB Archive SizeSavings
1 year ~225K rows (~34 MB with indexes) ~37K hot rows (~5 MB) ~500 JSONB rows (~3 MB) ~76%
3 years ~675K rows (~102 MB with indexes) ~37K hot rows (~5 MB) ~1,500 JSONB rows (~8 MB) ~87%
5 years ~1.1M rows (~170 MB with indexes) ~37K hot rows (~5 MB) ~2,500 JSONB rows (~13 MB) ~89%
10 years ~2.25M rows (~340 MB with indexes) ~37K hot rows (~5 MB) ~5,000 JSONB rows (~25 MB) ~91%
Asymmetric Growth — The Key Insight Without archival, the checkins table grows at ~225K rows/year (data plus 2–3 indexes per row). With archival, the hot table stays constant (~2 months of data), and the JSONB archive adds only ~500 rows/year (one per event, not per person). After 10 years, a 2.25M-row table that would consume ~340 MB is replaced by a fixed 5 MB hot table plus a compact 25 MB JSONB archive — 91% less storage with zero loss of queryable data.

Additional benefits beyond raw storage:

  • Faster backups — smaller tables mean faster pg_dump and smaller backup files
  • Faster VACUUM — routine maintenance operates on a smaller working set
  • Faster kiosk check-in — the checkins table stays small, keeping duplicate detection fast
  • Consistent query performance — the hot tables never grow beyond ~2 months of data
← Back to Documentation Home