Zero-Downtime Swap Resync

The swap resync strategy enables table resyncs without read downtime on the destination. This is critical for production systems where table availability must be maintained during data refresh operations.

The Problem with Truncate Resync

The default truncate strategy has a significant downside:

-- During truncate resync
TRUNCATE TABLE public.users;
-- Table is now EMPTY
 
-- Copy data (may take minutes or hours for large tables)
COPY public.users FROM ...;
-- Queries return no results during this entire period

Impact:

Read queries fail or return empty results
Applications show missing data
Downtime scales with table size

For a 10GB table taking 5 minutes to copy, your application has 5 minutes of data unavailability.

The Swap Strategy Solution

The swap strategy eliminates this downtime by using a shadow table and atomic rename:

-- Create shadow table
CREATE TABLE public.users_resync (LIKE public.users INCLUDING ALL);
 
-- Copy to shadow (original table still serves reads)
COPY public.users_resync FROM ...;
 
-- Build indexes on shadow
CREATE INDEX idx_users_email ON public.users_resync(email);
 
-- Atomic swap (< 100ms downtime)
BEGIN;
ALTER TABLE public.users RENAME TO users_old;
ALTER TABLE public.users_resync RENAME TO users;
DROP TABLE public.users_old;
COMMIT;

Result:

Original table remains available during copy
Swap takes milliseconds
Queries never see empty data

How Swap Resync Works

Create Shadow Table

BunnyDB creates a temporary table with suffix _resync:

CREATE TABLE public.users_resync (
  LIKE public.users INCLUDING DEFAULTS INCLUDING CONSTRAINTS
);

This table has the same schema but is initially empty.

Copy Data to Shadow Table

Data is copied from source to the shadow table using PostgreSQL’s COPY:

-- Parallel workers copy partitions
COPY public.users_resync FROM PROGRAM '...' WITH (FORMAT binary);

During this phase, which may take minutes or hours, the original public.users table continues to serve read queries with existing data.

For large tables, BunnyDB partitions data based on partition_key and uses snapshot_max_parallel_workers to parallelize the copy.

Build Indexes on Shadow Table

If replicate_indexes: true, indexes are created on the shadow table:

CREATE INDEX idx_users_email ON public.users_resync(email);
CREATE INDEX idx_users_created_at ON public.users_resync(created_at);
CREATE UNIQUE INDEX idx_users_username ON public.users_resync(username);

This can take significant time for large tables but doesn’t affect the original table.

Pause CDC

Before swapping, CDC replication is paused to prevent conflicting writes:

-- Last LSN position is recorded
SELECT pg_current_wal_lsn();

⚠️

Changes on source accumulate in the replication slot during swap. For high-write tables, the swap should complete quickly to avoid excessive WAL buildup.

Atomic Table Swap

The critical step: rename tables atomically within a single transaction:

BEGIN;
 
-- Rename old table to _old
ALTER TABLE public.users RENAME TO users_old;
 
-- Rename shadow table to production name
ALTER TABLE public.users_resync RENAME TO users;
 
-- Drop old table
DROP TABLE public.users_old;
 
COMMIT;

Key properties:

The entire swap is atomic (all-or-nothing)
Other transactions are briefly blocked during the rename (<100ms typically)
Queries never see an empty or missing table
No data loss occurs

Recreate Foreign Keys

If replicate_foreign_keys: true, foreign keys are recreated to reference the new table:

-- Drop FK on referencing table
ALTER TABLE orders DROP CONSTRAINT fk_orders_user_id;
 
-- Recreate FK pointing to new table
ALTER TABLE orders ADD CONSTRAINT fk_orders_user_id
  FOREIGN KEY (user_id) REFERENCES public.users(id);

Resume CDC

CDC restarts from the preserved LSN, replaying changes that occurred during swap:

-- Replication slot reconnects
SELECT * FROM pg_logical_slot_get_changes('bunny_prod_to_staging', ...);

Configuring Swap Resync

Set the resync_strategy parameter when creating a mirror:

curl -X POST http://localhost:8112/v1/mirrors \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "prod_to_staging",
    "source_peer": "source_db",
    "destination_peer": "dest_db",
    "resync_strategy": "swap",
    "do_initial_snapshot": true,
    "replicate_indexes": true,
    "replicate_foreign_keys": true,
    "table_mappings": [
      {
        "source_table": "public.users",
        "destination_table": "public.users",
        "partition_key": "id"
      }
    ]
  }'

The resync_strategy applies to both table-level resyncs and full mirror resyncs. It cannot be changed after mirror creation.

Triggering Swap Resync

Once configured, all resync operations use the swap strategy:

Table-Level Resync

curl -X POST http://localhost:8112/v1/mirrors/prod_to_staging/resync/public.users \
  -H "Authorization: Bearer YOUR_TOKEN"

Full Mirror Resync

curl -X POST http://localhost:8112/v1/mirrors/prod_to_staging/resync \
  -H "Authorization: Bearer YOUR_TOKEN"

Trade-offs

Advantages

Benefit	Description
Zero read downtime	Original table remains queryable during copy
No empty result sets	Queries always return data (old or new)
Production-safe	Minimal impact on running applications
Atomic cutover	Clean transition with no partial states

Disadvantages

Trade-off	Description
2x disk space	Shadow table doubles storage requirement temporarily
Slower execution	Creating shadow table + indexes + swap overhead
Brief lock during swap	Table briefly locked during rename (typically <100ms)
WAL accumulation	CDC paused during swap, WAL grows on source

Performance Considerations

Disk Space Requirements

Ensure sufficient disk space before swap resync:

-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('public.users'));

You need at least 2x this size available:

Original table: 10GB
Shadow table during copy: 10GB
Total: 20GB

After swap completes and old table is dropped, space returns to 10GB.

Optimizing Swap Speed

Partition large tables:

{
  "source_table": "public.users",
  "destination_table": "public.users",
  "partition_key": "id"
}

BunnyDB divides the table by partition_key and copies partitions in parallel.

Increase parallel workers:

{
  "snapshot_max_parallel_workers": 16,
  "snapshot_num_rows_per_partition": 500000
}

Disable indexes temporarily (advanced):

For extremely large tables, consider setting replicate_indexes: false, performing the swap, then manually creating indexes afterward.

Minimizing Lock Duration

The table rename lock is typically very brief, but you can minimize it further:

Run resync during low-traffic periods
Ensure no long-running queries hold locks on the table
Monitor active queries before swap:

SELECT pid, query, state, query_start
FROM pg_stat_activity
WHERE query ILIKE '%users%' AND state = 'active';

Example: Production Resync

Suppose you have a 50GB users table that needs resync:

# 1. Check disk space
psql -h dest_db -d production -c "
  SELECT pg_size_pretty(pg_total_relation_size('public.users'));
"
# Output: 50 GB
 
# Ensure at least 100 GB free on destination
 
# 2. Trigger swap resync
curl -X POST http://localhost:8112/v1/mirrors/prod_to_staging/resync/public.users \
  -H "Authorization: Bearer YOUR_TOKEN"
 
# 3. Monitor progress
while true; do
  STATUS=$(curl -s http://localhost:8112/v1/mirrors/prod_to_staging \
    -H "Authorization: Bearer YOUR_TOKEN" | jq -r '.status')
 
  echo "$(date): Status = $STATUS"
 
  if [ "$STATUS" == "RUNNING" ]; then
    echo "Resync complete!"
    break
  fi
 
  sleep 30
done
 
# 4. Verify row count
psql -h source_db -d production -c "SELECT COUNT(*) FROM public.users;"
psql -h dest_db -d production -c "SELECT COUNT(*) FROM public.users;"

Timeline:

T+0m: Resync triggered, shadow table created
T+5m: Copying data (original table still serving reads)
T+40m: Building indexes on shadow table
T+45m: Atomic swap (table locked for ~50ms)
T+45m: CDC resumes, replaying buffered changes
T+46m: Fully caught up

Downtime: <100ms during atomic swap. Application users experience no data unavailability.

Monitoring Swap Resync

Check Mirror Status

curl -X GET http://localhost:8112/v1/mirrors/prod_to_staging \
  -H "Authorization: Bearer YOUR_TOKEN"

During shadow table copy:

{
  "status": "RESYNCING",
  "current_phase": "SNAPSHOT"
}

During atomic swap:

{
  "status": "RESYNCING",
  "current_phase": "SWAP"
}

Monitor Destination Disk Usage

# Check disk space during resync
psql -h dest_db -d production -c "
  SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename))
  FROM pg_tables
  WHERE tablename IN ('users', 'users_resync')
  ORDER BY tablename;
"

Output during resync:

 schemaname | tablename      | pg_size_pretty
------------+----------------+----------------
 public     | users          | 50 GB
 public     | users_resync   | 48 GB (growing)

Monitor WAL Accumulation on Source

-- On source database
SELECT slot_name,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS wal_lag
FROM pg_replication_slots
WHERE slot_name = 'bunny_prod_to_staging';

If WAL lag grows too large, source disk may fill. Ensure adequate space.

Comparison: Truncate vs Swap

Aspect	Truncate	Swap
Read downtime	Minutes to hours	<100ms
Disk space	1x table size	2x table size (temporary)
Execution speed	Faster	Slower
Complexity	Simple	Complex (atomic swap)
Production safety	❌ Not safe	✅ Production-safe
Use case	Staging, development	Production, critical systems

Best Practices

Always Use Swap for Production

For any system serving user traffic:

{
  "resync_strategy": "swap"
}

Monitor Disk Space Alerts

Set up alerts for low disk space on destination:

# Alert if free space < 20%
df -h /var/lib/postgresql/data

Test Swap Timing

Before production resync, test on staging to estimate duration:

# Staging test
time curl -X POST http://localhost:8112/v1/mirrors/staging_mirror/resync/public.users \
  -H "Authorization: Bearer YOUR_TOKEN"

This gives you expected swap duration for production planning.

Schedule During Low Traffic

Even though swap has minimal downtime, schedule resyncs during low-traffic windows:

Reduced WAL accumulation
Faster swap (fewer active queries)
Lower risk of lock contention

Troubleshooting

Swap Fails with “Insufficient Disk Space”

Cause: Not enough space for shadow table.

Solution: Free up space or use truncate strategy for this one-time resync:

# Cannot change strategy for existing mirror
# Must drop and recreate with truncate, or free space

Swap Hangs During Rename

Cause: Long-running query holds lock on table.

Solution: Identify and terminate blocking query:

-- Find blocking queries
SELECT pid, query, state, query_start
FROM pg_stat_activity
WHERE query ILIKE '%users%' AND state = 'active';
 
-- Terminate if safe
SELECT pg_terminate_backend(12345);

Then retry resync.

Shadow Table Exists After Failed Swap

Cause: Swap failed partway through, leaving users_resync table.

Solution: Manually drop shadow table:

DROP TABLE IF EXISTS public.users_resync;

Then retry resync.

Next Steps

Table-Level Resync - Understand resync operations
Creating Your First Mirror - Configure resync strategy
Pause & Resume - Control mirror execution

Schema Sync User Management & RBAC