--:--:-- UTC
OFFLINE
SETTINGS
SYNAPTIC BASTION
HA MISSION CONTROL — CHAPTER TACTICAL NETWORK
HIGH AVAILABILITY MISSION CONTROL · MUHAMMAD AMIN BIN ABD RAHMAN
CHAPTER COMMAND INTERFACE v6.0 // CLEARANCE REQUIRED
-INITIALIZING CLUSTER INTERFACE-
AWAITING CHAPTER RELAY ADDRESS...
SYNAPTIC-BASTION.MUHDAMINRAHMAN.COM · HTTPS · HA ENTRY POINT
Cluster health
of — nodes online
Avg latency
last poll
DB replicas
streaming WAL
Patroni leader
timeline —
Poll requests
— success rate
Next poll
never polled
Connecting to cluster...
NODE LATENCY — 5 MIN WINDOW
EVENT LOG
>>SYSTEM READY. AWAITING PROXY CONNECTION.
Nginx access log last 30 app requests through LB
RELIABILITY TARGETS
METRICTARGETRESULTTIME
System Uptime≥99%
RTO<5s
RPO0s
DB Failover<30s (Patroni)
Failure Detection≤5s
Auto-Recovery100%
DB Replicas≥2
PERFORMANCE TARGETS
METRICTARGETRESULTTIME
Latency p50≤180ms
Latency p95≤250ms
Throughput≥1000 req/s
Replication Lag<50ms
Lag (peak)<100ms
PHASE 6 — FAILOVER & VALIDATION
TESTS PENDING
EVIDENCE LOG
NO EVIDENCE — RUN TESTS TO POPULATE
MANUAL PROBE
AWAITING INPUT
Configuration
Runs server-side via proxy — accurate results.
Results
Total
Success
Failed
Rate
Avg ms
Req/sec
Benchmark history
TimencPathSuccess%Avg msP50P95P99Req/sec
No runs yet.
WAL STREAMING TOPOLOGY
Connect to load topology
STREAMING REPLICAS
No data — click VERIFY
REPLICATION LAG
target <50ms
RPO
0s
synchronous WAL
REPLICAS ONLINE
of configured
PATRONI CLUSTER
Click REFRESH to load Patroni status
FAILURE INJECTION
⚠ APP KILL — SIGKILL ha-app process
Kills the C++ HTTP server. systemd auto-restarts within ~2s. Tests Nginx failover and service recovery. Does NOT affect PostgreSQL.
Connect to load nodes
⚠ DB KILL — Stop Patroni on leader (triggers election)
Stops Patroni on the selected node. If it is the leader, Patroni elects a new primary within ~15s. A RESTORE button appears next to any unreachable node — click it to bring Patroni back and rejoin the cluster.
Connect to load nodes
RECOVERY LOG
No chaos events yet.
MANUAL SERVICE COMMANDS
Node details
Navigate to this page while connected to load node details.
ENLISTMENT GUIDE
HOW TO ADD A NEW NODE
01
CREATE VPS
Create a new VPS on Hetzner (or any provider).

Recommended: CX22 — Ubuntu 22.04, 2 vCPU, 4GB RAM.

Note the public IP and root password from the provider panel.
02
FILL THE FORM
Click ENLIST NODE below and fill in:

Node ID — unique name (e.g. hetzner4)
IP — public IP from step 01
SSH userroot
SSH password — root password from provider
DB role — Master or Replica
DB host — master's Tailscale IP (replicas only)
03
PROVISION & AUTHORISE
Click PROVISION & ENLIST and watch the progress.

When the Tailscale step appears, a URL will be shown. Open it in your browser to authorise the node on your Tailscale network.

Provisioning resumes automatically. All 11 steps complete in ~3 minutes.

After enlisting, configure etcd + Patroni for automatic DB failover via the Replication page.
WHAT IS AUTOMATED
✓ SSH key deployment
✓ Build dependencies (g++, libpqxx)
✓ Tailscale installation
✓ C++ app compilation
✓ systemd service setup
✓ PostgreSQL install & configuration
✓ Replication setup (master or replica)
✓ Nginx upstream update
✓ Keepalived LB failover (Hetzner 1 ↔ 2)
✓ Health check 503 on DB failure
✓ Health check verification
✓ Node registration
DB ROLE: MASTER
Sets up PostgreSQL with WAL replication enabled.
Creates appuser and appdb.
Opens port 5432 on Tailscale interface.
DB host: leave blank
DB ROLE: REPLICA
Takes a base backup from master.
Streams WAL automatically.
Read-only — serves /data reads.
DB host: master's Tailscale IP
DB ROLE: NONE
App-only node — no PostgreSQL.
Only runs the C++ HTTP server.
Useful for pure compute/LB nodes.
DB host: not required
NODE ROSTER
DOUBLE CHECK METHOD — FAILURE DETECTION
Outpost monitors external websites with a two-stage verification pipeline before declaring a failure — based on Naim, M.H. et al. (2025), "Double Check Method: An Enhancement of Heartbeat Failure Detection by Fog Devices Through Socket and Port Engagement" (SSRN 5099955). This reduces false-positive failure detection compared to single-shot heartbeat monitoring.
01
HEARTBEAT
Periodic HTTP request to the target URL. If it succeeds — healthy, no further action. If it fails, verification begins.
02
TIME CHECK
Wait a debounce window, then retry the heartbeat. If it recovers within the threshold — transient blip, false positive filtered. No alert raised.
03
QUORUM SOCKET CHECK
If still failing, every cluster node independently opens a raw TCP socket to host:port — not just one device.
ENHANCEMENT — DISTRIBUTED QUORUM VERIFICATION
The original method performs the socket check from a single fog device — leaving it vulnerable to a false DOWN caused by that device's own localised network path, not the target. This implementation extends the method: the socket check runs from every cluster node in parallel (each acting as an independent fog device), and a verdict requires quorum agreement.

All nodes agree reachable → DEGRADED (confirmed app-layer issue, network fine everywhere) · All nodes agree unreachable → DOWN (confirmed outage, high confidence) · Split result → ambiguous, debounce window doubles and the quorum re-checks once before a majority-vote decision is made.
ADD OUTPOST
0 MONITORED
MONITORED TARGETS
No outposts yet — add one above.