Synaptic Bastion — HA Mission Control

SYNAPTIC BASTION

HIGH AVAILABILITY MISSION CONTROL

--:--:-- UTC

■OFFLINE

SETTINGS

COLOR THEME — GAME

COLOR THEME — REAL MILITARY

TEXT SIZE

SYNAPTIC BASTION

HA MISSION CONTROL — CHAPTER TACTICAL NETWORK

HIGH AVAILABILITY MISSION CONTROL · MUHAMMAD AMIN BIN ABD RAHMAN

CHAPTER COMMAND INTERFACE v6.0 // CLEARANCE REQUIRED

-INITIALIZING CLUSTER INTERFACE-

AWAITING CHAPTER RELAY ADDRESS...

Hetzner Public IP

SYNAPTIC-BASTION.MUHDAMINRAHMAN.COM · HTTPS · HA ENTRY POINT

Cluster health

—

of — nodes online

Avg latency

—

last poll

DB replicas

—

streaming WAL

Patroni leader

—

timeline —

Poll requests

—

— success rate

Next poll

—

never polled

Connecting to cluster...

NODE LATENCY — 5 MIN WINDOW

EVENT LOG

—

>>SYSTEM READY. AWAITING PROXY CONNECTION.

Nginx access log last 30 app requests through LB

▼

RELIABILITY TARGETS

METRIC	TARGET	RESULT	TIME
System Uptime	≥99%	—	—
RTO	<5s	—	—
RPO	0s	—	—
DB Failover	<30s (Patroni)	—	—
Failure Detection	≤5s	—	—
Auto-Recovery	100%	—	—
DB Replicas	≥2	—	—

PERFORMANCE TARGETS

METRIC	TARGET	RESULT	TIME
Latency p50	≤180ms	—	—
Latency p95	≤250ms	—	—
Throughput	≥1000 req/s	—	—
Replication Lag	<50ms	—	—
Lag (peak)	<100ms	—	—

PHASE 6 — FAILOVER & VALIDATION

TESTS PENDING

EVIDENCE LOG

▼

NO EVIDENCE — RUN TESTS TO POPULATE

MANUAL PROBE

Node

Path

AWAITING INPUT

Configuration

Requests (n)

Concurrency (c)

Endpoint

Runs server-side via proxy — accurate results.

Results

Total

—

Success

—

Failed

—

Rate

—

Avg ms

—

Req/sec

—

Benchmark history

Time	n	c	Path	Success%	Avg ms	P50	P95	P99	Req/sec
No runs yet.

WAL STREAMING TOPOLOGY

—

Connect to load topology

STREAMING REPLICAS

No data — click VERIFY

REPLICATION LAG

—

target <50ms

RPO

0s

synchronous WAL

REPLICAS ONLINE

—

of — configured

PATRONI CLUSTER

—

Click REFRESH to load Patroni status

FAILURE INJECTION

⚠ APP KILL — SIGKILL ha-app process

Kills the C++ HTTP server. systemd auto-restarts within ~2s. Tests Nginx failover and service recovery. Does NOT affect PostgreSQL.

Connect to load nodes

⚠ DB KILL — Stop Patroni on leader (triggers election)

Stops Patroni on the selected node. If it is the leader, Patroni elects a new primary within ~15s. A RESTORE button appears next to any unreachable node — click it to bring Patroni back and rejoin the cluster.

Connect to load nodes

RECOVERY LOG

No chaos events yet.

MANUAL SERVICE COMMANDS

▼

Node details

Navigate to this page while connected to load node details.

ENLISTMENT GUIDE

HOW TO ADD A NEW NODE ▼

01

CREATE VPS

Create a new VPS on Hetzner (or any provider).

Recommended: CX22 — Ubuntu 22.04, 2 vCPU, 4GB RAM.

Note the public IP and root password from the provider panel.

02

FILL THE FORM

Click ENLIST NODE below and fill in:

• Node ID — unique name (e.g. hetzner4)
• IP — public IP from step 01
• SSH user — root
• SSH password — root password from provider
• DB role — Master or Replica
• DB host — master's Tailscale IP (replicas only)

03

PROVISION & AUTHORISE

Click PROVISION & ENLIST and watch the progress.

When the Tailscale step appears, a URL will be shown. Open it in your browser to authorise the node on your Tailscale network.

Provisioning resumes automatically. All 11 steps complete in ~3 minutes.

After enlisting, configure etcd + Patroni for automatic DB failover via the Replication page.

WHAT IS AUTOMATED

✓ SSH key deployment

✓ Build dependencies (g++, libpqxx)

✓ Tailscale installation

✓ C++ app compilation

✓ systemd service setup

✓ PostgreSQL install & configuration

✓ Replication setup (master or replica)

✓ Nginx upstream update

✓ Keepalived LB failover (Hetzner 1 ↔ 2)

✓ Health check 503 on DB failure

✓ Health check verification

✓ Node registration

DB ROLE: MASTER

Sets up PostgreSQL with WAL replication enabled.
Creates appuser and appdb.
Opens port 5432 on Tailscale interface.
DB host: leave blank

DB ROLE: REPLICA

Takes a base backup from master.
Streams WAL automatically.
Read-only — serves /data reads.
DB host: master's Tailscale IP

DB ROLE: NONE

App-only node — no PostgreSQL.
Only runs the C++ HTTP server.
Useful for pure compute/LB nodes.
DB host: not required

NODE ROSTER

—

DOUBLE CHECK METHOD — FAILURE DETECTION

▼

Outpost monitors external websites with a two-stage verification pipeline before declaring a failure — based on Naim, M.H. et al. (2025), "Double Check Method: An Enhancement of Heartbeat Failure Detection by Fog Devices Through Socket and Port Engagement" (SSRN 5099955). This reduces false-positive failure detection compared to single-shot heartbeat monitoring.

01

HEARTBEAT

Periodic HTTP request to the target URL. If it succeeds — healthy, no further action. If it fails, verification begins.

02

TIME CHECK

Wait a debounce window, then retry the heartbeat. If it recovers within the threshold — transient blip, false positive filtered. No alert raised.

03

QUORUM SOCKET CHECK

If still failing, every cluster node independently opens a raw TCP socket to host:port — not just one device.

ENHANCEMENT — DISTRIBUTED QUORUM VERIFICATION

The original method performs the socket check from a single fog device — leaving it vulnerable to a false DOWN caused by that device's own localised network path, not the target. This implementation extends the method: the socket check runs from every cluster node in parallel (each acting as an independent fog device), and a verdict requires quorum agreement.

All nodes agree reachable → DEGRADED (confirmed app-layer issue, network fine everywhere) · All nodes agree unreachable → DOWN (confirmed outage, high confidence) · Split result → ambiguous, debounce window doubles and the quorum re-checks once before a majority-vote decision is made.

ADD OUTPOST

0 MONITORED

Name

URL

Port override (optional)

Check interval (sec)

MONITORED TARGETS

No outposts yet — add one above.