-
Monitoring & Status
-
Public Status Dashboard
- URL: https://cybernode.ai
- The status page shows real-time health of all public services with 90-day uptime history.
-
Whatβs Monitored
-
Endpoint Health
-
| Endpoint | Check |
|---|
| RPC | HTTP 200 + valid response |
| LCD | HTTP 200 + valid JSON |
| GraphQL | Query execution success |
| IPFS Gateway | Content retrieval |
| cyb.ai | Page load + content check |
-
SSL Certificates
- All endpoints are monitored for SSL certificate expiry with alerts 30 days before expiration.
-
Blockchain Sync
- Block height is monitored to detect if nodes fall behind the network.
-
IBC Relayer
- Wallet balances and packet relay success rate are monitored.
-
Dashboards
- Public Grafana dashboards (no login required):
-
Metrics Stack
-
| Component | Purpose |
|---|
| Prometheus | Time-series metrics collection |
| Grafana | Visualization and alerting |
| Blackbox Exporter | HTTP/SSL endpoint probing |
| Node Exporter | Server hardware metrics |
-
Alert Categories
-
Infrastructure Alerts
- Disk space, RAM, CPU, system load
- Block counter stalls (node not producing blocks)
- ZFS pool health
- GPU status (required for consensus)
-
Service Alerts
- API endpoint availability
- SSL certificate expiry
- IPFS gateway responsiveness
- IBC relayer wallet balance
-
Uptime Targets
-
| Service | Target |
|---|
| RPC/LCD | 99.9% |
| GraphQL | 99.5% |
| IPFS Gateway | 99% |
| cyb.ai | 99.9% |
-
Incident Response
- Alerts are routed to the infrastructure team via Telegram.
- Critical services auto-restart on failure.
- ZFS snapshots enable quick rollback if needed.
-
Historical Data
- Prometheus retains 90 days of metrics history, enabling:
- Trend analysis
- Capacity planning
- Post-incident investigation
-