Monitoring & Status
- Back to bostrom infrastructure
Public Status Dashboard
- URL: https://cybernode.ai
- The status page shows real-time health of all public services with 90-day uptime history.
What's Monitored
| Endpoint | Check |
|---|---|
| RPC | HTTP 200 + valid response |
| LCD | HTTP 200 + valid JSON |
| GraphQL | Query execution success |
| IPFS Gateway | Content retrieval |
| cyb.ai | Page load + content check |
-
SSL Certificates
- All endpoints are monitored for SSL certificate expiry with alerts 30 days before expiration.
-
Blockchain Sync
- Block height is monitored to detect if nodes fall behind the network.
-
IBC Relayer
- Wallet balances and packet relay success rate are monitored.
Dashboards
- Public Grafana dashboards (no login required):
- HTTPS Endpoints Status: https://cybernode.ai/grafana/public-dashboards/48ffa0bb018e424bb6aa71c2bcab42c9
- Real-time HTTP probe results for all public endpoints
Metrics Stack
| Component | Purpose |
|---|---|
| Prometheus | Time-series metrics collection |
| Grafana | Visualization and alerting |
| Blackbox Exporter | HTTP/SSL endpoint probing |
| Node Exporter | Server hardware metrics |
Alert Categories
-
Infrastructure Alerts
- Disk space, RAM, CPU, system load
- Block counter stalls (node not producing blocks)
- ZFS pool health
- GPU status (required for consensus)
-
Service Alerts
- API endpoint availability
- SSL certificate expiry
- IPFS gateway responsiveness
- IBC relayer wallet balance
Uptime Targets
| Service | Target |
|---|---|
| RPC/LCD | 99.9% |
| GraphQL | 99.5% |
| IPFS Gateway | 99% |
| cyb.ai | 99.9% |
Incident Response
- Alerts are routed to the infrastructure team via Telegram.
- Critical services auto-restart on failure.
- ZFS snapshots enable quick rollback if needed.
Historical Data
- Prometheus retains 90 days of metrics history, enabling:
- Trend analysis
- Capacity planning
- Post-incident investigation