Monitoring & Status

Back to bostrom infrastructure

Public Status Dashboard

URL: https://cybernode.ai
The status page shows real-time health of all public services with 90-day uptime history.

What's Monitored

Endpoint Health

Endpoint	Check
RPC	HTTP 200 + valid response
LCD	HTTP 200 + valid JSON
GraphQL	Query execution success
IPFS Gateway	Content retrieval
cyb.ai	Page load + content check

SSL Certificates
- All endpoints are monitored for SSL certificate expiry with alerts 30 days before expiration.
Blockchain Sync
- Block height is monitored to detect if nodes fall behind the network.
IBC Relayer
- Wallet balances and packet relay success rate are monitored.

Dashboards

Public Grafana dashboards (no login required):
- HTTPS Endpoints Status: https://cybernode.ai/grafana/public-dashboards/48ffa0bb018e424bb6aa71c2bcab42c9
- Real-time HTTP probe results for all public endpoints

Metrics Stack

Component	Purpose
Prometheus	Time-series metrics collection
Grafana	Visualization and alerting
Blackbox Exporter	HTTP/SSL endpoint probing
Node Exporter	Server hardware metrics

Alert Categories

Infrastructure Alerts
- Disk space, RAM, CPU, system load
- Block counter stalls (node not producing blocks)
- ZFS pool health
- GPU status (required for consensus)
Service Alerts
- API endpoint availability
- SSL certificate expiry
- IPFS gateway responsiveness
- IBC relayer wallet balance

Uptime Targets

Service	Target
RPC/LCD	99.9%
GraphQL	99.5%
IPFS Gateway	99%
cyb.ai	99.9%

Incident Response

Alerts are routed to the infrastructure team via Telegram.
Critical services auto-restart on failure.
ZFS snapshots enable quick rollback if needed.

Historical Data

Prometheus retains 90 days of metrics history, enabling:
- Trend analysis
- Capacity planning
- Post-incident investigation

Related

Linked References

Local Graph