My Philosophy on Alerting, based my observations while I was a Site Reliability Engineer at Google
Author: Rob Ewaschuk [email protected]
Link : Google Docs
这是最近比较火的开源监控架构Prometheus在Alerting Practices上的推荐阅读,见http://prometheus.io/docs/practices/alerting/
中心思想:
Keep alerting simple, alert on symptoms, have good consoles to allow pinpointing causes, and avoid having pages where there is nothing to do.
读后感:
任何知识都是从知识到技能,最后达到方法论,OP的技能也不外如此。OP们,搞好报警,过个好年吧
1
dcoder 2015-02-20 02:14:10 +08:00
看了下 prometheus
visual 是 rails + SQL, 感觉不如流行的 ElasticSearch+Kibana 给力呢 http://prometheus.io/docs/visualization/promdash/ 顺便问一下, 他这个 storage 是 levelDB 的, 容易 horizontal scale out 吗 http://prometheus.io/docs/operating/storage/ |