Date: Sun, 11 Dec 2016 20:35:32 +0100 From: Ed Schouten <ed@nuxi.nl> To: hackers@freebsd.org Subject: Sysctl as a Service, or: making sysctl(3) more friendly for monitoring systems Message-ID: <CABh_MKk87hJTsu1ETX8Ffq9E8gqRPELeSEKzf1jKk_wwUROgAw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hi there, The last couple of months I've been playing around with a monitoring system called Prometheus (https://prometheus.io/). In short, Prometheus works like this: ----- If you already know Prometheus, skip this ----- 1. For the thing you want to monitor, you either integrate the Prometheus client library into your codebase or you write a separate exporter process. The client library or the exporter process then exposes key metrics of your application over HTTP. Simplified example: $ curl http://localhost:12345/metrics # HELP open_file_descriptors The number of files opened by the process # TYPE open_file_descriptors gauge open_file_descriptors 12 # HELP http_requests The number of HTTP requests received. # TYPE http_requests counter http_requests{result="2xx"} 100 http_requests{result="4xx"} 14 http_requests{result="5xx"} 0 2. You fire op Prometheus and configure it to scrape and store all of the things you want to monitor. Prometheus can then add more labels to the metrics it scrapes. So the example above may get transformed by Prometheus to look like this: open_file_descriptors{job="nginx",instance="web1.mycompany.com"} 12 http_requests{job="nginx",instance="web1.mycompany.com",result="2xx"} 100 http_requests{job="nginx",instance="web1.mycompany.com",result="4xx"} 14 http_requests{job="nginx",instance="web1.mycompany.com",result="5xx"} 0 Fun fact: Prometheus can also scrape Prometheus, so if you operate multiple datacenters, you can let a global instance scrape a per-DC instance and add a dc="..." label to all metrics. 3. After scraping data for some time, you can do fancy queries like these: - Compute the 5-minute rate of HTTP requests per server and per HTTP error code: rate(http_requests[5m]) - Compute the 5-minute rate of all HTTP requests on the entire cluster: sum(rate(http_requests[5m])) - Same as the above, but aggregate by HTTP error code: sum(rate(http_requests[5m])) by (result) Prometheus can do alerting as well by using these expressions as matchers. 4. Set up Grafana and voila: you can create fancy dashboards! ----- If you skipped the introduction, start reading here ----- The Prometheus folks have developed a tool called the node_exporter (https://github.com/prometheus/node_exporter). Basically it extracts a whole bunch of interesting system-related metrics (disk usage, network I/O, etc) through sysctl(3), invoking ioctl(2), parsing /proc files, etc. and exposes that information using Prometheus' syntax. The other day I was thinking: in a certain way, the node exporter is a bit of a redundant tool on the BSDs. Instead of needing to write custom collectors for every kernel subsystem, we could write a generic exporter for converting the entire sysctl(3) tree to Prometheus metrics, which is exactly what I'm experimenting with here: https://github.com/EdSchouten/prometheus_sysctl_exporter An example of what this tool's output looks like: $ ./prometheus_sysctl_exporter ... # HELP kern_maxfiles Maximum number of files sysctl_kern_maxfiles 1043382 # HELP kern_openfiles System-wide number of open files sysctl_kern_openfiles 316 ... You could use this to write alerting rules like this: ALERT FileDescriptorUsageHigh IF sysctl_kern_openfiles / sysctl_kern_maxfiles > 0.5 FOR 15m ANNOTATIONS { description = "More than half of all FDs are in use!", } There you go. Access to a very large number of metrics without too much effort. My main question here is: are there any people in here interested in seeing something like this being developed into something usable? If so, let me know and I'll pursue this further. I also have a couple of technical questions related to sysctl(3)'s in-kernel design: - Prometheus differentiates between gauges (memory usage), counters (number of HTTP requests), histograms (per-RPC latency stats), etc., while sysctl(3) does not. It would be nice if we could have that info on a per-sysctl basis. Mind if I add a CTLFLAG_GAUGE, CTLFLAG_COUNTER, etc? - Semantically sysctl(3) and Prometheus are slightly different. Consider this sysctl: hw.acpi.thermal.tz0.temperature: 27.8C My tool currently converts this metric's name to sysctl_hw_acpi_thermal_tz0_temperature. This is suboptimal, as it would ideally be called sysctl_hw_acpi_thermal_temperature{sensor="tz0"}. Otherwise you wouldn't be able to write generic alerting rules, use aggregation in queries, etc. I was thinking: we could quite easily do such a translation by attaching labels to SYSCTL_NODE objects. As in, the hw.acpi.thermal node would get a label "sensor". Any OID placed underneath this node will not become a midfix of the sysctl name, but the value of that label instead. Thoughts? A final remark I want to make: a concern might be that changes like these would not be generic, but only apply to Prometheus. I tend to disagree. First of all, an advantage of Prometheus is that the coupling is very loose: it's just a GET request with key-value pairs. Anyone is free to add his/her own implementation. Second, emaste@ also pointed me to another monitoring framework being developed by Intel right now: https://github.com/intelsdi-x/snap The changes I'm proposing would also seem to make exporting sysctl data to that system easier. Anyway, thanks for reading this huge wall of text. Best regards, -- Ed Schouten <ed@nuxi.nl> Nuxi, 's-Hertogenbosch, the Netherlands KvK-nr.: 62051717
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CABh_MKk87hJTsu1ETX8Ffq9E8gqRPELeSEKzf1jKk_wwUROgAw>