From owner-freebsd-hackers@freebsd.org Sun Dec 11 19:36:04 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 113CCC72483 for ; Sun, 11 Dec 2016 19:36:04 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: from mailman.ysv.freebsd.org (unknown [127.0.1.3]) by mx1.freebsd.org (Postfix) with ESMTP id E86BD173E for ; Sun, 11 Dec 2016 19:36:03 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: by mailman.ysv.freebsd.org (Postfix) id E4D97C72482; Sun, 11 Dec 2016 19:36:03 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E473DC72481 for ; Sun, 11 Dec 2016 19:36:03 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: from mail-yw0-x232.google.com (mail-yw0-x232.google.com [IPv6:2607:f8b0:4002:c05::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B0D8C173D for ; Sun, 11 Dec 2016 19:36:03 +0000 (UTC) (envelope-from ed@nuxi.nl) Received: by mail-yw0-x232.google.com with SMTP id i145so50676399ywg.2 for ; Sun, 11 Dec 2016 11:36:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nuxi-nl.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=hiYs9LGpLt9pJ51fddYS5023QoBMXOlbD4fOOqw50Fs=; b=u+DnMDJvA31NQXPddNRkbTV66xbx6yTjWnNJEo5vWEDoaatVSZzlpqZnWaJE3DOmHU lYCFvQDXGncNL/sLR1k3Mlj44WvghadNjA3heBcjjzZCiG2Q9hEelNyLawM33FeXA9+i btg7YYxpot5+lBhnadN8LhKslpH5z7KyrK2+wY1GM039GUgPVZ76FCi9zwzgkZt5Bh4c 5Lw8eN6DJYOcNT+hbN7yGSoYflk7lRTq1qVKQo33HL6vn7FDLtW1DDH4HsoPyXRz6ApA VUOpdh4unNlVL1Cx1MWjCgYPjqAiYrnty6y0IBGsGnkcWYXMV6G8oIfQas+WERR8iRv7 c3Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=hiYs9LGpLt9pJ51fddYS5023QoBMXOlbD4fOOqw50Fs=; b=EtQVT1e4feLhmLWi099Z5u1OucEiAYIhhSu5atqc9namj1v+WAKQvfo9eks4sGIdib Oq8r8ISywflOTlJypKXa9m30DjmSpePX3dUM5biP4L25fKNAxFO1OuHBziegHq1kHn9W cKysl5n4Qup8cvr3cHdIGNpGbYLw90zzPHTTMtnQl+/YbUiN1lpSoLyHCAuxJ5Pspv9F fCHW1kjB/X9LZJVB27CtHmcYVUxlRIT7Chy9w2xprgCLtHMabONL33DNKxNynskxNHp0 ll6EzH76A+HZ5TyqIv/6P3SzeRdAtF5oDA3OgZYHi2Ff3anvVZmX1WN6mvHVa4anPXrj TpRg== X-Gm-Message-State: AKaTC03AO4JAfb2CeXTyXskBSiveBbHzFb8UeqNKFkxXIkfS6PG6Pxb6Wycdc5PuXtYhsyoG9aisfclWf7VwBw== X-Received: by 10.129.164.198 with SMTP id b189mr80534015ywh.294.1481484962535; Sun, 11 Dec 2016 11:36:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.129.0.212 with HTTP; Sun, 11 Dec 2016 11:35:32 -0800 (PST) From: Ed Schouten Date: Sun, 11 Dec 2016 20:35:32 +0100 Message-ID: Subject: Sysctl as a Service, or: making sysctl(3) more friendly for monitoring systems To: hackers@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 11 Dec 2016 19:36:04 -0000 Hi there, The last couple of months I've been playing around with a monitoring system called Prometheus (https://prometheus.io/). In short, Prometheus works like this: ----- If you already know Prometheus, skip this ----- 1. For the thing you want to monitor, you either integrate the Prometheus client library into your codebase or you write a separate exporter process. The client library or the exporter process then exposes key metrics of your application over HTTP. Simplified example: $ curl http://localhost:12345/metrics # HELP open_file_descriptors The number of files opened by the process # TYPE open_file_descriptors gauge open_file_descriptors 12 # HELP http_requests The number of HTTP requests received. # TYPE http_requests counter http_requests{result="2xx"} 100 http_requests{result="4xx"} 14 http_requests{result="5xx"} 0 2. You fire op Prometheus and configure it to scrape and store all of the things you want to monitor. Prometheus can then add more labels to the metrics it scrapes. So the example above may get transformed by Prometheus to look like this: open_file_descriptors{job="nginx",instance="web1.mycompany.com"} 12 http_requests{job="nginx",instance="web1.mycompany.com",result="2xx"} 100 http_requests{job="nginx",instance="web1.mycompany.com",result="4xx"} 14 http_requests{job="nginx",instance="web1.mycompany.com",result="5xx"} 0 Fun fact: Prometheus can also scrape Prometheus, so if you operate multiple datacenters, you can let a global instance scrape a per-DC instance and add a dc="..." label to all metrics. 3. After scraping data for some time, you can do fancy queries like these: - Compute the 5-minute rate of HTTP requests per server and per HTTP error code: rate(http_requests[5m]) - Compute the 5-minute rate of all HTTP requests on the entire cluster: sum(rate(http_requests[5m])) - Same as the above, but aggregate by HTTP error code: sum(rate(http_requests[5m])) by (result) Prometheus can do alerting as well by using these expressions as matchers. 4. Set up Grafana and voila: you can create fancy dashboards! ----- If you skipped the introduction, start reading here ----- The Prometheus folks have developed a tool called the node_exporter (https://github.com/prometheus/node_exporter). Basically it extracts a whole bunch of interesting system-related metrics (disk usage, network I/O, etc) through sysctl(3), invoking ioctl(2), parsing /proc files, etc. and exposes that information using Prometheus' syntax. The other day I was thinking: in a certain way, the node exporter is a bit of a redundant tool on the BSDs. Instead of needing to write custom collectors for every kernel subsystem, we could write a generic exporter for converting the entire sysctl(3) tree to Prometheus metrics, which is exactly what I'm experimenting with here: https://github.com/EdSchouten/prometheus_sysctl_exporter An example of what this tool's output looks like: $ ./prometheus_sysctl_exporter ... # HELP kern_maxfiles Maximum number of files sysctl_kern_maxfiles 1043382 # HELP kern_openfiles System-wide number of open files sysctl_kern_openfiles 316 ... You could use this to write alerting rules like this: ALERT FileDescriptorUsageHigh IF sysctl_kern_openfiles / sysctl_kern_maxfiles > 0.5 FOR 15m ANNOTATIONS { description = "More than half of all FDs are in use!", } There you go. Access to a very large number of metrics without too much effort. My main question here is: are there any people in here interested in seeing something like this being developed into something usable? If so, let me know and I'll pursue this further. I also have a couple of technical questions related to sysctl(3)'s in-kernel design: - Prometheus differentiates between gauges (memory usage), counters (number of HTTP requests), histograms (per-RPC latency stats), etc., while sysctl(3) does not. It would be nice if we could have that info on a per-sysctl basis. Mind if I add a CTLFLAG_GAUGE, CTLFLAG_COUNTER, etc? - Semantically sysctl(3) and Prometheus are slightly different. Consider this sysctl: hw.acpi.thermal.tz0.temperature: 27.8C My tool currently converts this metric's name to sysctl_hw_acpi_thermal_tz0_temperature. This is suboptimal, as it would ideally be called sysctl_hw_acpi_thermal_temperature{sensor="tz0"}. Otherwise you wouldn't be able to write generic alerting rules, use aggregation in queries, etc. I was thinking: we could quite easily do such a translation by attaching labels to SYSCTL_NODE objects. As in, the hw.acpi.thermal node would get a label "sensor". Any OID placed underneath this node will not become a midfix of the sysctl name, but the value of that label instead. Thoughts? A final remark I want to make: a concern might be that changes like these would not be generic, but only apply to Prometheus. I tend to disagree. First of all, an advantage of Prometheus is that the coupling is very loose: it's just a GET request with key-value pairs. Anyone is free to add his/her own implementation. Second, emaste@ also pointed me to another monitoring framework being developed by Intel right now: https://github.com/intelsdi-x/snap The changes I'm proposing would also seem to make exporting sysctl data to that system easier. Anyway, thanks for reading this huge wall of text. Best regards, -- Ed Schouten Nuxi, 's-Hertogenbosch, the Netherlands KvK-nr.: 62051717