From owner-freebsd-hackers@freebsd.org Mon Dec 12 14:16:30 2016 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 38823C7284C for ; Mon, 12 Dec 2016 14:16:30 +0000 (UTC) (envelope-from larry.maloney@hackerdojo.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 16D1795B for ; Mon, 12 Dec 2016 14:16:30 +0000 (UTC) (envelope-from larry.maloney@hackerdojo.com) Received: by mailman.ysv.freebsd.org (Postfix) id 1618EC7284B; Mon, 12 Dec 2016 14:16:30 +0000 (UTC) Delivered-To: hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 15B57C7284A for ; Mon, 12 Dec 2016 14:16:30 +0000 (UTC) (envelope-from larry.maloney@hackerdojo.com) Received: from mail-pg0-x242.google.com (mail-pg0-x242.google.com [IPv6:2607:f8b0:400e:c05::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E712E95A for ; Mon, 12 Dec 2016 14:16:29 +0000 (UTC) (envelope-from larry.maloney@hackerdojo.com) Received: by mail-pg0-x242.google.com with SMTP id p66so1111830pga.2 for ; Mon, 12 Dec 2016 06:16:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hackerdojo.com; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=fR7QSC+wcvE1IMgSR4uEvKjkVjLQleoNwIC04hzaBv8=; b=kL1k9aT74tphbvg0abs34B/vA1xytlmTkiHTM747ifCh1usQGz9zqx0NFBlfUERNqc JsVBnJ8yttcnjlflO0w/HJ/8dj3y8Pr4EwDbP46f8atapT39B1xHczZaWyEP6iQzSgLD LqpxCLI+UHqsxdZaS2WO1Ufi7zc0lALD7zxZI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=fR7QSC+wcvE1IMgSR4uEvKjkVjLQleoNwIC04hzaBv8=; b=SUwf5tRSJuSrA1yblYclE2sJ/k/ftiNUxSXMtoneV0XTZZrBucMRokhRpmBHKlP2lB JGFXySKEk0Kk8H3qXXGBxq5ir+Kn22Y4nRnIKzeU73xQLKSNy/hDLQ6RlMopW2ZqQN0m ReKUcHtimH7ARC2OdpwJavhbp/F8so7xhnlRmhLJlRKF1id+PwVfNX/yUamaaR56bENg htYSph9bPPiDQ7ocEhkd97DkiQJr2eyoYVLXjgVJhHv/Oi+Jv/JO0qCqzmszEl6zyDE9 OwX1xGKSS+W/aQgcYWNShJjDxGLdSEcQ2ILPAAaxhJUzQ4rwQyw8Ia+yyQSWWP0yIm9e b0jA== X-Gm-Message-State: AKaTC03oxsrtrbQKXXkuHT7J/Q/1wzbOl4nCd/g7g0WdntOgxuC7FOz/oN0mzF710WRiNDQA X-Received: by 10.98.76.8 with SMTP id z8mr96341341pfa.167.1481552189338; Mon, 12 Dec 2016 06:16:29 -0800 (PST) Received: from [10.0.0.14] (c-73-202-177-47.hsd1.ca.comcast.net. [73.202.177.47]) by smtp.gmail.com with ESMTPSA id 1sm76878575pgp.1.2016.12.12.06.16.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 12 Dec 2016 06:16:27 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (1.0) Subject: Re: Sysctl as a Service, or: making sysctl(3) more friendly for monitoring systems From: Larry Maloney X-Mailer: iPhone Mail (14B72) In-Reply-To: Date: Mon, 12 Dec 2016 06:16:26 -0800 Cc: hackers@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <7E6EAE02-F059-4C88-8D43-21F0811BC07B@hackerdojo.com> References: To: Ed Schouten X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Dec 2016 14:16:30 -0000 Cool! I like it Sent from my iPhone > On Dec 11, 2016, at 11:35 AM, Ed Schouten wrote: >=20 > Hi there, >=20 > The last couple of months I've been playing around with a monitoring > system called Prometheus (https://prometheus.io/). In short, > Prometheus works like this: >=20 > ----- If you already know Prometheus, skip this ----- >=20 > 1. For the thing you want to monitor, you either integrate the > Prometheus client library into your codebase or you write a separate > exporter process. The client library or the exporter process then > exposes key metrics of your application over HTTP. Simplified example: >=20 > $ curl http://localhost:12345/metrics > # HELP open_file_descriptors The number of files opened by the process > # TYPE open_file_descriptors gauge > open_file_descriptors 12 > # HELP http_requests The number of HTTP requests received. > # TYPE http_requests counter > http_requests{result=3D"2xx"} 100 > http_requests{result=3D"4xx"} 14 > http_requests{result=3D"5xx"} 0 >=20 > 2. You fire op Prometheus and configure it to scrape and store all of > the things you want to monitor. Prometheus can then add more labels to > the metrics it scrapes. So the example above may get transformed by > Prometheus to look like this: >=20 > open_file_descriptors{job=3D"nginx",instance=3D"web1.mycompany.com"} 12 > http_requests{job=3D"nginx",instance=3D"web1.mycompany.com",result=3D"2xx"= } 100 > http_requests{job=3D"nginx",instance=3D"web1.mycompany.com",result=3D"4xx"= } 14 > http_requests{job=3D"nginx",instance=3D"web1.mycompany.com",result=3D"5xx"= } 0 >=20 > Fun fact: Prometheus can also scrape Prometheus, so if you operate > multiple datacenters, you can let a global instance scrape a per-DC > instance and add a dc=3D"..." label to all metrics. >=20 > 3. After scraping data for some time, you can do fancy queries like these:= >=20 > - Compute the 5-minute rate of HTTP requests per server and per HTTP error= code: > rate(http_requests[5m]) >=20 > - Compute the 5-minute rate of all HTTP requests on the entire cluster: > sum(rate(http_requests[5m])) >=20 > - Same as the above, but aggregate by HTTP error code: > sum(rate(http_requests[5m])) by (result) >=20 > Prometheus can do alerting as well by using these expressions as matchers.= >=20 > 4. Set up Grafana and voila: you can create fancy dashboards! >=20 > ----- If you skipped the introduction, start reading here ----- >=20 > The Prometheus folks have developed a tool called the node_exporter > (https://github.com/prometheus/node_exporter). Basically it extracts a > whole bunch of interesting system-related metrics (disk usage, network > I/O, etc) through sysctl(3), invoking ioctl(2), parsing /proc files, > etc. and exposes that information using Prometheus' syntax. >=20 > The other day I was thinking: in a certain way, the node exporter is a > bit of a redundant tool on the BSDs. Instead of needing to write > custom collectors for every kernel subsystem, we could write a generic > exporter for converting the entire sysctl(3) tree to Prometheus > metrics, which is exactly what I'm experimenting with here: >=20 > https://github.com/EdSchouten/prometheus_sysctl_exporter >=20 > An example of what this tool's output looks like: >=20 > $ ./prometheus_sysctl_exporter > ... > # HELP kern_maxfiles Maximum number of files > sysctl_kern_maxfiles 1043382 > # HELP kern_openfiles System-wide number of open files > sysctl_kern_openfiles 316 > ... >=20 > You could use this to write alerting rules like this: >=20 > ALERT FileDescriptorUsageHigh > IF sysctl_kern_openfiles / sysctl_kern_maxfiles > 0.5 > FOR 15m > ANNOTATIONS { > description =3D "More than half of all FDs are in use!", > } >=20 > There you go. Access to a very large number of metrics without too much ef= fort. >=20 > My main question here is: are there any people in here interested in > seeing something like this being developed into something usable? If > so, let me know and I'll pursue this further. >=20 > I also have a couple of technical questions related to sysctl(3)'s > in-kernel design: >=20 > - Prometheus differentiates between gauges (memory usage), counters > (number of HTTP requests), histograms (per-RPC latency stats), etc., > while sysctl(3) does not. It would be nice if we could have that info > on a per-sysctl basis. Mind if I add a CTLFLAG_GAUGE, CTLFLAG_COUNTER, > etc? >=20 > - Semantically sysctl(3) and Prometheus are slightly different. > Consider this sysctl: >=20 > hw.acpi.thermal.tz0.temperature: 27.8C >=20 > My tool currently converts this metric's name to > sysctl_hw_acpi_thermal_tz0_temperature. This is suboptimal, as it > would ideally be called > sysctl_hw_acpi_thermal_temperature{sensor=3D"tz0"}. Otherwise you > wouldn't be able to write generic alerting rules, use aggregation in > queries, etc. >=20 > I was thinking: we could quite easily do such a translation by > attaching labels to SYSCTL_NODE objects. As in, the hw.acpi.thermal > node would get a label "sensor". Any OID placed underneath this node > will not become a midfix of the sysctl name, but the value of that > label instead. Thoughts? >=20 > A final remark I want to make: a concern might be that changes like > these would not be generic, but only apply to Prometheus. I tend to > disagree. First of all, an advantage of Prometheus is that the > coupling is very loose: it's just a GET request with key-value pairs. > Anyone is free to add his/her own implementation. >=20 > Second, emaste@ also pointed me to another monitoring framework being > developed by Intel right now: >=20 > https://github.com/intelsdi-x/snap >=20 > The changes I'm proposing would also seem to make exporting sysctl > data to that system easier. >=20 > Anyway, thanks for reading this huge wall of text. >=20 > Best regards, > --=20 > Ed Schouten > Nuxi, 's-Hertogenbosch, the Netherlands > KvK-nr.: 62051717 > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org"=