From owner-freebsd-arch@FreeBSD.ORG Fri Apr 5 08:42:28 2013 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id BAC4B4B4 for ; Fri, 5 Apr 2013 08:42:28 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id EA8698E0 for ; Fri, 5 Apr 2013 08:42:27 +0000 (UTC) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id LAA19258; Fri, 05 Apr 2013 11:42:23 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1UO2E3-000PdN-Jh; Fri, 05 Apr 2013 11:42:23 +0300 Message-ID: <515E8E6E.4030706@FreeBSD.org> Date: Fri, 05 Apr 2013 11:42:22 +0300 From: Andriy Gapon User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:17.0) Gecko/20130321 Thunderbird/17.0.4 MIME-Version: 1.0 To: Alfred Perlstein Subject: collecting statistics / metrics References: <20130401115128.GZ76816@FreeBSD.org> <20130402232606.GC1810@garage.freebsd.pl> <20130403002846.GB15334@onelab2.iet.unipi.it> <20130403100401.GA1349@garage.freebsd.pl> <515C68B5.2010006@ixsystems.com> In-Reply-To: <515C68B5.2010006@ixsystems.com> X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: arch@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2013 08:42:28 -0000 on 03/04/2013 20:36 Alfred Perlstein said the following: > Hey folks, sorry for the top post here, but I just came into this thread. > > Here at iXsystems we've just developed a set of scripts to scrape the various > FreeBSD user land utilities (sysctl, netstat, nfsstat, vmstat, etc, etc) and put > them into graphs based on time. > > The goal is to be able to line up all these metrics with whatever benchmark we > are currently running and be able to see what may be causing issues. > > Potentially you should be able to scroll through the graphs and see things like > "ran out of mbufs @time", "vm system began paging at @time", "buffer deaemon > went nuts @time" > > Then we can take the information back and leverage it to make tuning decisions, > or potentially change kernel algorithms. This is very very useful! > The only problem we have is that every user land tool has its own format, so > along with my team we have written some shell to coerce the output from the > various programs into pseudo-CSV (key/value pair) which can then be post > processed by tools to convert to CSV which can then be put into something like > open office, or put through an R program to graph it. > > I'm hoping to have something shortly. > > What I was hoping to do over the next few days was discuss with people how we > can (or should we even) fix the user land statistics tools to output machine > readable output that can be easily parsed. > > Example: netstat -m (hard to parse) versus 'vmstat -z | grep mbuf' easy to parse. > > The idea of outputting xml is good, CSV is OK, however CSV is problematic as in > the case of sysctl, if new nodes appear, then we can't begin to emit them, we > must either ignore them, or abort, or log them to auxiliary files. Anything > that makes life easier is good. > > I should be able to share our scripts within the next couple of days. Just an alternative idea... I think gathering all this information via plugins to e.g. collectd could be more flexible and less processing / parsing intensive. That would allow to avoid unnecessary formatting and re-parsing and to store the data in a convenient format. Ideally it would be great to have an umbrella library on top of sysctl, devstat, etc that would expose various stats in a convenient form. Another thing of convenience would be an ability to know which sysctls are actually stats. I think that you have already done work towards this goal. There are certain heuristics that may help to distinguish stats from knobs, constants, etc, but the explicit "this is a metric" should be used. Of course, it would take a lot of work to properly mark all the sysctls. Just thinking out loud. -- Andriy Gapon