Date: Mon, 15 Apr 2013 20:43:06 +0000 From: "Poul-Henning Kamp" <phk@phk.freebsd.dk> To: Alexander Motin <mav@FreeBSD.org> Cc: freebsd-hackers@FreeBSD.org, freebsd-geom@FreeBSD.org Subject: Re: devstat overhead VS precision Message-ID: <38496.1366058586@critter.freebsd.dk> In-Reply-To: <516C515A.9090602@FreeBSD.org> References: <51692C95.3010901@FreeBSD.org> <20130415184203.GA1839@garage.freebsd.pl> <516C515A.9090602@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
In message <516C515A.9090602@FreeBSD.org>, Alexander Motin writes: >>> I propose to switch that >>> statistics from using binuptime() to getbinuptime() to solve the problem >>> globally. >> No objections here, but I wonder if you were able to compare the results >> somehow before and after the change so we have some hard numbers to show >> that we don't lose much by applying the change. > >I haven't tested it statistically, but I haven't noticed any visual >difference in gstat output with its 0.1ms displayed resolution. I have tested it statistically, back when I wrote GEOM: It leads to very significant statistical bias. Just about the only thing in devstat that has any predictive power with respect to filesystem performance, is the latency, which measures how long time it takes to satisfy each I/O request. If you run gstat(8), this is the "ms/*" numbers: milliseconds per this or that. The rest of what's in devstat, with the exception of the queue-length ("L(q)") has almost no predictive power, and is IMO, practically pointless. In particular the %busy is totally misleading and I deeply regret that I didn't fight to kill it back then. If you switch to getbinuptime(), the latency measurements will only be precise if the I/O operations take much longer than the timecounter update period, which is not guaranteed to be 1000 Hz btw. For measuring how much USB-sticks suck, that will work fine. For tuning anything on a non-ridiculous SSD device or modern harddisks, it will be useless because of the bias you introduce is *not* one which averages out over many operations. The fundamental problem is that on a busy system, getbinuptime() does not get called at random times, it will be heavily affected by the I/O traffic, because of the interrupts, the bus-traffic itself, the cache-effects of I/O transfers and the context-switches by the processes causing the I/O. So yes, you can switch to getbinuptime(), but the only statistical responsible way to do so, would be to supress latency measurements on all I/O operations which complete in less than 5-10 timecounter interrupts. Apart from some practical issues implementing it, the numbers that came out would be pretty useless. The right idea is probably to bucketize the latencies, so that rather than having to keep track of devstat in real time to find out, you could get a histogram at any time showing past performance something like: Latency distribution: <5msec: 92.12 % <10msec: 0.17 % <20msec: 1.34 % <50msec: 6.37 % >50msec: 0.00 % Doing that with getbinuptime() would be statistically defensible provided the top bucket is "<5msec" and it would very clearly tell people if they have I/O trouble or not, which IMO is what people want to know. The cost 20 64bit counters in struct devstat (N|R|W|E)*5*8 = 160 bytes, but since devstat is already 288 bytes, that isn't a major catastropy. The ability to measure latency precisly should be retained, but it could be made a sysctl enabled debugging facility. The %busy crap should be killed, all it does is confuse people. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?38496.1366058586>