Date: Tue, 16 Apr 2013 06:24:13 +0000 From: "Poul-Henning Kamp" <phk@freebsd.org> To: Alexander Motin <mav@FreeBSD.org> Cc: freebsd-hackers@FreeBSD.org, freebsd-geom@FreeBSD.org Subject: Re: devstat overhead VS precision Message-ID: <40152.1366093453@critter.freebsd.dk> In-Reply-To: <516C71BC.4000902@FreeBSD.org> References: <51692C95.3010901@FreeBSD.org> <20130415184203.GA1839@garage.freebsd.pl> <516C515A.9090602@FreeBSD.org> <38496.1366058586@critter.freebsd.dk> <516C71BC.4000902@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
In message <516C71BC.4000902@FreeBSD.org>, Alexander Motin writes: >On 15.04.2013 23:43, Poul-Henning Kamp wrote: >> In message <516C515A.9090602@FreeBSD.org>, Alexander Motin writes: >> >> For tuning anything on a non-ridiculous SSD device or modern >> harddisks, it will be useless because of the bias you introduce is >> *not* one which averages out over many operations. > >Could you please explain why? > >> The fundamental problem is that on a busy system, getbinuptime() >> does not get called at random times, it will be heavily affected >> by the I/O traffic, because of the interrupts, the bus-traffic >> itself, the cache-effects of I/O transfers and the context-switches >> by the processes causing the I/O. > >I'm sorry, but I am not sure I understand above paragraphs. That was the exact explanation you asked for, and I'm not sure I can find a better way to explain it, but I'll try: Your assumption that the error will cancel out, implicitly assumes that the timestamp returned from getbinuptime() is updated at times which are totally independent from the I/O traffic you are trying to measure the latency of. That is not the case. The interrupt which updates getbinuptime()'s cached timestamp is affected a lot by the I/O traffic, for the various reasons I mention above. >Sure, getbinuptime() won't allow to answer how many requests completed >within 0.5ms, but present API doesn't allow to calculate that any way, >providing only total/average times. And why "_5-10_ timecounter interrupts"? A: Yes it actually does, a userland application running on a dedicated CPU core can poll the shared memory devstat structure at a very high rate and get very useful information about short latencies. Most people don't do that, becuase they don't care about the difference between 0.5 and 0.45 milliseconds. B: To get the systematic bias down to 10-20% of the measured interval. >> Latency distribution: >> >> <5msec: 92.12 % >> <10msec: 0.17 % >> <20msec: 1.34 % >> <50msec: 6.37 % >> >50msec: 0.00 % >> >I agree that such functionality could be interesting. The only worry is >which buckets should be there. For modern HDDs above buckets could be >fine. For high-end SSD it may go about microseconds then milliseconds. I >have doubt that 5 buckets will be universal enough, unless separated by >factor of 5-10. Remember what people use this for: Answering the question "Does my disk subsystem suck, and if so, how much" Buckets like the ones proposed will tell you that. >> The %busy crap should be killed, all it does is confuse people. > >I agree that it heavily lies, especially for cached writes, but at least >it allows to make some very basic estimates. For rotating disks: It always lies. For SSD: It almost always lies. Kill it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40152.1366093453>