Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Apr 2013 06:24:13 +0000
From:      "Poul-Henning Kamp" <phk@freebsd.org>
To:        Alexander Motin <mav@FreeBSD.org>
Cc:        freebsd-hackers@FreeBSD.org, freebsd-geom@FreeBSD.org
Subject:   Re: devstat overhead VS precision
Message-ID:  <40152.1366093453@critter.freebsd.dk>
In-Reply-To: <516C71BC.4000902@FreeBSD.org>
References:  <51692C95.3010901@FreeBSD.org> <20130415184203.GA1839@garage.freebsd.pl> <516C515A.9090602@FreeBSD.org> <38496.1366058586@critter.freebsd.dk> <516C71BC.4000902@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
In message <516C71BC.4000902@FreeBSD.org>, Alexander Motin writes:
>On 15.04.2013 23:43, Poul-Henning Kamp wrote:
>> In message <516C515A.9090602@FreeBSD.org>, Alexander Motin writes:
>>

>> For tuning anything on a non-ridiculous SSD device or modern
>> harddisks, it will be useless because of the bias you introduce is
>> *not* one which averages out over many operations.
>
>Could you please explain why?
>
>> The fundamental problem is that on a busy system, getbinuptime()
>> does not get called at random times, it will be heavily affected
>> by the I/O traffic, because of the interrupts, the bus-traffic
>> itself, the cache-effects of I/O transfers and the context-switches
>> by the processes causing the I/O.
>
>I'm sorry, but I am not sure I understand above paragraphs.

That was the exact explanation you asked for, and I'm not sure I can
find a better way to explain it, but I'll try:

Your assumption that the error will cancel out, implicitly assumes
that the timestamp returned from getbinuptime() is updated at
times which are totally independent from the I/O traffic you are
trying to measure the latency of.

That is not the case.  The interrupt which updates getbinuptime()'s
cached timestamp is affected a lot by the I/O traffic, for the various
reasons I mention above.

>Sure, getbinuptime() won't allow to answer how many requests completed 
>within 0.5ms, but present API doesn't allow to calculate that any way, 
>providing only total/average times. And why "_5-10_ timecounter interrupts"?

A: Yes it actually does, a userland application running on a dedicated
CPU core can poll the shared memory devstat structure at a very high
rate and get very useful information about short latencies.

Most people don't do that, becuase they don't care about the difference
between 0.5 and 0.45 milliseconds.

B: To get the systematic bias down to 10-20% of the measured interval.

>> 	Latency distribution:
>>
>> 		<5msec:		92.12 %
>> 		<10msec:	 0.17 %
>> 		<20msec:	 1.34 %
>> 		<50msec:	 6.37 %
>> 		>50msec:	 0.00 %
>>
>I agree that such functionality could be interesting. The only worry is 
>which buckets should be there. For modern HDDs above buckets could be 
>fine. For high-end SSD it may go about microseconds then milliseconds. I 
>have doubt that 5 buckets will be universal enough, unless separated by 
>factor of 5-10.

Remember what people use this for:  Answering the question "Does my
disk subsystem suck, and if so, how much"

Buckets like the ones proposed will tell you that.

>> The %busy crap should be killed, all it does is confuse people.
>
>I agree that it heavily lies, especially for cached writes, but at least 
>it allows to make some very basic estimates. 

For rotating disks:  It always lies.

For SSD: It almost always lies.

Kill it.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?40152.1366093453>