From owner-freebsd-hackers@FreeBSD.ORG Tue Apr 16 06:24:15 2013 Return-Path: Delivered-To: freebsd-hackers@FreeBSD.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 5B9E13BE; Tue, 16 Apr 2013 06:24:15 +0000 (UTC) (envelope-from phk@freebsd.org) Received: from phk.freebsd.dk (phk.freebsd.dk [130.225.244.222]) by mx1.freebsd.org (Postfix) with ESMTP id 244FCEF8; Tue, 16 Apr 2013 06:24:14 +0000 (UTC) Received: from critter.freebsd.dk (critter.freebsd.dk [192.168.61.3]) by phk.freebsd.dk (Postfix) with ESMTP id F205389FBE; Tue, 16 Apr 2013 06:24:13 +0000 (UTC) Received: from critter.freebsd.dk (localhost [127.0.0.1]) by critter.freebsd.dk (8.14.6/8.14.6) with ESMTP id r3G6OD8N040153; Tue, 16 Apr 2013 06:24:13 GMT (envelope-from phk@freebsd.org) To: Alexander Motin Subject: Re: devstat overhead VS precision In-reply-to: <516C71BC.4000902@FreeBSD.org> From: "Poul-Henning Kamp" References: <51692C95.3010901@FreeBSD.org> <20130415184203.GA1839@garage.freebsd.pl> <516C515A.9090602@FreeBSD.org> <38496.1366058586@critter.freebsd.dk> <516C71BC.4000902@FreeBSD.org> Content-Type: text/plain; charset=ISO-8859-1 Date: Tue, 16 Apr 2013 06:24:13 +0000 Message-ID: <40152.1366093453@critter.freebsd.dk> Cc: freebsd-hackers@FreeBSD.org, freebsd-geom@FreeBSD.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 16 Apr 2013 06:24:15 -0000 In message <516C71BC.4000902@FreeBSD.org>, Alexander Motin writes: >On 15.04.2013 23:43, Poul-Henning Kamp wrote: >> In message <516C515A.9090602@FreeBSD.org>, Alexander Motin writes: >> >> For tuning anything on a non-ridiculous SSD device or modern >> harddisks, it will be useless because of the bias you introduce is >> *not* one which averages out over many operations. > >Could you please explain why? > >> The fundamental problem is that on a busy system, getbinuptime() >> does not get called at random times, it will be heavily affected >> by the I/O traffic, because of the interrupts, the bus-traffic >> itself, the cache-effects of I/O transfers and the context-switches >> by the processes causing the I/O. > >I'm sorry, but I am not sure I understand above paragraphs. That was the exact explanation you asked for, and I'm not sure I can find a better way to explain it, but I'll try: Your assumption that the error will cancel out, implicitly assumes that the timestamp returned from getbinuptime() is updated at times which are totally independent from the I/O traffic you are trying to measure the latency of. That is not the case. The interrupt which updates getbinuptime()'s cached timestamp is affected a lot by the I/O traffic, for the various reasons I mention above. >Sure, getbinuptime() won't allow to answer how many requests completed >within 0.5ms, but present API doesn't allow to calculate that any way, >providing only total/average times. And why "_5-10_ timecounter interrupts"? A: Yes it actually does, a userland application running on a dedicated CPU core can poll the shared memory devstat structure at a very high rate and get very useful information about short latencies. Most people don't do that, becuase they don't care about the difference between 0.5 and 0.45 milliseconds. B: To get the systematic bias down to 10-20% of the measured interval. >> Latency distribution: >> >> <5msec: 92.12 % >> <10msec: 0.17 % >> <20msec: 1.34 % >> <50msec: 6.37 % >> >50msec: 0.00 % >> >I agree that such functionality could be interesting. The only worry is >which buckets should be there. For modern HDDs above buckets could be >fine. For high-end SSD it may go about microseconds then milliseconds. I >have doubt that 5 buckets will be universal enough, unless separated by >factor of 5-10. Remember what people use this for: Answering the question "Does my disk subsystem suck, and if so, how much" Buckets like the ones proposed will tell you that. >> The %busy crap should be killed, all it does is confuse people. > >I agree that it heavily lies, especially for cached writes, but at least >it allows to make some very basic estimates. For rotating disks: It always lies. For SSD: It almost always lies. Kill it. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.