From owner-freebsd-fs@FreeBSD.ORG Fri Sep 28 20:16:25 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 45E2E106566B for ; Fri, 28 Sep 2012 20:16:25 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id F16558FC0C for ; Fri, 28 Sep 2012 20:16:24 +0000 (UTC) Received: from [209.249.190.124] (port=63938 helo=gnnmac.hudson-trading.com) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77) (envelope-from ) id 1THgz2-0004m4-83; Fri, 28 Sep 2012 16:16:24 -0400 Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) From: George Neville-Neil In-Reply-To: <5056D896.8060607@icritical.com> Date: Fri, 28 Sep 2012 16:16:24 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <49E027FB-2FA7-4839-84C8-3112A039DEE7@neville-neil.com> References: <5056D896.8060607@icritical.com> To: Matt Burke X-Mailer: Apple Mail (2.1498) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com Cc: freebsd-fs@FreeBSD.org Subject: Re: [patch] DTrace disk IO X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Sep 2012 20:16:25 -0000 On Sep 17, 2012, at 04:00 , Matt Burke wrote: > I've recently been trying to get a grip on measuring disk IO latency = (per transaction), and have found it to be rather difficult given the = asynchronous nature of the beast, and also I can't find a way of = translating the bio start of transaction timestamps to anything I can = use in DTrace when pulling them out. >=20 Interesting. Have you looked at all at the I/O module I added in HEAD? Best, George > So I knocked up this little patch against releng/9.1 to put a couple = of DTrace probes in the right places to pick up crucial data like the = now+then timestamps while they're present. >=20 > The predicate on the probe is needed to pick up the right firing - for = reasons I've not been able to fathom because gstat et al give correct = data, devstat_end_transaction is called multiple times for a given = operation - from g_disk_done(), then g_io_deliver() - without anything = useful in the bio struct (device name, number, etc). There also seem to = be a lot of firings coming from the following path which I don't = understand, again without anything useful in the bio: > kernel`devstat_end_transaction+0x13b > kernel`g_io_deliver+0x1b0 > kernel`g_io_schedule_up+0xa6 > kernel`g_up_procbody+0x5c > kernel`fork_exit+0x11f > kernel`0xffffffff80c1c3fe >=20 > Catching flushes is also proving problematic. It would seem that = devstat_end_transaction_bio() is called, but the bio and devstat structs = are virtually empty. bp->bio_dev, bp->bio_disk, ds->device_name, = ds->device_number, ds_unit_number are all null/empty, so I know that one = disk has flushed, and I know how long it took, but I can't find out = which disk it was. >=20 >=20 > Thoughts? >=20 >=20 >=20 > Index: sys/kern/subr_devstat.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- sys/kern/subr_devstat.c (revision 240481) > +++ sys/kern/subr_devstat.c (working copy) > @@ -29,6 +29,7 @@ > #include > __FBSDID("$FreeBSD$"); >=20 > +#include "opt_kdtrace.h" > #include > #include > #include > @@ -41,9 +42,22 @@ > #include > #include > #include > +#include >=20 > #include >=20 > +SDT_PROVIDER_DEFINE(devstat); > +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction, = stat, stat); > +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, = stat, 0, "struct devstat *"); > +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, = stat, 1, "uint32_t"); > +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, = stat, 2, "struct bintime *"); > +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, = stat, 3, "struct bintime *"); > +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction_bio, = stat, stat); > +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, = stat, 0, "struct devstat *"); > +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, = stat, 1, "struct bio *"); > + > + > + > static int devstat_num_devs; > static long devstat_generation =3D 1; > static int devstat_version =3D DEVSTAT_VERSION; > @@ -312,6 +326,8 @@ >=20 > ds->end_count++; > atomic_add_rel_int(&ds->sequence0, 1); > + > + SDT_PROBE(devstat, subr_devstat, devstat_end_transaction, stat, = ds, bytes, now, then, 0); > } >=20 > void > @@ -332,6 +348,8 @@ > else=20 > flg =3D DEVSTAT_NO_DATA; >=20 > + SDT_PROBE(devstat, subr_devstat, devstat_end_transaction_bio, = stat, ds, bp, 0, 0, 0); > + > devstat_end_transaction(ds, bp->bio_bcount - bp->bio_resid, > DEVSTAT_TAG_SIMPLE, flg, NULL, = &bp->bio_t0); > } >=20 >=20 >=20 >=20 > Sample dtrace script: > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > BEGIN > { > bio_cmds[1] =3D "READ"; > bio_cmds[2] =3D "WRITE"; > bio_cmds[4] =3D "DELETE"; > bio_cmds[8] =3D "GETATTR"; > bio_cmds[16] =3D "FLUSH"; > } >=20 > devstat::devstat_end_transaction_bio: > { > self->bio =3D args[1]; > } >=20 > devstat::devstat_end_transaction: > /self->bio && args[0]->device_number/ > { > diff_frac =3D args[2]->frac - args[3]->frac; > diff_ufrac =3D (diff_frac < 0) ? (args[3]->frac - = args[2]->frac) : diff_frac; > diff =3D (1000000000*(diff_ufrac>>32))>>32; >=20 > printf("%d\t%s%d\t%s\t%d\t0.%09d\n", timestamp, > args[0]->device_name, args[0]->unit_number, > bio_cmds[self->bio->bio_cmd], > args[1], > diff > ); > } >=20 >=20 > --=20 > The information contained in this message is confidential and intended = for the addressee only. If you have received this message in error, or = there are any problems with its content, please contact the sender.=20 >=20 > iCritical is a trading name of Critical Software Ltd. Registered in = England: 04909220. > Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 = 5NH. >=20 > This message has been scanned for security threats by iCritical. = www.icritical.com >=20 > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"