Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Sep 2012 16:16:24 -0400
From:      George Neville-Neil <gnn@neville-neil.com>
To:        Matt Burke <mattblists@icritical.com>
Cc:        freebsd-fs@FreeBSD.org
Subject:   Re: [patch] DTrace disk IO
Message-ID:  <49E027FB-2FA7-4839-84C8-3112A039DEE7@neville-neil.com>
In-Reply-To: <5056D896.8060607@icritical.com>
References:  <5056D896.8060607@icritical.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Sep 17, 2012, at 04:00 , Matt Burke <mattblists@icritical.com> wrote:

> I've recently been trying to get a grip on measuring disk IO latency =
(per transaction), and have found it to be rather difficult given the =
asynchronous nature of the beast, and also I can't find a way of =
translating the bio start of transaction timestamps to anything I can =
use in DTrace when pulling them out.
>=20

Interesting.  Have you looked at all at the I/O module I added in HEAD?

Best,
George

> So I knocked up this little patch against releng/9.1 to put a couple =
of DTrace probes in the right places to pick up crucial data like the =
now+then timestamps while they're present.
>=20
> The predicate on the probe is needed to pick up the right firing - for =
reasons I've not been able to fathom because gstat et al give correct =
data, devstat_end_transaction is called multiple times for a given =
operation - from g_disk_done(), then g_io_deliver() - without anything =
useful in the bio struct (device name, number, etc). There also seem to =
be a lot of firings coming from the following path which I don't =
understand, again without anything useful in the bio:
>              kernel`devstat_end_transaction+0x13b
>              kernel`g_io_deliver+0x1b0
>              kernel`g_io_schedule_up+0xa6
>              kernel`g_up_procbody+0x5c
>              kernel`fork_exit+0x11f
>              kernel`0xffffffff80c1c3fe
>=20
> Catching flushes is also proving problematic. It would seem that =
devstat_end_transaction_bio() is called, but the bio and devstat structs =
are virtually empty. bp->bio_dev, bp->bio_disk, ds->device_name, =
ds->device_number, ds_unit_number are all null/empty, so I know that one =
disk has flushed, and I know how long it took, but I can't find out =
which disk it was.
>=20
>=20
> Thoughts?
>=20
>=20
>=20
> Index: sys/kern/subr_devstat.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> --- sys/kern/subr_devstat.c	(revision 240481)
> +++ sys/kern/subr_devstat.c	(working copy)
> @@ -29,6 +29,7 @@
> #include <sys/cdefs.h>
> __FBSDID("$FreeBSD$");
>=20
> +#include "opt_kdtrace.h"
> #include <sys/param.h>
> #include <sys/kernel.h>
> #include <sys/systm.h>
> @@ -41,9 +42,22 @@
> #include <sys/conf.h>
> #include <vm/vm.h>
> #include <vm/pmap.h>
> +#include <sys/sdt.h>
>=20
> #include <machine/atomic.h>
>=20
> +SDT_PROVIDER_DEFINE(devstat);
> +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction, =
stat, stat);
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, =
stat, 0, "struct devstat *");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, =
stat, 1, "uint32_t");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, =
stat, 2, "struct bintime *");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction, =
stat, 3, "struct bintime *");
> +SDT_PROBE_DEFINE(devstat, subr_devstat, devstat_end_transaction_bio, =
stat, stat);
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, =
stat, 0, "struct devstat *");
> +SDT_PROBE_ARGTYPE(devstat, subr_devstat, devstat_end_transaction_bio, =
stat, 1, "struct bio *");
> +
> +
> +
> static int devstat_num_devs;
> static long devstat_generation =3D 1;
> static int devstat_version =3D DEVSTAT_VERSION;
> @@ -312,6 +326,8 @@
>=20
> 	ds->end_count++;
> 	atomic_add_rel_int(&ds->sequence0, 1);
> +
> +	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction, stat, =
ds, bytes, now, then, 0);
> }
>=20
> void
> @@ -332,6 +348,8 @@
> 	else=20
> 		flg =3D DEVSTAT_NO_DATA;
>=20
> +	SDT_PROBE(devstat, subr_devstat, devstat_end_transaction_bio, =
stat, ds, bp, 0, 0, 0);
> +
> 	devstat_end_transaction(ds, bp->bio_bcount - bp->bio_resid,
> 				DEVSTAT_TAG_SIMPLE, flg, NULL, =
&bp->bio_t0);
> }
>=20
>=20
>=20
>=20
> Sample dtrace script:
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>=20
> BEGIN
> {
>        bio_cmds[1] =3D  "READ";
>        bio_cmds[2] =3D  "WRITE";
>        bio_cmds[4] =3D  "DELETE";
>        bio_cmds[8] =3D  "GETATTR";
>        bio_cmds[16] =3D "FLUSH";
> }
>=20
> devstat::devstat_end_transaction_bio:
> {
>        self->bio =3D args[1];
> }
>=20
> devstat::devstat_end_transaction:
> /self->bio && args[0]->device_number/
> {
>        diff_frac =3D args[2]->frac - args[3]->frac;
>        diff_ufrac =3D (diff_frac < 0) ? (args[3]->frac - =
args[2]->frac) : diff_frac;
>        diff =3D (1000000000*(diff_ufrac>>32))>>32;
>=20
>        printf("%d\t%s%d\t%s\t%d\t0.%09d\n", timestamp,
>                args[0]->device_name, args[0]->unit_number,
>                bio_cmds[self->bio->bio_cmd],
>                args[1],
>                diff
>                );
> }
>=20
>=20
> --=20
> The information contained in this message is confidential and intended =
for the addressee only. If you have received this message in error, or =
there are any problems with its content, please contact the sender.=20
>=20
> iCritical is a trading name of Critical Software Ltd. Registered in =
England: 04909220.
> Registered Office: IC2, Keele Science Park, Keele, Staffordshire, ST5 =
5NH.
>=20
> This message has been scanned for security threats by iCritical. =
www.icritical.com
>=20
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?49E027FB-2FA7-4839-84C8-3112A039DEE7>