FreeBSD Mail Archives

Date:      Fri, 01 Feb 2008 12:30:02 +0100
From:      Ivan Voras <ivoras@freebsd.org>
To:        freebsd-performance@freebsd.org
Subject:   Re: newfs locks entire machine for 20seconds
Message-ID:  <fnuvqe$5in$1@ger.gmane.org>
In-Reply-To: <002201c86499$7861ac20$b6db87d4@multiplay.co.uk>
References:  <200801310147.BAA04522@sopwith.solgatos.com>	<47A2A606.9080702@freebsd.org> <002201c86499$7861ac20$b6db87d4@multiplay.co.uk>

index | next in thread | previous in thread | raw e-mail


[-- Attachment #1 --]
Steven Hartland wrote:

> Yep thats where I've traced it to its requesting: kern.geom.confxml
> 
> Which does:-
> static int
> sysctl_kern_geom_confxml(SYSCTL_HANDLER_ARGS)
> {
>    int error;
>    struct sbuf *sb;
> 
>    sb = sbuf_new(NULL, NULL, 0, SBUF_AUTOEXTEND);
>    g_waitfor_event(g_confxml, sb, M_WAITOK, NULL);
>    error = SYSCTL_OUT(req, sbuf_data(sb), sbuf_len(sb) + 1);
>    sbuf_delete(sb);
>    return error;
> }
> 
> What I dont understand is why this would lock the entire machine.
> 
> I've enabled LOCK_PROFILING and reran and I get the following which
> seems to indicate the culpret is: SYSCTL_LOCK()
> 
>> From what I can tell g_waitfor_event is returning EAGAIN for a large
> amount of time which means we get stuck in:-
> userland_sysctl
> ...
>    SYSCTL_LOCK();
> 
>    do {
>        req.oldidx = 0;
>        req.newidx = 0;
>        error = sysctl_root(0, name, namelen, &req);
>    } while (error == EAGAIN);
> 
>    if (req.lock == REQ_WIRED && req.validlen > 0)
>        vsunlock(req.oldptr, req.validlen);
> 
>    SYSCTL_UNLOCK();
> ...
> 
> The only reason I can see for returning EAGAIN is g_destroy_geom
> calling g_cancel_event
> 
> To confirm this wasnt related to anything that top is doing and
> we are seeing a total lockup I used:
> sh -c 'while [ 1 ]; do sleep 1; date; done'
> Fri Feb  1 06:04:47 GMT 2008
> Fri Feb  1 06:04:48 GMT 2008
> Fri Feb  1 06:04:49 GMT 2008 <= Lockup
> Fri Feb  1 06:05:16 GMT 2008 <= Recover
> Fri Feb  1 06:05:17 GMT 2008
> Fri Feb  1 06:05:18 GMT 2008
> 
> 
> Enabling geom debugging I get:-
> Feb  1 06:04:45 geomtest kernel: g_post_event_x(0xffffffff802394c0,
> 0xffffff00010e6100, 2, 0)
> Feb  1 06:04:45 geomtest kernel: ref 0xffffff00010e6100
> Feb  1 06:04:45 geomtest kernel: g_post_event_x(0xffffffff802394c0,
> 0xffffff00014e6700, 2, 0)
> Feb  1 06:04:45 geomtest kernel: ref 0xffffff00014e6700
> Feb  1 06:04:49 geomtest kernel: g_post_event_x(0xffffffff80239270,
> 0xffffff00010e6100, 2, 0)
> Feb  1 06:04:49 geomtest kernel: ref 0xffffff00010e6100
> Feb  1 06:04:49 geomtest kernel: g_post_event_x(0xffffffff80239270,
> 0xffffff00014e6700, 2, 0)
> Feb  1 06:04:49 geomtest kernel: ref 0xffffff00014e6700
> Feb  1 06:04:49 geomtest kernel: mbr_taste(MBR,da0s3)
> Feb  1 06:04:49 geomtest kernel: g_mbrext_taste(MBREXT,da0s3)
> Feb  1 06:04:49 geomtest kernel: g_slice_spoiled(0xffffff0001b27180/da0s3)
> Feb  1 06:04:49 geomtest kernel: g_wither_geom(0xffffff0001a33c00(da0s3))
> Feb  1 06:04:49 geomtest kernel: g_part_taste(PART,da0s3)
> Feb  1 06:04:56 geomtest kernel: g_post_event_x(0xffffffff80235b10,
> 0xffffff000144a9c0, 2, 262144)
> Feb  1 06:05:00 geomtest kernel: g_wither_geom(0xffffff000158b300(da0s3))
> Feb  1 06:05:00 geomtest kernel:
> Feb  1 06:05:00 geomtest kernel: g_label_taste(LABEL, da0s3)
> Feb  1 06:05:00 geomtest kernel:
> Feb  1 06:05:16 geomtest kernel: GEOM_LABEL[1]: MSDOSFS: da0s3: FAT32
> volume not valid.
> Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001b23980)
> Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b23980)
> Feb  1 06:05:16 geomtest kernel:

> So after all that I can see why the sysctl call is taking
> so long to complete but the burning question is why does

Can you explain - I don't see it :) Do you mean to say there's a
contention for sysctl lock between geom_confxml and g_waitfor_event or
that geom_label tasting has something to do with it?

> the entire system lock because of this? What else is
> waiting on the sysctl lock which is so critical?

What I do know is that sysctl is GIANT-locked, which is also used by
some parts of device handling infrastructure (dead_cdevsw), the USB
stack, and can creep itself in the timer via swi_sched in
subr_taskqueue.c:303. I cannot say for sure that's what happening here,
but they are possibilities.

If you can provoke this reliably, I think there is a (old!) patch for
removing sysctl from under the Giant lock that you could try.


[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHowLAldnAQVacBcgRAkutAJ0Y3W1r/pUDkumTH7lwOQGEPJAVwgCfUQ+j
GrktnsKT+fVbSWCNowj0Y0E=
=uomb
-----END PGP SIGNATURE-----

home | help

Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?fnuvqe$5in$1>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation