Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 1 Feb 2008 06:12:48 -0000
From:      "Steven Hartland" <killing@multiplay.co.uk>
To:        "Eric Anderson" <anderson@freebsd.org>, "Dieter" <freebsd@sopwith.solgatos.com>
Cc:        freebsd-performance@freebsd.org
Subject:   Re: newfs locks entire machine for 20seconds
Message-ID:  <002201c86499$7861ac20$b6db87d4@multiplay.co.uk>
References:  <200801310147.BAA04522@sopwith.solgatos.com> <47A2A606.9080702@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

----- Original Message ----- 
From: "Eric Anderson" <anderson@freebsd.org>

> I saw this once before, a long time back, and every time I went through a debugging session, it came to some kind of lock on the 
> sysctl tree with regards to the geom info (maybe the XML kind of tree dump or something).  I don't recall all the details, but 
> it was something like that.

Yep thats where I've traced it to its requesting: kern.geom.confxml

Which does:-
static int
sysctl_kern_geom_confxml(SYSCTL_HANDLER_ARGS)
{
    int error;
    struct sbuf *sb;

    sb = sbuf_new(NULL, NULL, 0, SBUF_AUTOEXTEND);
    g_waitfor_event(g_confxml, sb, M_WAITOK, NULL);
    error = SYSCTL_OUT(req, sbuf_data(sb), sbuf_len(sb) + 1);
    sbuf_delete(sb);
    return error;
}

What I dont understand is why this would lock the entire machine.

I've enabled LOCK_PROFILING and reran and I get the following which
seems to indicate the culpret is: SYSCTL_LOCK()

>From what I can tell g_waitfor_event is returning EAGAIN for a large
amount of time which means we get stuck in:-
userland_sysctl
...
    SYSCTL_LOCK();

    do {
        req.oldidx = 0;
        req.newidx = 0;
        error = sysctl_root(0, name, namelen, &req);
    } while (error == EAGAIN);

    if (req.lock == REQ_WIRED && req.validlen > 0)
        vsunlock(req.oldptr, req.validlen);

    SYSCTL_UNLOCK();
...

The only reason I can see for returning EAGAIN is g_destroy_geom
calling g_cancel_event

To confirm this wasnt related to anything that top is doing and
we are seeing a total lockup I used:
sh -c 'while [ 1 ]; do sleep 1; date; done'
Fri Feb  1 06:04:47 GMT 2008
Fri Feb  1 06:04:48 GMT 2008
Fri Feb  1 06:04:49 GMT 2008 <= Lockup
Fri Feb  1 06:05:16 GMT 2008 <= Recover
Fri Feb  1 06:05:17 GMT 2008
Fri Feb  1 06:05:18 GMT 2008


Enabling geom debugging I get:-
Feb  1 06:04:45 geomtest kernel: g_post_event_x(0xffffffff802394c0, 0xffffff00010e6100, 2, 0)
Feb  1 06:04:45 geomtest kernel: ref 0xffffff00010e6100
Feb  1 06:04:45 geomtest kernel: g_post_event_x(0xffffffff802394c0, 0xffffff00014e6700, 2, 0)
Feb  1 06:04:45 geomtest kernel: ref 0xffffff00014e6700
Feb  1 06:04:49 geomtest kernel: g_post_event_x(0xffffffff80239270, 0xffffff00010e6100, 2, 0)
Feb  1 06:04:49 geomtest kernel: ref 0xffffff00010e6100
Feb  1 06:04:49 geomtest kernel: g_post_event_x(0xffffffff80239270, 0xffffff00014e6700, 2, 0)
Feb  1 06:04:49 geomtest kernel: ref 0xffffff00014e6700
Feb  1 06:04:49 geomtest kernel: mbr_taste(MBR,da0s3)
Feb  1 06:04:49 geomtest kernel: g_mbrext_taste(MBREXT,da0s3)
Feb  1 06:04:49 geomtest kernel: g_slice_spoiled(0xffffff0001b27180/da0s3)
Feb  1 06:04:49 geomtest kernel: g_wither_geom(0xffffff0001a33c00(da0s3))
Feb  1 06:04:49 geomtest kernel: g_part_taste(PART,da0s3)
Feb  1 06:04:56 geomtest kernel: g_post_event_x(0xffffffff80235b10, 0xffffff000144a9c0, 2, 262144)
Feb  1 06:05:00 geomtest kernel: g_wither_geom(0xffffff000158b300(da0s3))
Feb  1 06:05:00 geomtest kernel:
Feb  1 06:05:00 geomtest kernel: g_label_taste(LABEL, da0s3)
Feb  1 06:05:00 geomtest kernel:
Feb  1 06:05:16 geomtest kernel: GEOM_LABEL[1]: MSDOSFS: da0s3: FAT32 volume not valid.
Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001b23980)
Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b23980)
Feb  1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001b4be00(label:taste))
Feb  1 06:05:16 geomtest kernel: mbr_taste(MBR,da0s3g)
Feb  1 06:05:16 geomtest kernel: g_slice_spoiled(0xffffff0001a3c280/da0s3g)
Feb  1 06:05:16 geomtest kernel: g_wither_geom(0xffffff0001b4b600(da0s3g))
Feb  1 06:05:16 geomtest kernel: g_mbrext_taste(MBREXT,da0s3g)
Feb  1 06:05:16 geomtest kernel: bsd_taste(BSD,da0s3g)
Feb  1 06:05:16 geomtest kernel: g_slice_spoiled(0xffffff0001b27200/da0s3g)
Feb  1 06:05:16 geomtest kernel: g_wither_geom(0xffffff0001504c00(da0s3g))
Feb  1 06:05:16 geomtest kernel: g_part_taste(PART,da0s3g)
Feb  1 06:05:16 geomtest kernel: g_wither_geom(0xffffff0001527100(da0s3g))
Feb  1 06:05:16 geomtest kernel: g_label_taste(LABEL, da0s3g)
Feb  1 06:05:16 geomtest kernel: GEOM_LABEL[1]: UFS2 file system detected on da0s3g.
Feb  1 06:05:16 geomtest kernel: GEOM_LABEL[1]: MSDOSFS: da0s3g: no FAT signature found.
Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001b23a80)
Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b23a80)
Feb  1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001b4be00(label:taste))
Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001a3c280)
Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001a3c280)
Feb  1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001b4b600(da0s3g))
Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001b27180)
Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b27180)
Feb  1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001a33c00(da0s3))
Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001b27200)
Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b27200)
Feb  1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001504c00(da0s3g))
Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001adc580)
Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001adc580)
Feb  1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001527100(da0s3g))
Feb  1 06:05:16 geomtest kernel: g_detach(0xffffff0001b23b00)
Feb  1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b23b00)
Feb  1 06:05:16 geomtest kernel:
Feb  1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff000158b300(da0s3))

So after all that I can see why the sysctl call is taking
so long to complete but the burning question is why does
the entire system lock because of this? What else is
waiting on the sysctl lock which is so critical?


debug.lock.prof.stats:
   max        total   wait_total       count   avg wait_avg     cnt_hold     cnt_lock name
153585139    181478925            0         112 1620347     0            0            0 kern/vfs_subr.c:2062 (lockmgr:ufs)
26298436     26298513     23195820          24 1095771 966492            1            1 kern/kern_sysctl.c:1396 (sx:sysctl lock)
2396114      5604484            0         596  9403     0            0            0 sys/buf.h:280 (lockmgr:bufwait)
   214        76577         1837       12490     6     0          185          280 kern/subr_sleepqueue.c:232 (spin mutex:sleepq 
chain)
     1        34126            2       30415     1     0            2            1 kern/kern_clock.c:224 (spin mutex:sched lock 1)
     2        34069            5       30415     1     0            3            2 kern/kern_clock.c:224 (spin mutex:sched lock 0)
     1        34067            0       30415     1     0            0            0 kern/kern_clock.c:264 (spin mutex:callout)
    54        23629            1        4122     5     0            6            1 kern/uipc_socket.c:2468 (sleep mutex:so_snd)
    34        22599            0        9232     2     0            0            0 amd64/amd64/pmap.c:935 (sleep mutex:vm page 
queue mutex)
    34        22573            0        9232     2     0            0            0 amd64/amd64/pmap.c:936 (sleep mutex:pmap)
   121        20165            0         586    34     0            0            0 kern/uipc_sockbuf.c:145 (sx:so_snd_sx)
    50        19229           38        9401     2     0            3            3 geom/geom_io.c:68 (sleep mutex:bio queue)
   117        16281            0         586    27     0           17            0 netinet/tcp_usrreq.c:781 (sleep mutex:inp)
    50        14954          245       11968     1     0           15           62 kern/kern_conf.c:69 (sleep mutex:cdev)
    52        13599            0        4122     3     0            0            0 kern/uipc_socket.c:2469 (sleep mutex:so_rcv)
   227        11779            0         625    18     0            1            0 cam/cam_xpt.c:7153 (sleep mutex:arcmsr Q buffer 
lock)
    40        11779            0         639    18     0           50            0 kern/kern_conf.c:372 (sleep mutex:Giant)
    52        11201          219        3536     3     0            3           47 kern/kern_conf.c:384 (sleep mutex:Giant)
    11        10888            0        1770     6     0            0            0 kern/subr_taskqueue.c:71 (spin 
mutex:fast_taskqueue)
    41        10866            0        3041     3     0            0            0 netinet/tcp_subr.c:1426 (sleep mutex:isn_mtx)
    50         9858            5        8243     1     0            1            3 kern/sys_generic.c:1079 (sleep mutex:sellck)
     5         9545            7        3416     2     0            3            2 kern/sched_ule.c:1872 (spin mutex:sched lock 0)
     2         9029          677        7356     1     0          103          319 kern/kern_timeout.c:419 (spin mutex:callout)
    52         8916            0        1813     4     0            1            0 kern/sys_generic.c:759 (sleep mutex:sellck)
   222         8318            0         625    13     0            0            0 geom/geom_disk.c:198 (sleep mutex:g_disk_done)
     1         7843            3        6407     1     0            3            2 kern/kern_timeout.c:296 (spin mutex:callout)
     5         7240           17        5090     1     0            5            7 kern/kern_switch.c:182 (spin mutex:sched lock 
0)
    64         6940           58        1276     5     0            0            3 kern/kern_timeout.c:241 (sleep mutex:Giant)
     5         6924         2880        1730     4     1            2         1715 amd64/amd64/mp_machdep.c:966 (spin mutex:sched 
lock 0)
    13         6845            0         623    10     0            0            0 amd64/amd64/mp_machdep.c:845 (spin mutex:smp 
rendezvous)
    53         6279            0        1766     3     0            0            0 kern/sys_pipe.c:1331 (sleep mutex:pipe mutex)
     5         6063           93        4189     1     0            2           27 kern/kern_intr.c:1131 (spin mutex:sched lock 0)
    15         5910           25         660     8     0            7           10 kern/sys_generic.c:1127 (sleep mutex:sellck)
    56         5290          727         357    14     2            0           17 netinet/tcp_input.c:479 (sleep mutex:inp)
   144         5264            0         116    45     0            0            0 vm/vm_map.c:3111 (sx:user map)
     7         4971           25        1552     3     0            9           11 kern/sched_ule.c:1872 (spin mutex:sched lock 1)
    11         4959            0         625     7     0            0            0 dev/arcmsr/arcmsr.c:1129 (sleep mutex:arcmsr Q 
buffer lock)
   210         4901           56        3436     1     0            9           27 kern/sched_ule.c:979 (spin mutex:sched lock 0)
    69         4899            0         357    13     0            0            0 netinet/tcp_input.c:400 (sleep mutex:tcp)
     1         4698            0        4045     1     0            0            0 kern/kern_clock.c:412 (spin mutex:sched lock 1)
     1         4697            0        4189     1     0            1            0 kern/kern_intr.c:706 (spin mutex:sched lock 0)
     1         4696            0        4184     1     0            0            0 kern/kern_timeout.c:191 (spin mutex:callout)
    45         4604            0         591     7     0            6            0 kern/vfs_bio.c:3790 (sleep mutex:bdone lock)
     2         4550            3        4045     1     0            0            1 kern/kern_clock.c:412 (spin mutex:sched lock 0)
    53         4384            0        1228     3     0            0            0 kern/sys_generic.c:750 (sleep mutex:sellck)
     2         4189            3        3392     1     0            1            2 kern/subr_sleepqueue.c:331 (spin mutex:sched 
lock 0)
  3695         4042            0           4  1010     0            0            0 kern/vfs_vnops.c:515 (lockmgr:ufs)
     8         3891            0         645     6     0            0            0 kern/subr_sleepqueue.c:409 (spin mutex:sleepq 
chain)
    10         3773            0        3103     1     0            1            0 kern/vfs_bio.c:3002 (sleep mutex:bdone lock)
    55         3323           22         634     5     0            4            7 kern/kern_mutex.c:141 (sleep mutex:sellck)
    50         3188            0        2469     1     0            0            0 kern/kern_descrip.c:2139 (sleep mutex:sleep 
mtxpool)
  1853         3083            0           4   770     0            0            0 kern/vfs_subr.c:1665 (lockmgr:syncer)
     6         3074            0        1310     2     0            0            0 kern/kern_lock.c:200 (sleep mutex:lockbuilder 
mtxpool)
    10         3003            0         645     4     0            0            0 kern/subr_sleepqueue.c:390 (sleep mutex:process 
lock)
    32         2991            0        2448     1     0            0            0 kern/kern_descrip.c:2007 (sleep mutex:sleep 
mtxpool)
    50         2852            0        2401     1     0            0            0 kern/kern_sig.c:996 (sleep mutex:process lock)
    94         2458            0         589     4     0            0            0 dev/em/if_em.c:980 (sleep mutex:em0)
    54         2358            6         592     3     0            0            3 kern/kern_conf.c:360 (sleep mutex:Giant)
     1         2343           35        1547     1     0           20           20 kern/subr_sleepqueue.c:331 (spin mutex:sched 
lock 1)
     2         2272           98        1540     1     0          238           39 kern/sched_ule.c:979 (spin mutex:sched lock 1)
     1         2214            0        1897     1     0            0            0 kern/subr_sleepqueue.c:595 (spin mutex:sched 
lock 0)
     1         2194            4        1329     1     0            2            2 kern/kern_timeout.c:500 (spin mutex:callout)
     1         2189           33        1526     1     0            0            4 kern/subr_sleepqueue.c:800 (spin mutex:sleepq 
chain)
     5         2160         1959        1173     1     1            0         1168 amd64/amd64/mp_machdep.c:966 (spin mutex:sched 
lock 1)
     1         2158            1        1660     1     0            0            1 kern/sys_generic.c:753 (spin mutex:sched lock 
0)
     8         1972            0         248     7     0            0            0 kern/kern_timeout.c:241 (sleep mutex:process 
lock)
     6         1968           11        1245     1     0            2            5 kern/kern_intr.c:1131 (spin mutex:sched lock 1) 


================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 

In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337
or return the E.mail to postmaster@multiplay.co.uk.




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?002201c86499$7861ac20$b6db87d4>