Date: Fri, 1 Feb 2008 06:12:48 -0000 From: "Steven Hartland" <killing@multiplay.co.uk> To: "Eric Anderson" <anderson@freebsd.org>, "Dieter" <freebsd@sopwith.solgatos.com> Cc: freebsd-performance@freebsd.org Subject: Re: newfs locks entire machine for 20seconds Message-ID: <002201c86499$7861ac20$b6db87d4@multiplay.co.uk> References: <200801310147.BAA04522@sopwith.solgatos.com> <47A2A606.9080702@freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
----- Original Message ----- From: "Eric Anderson" <anderson@freebsd.org> > I saw this once before, a long time back, and every time I went through a debugging session, it came to some kind of lock on the > sysctl tree with regards to the geom info (maybe the XML kind of tree dump or something). I don't recall all the details, but > it was something like that. Yep thats where I've traced it to its requesting: kern.geom.confxml Which does:- static int sysctl_kern_geom_confxml(SYSCTL_HANDLER_ARGS) { int error; struct sbuf *sb; sb = sbuf_new(NULL, NULL, 0, SBUF_AUTOEXTEND); g_waitfor_event(g_confxml, sb, M_WAITOK, NULL); error = SYSCTL_OUT(req, sbuf_data(sb), sbuf_len(sb) + 1); sbuf_delete(sb); return error; } What I dont understand is why this would lock the entire machine. I've enabled LOCK_PROFILING and reran and I get the following which seems to indicate the culpret is: SYSCTL_LOCK() >From what I can tell g_waitfor_event is returning EAGAIN for a large amount of time which means we get stuck in:- userland_sysctl ... SYSCTL_LOCK(); do { req.oldidx = 0; req.newidx = 0; error = sysctl_root(0, name, namelen, &req); } while (error == EAGAIN); if (req.lock == REQ_WIRED && req.validlen > 0) vsunlock(req.oldptr, req.validlen); SYSCTL_UNLOCK(); ... The only reason I can see for returning EAGAIN is g_destroy_geom calling g_cancel_event To confirm this wasnt related to anything that top is doing and we are seeing a total lockup I used: sh -c 'while [ 1 ]; do sleep 1; date; done' Fri Feb 1 06:04:47 GMT 2008 Fri Feb 1 06:04:48 GMT 2008 Fri Feb 1 06:04:49 GMT 2008 <= Lockup Fri Feb 1 06:05:16 GMT 2008 <= Recover Fri Feb 1 06:05:17 GMT 2008 Fri Feb 1 06:05:18 GMT 2008 Enabling geom debugging I get:- Feb 1 06:04:45 geomtest kernel: g_post_event_x(0xffffffff802394c0, 0xffffff00010e6100, 2, 0) Feb 1 06:04:45 geomtest kernel: ref 0xffffff00010e6100 Feb 1 06:04:45 geomtest kernel: g_post_event_x(0xffffffff802394c0, 0xffffff00014e6700, 2, 0) Feb 1 06:04:45 geomtest kernel: ref 0xffffff00014e6700 Feb 1 06:04:49 geomtest kernel: g_post_event_x(0xffffffff80239270, 0xffffff00010e6100, 2, 0) Feb 1 06:04:49 geomtest kernel: ref 0xffffff00010e6100 Feb 1 06:04:49 geomtest kernel: g_post_event_x(0xffffffff80239270, 0xffffff00014e6700, 2, 0) Feb 1 06:04:49 geomtest kernel: ref 0xffffff00014e6700 Feb 1 06:04:49 geomtest kernel: mbr_taste(MBR,da0s3) Feb 1 06:04:49 geomtest kernel: g_mbrext_taste(MBREXT,da0s3) Feb 1 06:04:49 geomtest kernel: g_slice_spoiled(0xffffff0001b27180/da0s3) Feb 1 06:04:49 geomtest kernel: g_wither_geom(0xffffff0001a33c00(da0s3)) Feb 1 06:04:49 geomtest kernel: g_part_taste(PART,da0s3) Feb 1 06:04:56 geomtest kernel: g_post_event_x(0xffffffff80235b10, 0xffffff000144a9c0, 2, 262144) Feb 1 06:05:00 geomtest kernel: g_wither_geom(0xffffff000158b300(da0s3)) Feb 1 06:05:00 geomtest kernel: Feb 1 06:05:00 geomtest kernel: g_label_taste(LABEL, da0s3) Feb 1 06:05:00 geomtest kernel: Feb 1 06:05:16 geomtest kernel: GEOM_LABEL[1]: MSDOSFS: da0s3: FAT32 volume not valid. Feb 1 06:05:16 geomtest kernel: g_detach(0xffffff0001b23980) Feb 1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b23980) Feb 1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001b4be00(label:taste)) Feb 1 06:05:16 geomtest kernel: mbr_taste(MBR,da0s3g) Feb 1 06:05:16 geomtest kernel: g_slice_spoiled(0xffffff0001a3c280/da0s3g) Feb 1 06:05:16 geomtest kernel: g_wither_geom(0xffffff0001b4b600(da0s3g)) Feb 1 06:05:16 geomtest kernel: g_mbrext_taste(MBREXT,da0s3g) Feb 1 06:05:16 geomtest kernel: bsd_taste(BSD,da0s3g) Feb 1 06:05:16 geomtest kernel: g_slice_spoiled(0xffffff0001b27200/da0s3g) Feb 1 06:05:16 geomtest kernel: g_wither_geom(0xffffff0001504c00(da0s3g)) Feb 1 06:05:16 geomtest kernel: g_part_taste(PART,da0s3g) Feb 1 06:05:16 geomtest kernel: g_wither_geom(0xffffff0001527100(da0s3g)) Feb 1 06:05:16 geomtest kernel: g_label_taste(LABEL, da0s3g) Feb 1 06:05:16 geomtest kernel: GEOM_LABEL[1]: UFS2 file system detected on da0s3g. Feb 1 06:05:16 geomtest kernel: GEOM_LABEL[1]: MSDOSFS: da0s3g: no FAT signature found. Feb 1 06:05:16 geomtest kernel: g_detach(0xffffff0001b23a80) Feb 1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b23a80) Feb 1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001b4be00(label:taste)) Feb 1 06:05:16 geomtest kernel: g_detach(0xffffff0001a3c280) Feb 1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001a3c280) Feb 1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001b4b600(da0s3g)) Feb 1 06:05:16 geomtest kernel: g_detach(0xffffff0001b27180) Feb 1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b27180) Feb 1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001a33c00(da0s3)) Feb 1 06:05:16 geomtest kernel: g_detach(0xffffff0001b27200) Feb 1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b27200) Feb 1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001504c00(da0s3g)) Feb 1 06:05:16 geomtest kernel: g_detach(0xffffff0001adc580) Feb 1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001adc580) Feb 1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff0001527100(da0s3g)) Feb 1 06:05:16 geomtest kernel: g_detach(0xffffff0001b23b00) Feb 1 06:05:16 geomtest kernel: g_destroy_consumer(0xffffff0001b23b00) Feb 1 06:05:16 geomtest kernel: Feb 1 06:05:16 geomtest kernel: g_destroy_geom(0xffffff000158b300(da0s3)) So after all that I can see why the sysctl call is taking so long to complete but the burning question is why does the entire system lock because of this? What else is waiting on the sysctl lock which is so critical? debug.lock.prof.stats: max total wait_total count avg wait_avg cnt_hold cnt_lock name 153585139 181478925 0 112 1620347 0 0 0 kern/vfs_subr.c:2062 (lockmgr:ufs) 26298436 26298513 23195820 24 1095771 966492 1 1 kern/kern_sysctl.c:1396 (sx:sysctl lock) 2396114 5604484 0 596 9403 0 0 0 sys/buf.h:280 (lockmgr:bufwait) 214 76577 1837 12490 6 0 185 280 kern/subr_sleepqueue.c:232 (spin mutex:sleepq chain) 1 34126 2 30415 1 0 2 1 kern/kern_clock.c:224 (spin mutex:sched lock 1) 2 34069 5 30415 1 0 3 2 kern/kern_clock.c:224 (spin mutex:sched lock 0) 1 34067 0 30415 1 0 0 0 kern/kern_clock.c:264 (spin mutex:callout) 54 23629 1 4122 5 0 6 1 kern/uipc_socket.c:2468 (sleep mutex:so_snd) 34 22599 0 9232 2 0 0 0 amd64/amd64/pmap.c:935 (sleep mutex:vm page queue mutex) 34 22573 0 9232 2 0 0 0 amd64/amd64/pmap.c:936 (sleep mutex:pmap) 121 20165 0 586 34 0 0 0 kern/uipc_sockbuf.c:145 (sx:so_snd_sx) 50 19229 38 9401 2 0 3 3 geom/geom_io.c:68 (sleep mutex:bio queue) 117 16281 0 586 27 0 17 0 netinet/tcp_usrreq.c:781 (sleep mutex:inp) 50 14954 245 11968 1 0 15 62 kern/kern_conf.c:69 (sleep mutex:cdev) 52 13599 0 4122 3 0 0 0 kern/uipc_socket.c:2469 (sleep mutex:so_rcv) 227 11779 0 625 18 0 1 0 cam/cam_xpt.c:7153 (sleep mutex:arcmsr Q buffer lock) 40 11779 0 639 18 0 50 0 kern/kern_conf.c:372 (sleep mutex:Giant) 52 11201 219 3536 3 0 3 47 kern/kern_conf.c:384 (sleep mutex:Giant) 11 10888 0 1770 6 0 0 0 kern/subr_taskqueue.c:71 (spin mutex:fast_taskqueue) 41 10866 0 3041 3 0 0 0 netinet/tcp_subr.c:1426 (sleep mutex:isn_mtx) 50 9858 5 8243 1 0 1 3 kern/sys_generic.c:1079 (sleep mutex:sellck) 5 9545 7 3416 2 0 3 2 kern/sched_ule.c:1872 (spin mutex:sched lock 0) 2 9029 677 7356 1 0 103 319 kern/kern_timeout.c:419 (spin mutex:callout) 52 8916 0 1813 4 0 1 0 kern/sys_generic.c:759 (sleep mutex:sellck) 222 8318 0 625 13 0 0 0 geom/geom_disk.c:198 (sleep mutex:g_disk_done) 1 7843 3 6407 1 0 3 2 kern/kern_timeout.c:296 (spin mutex:callout) 5 7240 17 5090 1 0 5 7 kern/kern_switch.c:182 (spin mutex:sched lock 0) 64 6940 58 1276 5 0 0 3 kern/kern_timeout.c:241 (sleep mutex:Giant) 5 6924 2880 1730 4 1 2 1715 amd64/amd64/mp_machdep.c:966 (spin mutex:sched lock 0) 13 6845 0 623 10 0 0 0 amd64/amd64/mp_machdep.c:845 (spin mutex:smp rendezvous) 53 6279 0 1766 3 0 0 0 kern/sys_pipe.c:1331 (sleep mutex:pipe mutex) 5 6063 93 4189 1 0 2 27 kern/kern_intr.c:1131 (spin mutex:sched lock 0) 15 5910 25 660 8 0 7 10 kern/sys_generic.c:1127 (sleep mutex:sellck) 56 5290 727 357 14 2 0 17 netinet/tcp_input.c:479 (sleep mutex:inp) 144 5264 0 116 45 0 0 0 vm/vm_map.c:3111 (sx:user map) 7 4971 25 1552 3 0 9 11 kern/sched_ule.c:1872 (spin mutex:sched lock 1) 11 4959 0 625 7 0 0 0 dev/arcmsr/arcmsr.c:1129 (sleep mutex:arcmsr Q buffer lock) 210 4901 56 3436 1 0 9 27 kern/sched_ule.c:979 (spin mutex:sched lock 0) 69 4899 0 357 13 0 0 0 netinet/tcp_input.c:400 (sleep mutex:tcp) 1 4698 0 4045 1 0 0 0 kern/kern_clock.c:412 (spin mutex:sched lock 1) 1 4697 0 4189 1 0 1 0 kern/kern_intr.c:706 (spin mutex:sched lock 0) 1 4696 0 4184 1 0 0 0 kern/kern_timeout.c:191 (spin mutex:callout) 45 4604 0 591 7 0 6 0 kern/vfs_bio.c:3790 (sleep mutex:bdone lock) 2 4550 3 4045 1 0 0 1 kern/kern_clock.c:412 (spin mutex:sched lock 0) 53 4384 0 1228 3 0 0 0 kern/sys_generic.c:750 (sleep mutex:sellck) 2 4189 3 3392 1 0 1 2 kern/subr_sleepqueue.c:331 (spin mutex:sched lock 0) 3695 4042 0 4 1010 0 0 0 kern/vfs_vnops.c:515 (lockmgr:ufs) 8 3891 0 645 6 0 0 0 kern/subr_sleepqueue.c:409 (spin mutex:sleepq chain) 10 3773 0 3103 1 0 1 0 kern/vfs_bio.c:3002 (sleep mutex:bdone lock) 55 3323 22 634 5 0 4 7 kern/kern_mutex.c:141 (sleep mutex:sellck) 50 3188 0 2469 1 0 0 0 kern/kern_descrip.c:2139 (sleep mutex:sleep mtxpool) 1853 3083 0 4 770 0 0 0 kern/vfs_subr.c:1665 (lockmgr:syncer) 6 3074 0 1310 2 0 0 0 kern/kern_lock.c:200 (sleep mutex:lockbuilder mtxpool) 10 3003 0 645 4 0 0 0 kern/subr_sleepqueue.c:390 (sleep mutex:process lock) 32 2991 0 2448 1 0 0 0 kern/kern_descrip.c:2007 (sleep mutex:sleep mtxpool) 50 2852 0 2401 1 0 0 0 kern/kern_sig.c:996 (sleep mutex:process lock) 94 2458 0 589 4 0 0 0 dev/em/if_em.c:980 (sleep mutex:em0) 54 2358 6 592 3 0 0 3 kern/kern_conf.c:360 (sleep mutex:Giant) 1 2343 35 1547 1 0 20 20 kern/subr_sleepqueue.c:331 (spin mutex:sched lock 1) 2 2272 98 1540 1 0 238 39 kern/sched_ule.c:979 (spin mutex:sched lock 1) 1 2214 0 1897 1 0 0 0 kern/subr_sleepqueue.c:595 (spin mutex:sched lock 0) 1 2194 4 1329 1 0 2 2 kern/kern_timeout.c:500 (spin mutex:callout) 1 2189 33 1526 1 0 0 4 kern/subr_sleepqueue.c:800 (spin mutex:sleepq chain) 5 2160 1959 1173 1 1 0 1168 amd64/amd64/mp_machdep.c:966 (spin mutex:sched lock 1) 1 2158 1 1660 1 0 0 1 kern/sys_generic.c:753 (spin mutex:sched lock 0) 8 1972 0 248 7 0 0 0 kern/kern_timeout.c:241 (sleep mutex:process lock) 6 1968 11 1245 1 0 2 5 kern/kern_intr.c:1131 (spin mutex:sched lock 1) ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?002201c86499$7861ac20$b6db87d4>