Date: Wed, 20 Feb 2002 00:13:18 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Kirk McKusick <mckusick@mckusick.com> Cc: Mike Silbersack <silby@silby.com>, Valentin Nechayev <netch@iv.nn.kiev.ua>, "David W. Chapman Jr." <dwcjr@inethouston.net>, <stable@FreeBSD.ORG> Subject: Softupdates failure during buffer syncing at shutdown (was Re: cvs commit: src/sys/ufs/ffs ffs_softdep.c) Message-ID: <200202200813.g1K8DIl85685@apollo.backplane.com> References: <20020211010801.K8897-100000@patrocles.silby.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Ok, I finally tracked down the buffers that 'syncing disks...' could not sync. They appear to be indirect blocks. Syncing disks... is counting them because they are exclusively locked by softupdates. All the buffers in question are locked by setup_allocindir_phase2() in ffs_softdep.c line 1698 (in stable). The buffers themselves are marked clean. For some reason, softupdates never releases its lock on these buffers, though it appears that it ought to have (ir_deplisthd is empty). I don't know why, so I am adding Kirk to the list. Kirk, I've included a gdb dump of one of the buffers and the item on its worklist. The problem occurs when you 'shutdown -r now' a machine immediately after doing something major to the filesystem. I was able to reproduce it on test1 by installing the kernel to /usr/fubar twice and doing a shutdown -r now immediately. It sometimes took two or three reboots before 'Syncing disks...' would fail on a number of buffers. i.e. it would say: syncing disks... 86 18 15 13 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 12 The buffers it is unable to sync are clean and exclusively locked by softupdates (-stable ffs_softdep.c line 1698). The lock is never released. I think there may be some kind of cleanup that is not getting executed by the 'syncing disks...' code's attempt to flush the buffers. -Matt Matthew Dillon <dillon@backplane.com> (kgdb) print &$8 $19 = (struct buf *) 0xcf456388 (kgdb) print $8 $15 = {b_hash = {le_next = 0x0, le_prev = 0xcf3abfc0}, b_vnbufs = { tqe_next = 0xcf467700, tqe_prev = 0xcf451448}, b_freelist = { tqe_next = 0xcf4564e0, tqe_prev = 0xc02f75d0}, b_act = {tqe_next = 0x0, tqe_prev = 0x0}, b_flags = 536870912, b_qindex = 0, b_xflags = 2 '\002', b_lock = {lk_interlock = {lock_data = 0}, lk_flags = 1024, lk_sharecount = 0, lk_waitcount = 0, lk_exclusivecount = 1, lk_prio = 20, lk_wmesg = 0xc02bcc50 "bufwait", lk_timo = 0, lk_lockholder = -2}, b_error = 0, b_bufsize = 8192, b_runningbufspace = 0, b_bcount = 8192, b_resid = 0, b_dev = 0xc2bfb300, b_data = 0xd2b25000 "\020O\037", b_kvabase = 0xd2b25000 "\020O\037", b_kvasize = 16384, b_lblkno = 3988624, b_blkno = 3988624, b_offset = 2042175488, b_iodone = 0, b_iodone_chain = 0x0, b_vp = 0xdc8fcb40, b_dirtyoff = 0, b_dirtyend = 0, b_rcred = 0x0, b_wcred = 0x0, b_pblkno = 0, b_saveaddr = 0x0, b_driver1 = 0x0, b_driver2 = 0x0, b_caller1 = 0x0, b_caller2 = 0x0, b_pager = {pg_spc = 0x0, pg_reqpage = 0}, b_cluster = {cluster_head = { tqh_first = 0xcf4564e0, tqh_last = 0xcf4562e4}, cluster_entry = { tqe_next = 0xcf4564e0, tqe_prev = 0xcf4562e4}}, b_pages = {0xc093d220, 0xc093785c, 0x0 <repeats 30 times>}, b_npages = 2, b_dep = { lh_first = 0xc2cf17a0}, b_chain = {parent = 0x0, count = 0}, b_olockholder = 519, b_ofile = 0xc02ca4ff "../../ufs/ffs/ffs_softdep.c", b_oline = 1698} (kgdb) print *$8.b_dep.lh_first $16 = {wk_list = {le_next = 0x0, le_prev = 0xcf4564c8}, wk_type = 5, wk_state = 33025} (kgdb) print (struct indirdep)$16 $18 = {ir_list = {wk_list = {le_next = 0x0, le_prev = 0xcf4564c8}, wk_type = 5, wk_state = 33025}, ir_saveddata = 0x0, ir_savebp = 0xcf456388, ir_donehd = {lh_first = 0x0}, ir_deplisthd = { lh_first = 0x0}} (kgdb) -Matt :> :Matt, can you reproduce the problem over by you? It seems that doing :> :anything disk intensive and then shutting down immediately will trigger :> :it. :> : :> :Mike "Silby" Silbersack :> :> Hmm. I will attempt to reproduce the problem. How much activity is :> 'significant' ? e.g. equivalent of an rm -rf /usr/ports or something :> smaller? Do the directories have to be deeply nested for the problem :> to occur? :> -Matt :> Matthew Dillon : :I was seeing the problem by just making a kernel (just a few files :changed with no config or clean steps), installing the kernel, and doing a :shutdown -r now. So, only a few files were active at most. The system in :question only has a /, /usr, and /var partition, if that matters. Only :/usr was mounted softupdates. : :Mike "Silby" Silbersack To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200202200813.g1K8DIl85685>