Date: Tue, 22 Jul 2008 20:05:40 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Attilio Rao <attilio@freebsd.org> Cc: freebsd-current@freebsd.org, Andrew Gallatin <gallatin@cs.duke.edu> Subject: Re: reproducible "panic: share->excl" Message-ID: <20080722170540.GA17123@deviant.kiev.zoral.com.ua> In-Reply-To: <3bbf2fe10807220954q60ee6747x40076e39884daf19@mail.gmail.com> References: <4884F992.7090008@cs.duke.edu> <20080722154825.GZ17123@deviant.kiev.zoral.com.ua> <3bbf2fe10807220954q60ee6747x40076e39884daf19@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--g3+pAoj2zJcoLeOT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 22, 2008 at 06:54:04PM +0200, Attilio Rao wrote: > 2008/7/22, Kostik Belousov <kostikbel@gmail.com>: > > On Mon, Jul 21, 2008 at 05:03:14PM -0400, Andrew Gallatin wrote: > > > I can panic today's -current reliably (or hang it with > > > WITNESS/INVARIENTS disabled). When it crashes, I see > > > the appended panic messages. > > > > > > It seems to be 100% reproducible on my box (AMD64 x2, > > > 512MB ram, UFS2). If anybody savvy in this area would > > > like to reproduce it, I've left the program at ~gallatin/ahunt.c > > > on freefall. Compile it, and run it as: > > > ./a.out -mmbfileinit -madvise=3D/var/tmp/zot -random -size=3D95536 > > > -touch=3D4096 -rewrite=3D2 > > > > > > > > > Cheers, > > > > > > Drew > > > > > > PS: Here is a serial console log from the panic: > > > > ... > > > > > > > login: shared lock of (lockmgr) ufs @ kern/vfs_subr.c:2044 > > > while exclusively locked from kern/vfs_vnops.c:593 > > > panic: share->excl > > > cpuid =3D 1 > > > KDB: enter: panic > > > [thread pid 1702 tid 100149 ] > > > Stopped at kdb_enter+0x3d: movq $0,0x639958(%rip) > > > db> tr > > > Tracing pid 1702 tid 100149 td 0xffffff000d08f000 > > > kdb_enter() at kdb_enter+0x3d > > > panic() at panic+0x176 > > > witness_checkorder() at witness_checkorder+0x137 > > > __lockmgr_args() at __lockmgr_args+0xc74 > > > ffs_lock() at ffs_lock+0x8c > > > VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b > > > _vn_lock() at _vn_lock+0x47 > > > vget() at vget+0x7b > > > vnode_pager_lock() at vnode_pager_lock+0x146 > > > vm_fault() at vm_fault+0x1e2 > > > trap_pfault() at trap_pfault+0x128 > > > trap() at trap+0x395 > > > calltrap() at calltrap+0x8 > > > --- trap 0xc, rip =3D 0xffffffff8079f2bd, rsp =3D 0xfffffffe58c2f7b0= , rbp =3D > > > 0xfffffffe58c2f830 --- > > > copyin() at copyin+0x3d > > > ffs_write() at ffs_write+0x2f8 > > > VOP_WRITE_APV() at VOP_WRITE_APV+0x10b > > > vn_write() at vn_write+0x23f > > > dofilewrite() at dofilewrite+0x85 > > > --More-- > > > > > > kern_writev() at kern_writev+0x60 > > > write() at write+0x54 > > > syscall() at syscall+0x1dd > > > Xfast_syscall() at Xfast_syscall+0xab > > > --- syscall (4, FreeBSD ELF64, write), rip =3D 0x8007296ec, rsp =3D > > > 0x7fffffffe158, rbp =3D 0x7fffffffe210 --- > > > db> show locks > > > exclusive sleep mutex vnode interlock r =3D 0 (0xffffff000d0dc0c0) l= ocked > > > @ vm/vnode_pager.c:1199 > > > exclusive sx user map r =3D 0 (0xffffff000d054360) locked @ vm/vm_ma= p.c:3115 > > > exclusive lockmgr bufwait r =3D 0 (0xfffffffe5047f278) locked @ > > > kern/vfs_bio.c:1783 > > > exclusive lockmgr ufs r =3D 0 (0xffffff000d0dc098) locked @ > > > kern/vfs_vnops.c:593 > > > db> > > > > > > Essentially, you tried to do the write of the part of the region mmaped > > from the file, to the file. The VOP_WRITE() is called with exclusively > > locked vnode, while fault handler tried to lock the vnode in shared mo= de > > to page in. > > > > The following change fixed it for me. > > Attilio, would it make sense to consider LK_CANRECURSE | LK_SHARED as > > a request for the exlusive lock when the current thread already hold t= he > > exclusive lock instead ? I think this would be a proper solution. >=20 > I don't like this kind of magics and ecoding in lockmgr. > I think that the better thing to do here is to recurse the exclusive > lock as you pass to vget(). It could be argued that lockmgr is a black magic in whole. On the other hand, I had to use VOP_ISLOCKED() and manually construct lock request while all needed information is at hands inside the lockmgr. Moreover, I believe that doing implicit shared->exclusive request upgrade in this situation (excl locked by curthread, LK_CANRECURSE present) is right. >=20 > Also note that without WITNESS the code will return EDEADLK in this > case while traditionally what would have happened is that the lockmgr > would have to be downgraded silently, but as you can expect this is a > very dangerous practice. Fully agree. --g3+pAoj2zJcoLeOT Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (FreeBSD) iEYEARECAAYFAkiGE2MACgkQC3+MBN1Mb4hygACeOSgFz4Qct1+dMcxRetwJJIIc gGYAn2O5wMApwEFRPhVDGoI1NeHsCHlx =du+u -----END PGP SIGNATURE----- --g3+pAoj2zJcoLeOT--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080722170540.GA17123>