From owner-freebsd-current Wed Apr 21 13:23:25 1999 Delivered-To: freebsd-current@freebsd.org Received: from granite.sentex.net (granite.sentex.ca [199.212.134.1]) by hub.freebsd.org (Postfix) with ESMTP id A893D1588B for ; Wed, 21 Apr 1999 13:22:02 -0700 (PDT) (envelope-from mike@sentex.net) Received: from simoeon (simeon.sentex.ca [209.112.4.47]) by granite.sentex.net (8.8.8/8.6.9) with SMTP id QAA26690; Wed, 21 Apr 1999 16:19:23 -0400 (EDT) Message-Id: <3.0.5.32.19990421161837.01beabf0@staff.sentex.ca> X-Sender: mdtpop@staff.sentex.ca X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32) Date: Wed, 21 Apr 1999 16:18:37 -0400 To: Matthew Dillon From: Mike Tancsa Subject: Re: solid NFS patch #6 avail for -current - need testers files) Cc: current@FreeBSD.ORG In-Reply-To: <199904212009.NAA08036@apollo.backplane.com> References: <199904211751.NAA31922@misha.cisco.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG At 01:09 PM 4/21/99 -0700, Matthew Dillon wrote: >:Speaking of, when can we expect to see this wonderfull _stability_ >:improvement in -stable? I'm setting up a server here, and would >:rather have fixed NFS code in it... Yet, jumping to -current is >:officially wrong... Thanks! >: >: -mi > > Well, you already see a lot of the pure bug fixes being backported. > What you don't see in -stable are the bug fixes that also depend on the > rewritten portions of the system, nor do you see the rewritten portions > of the system themselves. The latest NFS patch is borderline -- it > would be possible to backport in time enough for the 3.2 deadline, but > it wouldn't be fun. Hi, Just wondering if these changes also have the side effect of fixing the nmap problem that crashes 3.x boxes ? i.e. as you wrote back on 3/4/99 > The problem is a deadlock caused by the fgrep. The fgrep is mmap()ing > the file, but then it does some really weird crap when dealing with > larger files. > > It's the most idiotic code I've ever seen. The code uses a PRIVATE+RW > mmap() until it gets to odd point in the file, at which point it read()'s > additional information from the file into the mapped space ( that might > contain a previous mmap'd portion of the file ). > > So what happens is this: > > * read() call > * shared lock obtained on vnode > > ( some other process attempts to get shared lock on vnode and > succeeds... for example, a namei operation is attempted by > another grep ) > > * access MMAP'd area > * exclusive lock attempt obtained on same vnode. This blocks because > some other process has a shared lock on the vnode. > > ( the other process then attempts to get an exclusive lock on the > vnode this blocks. > > Deadlock. > > Even worse, the gnu grep does not bother munmap()'ing the space so, in > fact, the deadlock can occur between two unrelated files as well as with > the same file. This is the more likely deadlock scenario. > > The solution is more difficult. We could hack an exception for PRIVATE > mmap's... there really is no need for the vm_fault code to lock the vnode. > Howver, other situations can occur where this hack would not work. > > This is 'kinda a known problem' in FreeBSD. We really need to find a > solution to it. Other similar deadlocks can occur if you mmap() one file > and read() or write() data from it to another file, and vise versa at > the same time. > > Personally, I think the only real solution is to make vn_read() and > vn_write() lock the uio space as well as the vnode being read or > written. It would have to do it in the right order, and it would have > to deal with the situation where the uio space covers multiple vnodes. > > Alternately, vnodes need to be redesigned without these fraggin > all-encompassing locks for data R+W ops. > > -Matt > >(kgdb) back >#0 mi_switch () at ../../kern/kern_synch.c:827 >#1 0xf0151919 in tsleep (ident=0xf0a2e500, priority=0x8, wmesg=0xf0263a9c "inode", timo=0x0) > at ../../kern/kern_synch.c:443 >#2 0xf014b774 in acquire (lkp=0xf0a2e500, extflags=0x1000040, wanted=0x700) at ../../kern/kern_lock.c:145 >#3 0xf014b835 in lockmgr (lkp=0xf0a2e500, flags=0x1030041, interlkp=0xf51719b0, p=0xf5151200) > at ../../kern/kern_lock.c:209 >#4 0xf0171df0 in vop_stdlock (ap=0xf5176b94) at ../../kern/vfs_default.c:210 >#5 0xf0204b39 in ufs_vnoperate (ap=0xf5176b94) at ../../ufs/ufs/ufs_vnops.c:2309 >#6 0xf017aba4 in vn_lock (vp=0xf5171940, flags=0x1030041, p=0xf5151200) at vnode_if.h:811 >#7 0xf01747b0 in vget (vp=0xf5171940, flags=0x1020041, p=0xf5151200) at ../../kern/vfs_subr.c:1348 >#8 0xf0212f1e in vnode_pager_lock (object=0xf02af2b4) at ../../vm/vnode_pager.c:960 >#9 0xf0206a56 in vm_fault (map=0xf51497c0, vaddr=0x805d000, fault_type=0x3, fault_flags=0x8) > at ../../vm/vm_fault.c:243 >#10 0xf022aebe in trap_pfault (frame=0xf5176d14, usermode=0x0, eva=0x805d038) at ../../i386/i386/trap.c:816 >#11 0xf022ab92 in trap (frame={tf_es = 0x10, tf_ds = 0x10, tf_edi = 0x805d000, tf_esi = 0xf223e000, > tf_ebp = 0xf5176e0c, tf_isp = 0xf5176d3c, tf_ebx = 0x1060, tf_edx = 0x0, tf_ecx = 0x700, tf_eax = 0x0, > tf_trapno = 0xc, tf_err = 0x3, tf_eip = 0xf0229ce4, tf_cs = 0x8, tf_eflags = 0x10287, tf_esp = 0xffff1272, > tf_ss = 0xffff0000}) at ../../i386/i386/trap.c:437 >#12 0xf0229ce4 in fastmove_loop () >#13 0xf0229b2b in i586_copyout () >#14 0xf01fd707 in ffs_read (ap=0xf5176ef0) at ../../ufs/ufs/ufs_readwrite.c:289 >#15 0xf017a689 in vn_read (fp=0xf09dda80, uio=0xf5176f38, cred=0xf09ef480) at vnode_if.h:303 >#16 0xf015a757 in read (p=0xf5151200, uap=0xf5176f94) at ../../kern/sys_generic.c:121 >#17 0xf022b4a0 in syscall (frame={tf_es = 0x2f, tf_ds = 0x2f, tf_edi = 0x0, tf_esi = 0x8000, tf_ebp = 0xefbf85c8, > tf_isp = 0xf5176fe4, tf_ebx = 0xffffffff, tf_edx = 0x8059000, tf_ecx = 0xa0000, tf_eax = 0x3, > tf_trapno = 0xc, tf_err = 0x2, tf_eip = 0x280b303c, tf_cs = 0x1f, tf_eflags = 0x206, tf_esp = 0xefbf85a0, > tf_ss = 0x2f}) at ../../i386/i386/trap.c:1100 >#18 0xf0220dcc in Xint0x80_syscall () >#19 0x804dc46 in ?? () >#20 0x804e85c in ?? () >#21 0x8048f7d in ?? () >(kgdb) > > > ------------------------------------------------------------------------ Mike Tancsa, tel 01.519.651.3400 Network Administrator, mike@sentex.net Sentex Communications www.sentex.net Cambridge, Ontario Canada To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message