From owner-freebsd-current@FreeBSD.ORG Tue Dec 12 20:06:41 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4DBBB16A510; Tue, 12 Dec 2006 20:06:41 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 70A2A4466D; Tue, 12 Dec 2006 19:33:50 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id kBCJZBKe020922; Tue, 12 Dec 2006 14:35:11 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: Suleiman Souhlal Date: Tue, 12 Dec 2006 14:12:12 -0500 User-Agent: KMail/1.9.1 References: <456950AF.3090308@sh.cvut.cz> <3bbf2fe10612120643n6b4ad850oc857be46c48505cc@mail.gmail.com> <457EF835.2020705@FreeBSD.org> In-Reply-To: <457EF835.2020705@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200612121412.13551.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Tue, 12 Dec 2006 14:35:11 -0500 (EST) X-Virus-Scanned: ClamAV 0.88.3/2318/Tue Dec 12 11:58:25 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx X-Mailman-Approved-At: Tue, 12 Dec 2006 22:31:50 +0000 Cc: freebsd-stable@freebsd.org, Attilio Rao , freebsd-current@freebsd.org, bde@freebsd.org, Kostik Belousov , tegge@freebsd.org Subject: Re: kqueue LOR X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2006 20:06:41 -0000 On Tuesday 12 December 2006 13:43, Suleiman Souhlal wrote: > Attilio Rao wrote: > > 2006/12/12, Kostik Belousov : > > > >> On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: > >> > Kostik Belousov wrote: > >> > >On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: > >> > > > >> > >>Hi, > >> > >>the attached lor.txt contains LOR I got this yesterday. It is > >> FreeBSD 6.1 > >> > >>with relatively recent kernel, from last week or so. > >> > >> > >> > >>-- > >> > >>VH > >> > > > >> > > > >> > >>+lock order reversal: > >> > >>+ 1st 0xc537f300 kqueue (kqueue) @ > >> /usr/src/sys/kern/kern_event.c:1547 > >> > >>+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ > >> > >>/usr/src/sys/ufs/ufs/ufs_vnops.c:138 > >> > >>+KDB: stack backtrace: > >> > >>+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at > >> > >>kdb_backtrace+0x2f > >> > >>+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at > >> > >>witness_checkorder+0x5fe > >> > >>+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at > >> > >>_mtx_lock_flags+0x32 > >> > >>+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at > >> > >>ufs_itimes+0x6c > >> > >>+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at > >> > >>ufs_getattr+0x20 > >> > >>+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at > >> > >>VOP_GETATTR_APV+0x3a > >> > >>+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 > >> > >>+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 > >> > >>+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at > >> > >>VOP_WRITE_APV+0x148 > >> > >>+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at > >> vn_write+0x201 > >> > >>+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,ffffffff,...) at > >> > >>dofilewrite+0x84 > >> > >>+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 > >> > >>+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f > >> > >>+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 > >> > >>+Xint0x80_syscall() at Xint0x80_syscall+0x1f > >> > >>+--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = > >> > >>0xbfbfea1c, ebp = 0xbfbfea48 --- > >> > > > >> > > > >> > >Thank you for the report. The LOR is caused by my commit into > >> > >sys/ufs/ufs/ufs_vnops.c, rev. 1.280. > >> > > >> > Is the mount lock really required, if all we're doing is a single > >> read of a > >> > single word (mnt_kern_flags) (v_mount should be read-only for the whole > >> > lifetime of the vnode, I believe)? After all, reads of a single word > >> are > >> > atomic on all our supported architectures. > >> > The only situation I see where there MIGHT be problems are forced > >> unmounts, > >> > but I think there are bigger issues with those. > >> > Sorry for noticing this email only now. > >> > >> The problem is real with snapshotting. Ignoring > >> MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of > >> mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode > >> inactivation time. This was the big trouble with nfsd and snapshots. As > >> such, I think that precise value of mmnt_kern_flag is critical there, > >> and mount interlock is needed. > > > > > > This can be avoided using a memory barrier when setting flags. > > Even if memory barriers usage is not encouraged, some critical code > > should really use them replacing a mutex semantic (if that worths it). > > Why is memory barrier usage not encouraged? As you said, they can be used to reduce the number of atomic (LOCKed) operations, in some cases. > > FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() (load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only needed on SMP), and barrier() (GCC barrier (__asm __volatile (:::"memory")) macros that I've personally found very useful. > Admittedly, they are harder to use than atomic operations, but it might still worth having something similar. Memory barriers just specify ordering, they don't ensure a cache flush so another CPU reads up to date values. You can use memory barriers in conjunction with atomic operations on a variable to ensure that you can safely read other variables (which is what locks do). For example, in this case IIUC, you have a race that is because there is shared state between two fields, one in the mount structure, and one in the ufs i-node. Memory barriers alone won't prevent you from operating on those flags non-consistently. That is, you have two memory locations in play here, and atomic ops only work on a single one. There isn't an atomic op to do "read from memory location A, check flag B, and if it's true write C to memory location D". Where would you put the membar in this case to ensure that the action always results in consistent behavior? -- John Baldwin