From owner-freebsd-stable@FreeBSD.ORG Tue Dec 12 19:16:55 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E0F5C16A4A0 for ; Tue, 12 Dec 2006 19:16:55 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from ug-out-1314.google.com (ug-out-1314.google.com [66.249.92.171]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8A11544040 for ; Tue, 12 Dec 2006 19:05:03 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by ug-out-1314.google.com with SMTP id o2so1586075uge for ; Tue, 12 Dec 2006 11:06:13 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=qYtUiqK/k4tdf4aA2RMPta6D1hdpByUpQ7IQb+liq7vRyedMjGMA3R5E1GnsyZit6WNU29u4AJCIobLqXuA4CNdZAtNA07F9+J7btk/DEtGwiD1DK8f9u3Zy6USb1U16WPnDpTyNkZrQUfF/1tX91dazZdHiUF9ldnjb0sbw4F8= Received: by 10.82.153.5 with SMTP id a5mr795689bue.1165950372377; Tue, 12 Dec 2006 11:06:12 -0800 (PST) Received: by 10.82.178.4 with HTTP; Tue, 12 Dec 2006 11:06:12 -0800 (PST) Message-ID: <3bbf2fe10612121106n76244f60k3b6ed4c031e45057@mail.gmail.com> Date: Tue, 12 Dec 2006 20:06:12 +0100 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Suleiman Souhlal" In-Reply-To: <457EF835.2020705@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <456950AF.3090308@sh.cvut.cz> <20061127092146.GA69556@deviant.kiev.zoral.com.ua> <457E6C06.1020405@FreeBSD.org> <20061212101903.GF311@deviant.kiev.zoral.com.ua> <3bbf2fe10612120643n6b4ad850oc857be46c48505cc@mail.gmail.com> <457EF835.2020705@FreeBSD.org> X-Google-Sender-Auth: 33cab782b2599903 Cc: freebsd-stable@freebsd.org, John Baldwin , freebsd-current@freebsd.org, bde@freebsd.org, Kostik Belousov , tegge@freebsd.org Subject: Re: kqueue LOR X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2006 19:16:56 -0000 2006/12/12, Suleiman Souhlal : > Attilio Rao wrote: > > 2006/12/12, Kostik Belousov : > > > >> On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: > >> > Kostik Belousov wrote: > >> > >On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman wrote: > >> > > > >> > >>Hi, > >> > >>the attached lor.txt contains LOR I got this yesterday. It is > >> FreeBSD 6.1 > >> > >>with relatively recent kernel, from last week or so. > >> > >> > >> > >>-- > >> > >>VH > >> > > > >> > > > >> > >>+lock order reversal: > >> > >>+ 1st 0xc537f300 kqueue (kqueue) @ > >> /usr/src/sys/kern/kern_event.c:1547 > >> > >>+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ > >> > >>/usr/src/sys/ufs/ufs/ufs_vnops.c:138 > >> > >>+KDB: stack backtrace: > >> > >>+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) at > >> > >>kdb_backtrace+0x2f > >> > >>+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at > >> > >>witness_checkorder+0x5fe > >> > >>+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at > >> > >>_mtx_lock_flags+0x32 > >> > >>+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) at > >> > >>ufs_itimes+0x6c > >> > >>+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at > >> > >>ufs_getattr+0x20 > >> > >>+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at > >> > >>VOP_GETATTR_APV+0x3a > >> > >>+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at filt_vfsread+0x75 > >> > >>+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 > >> > >>+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) at > >> > >>VOP_WRITE_APV+0x148 > >> > >>+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at > >> vn_write+0x201 > >> > >>+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,ffffffff,...) at > >> > >>dofilewrite+0x84 > >> > >>+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at kern_writev+0x65 > >> > >>+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f > >> > >>+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 > >> > >>+Xint0x80_syscall() at Xint0x80_syscall+0x1f > >> > >>+--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp = > >> > >>0xbfbfea1c, ebp = 0xbfbfea48 --- > >> > > > >> > > > >> > >Thank you for the report. The LOR is caused by my commit into > >> > >sys/ufs/ufs/ufs_vnops.c, rev. 1.280. > >> > > >> > Is the mount lock really required, if all we're doing is a single > >> read of a > >> > single word (mnt_kern_flags) (v_mount should be read-only for the whole > >> > lifetime of the vnode, I believe)? After all, reads of a single word > >> are > >> > atomic on all our supported architectures. > >> > The only situation I see where there MIGHT be problems are forced > >> unmounts, > >> > but I think there are bigger issues with those. > >> > Sorry for noticing this email only now. > >> > >> The problem is real with snapshotting. Ignoring > >> MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale value of > >> mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs vnode > >> inactivation time. This was the big trouble with nfsd and snapshots. As > >> such, I think that precise value of mmnt_kern_flag is critical there, > >> and mount interlock is needed. > > > > > > This can be avoided using a memory barrier when setting flags. > > Even if memory barriers usage is not encouraged, some critical code > > should really use them replacing a mutex semantic (if that worths it). > > Why is memory barrier usage not encouraged? As you said, they can be used to reduce the number of atomic (LOCKed) operations, in some cases. Beacause they can lead to "errors" as it is not so straightforward to understand when a memory barrier is needed more than an atomic instruction and so on (even if it doesn't value, for example, for ia32, for other architectures memory barriers could be more expensive than the atomic instruction, without counting a possible error). > FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), mb() (load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers only needed on SMP), and barrier() (GCC barrier (__asm __volatile (:::"memory")) macros that I've personally found very useful. I think that our memory barriers reflect the usage we do into the kernel as the base for building syncronizing primitives. From this point of view our atomic operations (meant into the wider possible sense, man 9 atomic) are more suitable than having something like Linux's smp_*(). Attilio -- Peace can only be achieved by understanding - A. Einstein