From owner-freebsd-current@FreeBSD.ORG Tue Dec 12 23:34:10 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D269D16A503 for ; Tue, 12 Dec 2006 23:34:10 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from nf-out-0910.google.com (nf-out-0910.google.com [64.233.182.189]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6979D43FBB for ; Tue, 12 Dec 2006 23:15:29 +0000 (GMT) (envelope-from asmrookie@gmail.com) Received: by nf-out-0910.google.com with SMTP id x37so343083nfc for ; Tue, 12 Dec 2006 15:16:53 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references:x-google-sender-auth; b=m4nvQEs0BMuK3xuODQWFZPig/0NVy7yITm0HKy4aiir416wNJ2Oe0qaCKDwZKCHTp5CI5VmvRLlky9VW4ZqspNree1BB6vH6LjD4heNYsis4s3JdMqywUP5D8aivS1uPAwVG/NTWDgT4qVxPJ1z8fSdlo7IKepy5ODNHtdF90/g= Received: by 10.82.120.14 with SMTP id s14mr25580buc.1165965413250; Tue, 12 Dec 2006 15:16:53 -0800 (PST) Received: by 10.82.178.4 with HTTP; Tue, 12 Dec 2006 15:16:53 -0800 (PST) Message-ID: <3bbf2fe10612121516r78426f9bp95401e33c8871fea@mail.gmail.com> Date: Wed, 13 Dec 2006 00:16:53 +0100 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "John Baldwin" In-Reply-To: <3bbf2fe10612121515pacc3c43u501efd5d5fd1a2bd@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <456950AF.3090308@sh.cvut.cz> <200612121614.18502.jhb@freebsd.org> <3bbf2fe10612121334n12459019sb657a226615fc4b6@mail.gmail.com> <200612121745.20915.jhb@freebsd.org> <3bbf2fe10612121515pacc3c43u501efd5d5fd1a2bd@mail.gmail.com> X-Google-Sender-Auth: fcfcbd8465ea8cf0 Cc: Kostik Belousov , Suleiman Souhlal , freebsd-current@freebsd.org, tegge@freebsd.org, bde@freebsd.org Subject: Re: kqueue LOR X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2006 23:34:11 -0000 2006/12/13, Attilio Rao : > 2006/12/12, John Baldwin : > > On Tuesday 12 December 2006 16:34, Attilio Rao wrote: > > > 2006/12/12, John Baldwin : > > > > On Tuesday 12 December 2006 15:13, Attilio Rao wrote: > > > > > 2006/12/12, John Baldwin : > > > > > > On Tuesday 12 December 2006 13:43, Suleiman Souhlal wrote: > > > > > > > Attilio Rao wrote: > > > > > > > > 2006/12/12, Kostik Belousov : > > > > > > > > > > > > > > > >> On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote: > > > > > > > >> > Kostik Belousov wrote: > > > > > > > >> > >On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman > > wrote: > > > > > > > >> > > > > > > > > > >> > >>Hi, > > > > > > > >> > >>the attached lor.txt contains LOR I got this yesterday. It is > > > > > > > >> FreeBSD 6.1 > > > > > > > >> > >>with relatively recent kernel, from last week or so. > > > > > > > >> > >> > > > > > > > >> > >>-- > > > > > > > >> > >>VH > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > >>+lock order reversal: > > > > > > > >> > >>+ 1st 0xc537f300 kqueue (kqueue) @ > > > > > > > >> /usr/src/sys/kern/kern_event.c:1547 > > > > > > > >> > >>+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @ > > > > > > > >> > >>/usr/src/sys/ufs/ufs/ufs_vnops.c:138 > > > > > > > >> > >>+KDB: stack backtrace: > > > > > > > >> > > > >>+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...) > > > > at > > > > > > > >> > >>kdb_backtrace+0x2f > > > > > > > >> > >>+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at > > > > > > > >> > >>witness_checkorder+0x5fe > > > > > > > >> > >>+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at > > > > > > > >> > >>_mtx_lock_flags+0x32 > > > > > > > >> > >>+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...) > > at > > > > > > > >> > >>ufs_itimes+0x6c > > > > > > > >> > > > >>+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at > > > > > > > >> > >>ufs_getattr+0x20 > > > > > > > >> > > > > > >>+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at > > > > > > > >> > >>VOP_GETATTR_APV+0x3a > > > > > > > >> > >>+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at > > filt_vfsread+0x75 > > > > > > > >> > >>+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75 > > > > > > > >> > >>+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...) > > at > > > > > > > >> > >>VOP_WRITE_APV+0x148 > > > > > > > >> > >>+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at > > > > > > > >> vn_write+0x201 > > > > > > > >> > >>+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,ffffffff,...) at > > > > > > > >> > >>dofilewrite+0x84 > > > > > > > >> > >>+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at > > > > kern_writev+0x65 > > > > > > > >> > >>+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f > > > > > > > >> > >>+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295 > > > > > > > >> > >>+Xint0x80_syscall() at Xint0x80_syscall+0x1f > > > > > > > >> > >>+--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp > > = > > > > > > > >> > >>0xbfbfea1c, ebp = 0xbfbfea48 --- > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > >Thank you for the report. The LOR is caused by my commit into > > > > > > > >> > >sys/ufs/ufs/ufs_vnops.c, rev. 1.280. > > > > > > > >> > > > > > > > > >> > Is the mount lock really required, if all we're doing is a > > single > > > > > > > >> read of a > > > > > > > >> > single word (mnt_kern_flags) (v_mount should be read-only for > > the > > > > whole > > > > > > > >> > lifetime of the vnode, I believe)? After all, reads of a single > > > > word > > > > > > > >> are > > > > > > > >> > atomic on all our supported architectures. > > > > > > > >> > The only situation I see where there MIGHT be problems are > > forced > > > > > > > >> unmounts, > > > > > > > >> > but I think there are bigger issues with those. > > > > > > > >> > Sorry for noticing this email only now. > > > > > > > >> > > > > > > > >> The problem is real with snapshotting. Ignoring > > > > > > > >> MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale > > value > > > > of > > > > > > > >> mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs > > vnode > > > > > > > >> inactivation time. This was the big trouble with nfsd and > > snapshots. > > > > As > > > > > > > >> such, I think that precise value of mmnt_kern_flag is critical > > there, > > > > > > > >> and mount interlock is needed. > > > > > > > > > > > > > > > > > > > > > > > > This can be avoided using a memory barrier when setting flags. > > > > > > > > Even if memory barriers usage is not encouraged, some critical > > code > > > > > > > > should really use them replacing a mutex semantic (if that worths > > it). > > > > > > > > > > > > > > Why is memory barrier usage not encouraged? As you said, they can be > > > > used to > > > > > > reduce the number of atomic (LOCKed) operations, in some cases. > > > > > > > > > > > > > > FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier), > > > > mb() > > > > > > (load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers > > > > only > > > > > > needed on SMP), and barrier() (GCC barrier (__asm __volatile > > > > (:::"memory")) > > > > > > macros that I've personally found very useful. > > > > > > > Admittedly, they are harder to use than atomic operations, but it > > might > > > > > > still worth having something similar. > > > > > > > > > > > > Memory barriers just specify ordering, they don't ensure a cache flush > > so > > > > > > another CPU reads up to date values. You can use memory barriers in > > > > > > conjunction with atomic operations on a variable to ensure that you > > can > > > > > > safely read other variables (which is what locks do). For example, in > > > > this > > > > > > case IIUC, you have a race that is because there is shared state > > between > > > > two > > > > > > fields, one in the mount structure, and one in the ufs i-node. Memory > > > > > > barriers alone won't prevent you from operating on those flags > > > > > > non-consistently. That is, you have two memory locations in play > > here, > > > > and > > > > > > atomic ops only work on a single one. There isn't an atomic op to > > > > do "read > > > > > > from memory location A, check flag B, and if it's true write C to > > memory > > > > > > location D". Where would you put the membar in this case to ensure > > that > > > > the > > > > > > action always results in consistent behavior? > > > > > > > > > > Ah no, I misunderstood the problem. > > > > > I was thinking it was about setting the flag A into location B if C is > > > > > not present. > > > > > > > > That's the same problem, setting a flag in one word if a flag is set (or > > not > > > > set) in another word. Either way though you are dealing with two > > different > > > > memory locations that need to be consistent with respect to one another. > > > > > > No, I meant something like: > > > > > > if (!(flag & FLAG1)) > > > flag |= FLAG2; > > > > > > which can be done nicely with a memory barrier. > > > I didn't catch the problem was about 2 different locations...sorry. > > > > Ah. Even that can't be solved by membars alone. What if you have: > > > > if (!(flag & FLAG1)) > > flag |= FLAG2; > > > > in one thread and in another: > > > > if (!(flag & FLAG2)) > > flag |= FLAG1; > > > > even if you put membar's in place, you can still get breakage (imagine flag > > being 0 initially and getting preempted in between test and set). For this > > case you could use atomic_cmpset() in a loop without membars: > > > > do { > > x = flag; > > if (x & FLAG1) > > break; > > } while (!atomic_cmpset_int(&flag, x, x | FLAG2)); > > > > but often you have several variables linked together that need to be in a > > consistent state relative to each other, so you need to use a lock instead > > anyways. > > yes, this is exactly what I had in mind (and that we do with > mutex/rwlocks coding code). s/coding/locking Attilio -- Peace can only be achieved by understanding - A. Einstein