Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Dec 2006 00:16:53 +0100
From:      "Attilio Rao" <attilio@freebsd.org>
To:        "John Baldwin" <jhb@freebsd.org>
Cc:        Kostik Belousov <kostikbel@gmail.com>, Suleiman Souhlal <ssouhlal@freebsd.org>, freebsd-current@freebsd.org, tegge@freebsd.org, bde@freebsd.org
Subject:   Re: kqueue LOR
Message-ID:  <3bbf2fe10612121516r78426f9bp95401e33c8871fea@mail.gmail.com>
In-Reply-To: <3bbf2fe10612121515pacc3c43u501efd5d5fd1a2bd@mail.gmail.com>
References:  <456950AF.3090308@sh.cvut.cz> <200612121614.18502.jhb@freebsd.org> <3bbf2fe10612121334n12459019sb657a226615fc4b6@mail.gmail.com> <200612121745.20915.jhb@freebsd.org> <3bbf2fe10612121515pacc3c43u501efd5d5fd1a2bd@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2006/12/13, Attilio Rao <attilio@freebsd.org>:
> 2006/12/12, John Baldwin <jhb@freebsd.org>:
> > On Tuesday 12 December 2006 16:34, Attilio Rao wrote:
> > > 2006/12/12, John Baldwin <jhb@freebsd.org>:
> > > > On Tuesday 12 December 2006 15:13, Attilio Rao wrote:
> > > > > 2006/12/12, John Baldwin <jhb@freebsd.org>:
> > > > > > On Tuesday 12 December 2006 13:43, Suleiman Souhlal wrote:
> > > > > > > Attilio Rao wrote:
> > > > > > > > 2006/12/12, Kostik Belousov <kostikbel@gmail.com>:
> > > > > > > >
> > > > > > > >> On Tue, Dec 12, 2006 at 12:44:54AM -0800, Suleiman Souhlal wrote:
> > > > > > > >> > Kostik Belousov wrote:
> > > > > > > >> > >On Sun, Nov 26, 2006 at 09:30:39AM +0100, V??clav Haisman
> > wrote:
> > > > > > > >> > >
> > > > > > > >> > >>Hi,
> > > > > > > >> > >>the attached lor.txt contains LOR I got this yesterday. It is
> > > > > > > >> FreeBSD 6.1
> > > > > > > >> > >>with relatively recent kernel, from last week or so.
> > > > > > > >> > >>
> > > > > > > >> > >>--
> > > > > > > >> > >>VH
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >>+lock order reversal:
> > > > > > > >> > >>+ 1st 0xc537f300 kqueue (kqueue) @
> > > > > > > >> /usr/src/sys/kern/kern_event.c:1547
> > > > > > > >> > >>+ 2nd 0xc45c22dc struct mount mtx (struct mount mtx) @
> > > > > > > >> > >>/usr/src/sys/ufs/ufs/ufs_vnops.c:138
> > > > > > > >> > >>+KDB: stack backtrace:
> > > > > > > >> >
> > >>+kdb_backtrace(c07f9879,c45c22dc,c07fd31c,c07fd31c,c080c7b2,...)
> > > > at
> > > > > > > >> > >>kdb_backtrace+0x2f
> > > > > > > >> > >>+witness_checkorder(c45c22dc,9,c080c7b2,8a,c07fc6bd,...) at
> > > > > > > >> > >>witness_checkorder+0x5fe
> > > > > > > >> > >>+_mtx_lock_flags(c45c22dc,0,c080c7b2,8a,e790ba20,...) at
> > > > > > > >> > >>_mtx_lock_flags+0x32
> > > > > > > >> > >>+ufs_itimes(c47a0dd0,c47a0e90,e790ba78,c060e1cc,c47a0dd0,...)
> > at
> > > > > > > >> > >>ufs_itimes+0x6c
> > > > > > > >> >
> > >>+ufs_getattr(e790ba54,e790baec,c0622af6,c0896f40,e790ba54,...) at
> > > > > > > >> > >>ufs_getattr+0x20
> > > > > > > >> >
> > > > >>+VOP_GETATTR_APV(c0896f40,e790ba54,c08a5760,c47a0dd0,e790ba74,...) at
> > > > > > > >> > >>VOP_GETATTR_APV+0x3a
> > > > > > > >> > >>+filt_vfsread(c4cf261c,6,c07f445e,60b,0,...) at
> > filt_vfsread+0x75
> > > > > > > >> > >>+knote(c4f57114,6,1,1f30c2af,1f30c2af,...) at knote+0x75
> > > > > > > >> > >>+VOP_WRITE_APV(c0896f40,e790bbec,c47a0dd0,227,e790bcb4,...)
> > at
> > > > > > > >> > >>VOP_WRITE_APV+0x148
> > > > > > > >> > >>+vn_write(c45d5120,e790bcb4,c5802a00,0,c4b73a80,...) at
> > > > > > > >> vn_write+0x201
> > > > > > > >> > >>+dofilewrite(c4b73a80,1b,c45d5120,e790bcb4,ffffffff,...) at
> > > > > > > >> > >>dofilewrite+0x84
> > > > > > > >> > >>+kern_writev(c4b73a80,1b,e790bcb4,8220c71,0,...) at
> > > > kern_writev+0x65
> > > > > > > >> > >>+write(c4b73a80,e790bd04,c,c07d899c,3,...) at write+0x4f
> > > > > > > >> > >>+syscall(3b,3b,bfbf003b,0,bfbfeae4,...) at syscall+0x295
> > > > > > > >> > >>+Xint0x80_syscall() at Xint0x80_syscall+0x1f
> > > > > > > >> > >>+--- syscall (4, FreeBSD ELF32, write), eip = 0x2831d727, esp
> > =
> > > > > > > >> > >>0xbfbfea1c, ebp = 0xbfbfea48 ---
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >Thank you for the report. The LOR is caused by my commit into
> > > > > > > >> > >sys/ufs/ufs/ufs_vnops.c, rev. 1.280.
> > > > > > > >> >
> > > > > > > >> > Is the mount lock really required, if all we're doing is a
> > single
> > > > > > > >> read of a
> > > > > > > >> > single word (mnt_kern_flags) (v_mount should be read-only for
> > the
> > > > whole
> > > > > > > >> > lifetime of the vnode, I believe)? After all, reads of a single
> > > > word
> > > > > > > >> are
> > > > > > > >> > atomic on all our supported architectures.
> > > > > > > >> > The only situation I see where there MIGHT be problems are
> > forced
> > > > > > > >> unmounts,
> > > > > > > >> > but I think there are bigger issues with those.
> > > > > > > >> > Sorry for noticing this email only now.
> > > > > > > >>
> > > > > > > >> The problem is real with snapshotting. Ignoring
> > > > > > > >> MNTK_SUSPEND/MNTK_SUSPENDED flags (in particular, reading stale
> > value
> > > > of
> > > > > > > >> mnt_kern_flag) while setting IN_MODIFIED caused deadlock at ufs
> > vnode
> > > > > > > >> inactivation time. This was the big trouble with nfsd and
> > snapshots.
> > > > As
> > > > > > > >> such, I think that precise value of mmnt_kern_flag is critical
> > there,
> > > > > > > >> and mount interlock is needed.
> > > > > > > >
> > > > > > > >
> > > > > > > > This can be avoided using a memory barrier when setting flags.
> > > > > > > > Even if memory barriers usage is not encouraged, some critical
> > code
> > > > > > > > should really use them replacing a mutex semantic (if that worths
> > it).
> > > > > > >
> > > > > > > Why is memory barrier usage not encouraged? As you said, they can be
> > > > used to
> > > > > > reduce the number of atomic (LOCKed) operations, in some cases.
> > > > > > >
> > > > > > > FWIW, Linux has rmb() (load mem barrier), wmb() (store mem barrier),
> > > > mb()
> > > > > > (load/store mem barrier), smp_rmb(), smp_wmb(), smp_mb() (mem barriers
> > > > only
> > > > > > needed on SMP), and barrier() (GCC barrier (__asm __volatile
> > > > (:::"memory"))
> > > > > > macros that I've personally found very useful.
> > > > > > > Admittedly, they are harder to use than atomic operations, but it
> > might
> > > > > > still worth having something similar.
> > > > > >
> > > > > > Memory barriers just specify ordering, they don't ensure a cache flush
> > so
> > > > > > another CPU reads up to date values.  You can use memory barriers in
> > > > > > conjunction with atomic operations on a variable to ensure that you
> > can
> > > > > > safely read other variables (which is what locks do).  For example, in
> > > > this
> > > > > > case IIUC, you have a race that is because there is shared state
> > between
> > > > two
> > > > > > fields, one in the mount structure, and one in the ufs i-node.  Memory
> > > > > > barriers alone won't prevent you from operating on those flags
> > > > > > non-consistently.  That is, you have two memory locations in play
> > here,
> > > > and
> > > > > > atomic ops only work on a single one.  There isn't an atomic op to
> > > > do "read
> > > > > > from memory location A, check flag B, and if it's true write C to
> > memory
> > > > > > location D".  Where would you put the membar in this case to ensure
> > that
> > > > the
> > > > > > action always results in consistent behavior?
> > > > >
> > > > > Ah no, I misunderstood the problem.
> > > > > I was thinking it was about setting the flag A into location B if C is
> > > > > not present.
> > > >
> > > > That's the same problem, setting a flag in one word if a flag is set (or
> > not
> > > > set) in another word.  Either way though you are dealing with two
> > different
> > > > memory locations that need to be consistent with respect to one another.
> > >
> > > No, I meant something like:
> > >
> > > if (!(flag & FLAG1))
> > >         flag |= FLAG2;
> > >
> > > which can be done nicely with a memory barrier.
> > > I didn't catch the problem was about 2 different locations...sorry.
> >
> > Ah.  Even that can't be solved by membars alone.  What if you have:
> >
> >         if (!(flag & FLAG1))
> >                 flag |= FLAG2;
> >
> > in one thread and in another:
> >
> >         if (!(flag & FLAG2))
> >                 flag |= FLAG1;
> >
> > even if you put membar's in place, you can still get breakage (imagine flag
> > being 0 initially and getting preempted in between test and set).  For this
> > case you could use atomic_cmpset() in a loop without membars:
> >
> >         do {
> >                 x = flag;
> >                 if (x & FLAG1)
> >                         break;
> >         } while (!atomic_cmpset_int(&flag, x, x | FLAG2));
> >
> > but often you have several variables linked together that need to be in a
> > consistent state relative to each other, so you need to use a lock instead
> > anyways.
>
> yes, this is exactly what I had in mind (and that we do with
> mutex/rwlocks coding code).

s/coding/locking

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3bbf2fe10612121516r78426f9bp95401e33c8871fea>