Date: Mon, 15 Feb 2016 20:49:56 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Giuseppe Lettieri <g.lettieri@iet.unipi.it> Cc: Luigi Rizzo <rizzo@iet.unipi.it>, Adrian Chadd <adrian.chadd@gmail.com>, "stable@freebsd.org" <stable@freebsd.org> Subject: Re: 82576 + NETMAP + VLAN Message-ID: <20160215174956.GD68298@zxy.spb.ru> In-Reply-To: <56C1F69C.5010004@iet.unipi.it> References: <CA%2BhQ2%2BiD3X9wR8exw2p-9G8pPNHCQtLdMdJJXU78PDrQaWBH7w@mail.gmail.com> <56B9E398.1060105@iet.unipi.it> <20160210115937.GA37895@zxy.spb.ru> <56BB3C20.600@iet.unipi.it> <20160210135318.GL68298@zxy.spb.ru> <56BC505F.7080309@iet.unipi.it> <20160211133428.GM68298@zxy.spb.ru> <56C1EA66.807@iet.unipi.it> <20160215151318.GQ68298@zxy.spb.ru> <56C1F69C.5010004@iet.unipi.it>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Feb 15, 2016 at 05:02:36PM +0100, Giuseppe Lettieri wrote: > Il 15/02/2016 16:13, Slawa Olhovchenkov ha scritto: > > On Mon, Feb 15, 2016 at 04:10:30PM +0100, Giuseppe Lettieri wrote: > > > >> Hi Slawa, > >> > >> I think WITNESS is seeing a false positive, since those two are always > >> different mutexes. > >> > >> The actual deadlock you experience should be caused by something else. I > > > > Are you sure? When deadlock occur I am see threads waiting on nm_kn_lock. > > The deadlock I mentioned still involves nm_kn_locks, sorry if I was not > clear about that. I am just saying that we never try to take the same > lock that we already holding. > > Nonetheless, there are indeed problems in the path that WITNESS has > seen. The problem is that pipes have to notify the other end while > called by kevent. kevent holds the nm_kn_lock on the TX src ring and the > notification takes the nm_kn_lock on the RX dst ring. Thanks for clarification. > > > >> have not been able to reproduce it locally (I have not tried that hard, > >> to be honest). I am pretty sure that there is a lock inversion - one > >> that may cause real deadlocks - when you use netmap pipes+kqueue and you > >> don't pass NETMAP_NO_TX_POLL at NIOCREGIF time. The attached patch > >> should solve this particular problem, but there may be others. May you > >> please try it? > > > > Try it with or w/o WITNESS? > > I am trying to see if the actual deadlock disappears, so disable WITNESS > if it slows down the system and masks the real deadlock. Otherwise, > leave it on. OK. With and w/o WITNESS I am currently don't see deadlock. Just for record, two LOR, may be already well-known: lock order reversal: 1st 0xfffffe0172c6fa78 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:3130 2nd 0xfffff8005ca81000 dirhash (dirhash) @ /usr/src/sys/ufs/ufs/ufs_dirhash.c:280 KDB: stack backtrace: #0 0xffffffff809702b0 at kdb_backtrace+0x60 #1 0xffffffff8098825e at witness_checkorder+0xc7e #2 0xffffffff8093e137 at _sx_xlock+0x47 #3 0xffffffff80b75d6a at ufsdirhash_add+0x3a #4 0xffffffff80b78b40 at ufs_direnter+0x6a0 #5 0xffffffff80b815ab at ufs_makeinode+0x56b #6 0xffffffff80b7d5dd at ufs_create+0x2d #7 0xffffffff80e33311 at VOP_CREATE_APV+0xa1 #8 0xffffffff809e2009 at vn_open_cred+0x3b9 #9 0xffffffff809db30f at kern_openat+0x26f #10 0xffffffff80d0e8a4 at amd64_syscall+0x2d4 #11 0xffffffff80cf4f5b at Xfast_syscall+0xfb lock order reversal: 1st 0xfffff80049138d50 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2415 2nd 0xfffffe0172cb1b80 bufwait (bufwait) @ /usr/src/sys/ufs/ffs/ffs_vnops.c:262 3rd 0xfffff800a6832d50 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2415 KDB: stack backtrace: #0 0xffffffff809702b0 at kdb_backtrace+0x60 #1 0xffffffff8098825e at witness_checkorder+0xc7e #2 0xffffffff80918dd8 at __lockmgr_args+0x738 #3 0xffffffff80b71594 at ffs_lock+0x84 #4 0xffffffff80e3512b at VOP_LOCK1_APV+0xab #5 0xffffffff809e28f3 at _vn_lock+0x43 #6 0xffffffff809d42fb at vget+0x5b #7 0xffffffff809c8c51 at vfs_hash_get+0xe1 #8 0xffffffff80b6d0a0 at ffs_vgetf+0x40 #9 0xffffffff80b64c50 at softdep_sync_buf+0x300 #10 0xffffffff80b72296 at ffs_syncvnode+0x226 #11 0xffffffff80b4b6b3 at ffs_truncate+0x683 #12 0xffffffff80b78c99 at ufs_direnter+0x7f9 #13 0xffffffff80b808eb at ufs_mkdir+0x86b #14 0xffffffff80e34987 at VOP_MKDIR_APV+0xa7 #15 0xffffffff809dfca9 at kern_mkdirat+0x209 #16 0xffffffff80d0e8a4 at amd64_syscall+0x2d4 #17 0xffffffff80cf4f5b at Xfast_syscall+0xfb > > > >> Cheers, > >> Giuseppe > >> > >> Il 11/02/2016 14:34, Slawa Olhovchenkov ha scritto: > >>> On Thu, Feb 11, 2016 at 10:11:59AM +0100, Giuseppe Lettieri wrote: > >>> > >>>> Il 10/02/2016 14:53, Slawa Olhovchenkov ha scritto: > >>>>> On Wed, Feb 10, 2016 at 02:33:20PM +0100, Giuseppe Lettieri wrote: > >>>>> > >>>>>> Il 10/02/2016 12:59, Slawa Olhovchenkov ha scritto: > >>>>>>> Can you look also on second issue? > >>>>>>> > >>>>>>> PS: What need from me? May be open PR? > >>>>>> > >>>>>> May you provide some example code that triggers the issue? > >>>>> > >>>>> This is about 700 lines of code (not very clear), may be I can describe it? > >>>> > >>>> I just need some code to trigger the problem locally. Don't worry about > >>>> the clarity and the line count, unless you cannot share the code for > >>>> other reasons. > >>> > >>> I am attach source. > >>> run as "prog if1 if2" > >>> Got `acquiring duplicate lock of same type: "nm_kn_lock"` immediatly > >>> after start. > >>> Dead locking may be occur immediatly after start or may be need > >>> traffic flooding. > >>> > >> > >> > >> -- > >> Dr. Ing. Giuseppe Lettieri > >> Dipartimento di Ingegneria della Informazione > >> Universita' di Pisa > >> Largo Lucio Lazzarino 1, 56122 Pisa - Italy > >> Ph. : (+39) 050-2217.649 (direct) .599 (switch) > >> Fax : (+39) 050-2217.600 > >> e-mail: g.lettieri@iet.unipi.it > > > >> Index: dev/netmap/netmap.c > >> =================================================================== > >> --- dev/netmap/netmap.c (revision 287671) > >> +++ dev/netmap/netmap.c (working copy) > >> @@ -2378,7 +2378,7 @@ > >> * XXX should also check cur != hwcur on the tx rings. > >> * Fortunately, normal tx mode has np_txpoll set. > >> */ > >> - if (priv->np_txpoll || want_tx) { > >> + if ((priv->np_txpoll && !is_kevent) || want_tx) { > >> /* > >> * The first round checks if anyone is ready, if not > >> * do a selrecord and another round to handle races. > > > > > -- > Dr. Ing. Giuseppe Lettieri > Dipartimento di Ingegneria della Informazione > Universita' di Pisa > Largo Lucio Lazzarino 1, 56122 Pisa - Italy > Ph. : (+39) 050-2217.649 (direct) .599 (switch) > Fax : (+39) 050-2217.600 > e-mail: g.lettieri@iet.unipi.it
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160215174956.GD68298>