Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Feb 2016 14:03:20 +0100
From:      Giuseppe Lettieri <g.lettieri@iet.unipi.it>
To:        Luigi Rizzo <rizzo@iet.unipi.it>, Slawa Olhovchenkov <slw@zxy.spb.ru>
Cc:        Adrian Chadd <adrian.chadd@gmail.com>, "stable@freebsd.org" <stable@freebsd.org>
Subject:   Re: 82576 + NETMAP + VLAN
Message-ID:  <56B9E398.1060105@iet.unipi.it>
In-Reply-To: <CA%2BhQ2%2BiD3X9wR8exw2p-9G8pPNHCQtLdMdJJXU78PDrQaWBH7w@mail.gmail.com>
References:  <CAJ-VmonO8ok=twgBGVMBiAs=AyRs4LUoDX6pGBtWStvndGKGzg@mail.gmail.com> <20151018210049.GT6469@zxy.spb.ru> <CAJ-Vmonfxz5vjVHqp6gS97mhnU10SLgohRA35O8MQLUzHvcsrw@mail.gmail.com> <20151022163519.GF6469@zxy.spb.ru> <CAJ-Vmok56uBJgJh4Bwr7yjNwsigU=ySBJ08H26caODAAxXNLRA@mail.gmail.com> <CA%2BhQ2%2Bg0ggpS%2BE5nOpON66efs7cwsed=NvaKa=mzsg6FycGhiQ@mail.gmail.com> <20160202204446.GQ88527@zxy.spb.ru> <20160204130029.GC88527@zxy.spb.ru> <CAJ-Vmok%2B7Vt4ww4iWkQY505eapxVQF4MBtnb%2BwGg-TNSmJTLGw@mail.gmail.com> <CAJ-VmomzvoZZZPUveTZUJ5zAhHQkJ5M9%2B7gfN8gGSGp05JpOWw@mail.gmail.com> <20160208173935.GK68298@zxy.spb.ru> <CA%2BhQ2%2BiD3X9wR8exw2p-9G8pPNHCQtLdMdJJXU78PDrQaWBH7w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi all,

I have only looked into the fist LOR, which has actually been there for 
a long time.

It should be triggered by the following paths:
1) application does an ioctl(, NIOCREGIF) (or GETINFO)
	->netmap_mem_finalize() [locks the netmap allocator]
		->contigmalloc() [locks things related to vm]
2) application mmap()s the netmap fd and then accesses the area
	-> page fault [locks things related to vm]
		->netmap_mem_ofstophys() [locks the netmap allocator]

As a quick check, the LOR disappears if I replace the contigmalloc() 
with a dummy operation returning a static buffer.

If this is correct, there cannot be any concurrency between the two 
paths, since the 1st one must be completed before the first mmap(). I 
also think that the vm objects locked in the two paths are not the same, 
but I don't know whether WITNESS keeps track of (some?) lock instances, 
or just lock types.

Cheers,
Giuseppe

Il 09/02/2016 13:31, Luigi Rizzo ha scritto:
> I am Cc-ing Giuseppe Lettieri who has looked at the problem and may
> have some comments to share
>
> cheers
> luigi
>
> On Mon, Feb 8, 2016 at 9:39 AM, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
>> On Thu, Feb 04, 2016 at 10:47:34AM -0800, Adrian Chadd wrote:
>>
>>> .. but if it does, can you enable witness and see what it reports as
>>> lock order violations?
>>
>> last STABLE:
>>
>> 1. first LOR (with poll, don't cause direct problems now):
>>
>> lock order reversal:
>>   1st 0xfffff800946e6700 vm object (vm object) @ /usr/src/sys/vm/vm_fault.c:363
>>   2nd 0xffffffff813e14d8 netmap memory allocator lock (netmap memory allocator lock) @ /usr/src/sys/dev/netmap/netmap_mem2.c:393
>> KDB: stack backtrace:
>> #0 0xffffffff80970320 at kdb_backtrace+0x60
>> #1 0xffffffff809882ce at witness_checkorder+0xc7e
>> #2 0xffffffff8091fcbc at __mtx_lock_flags+0x4c
>> #3 0xffffffff806784f6 at netmap_mem_ofstophys+0x36
>> #4 0xffffffff80676834 at netmap_dev_pager_fault+0x34
>> #5 0xffffffff80b81a0f at dev_pager_getpages+0x3f
>> #6 0xffffffff80b8cc1e at vm_fault_hold+0x86e
>> #7 0xffffffff80b8c367 at vm_fault+0x77
>> #8 0xffffffff80d0e2c9 at trap_pfault+0x199
>> #9 0xffffffff80d0db47 at trap+0x527
>> #10 0xffffffff80cf4ce2 at calltrap+0x8
>>
>> 2. kqueue issuse (not LOR!)
>>
>> acquiring duplicate lock of same type: "nm_kn_lock"
>>   1st nm_kn_lock @ /usr/src/sys/kern/kern_event.c:2003
>>   2nd nm_kn_lock @ /usr/src/sys/kern/kern_event.c:2003
>> KDB: stack backtrace:
>> #0 0xffffffff80970320 at kdb_backtrace+0x60
>> #1 0xffffffff809882ce at witness_checkorder+0xc7e
>> #2 0xffffffff8091fcbc at __mtx_lock_flags+0x4c
>> #3 0xffffffff808fd899 at knote+0x39
>> #4 0xffffffff8067636b at freebsd_selwakeup+0x8b
>> #5 0xffffffff80674eb5 at netmap_notify+0x55
>> #6 0xffffffff8067ccb6 at netmap_pipe_txsync+0x156
>> #7 0xffffffff80674740 at netmap_poll+0x400
>> #8 0xffffffff80676b8e at netmap_knrw+0x6e
>> #9 0xffffffff808fc57a at kqueue_register+0x64a
>> #10 0xffffffff808fcdd4 at kern_kevent_fp+0x144
>> #11 0xffffffff808fcc4f at kern_kevent+0x9f
>> #12 0xffffffff808fcaea at sys_kevent+0x12a
>> #13 0xffffffff80d0e914 at amd64_syscall+0x2d4
>> #14 0xffffffff80cf4fcb at Xfast_syscall+0xfb
>>
>> Do you need anything?
>>
>>> On 4 February 2016 at 10:47, Adrian Chadd <adrian.chadd@gmail.com> wrote:
>>>> I've no time to help with this, I'm sorry :(
>>>>
>>>>
>>>> -a
>>>>
>>>>
>>>> On 4 February 2016 at 05:00, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
>>>>> On Tue, Feb 02, 2016 at 11:44:47PM +0300, Slawa Olhovchenkov wrote:
>>>>>
>>>>>> On Thu, Oct 22, 2015 at 11:24:53AM -0700, Luigi Rizzo wrote:
>>>>>>
>>>>>>> On Thu, Oct 22, 2015 at 11:12 AM, Adrian Chadd <adrian.chadd@gmail.com> wrote:
>>>>>>>> On 22 October 2015 at 09:35, Slawa Olhovchenkov <slw@zxy.spb.ru> wrote:
>>>>>>>>> On Sun, Oct 18, 2015 at 07:45:52PM -0700, Adrian Chadd wrote:
>>>>>>>>>
>>>>>>>>>> Heh, file a bug with luigi; it should be defined better inside netmap itself.
>>>>>>>>>
>>>>>>>>> I am CC: luigi.
>>>>>>>>>
>>>>>>>>> Next question: do kevent RX/TX sync?
>>>>>>>>> In my setup I am need to manual NIOCTXSYNC/NIOCRXSYNC.
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Nope. kqueue() doesn't do the implicit sync like poll() does; it's
>>>>>>>> just the notification path.
>>>>>>>
>>>>>>> actually not. When the file descriptor is registered there
>>>>>>> is an implicit sync, and there is another one when an event
>>>>>>> is posted for the file descriptor.
>>>>>>>
>>>>>>> unless there are bugs, of course.
>>>>>>
>>>>>> I found strange behaivor:
>>>>>>
>>>>>> 1. open netmap and register in main thread
>>>>>> 2. kevent register in different thread
>>>>>> 3. result: got event by kevent but no ring sinc (all head,tail,cur
>>>>>> still 0).
>>>>>>
>>>>>> Is this normal? Or is this bug?
>>>>>>
>>>>>> open and registering netmap in same thread as kevent resolve this.
>>>>>
>>>>> Also, kevent+netmap deadlocked for me:
>>>>>
>>>>>    PID    TID COMM             TDNAME           KSTACK
>>>>>   1095 100207 addos            -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_timedwait_sig+0x10 _sleep+0x238 kern_nanosleep+0x10e sys_nanosleep+0x51 amd64_syscall+0x40f Xfast_syscall+0xfb
>>>>>   1095 100208 addos            worker#0         mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent+0x401 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb
>>>>>   1095 100209 addos            worker#1         mi_switch+0xe1 turnstile_wait+0x42a __mtx_lock_sleep+0x26b knote+0x38 freebsd_selwakeup+0x8b netmap_notify+0x55 netmap_pipe_txsync+0x156 netmap_poll+0x400 netmap_knrw+0x6e kqueue_register+0x799 kern_kevent+0x158 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb
>>>>>   1095 100210 addos            worker#2         mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent+0x401 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb
>>>>>   1095 100211 addos            worker#NOIP      mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent+0x401 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb
>>>>>   1095 100212 addos            balancer         mi_switch+0xe1 turnstile_wait+0x42a __mtx_lock_sleep+0x26b knote+0x38 freebsd_selwakeup+0x8b netmap_notify+0x2a netmap_pipe_rxsync+0x54 netmap_poll+0x774 netmap_knrw+0x6e kern_kevent+0x5cc sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb
>
>
>


-- 
Dr. Ing. Giuseppe Lettieri
Dipartimento di Ingegneria della Informazione
Universita' di Pisa
Largo Lucio Lazzarino 1, 56122 Pisa - Italy
Ph. : (+39) 050-2217.649 (direct) .599 (switch)
Fax : (+39) 050-2217.600
e-mail: g.lettieri@iet.unipi.it



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?56B9E398.1060105>