From owner-freebsd-stable@freebsd.org Tue Feb 9 13:05:16 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D2D1DAA2209 for ; Tue, 9 Feb 2016 13:05:16 +0000 (UTC) (envelope-from g.lettieri@iet.unipi.it) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id BC90E19A9 for ; Tue, 9 Feb 2016 13:05:16 +0000 (UTC) (envelope-from g.lettieri@iet.unipi.it) Received: by mailman.ysv.freebsd.org (Postfix) id B8EABAA2208; Tue, 9 Feb 2016 13:05:16 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8732AA2207 for ; Tue, 9 Feb 2016 13:05:16 +0000 (UTC) (envelope-from g.lettieri@iet.unipi.it) Received: from smtp.unipi.it (smtp1.unipi.it [131.114.21.19]) by mx1.freebsd.org (Postfix) with ESMTP id 68C6319A8 for ; Tue, 9 Feb 2016 13:05:14 +0000 (UTC) (envelope-from g.lettieri@iet.unipi.it) Received: from localhost (localhost [127.0.0.1]) by smtp.unipi.it (Postfix) with ESMTP id E7D4B41E9D; Tue, 9 Feb 2016 13:57:26 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at unipi.it Received: from [10.216.1.203] (prova.iet.unipi.it [131.114.58.86]) (Authenticated User) by smtp.unipi.it (Postfix) with ESMTPSA id 0662F41E8E; Tue, 9 Feb 2016 13:57:26 +0100 (CET) Subject: Re: 82576 + NETMAP + VLAN To: Luigi Rizzo , Slawa Olhovchenkov References: <20151018210049.GT6469@zxy.spb.ru> <20151022163519.GF6469@zxy.spb.ru> <20160202204446.GQ88527@zxy.spb.ru> <20160204130029.GC88527@zxy.spb.ru> <20160208173935.GK68298@zxy.spb.ru> Cc: Adrian Chadd , "stable@freebsd.org" From: Giuseppe Lettieri Message-ID: <56B9E398.1060105@iet.unipi.it> Date: Tue, 9 Feb 2016 14:03:20 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Feb 2016 13:05:16 -0000 Hi all, I have only looked into the fist LOR, which has actually been there for a long time. It should be triggered by the following paths: 1) application does an ioctl(, NIOCREGIF) (or GETINFO) ->netmap_mem_finalize() [locks the netmap allocator] ->contigmalloc() [locks things related to vm] 2) application mmap()s the netmap fd and then accesses the area -> page fault [locks things related to vm] ->netmap_mem_ofstophys() [locks the netmap allocator] As a quick check, the LOR disappears if I replace the contigmalloc() with a dummy operation returning a static buffer. If this is correct, there cannot be any concurrency between the two paths, since the 1st one must be completed before the first mmap(). I also think that the vm objects locked in the two paths are not the same, but I don't know whether WITNESS keeps track of (some?) lock instances, or just lock types. Cheers, Giuseppe Il 09/02/2016 13:31, Luigi Rizzo ha scritto: > I am Cc-ing Giuseppe Lettieri who has looked at the problem and may > have some comments to share > > cheers > luigi > > On Mon, Feb 8, 2016 at 9:39 AM, Slawa Olhovchenkov wrote: >> On Thu, Feb 04, 2016 at 10:47:34AM -0800, Adrian Chadd wrote: >> >>> .. but if it does, can you enable witness and see what it reports as >>> lock order violations? >> >> last STABLE: >> >> 1. first LOR (with poll, don't cause direct problems now): >> >> lock order reversal: >> 1st 0xfffff800946e6700 vm object (vm object) @ /usr/src/sys/vm/vm_fault.c:363 >> 2nd 0xffffffff813e14d8 netmap memory allocator lock (netmap memory allocator lock) @ /usr/src/sys/dev/netmap/netmap_mem2.c:393 >> KDB: stack backtrace: >> #0 0xffffffff80970320 at kdb_backtrace+0x60 >> #1 0xffffffff809882ce at witness_checkorder+0xc7e >> #2 0xffffffff8091fcbc at __mtx_lock_flags+0x4c >> #3 0xffffffff806784f6 at netmap_mem_ofstophys+0x36 >> #4 0xffffffff80676834 at netmap_dev_pager_fault+0x34 >> #5 0xffffffff80b81a0f at dev_pager_getpages+0x3f >> #6 0xffffffff80b8cc1e at vm_fault_hold+0x86e >> #7 0xffffffff80b8c367 at vm_fault+0x77 >> #8 0xffffffff80d0e2c9 at trap_pfault+0x199 >> #9 0xffffffff80d0db47 at trap+0x527 >> #10 0xffffffff80cf4ce2 at calltrap+0x8 >> >> 2. kqueue issuse (not LOR!) >> >> acquiring duplicate lock of same type: "nm_kn_lock" >> 1st nm_kn_lock @ /usr/src/sys/kern/kern_event.c:2003 >> 2nd nm_kn_lock @ /usr/src/sys/kern/kern_event.c:2003 >> KDB: stack backtrace: >> #0 0xffffffff80970320 at kdb_backtrace+0x60 >> #1 0xffffffff809882ce at witness_checkorder+0xc7e >> #2 0xffffffff8091fcbc at __mtx_lock_flags+0x4c >> #3 0xffffffff808fd899 at knote+0x39 >> #4 0xffffffff8067636b at freebsd_selwakeup+0x8b >> #5 0xffffffff80674eb5 at netmap_notify+0x55 >> #6 0xffffffff8067ccb6 at netmap_pipe_txsync+0x156 >> #7 0xffffffff80674740 at netmap_poll+0x400 >> #8 0xffffffff80676b8e at netmap_knrw+0x6e >> #9 0xffffffff808fc57a at kqueue_register+0x64a >> #10 0xffffffff808fcdd4 at kern_kevent_fp+0x144 >> #11 0xffffffff808fcc4f at kern_kevent+0x9f >> #12 0xffffffff808fcaea at sys_kevent+0x12a >> #13 0xffffffff80d0e914 at amd64_syscall+0x2d4 >> #14 0xffffffff80cf4fcb at Xfast_syscall+0xfb >> >> Do you need anything? >> >>> On 4 February 2016 at 10:47, Adrian Chadd wrote: >>>> I've no time to help with this, I'm sorry :( >>>> >>>> >>>> -a >>>> >>>> >>>> On 4 February 2016 at 05:00, Slawa Olhovchenkov wrote: >>>>> On Tue, Feb 02, 2016 at 11:44:47PM +0300, Slawa Olhovchenkov wrote: >>>>> >>>>>> On Thu, Oct 22, 2015 at 11:24:53AM -0700, Luigi Rizzo wrote: >>>>>> >>>>>>> On Thu, Oct 22, 2015 at 11:12 AM, Adrian Chadd wrote: >>>>>>>> On 22 October 2015 at 09:35, Slawa Olhovchenkov wrote: >>>>>>>>> On Sun, Oct 18, 2015 at 07:45:52PM -0700, Adrian Chadd wrote: >>>>>>>>> >>>>>>>>>> Heh, file a bug with luigi; it should be defined better inside netmap itself. >>>>>>>>> >>>>>>>>> I am CC: luigi. >>>>>>>>> >>>>>>>>> Next question: do kevent RX/TX sync? >>>>>>>>> In my setup I am need to manual NIOCTXSYNC/NIOCRXSYNC. >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Nope. kqueue() doesn't do the implicit sync like poll() does; it's >>>>>>>> just the notification path. >>>>>>> >>>>>>> actually not. When the file descriptor is registered there >>>>>>> is an implicit sync, and there is another one when an event >>>>>>> is posted for the file descriptor. >>>>>>> >>>>>>> unless there are bugs, of course. >>>>>> >>>>>> I found strange behaivor: >>>>>> >>>>>> 1. open netmap and register in main thread >>>>>> 2. kevent register in different thread >>>>>> 3. result: got event by kevent but no ring sinc (all head,tail,cur >>>>>> still 0). >>>>>> >>>>>> Is this normal? Or is this bug? >>>>>> >>>>>> open and registering netmap in same thread as kevent resolve this. >>>>> >>>>> Also, kevent+netmap deadlocked for me: >>>>> >>>>> PID TID COMM TDNAME KSTACK >>>>> 1095 100207 addos - mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_timedwait_sig+0x10 _sleep+0x238 kern_nanosleep+0x10e sys_nanosleep+0x51 amd64_syscall+0x40f Xfast_syscall+0xfb >>>>> 1095 100208 addos worker#0 mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent+0x401 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb >>>>> 1095 100209 addos worker#1 mi_switch+0xe1 turnstile_wait+0x42a __mtx_lock_sleep+0x26b knote+0x38 freebsd_selwakeup+0x8b netmap_notify+0x55 netmap_pipe_txsync+0x156 netmap_poll+0x400 netmap_knrw+0x6e kqueue_register+0x799 kern_kevent+0x158 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb >>>>> 1095 100210 addos worker#2 mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent+0x401 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb >>>>> 1095 100211 addos worker#NOIP mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_kevent+0x401 sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb >>>>> 1095 100212 addos balancer mi_switch+0xe1 turnstile_wait+0x42a __mtx_lock_sleep+0x26b knote+0x38 freebsd_selwakeup+0x8b netmap_notify+0x2a netmap_pipe_rxsync+0x54 netmap_poll+0x774 netmap_knrw+0x6e kern_kevent+0x5cc sys_kevent+0x12a amd64_syscall+0x40f Xfast_syscall+0xfb > > > -- Dr. Ing. Giuseppe Lettieri Dipartimento di Ingegneria della Informazione Universita' di Pisa Largo Lucio Lazzarino 1, 56122 Pisa - Italy Ph. : (+39) 050-2217.649 (direct) .599 (switch) Fax : (+39) 050-2217.600 e-mail: g.lettieri@iet.unipi.it