From owner-freebsd-net@FreeBSD.ORG Tue May 7 23:06:26 2013 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id 128CC5B4; Tue, 7 May 2013 23:06:26 +0000 (UTC) (envelope-from delphij@delphij.net) Received: from anubis.delphij.net (anubis.delphij.net [64.62.153.212]) by mx1.freebsd.org (Postfix) with ESMTP id E38DD80F; Tue, 7 May 2013 23:06:25 +0000 (UTC) Received: from zeta.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by anubis.delphij.net (Postfix) with ESMTPSA id 24F0640C7; Tue, 7 May 2013 16:06:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=delphij.net; s=anubis; t=1367967985; bh=QzlF/DBYWEfYeVLbePQiHw/3j+Kk31sehS3YdMaCF7Q=; h=Date:From:Reply-To:To:CC:Subject:References:In-Reply-To; b=KIjBsx4Fm6xj2qD5SO5aK++QbE7JiqCc2O3OVzk1EdXduqZfyJp+74K+yr1YRNTaC 6NBBmdDxRaCgBm1sa36I1cl02h1xzWznxK+qpaXb+flh4MCOpa/UQlO8rDOkP13Uz6 FhYuj2ywPf60oveXuNlQszEonkv92lxofcbbU5HY= Message-ID: <518988F0.2080902@delphij.net> Date: Tue, 07 May 2013 16:06:24 -0700 From: Xin Li Organization: The FreeBSD Project MIME-Version: 1.0 To: Garrett Cooper Subject: Re: LOR: "taskqueue_drain with the following non-sleepable locks held" with if_em References: In-Reply-To: X-Enigmail-Version: 1.5.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: jfv@FreeBSD.org, freebsd-net@freebsd.org, haven.hash@isilon.com, jeff@freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: d@delphij.net List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 May 2013 23:06:26 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 05/07/13 15:03, Garrett Cooper wrote: > Saw the following LOR on a CURRENT build as of yesterday with an > almost idle machine processing ARP requests: > > root@wf220:/mnt # taskqueue_drain with the following non-sleepable > locks held: exclusive rw lle (lle) r = 0 (0xfffffe001450b410) > locked @ /usr/src/sys/netinet/in.c:1484 KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xffffff848d4f7690 kdb_backtrace() at kdb_backtrace+0x39/frame > 0xffffff848d4f7740 witness_warn() at witness_warn+0x4a8/frame > 0xffffff848d4f7800 taskqueue_drain() at taskqueue_drain+0x3a/frame > 0xffffff848d4f7840 set_timeout() at set_timeout+0x4a/frame > 0xffffff848d4f7860 netevent_callback() at > netevent_callback+0x16/frame 0xffffff848d4f7870 arpintr() at > arpintr+0x9b5/frame 0xffffff848d4f7930 netisr_dispatch_src() at > netisr_dispatch_src+0x60/frame 0xffffff848d4f79a0 ether_demux() at > ether_demux+0x130/frame 0xffffff848d4f79d0 ether_nh_input() at > ether_nh_input+0x369/frame 0xffffff848d4f7a30 netisr_dispatch_src() > at netisr_dispatch_src+0x60/frame 0xffffff848d4f7aa0 em_rxeof() at > em_rxeof+0x30e/frame 0xffffff848d4f7b10 em_msix_rx() at > em_msix_rx+0x33/frame 0xffffff848d4f7b40 > intr_event_execute_handlers() at > intr_event_execute_handlers+0x80/frame 0xffffff848d4f7b70 > ithread_loop() at ithread_loop+0x128/frame 0xffffff848d4f7bb0 > fork_exit() at fork_exit+0x71/frame 0xffffff848d4f7bf0 > fork_trampoline() at fork_trampoline+0xe/frame 0xffffff848d4f7bf0 > --- trap 0, rip = 0, rsp = 0xffffff848d4f7cb0, rbp = 0 --- > root@wf220:/mnt # uname -a FreeBSD wf220.west.isilon.com > 10.0-CURRENT FreeBSD 10.0-CURRENT #1: Tue May 7 08:04:59 PDT 2013 > root@wf220.west.isilon.com:/usr/obj/usr/src/sys/ISI-GENERIC amd64 > > I've seen this issue before for a few weeks/months, so it's nothing > new (but probably should be fixed...). Thanks! This have nothing to do with em(4) but looks like a bug in our Linux compatibility wrapper. In the InfiniBand code, its _handle_arp_update_event() calls netevent_callback() with NETEVENT_NEIGH_UPDATE, where a cancel_delayed_work() causes the drain. Looking at the Linux code, it seems that we just shouldn't do the drain in the cancel_delayed_work() wrapper (sys/ofed/include/linux/workqueue.h) so it seems like we need something like this: Index: sys/ofed/include/linux/workqueue.h =================================================================== - --- sys/ofed/include/linux/workqueue.h (revision 250337) +++ sys/ofed/include/linux/workqueue.h (working copy) @@ -184,9 +184,9 @@ { callout_stop(&work->timer); - - if (work->work.taskqueue && - - taskqueue_cancel(work->work.taskqueue, &work->work.work_task, NULL)) - - taskqueue_drain(work->work.taskqueue, &work->work.work_task); + if (work->work.taskqueue) + return (taskqueue_cancel(work->work.taskqueue, + &work->work.work_task, NULL) != 0); return 0; } I've added Jeff to Cc. Cheers, - -- Xin LI https://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJRiYjwAAoJEG80Jeu8UPuzOC0H+wbTxVq3nPOuQqZynOLcxHVj L19b1D8opm8hl3AwXfvbOyCbEEenoHJm0FjBd+5eas+9ol1kuRoOyBKVnoZRr2vO 7hcFt/iA7WAQKrZR7ReLUebjLcIymjzDRO6ztZCPMwSzIg1CzypY4KdJhlW438te DvAkzYbgy1YG4C8Uxjg7wR7PR4SY1UgLFYPMeNyvwCCJmSEN/RQB1qrOaJovFks5 C53j713BIHOI0H4G3IhKJd9ujPhVrfQperItlJ4Lg7y0Ix5HlLFdSNRkpzvNrXN4 TN6Xb/atMo1EIiDReqx8Mpus52yUOl3oHXkKzTRZpGM3mW0vLIieajCK0JGBd6c= =tU/S -----END PGP SIGNATURE-----