From owner-svn-src-all@freebsd.org Wed Mar 16 12:46:44 2016 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6FD1BAD252A; Wed, 16 Mar 2016 12:46:44 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (mail.turbocat.net [IPv6:2a01:4f8:d16:4514::2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 087E981B; Wed, 16 Mar 2016 12:46:44 +0000 (UTC) (envelope-from hps@selasky.org) Received: from laptop015.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id ADA6C1FE024; Wed, 16 Mar 2016 13:46:41 +0100 (CET) Subject: Re: svn commit: r296909 - head/sys/ofed/drivers/infiniband/ulp/ipoib To: John Baldwin References: <201603151547.u2FFlQKN078643@repo.freebsd.org> <2420759.o4YiE5Za4X@ralph.baldwin.cx> Cc: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org From: Hans Petter Selasky Message-ID: <56E95658.2010703@selasky.org> Date: Wed, 16 Mar 2016 13:49:28 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <2420759.o4YiE5Za4X@ralph.baldwin.cx> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Mar 2016 12:46:44 -0000 On 03/15/16 18:43, John Baldwin wrote: > On Tuesday, March 15, 2016 03:47:26 PM Hans Petter Selasky wrote: >> Author: hselasky >> Date: Tue Mar 15 15:47:26 2016 >> New Revision: 296909 >> URL: https://svnweb.freebsd.org/changeset/base/296909 >> >> Log: >> Fix witness panic in the ipoib_ioctl() function when unloading the >> ipoib module. >> >> The bpfdetach() function is trying to turn off promiscious mode on the >> network interface it is attached to while holding a mutex. The fix >> consists of ignoring any further calls to the ipoib_ioctl() function >> when the network interface is going to be detached. The ipoib_ioctl() >> function might sleep. >> >> Sponsored by: Mellanox Technologies >> MFC after: 1 week > > In all of the other NIC drivers I have worked on the fix has been to call > if_detach (or ether_ifdetach) as the first step in your detach method before > you try to stop the interface. The idea is to remove external connections > from your driver first, and the only after those have drained do you actually > shut down the hardware. Given you are calling if_detach() just before > if_free() below, ipoib just seems to be broken here. Hi John, It is a problem that if_ioctl() callbacks come from various places without proper refcounting and unknown context properties, like sleeping allowed or not. The WITNESS assert I'm trying to solve is this one: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe04661876c0 > ipoib_ioctl() at ipoib_ioctl+0x29/frame 0xfffffe0466187700 > if_setflag() at if_setflag+0xd5/frame 0xfffffe0466187760 > ifpromisc() at ifpromisc+0x2c/frame 0xfffffe0466187790 > bpf_detachd_locked() at bpf_detachd_locked+0x1e8/frame 0xfffffe04661877d0 > bpfdetach() at bpfdetach+0x152/frame 0xfffffe0466187810 > ipoib_detach() at ipoib_detach+0x42/frame 0xfffffe0466187830 > ipoib_remove_one() at ipoib_remove_one+0x19e/frame 0xfffffe04661878a0 > ib_unregister_client() at ib_unregister_client+0x5e/frame 0xfffffe04661878e0 > ipoib_cleanup_module() at ipoib_cleanup_module+0x52/frame 0xfffffe04661878f0 > _module_run() at _module_run+0x7b/frame 0xfffffe0466187920 > linker_file_unload() at linker_file_unload+0x45f/frame 0xfffffe0466187970 > kern_kldunload() at kern_kldunload+0xbc/frame 0xfffffe04661879a0 > amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe0466187ab0 > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0466187ab0 As you can see bpfdetach() is calling ifpromisc() locked, which basically means we cannot do anything there with regard to ipoib. ipoib is not a NIC driver, but an IP-bearer only. Comparing with PCI devices is not valid here, because with a PCI device you usually only need to write a PCI register to switch promiscious mode on/off, but with IPOIB it involves commands towards the infiniband stack which might sleep. > For example, detach in a typical NIC driver does this: > > struct foo_softc *sc; > > sc = device_get_softc(dev); > ether_ifdetach(sc->sc_ifp); > FOO_LOCK(sc); > foo_stop(sc); > FOO_UNLOCK(sc); > callout_drain(&sc->timer); > bus_teardown_intr(...); > bus_release_resource(...); > if_free(sc->sc_ifp); > > Similarly, devices with a character device in /dev should be calling > destroy_dev() first before shutting down hardware, etc. Right. The problem is not really the detach sequence, but that bpfdetach() calls ifpromisc() locked. I'm not sure if we can change that. --HPS