From owner-freebsd-stable@FreeBSD.ORG Fri Sep 21 20:59:13 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8E59D106566B; Fri, 21 Sep 2012 20:59:13 +0000 (UTC) (envelope-from jim.harris@gmail.com) Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id 32E228FC08; Fri, 21 Sep 2012 20:59:12 +0000 (UTC) Received: by vbmv11 with SMTP id v11so5463670vbm.13 for ; Fri, 21 Sep 2012 13:59:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=bPMdtLR51pdQbYlvDfkoSG1rCcOZnChafRetDCxf39o=; b=I+E7ndl+rnlal0OZYMQimYHdWlI/AE147b8mQUvCVbJ4u8mfceb1P/1rijs+54Ueum i7xvyH07x4XpjjO0N5gD4rFWwgKbKvZnHRJfAtPK3h1VdI1neJf30a7qNq9Vl8fXFOWx nMQQ9aRdJnujPwwZjNDAieVMAWJd8UQi/SkgHB7IX9KB6DSa7Aox7fd+J89ebEmflEqC OGj3g7jFkwERP1O1iKOjRSJfFf88SiwZykHn9C2xOzZhOzJ7mxkDaX0eV13M9uhcMBFg ywhzWmqV6weeuitn4eq628y7jRnBaAlsTmeolGTS4Pc33w6FyeltLSTae0P07CRI0/uH AmQg== MIME-Version: 1.0 Received: by 10.52.69.47 with SMTP id b15mr2900284vdu.116.1348261152299; Fri, 21 Sep 2012 13:59:12 -0700 (PDT) Sender: jim.harris@gmail.com Received: by 10.58.249.135 with HTTP; Fri, 21 Sep 2012 13:59:12 -0700 (PDT) In-Reply-To: <505CC8EC.4030608@sentex.net> References: <505CC8EC.4030608@sentex.net> Date: Fri, 21 Sep 2012 13:59:12 -0700 X-Google-Sender-Auth: gPvsz619S4x9ffo_584pKejgWs0 Message-ID: From: Jim Harris To: Mike Tancsa , delphij@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: FreeBSD-STABLE Mailing List Subject: Re: tws bug ? (LSI SAS 9750) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Sep 2012 20:59:13 -0000 On Fri, Sep 21, 2012 at 1:07 PM, Mike Tancsa wrote: > Hi, > I have been trying out a nice new tws controller and decided to enable > debugging in the kernel and run some stress tests. With a regular > GENERIC kernel, it boots up fine. But with debugging, it panics on > boot. Anyone know whats up ? Is this something that should be sent > directly to LSI ? Through a code inspection, this mutex is being recursed whether or not debugging is enabled. There is no code path here specific to INVARIANTS. And the main IO path in this driver is always recursing on this lock - it is not specific to the initialization callstack you listed below. The best course of action seems to be initializing the lock with MTX_RECURSE, since the driver seems to expect to be able to recurse on the io_lock. Can you try the following patch? diff --git a/sys/dev/tws/tws.c b/sys/dev/tws/tws.c index b1615db..d156d40 100644 --- a/sys/dev/tws/tws.c +++ b/sys/dev/tws/tws.c @@ -197,7 +197,7 @@ tws_attach(device_t dev) mtx_init( &sc->q_lock, "tws_q_lock", NULL, MTX_DEF); mtx_init( &sc->sim_lock, "tws_sim_lock", NULL, MTX_DEF); mtx_init( &sc->gen_lock, "tws_gen_lock", NULL, MTX_DEF); - mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF); + mtx_init( &sc->io_lock, "tws_io_lock", NULL, MTX_DEF | MTX_RECURSE); if ( tws_init_trace_q(sc) == FAILURE ) printf("trace init failure\n"); > > pcib0: port 0xcf8-0xcff on acpi0 > pci0: on pcib0 > pcib1: irq 16 at device 1.0 on pci0 > pci1: on pcib1 > pcib2: irq 17 at device 1.1 on pci0 > pci2: on pcib2 > LSI 3ware device driver for SAS/SATA storage controllers, version: > 10.80.00.003 > tws0: port 0x4000-0x40ff mem > 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 > tws0: Using legacy INTx > panic: _mtx_lock_sleep: recursed on non-recursive mutex tws_io_lock @ > /usr/HEAD/src/sys/dev/tws/tws_hdm.c:287 > > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x1d8 > _mtx_lock_sleep() at _mtx_lock_sleep+0x27f > _mtx_lock_flags() at _mtx_lock_flags+0xf1 > tws_submit_command() at tws_submit_command+0x3f > tws_dmamap_data_load_cbfn() at tws_dmamap_data_load_cbfn+0xb7 > bus_dmamap_load() at bus_dmamap_load+0x16c > tws_map_request() at tws_map_request+0x78 > tws_get_param() at tws_get_param+0xe1 > tws_display_ctlr_info() at tws_display_ctlr_info+0x4c > tws_init_ctlr() at tws_init_ctlr+0x6d > tws_attach() at tws_attach+0x68c > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pci_attach() at acpi_pci_attach+0x164 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pcib_attach() at acpi_pcib_attach+0x1a7 > acpi_pcib_pci_attach() at acpi_pcib_pci_attach+0x9b > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pci_attach() at acpi_pci_attach+0x164 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_pcib_attach() at acpi_pcib_attach+0x1a7 > acpi_pcib_acpi_attach() at acpi_pcib_acpi_attach+0x1f6 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > acpi_attach() at acpi_attach+0xbc1 > device_attach() at device_attach+0x72 > bus_generic_attach() at bus_generic_attach+0x1a > nexus_acpi_attach() at nexus_acpi_attach+0x69 > device_attach() at device_attach+0x72 > bus_generic_new_pass() at bus_generic_new_pass+0xd6 > bus_set_pass() at bus_set_pass+0x7a > configure() at configure+0xa > mi_startup() at mi_startup+0x77 > btext() at btext+0x2c > KDB: enter: panic > [ thread pid 0 tid 100000 ] > Stopped at kdb_enter+0x3b: movq $0,0x993262(%rip) > db> > > > int > tws_submit_command(struct tws_softc *sc, struct tws_request *req) > { > u_int32_t regl, regh; > u_int64_t mfa=0; > > /* > * mfa register read and write must be in order. > * Get the io_lock to protect against simultinous > * passthru calls > */ > mtx_lock(&sc->io_lock); > > if ( sc->obfl_q_overrun ) { > tws_init_obfl_q(sc); > } > > > > With no debugging in the kernel, it boots up fine > > pcib2: irq 17 at device 1.1 on pci0 > pci2: on pcib2 > LSI 3ware device driver for SAS/SATA storage controllers, version: > 10.80.00.003 > tws0: port 0x4000-0x40ff mem > 0xc2460000-0xc2463fff,0xc2400000-0xc243ffff irq 17 at device 0.0 on pci2 > tws0: Using legacy INTx > tws0: Controller details: Model 9750-4i, 8 Phys, Firmware FH9X > 5.12.00.007, BIOS BE9X 5.11.00.006 > em0: port 0x5040-0x505f mem > 0xc2500000-0xc251ffff,0xc2570000-0xc2570fff irq 19 at device 25.0 on pci0 > em0: Using an MSI interrupt > em0: Ethernet address: 00:1e:67:45:b6:29 > ehci0: mem 0xc2560000-0xc25603ff irq > 22 at device 26.0 on pci0 > usbus0: EHCI version 1.0 > usbus0 on ehci0 > > > tws0@pci0:2:0:0: class=0x010400 card=0x000113c1 chip=0x101013c1 > rev=0x05 hdr=0x00 > vendor = '3ware Inc' > device = '9750 SAS2/SATA-II RAID PCIe' > class = mass storage > subclass = RAID > bar [10] = type I/O Port, range 32, base 0x4000, size 256, enabled > bar [14] = type Memory, range 64, base 0xc2460000, size 16384, enabled > bar [1c] = type Memory, range 64, base 0xc2400000, size 262144, > enabled > cap 01[50] = powerspec 3 supports D0 D1 D2 D3 current D0 > cap 10[68] = PCI-Express 2 endpoint max data 128(4096) link x4(x8) > cap 03[d0] = VPD > cap 05[a8] = MSI supports 1 message, 64 bit > ecap 0001[100] = AER 1 1 fatal 0 non-fatal 0 corrected > ecap 0004[138] = unknown 1 > PCI-e errors = Fatal Error Detected > Unsupported Request Detected > Fatal = Unsupported Request > > > > > Also, any reason NOT to set hw.tws.enable_msi=1 in /boot/loader.conf ? > > ---Mike > > > > -- > ------------------- > Mike Tancsa, tel +1 519 651 3400 > Sentex Communications, mike@sentex.net > Providing Internet services since 1994 www.sentex.net > Cambridge, Ontario Canada http://www.tancsa.com/ > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"