From owner-freebsd-hackers@FreeBSD.ORG Sat Mar 14 13:47:40 2015 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 400C04D9 for ; Sat, 14 Mar 2015 13:47:40 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id BB2CB2FC for ; Sat, 14 Mar 2015 13:47:39 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t2EDlXpY005381 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 14 Mar 2015 15:47:33 +0200 (EET) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t2EDlXpY005381 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t2EDlXYp005380; Sat, 14 Mar 2015 15:47:33 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 14 Mar 2015 15:47:33 +0200 From: Konstantin Belousov To: Michael Fuckner Subject: Re: Server with 3TB Crashing at boot Message-ID: <20150314134733.GI2379@kib.kiev.ua> References: <5501DD57.7030305@freebsd.org> <5503234A.4060103@fuckner.net> <55032C1B.5030004@multiplay.co.uk> <275339388.395931.1426279324879.JavaMail.open-xchange@ptangptang.store> <5503DC66.40409@fuckner.net> <55041EF5.9080200@multiplay.co.uk> <5504362C.9000904@fuckner.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5504362C.9000904@fuckner.net> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: Ryan Stone , Steven Hartland , "freebsd-hackers@freebsd.org" X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Mar 2015 13:47:40 -0000 On Sat, Mar 14, 2015 at 02:22:52PM +0100, Michael Fuckner wrote: > On 3/14/2015 12:43 PM, Steven Hartland wrote: > > > > > > On 14/03/2015 06:59, Michael Fuckner wrote: > >> On 3/13/2015 10:17 PM, Ryan Stone wrote: > >>> On Fri, Mar 13, 2015 at 4:42 PM, Michael Fuckner > >>> wrote: > >>> > >>>> Now I can kldload zfs without exploding kernel. I'll do some more > >>>> tests > >>>> tomorrow, but this looks fine! > >>>> > >>> > >>> Excellent news! I'd be interested to know whether this fixes the panics > >>> that you saw when zfs.ko was loaded by the bootloader. It's definitely > >>> possible, as the symptoms of this bug are likely to be random memory > >>> corruption after zfs initializes, but your crash happened pretty > >>> early on > >>> and I'm not sure whether zfs would have had a chance to do anything that > >>> early. > >>> > >>> Thanks for all of the work that you did to debug this. > >> > >> > >> Currently there is another issue that prevents me from testing ZFS: > >> only one HBA gets initialized. > >> > >> mpr0: 9300-8i with 8x Intel S3700 > >> mpr1: 9300-4i4e with 4x Intel S3700 and an external JBOD with 24HDD. > >> > >> mpr0 initializes fine, mpr1 fails > >> > >> root@s4l:~ # dmesg |grep mpr > >> mpr1: port 0xf000-0xf0ff mem 0xfb100000-0xfb10ffff irq > >> 112 at device 0.0 on pci195 > >> mpr1: IOCFacts : > >> mpr1: Firmware: 07.00.01.00, Driver: 05.255.05.00-fbsd > >> mpr1: IOCCapabilities: > >> 7a85c > >> > >> mpr1: Cannot allocate queues memory > >> mpr1: mpr_iocfacts_allocate failed to alloc queues with error 12 > >> mpr1: mpr_attach IOC Facts based allocation failed with error 12 > >> device_attach: mpr1 attach returned 12 > > did your patch for queue size and sense size. > > I just saw: > > http://dedi3.fuckner.net/~molli123/temp/mpr2.cap (just the second half > of the boot, but nothing changed but the two debug echos) > > > root@s4l:~ # dmesg |grep mpr > mpr0: port 0x2000-0x20ff mem 0xaba00000-0xaba0ffff irq 42 > at device 0.0 on pci4 > mpr0: IOCFacts : > mpr0: Firmware: 07.00.01.00, Driver: 05.255.05.00-fbsd > mpr0: IOCCapabilities: > 7a85c > mpr0: attempting to allocate 1 MSI-X vectors (96 supported) > mpr0: using IRQ 300 for MSI-X > mpr1: port 0xf000-0xf0ff mem 0xfb100000-0xfb10ffff irq 112 > at device 0.0 on pci195 > mpr1: IOCFacts : > mpr1: Firmware: 07.00.01.00, Driver: 05.255.05.00-fbsd > mpr1: IOCCapabilities: > 7a85c > mpr1: Cannot allocate sense size 258048 memory > mpr1: mpr_iocfacts_allocate failed to alloc queues with error 12 > mpr1: mpr_attach IOC Facts based allocation failed with error 12 > device_attach: mpr1 attach returned 12 > (probe0:mpr0:0:0:0): Down reving Protocol Version from 4 to 0? > da0 at mpr0 bus 0 scbus0 target 0 lun 0 > pass0 at mpr0 bus 0 scbus0 target 0 lun 0 > > In Bios I have VT-d enabled. In the VT-d Menu there are two other options: > ATS (Non-Iscoh VT-D Engine ATS Support), default: enabled ATS is not used by FreeBSD right now, and your device probably does not support it as well. > Coherency Support(Non Iscoh VT_D Engine Coherency Support), default: > Disabled There is a bug in 10.1 which incorrectly invalidates IOMMU pages cache. Enabling the coherency works around the bug. > > In the Processor Configuration tab there also was > extended apic support (default is disabled) On 10.1 this option should not result in any behaviour change. > > Should I give this option a try? Up to you. I am somewhat curious whether it boots with DMAR enabled and what happens if it does not.