From owner-freebsd-stable@freebsd.org Tue Mar 22 20:37:54 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 11B7EAD8F8E for ; Tue, 22 Mar 2016 20:37:54 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) Received: from mail.ambrisko.com (mail.ambrisko.com [70.91.206.90]) by mx1.freebsd.org (Postfix) with ESMTP id EDC3B120A for ; Tue, 22 Mar 2016 20:37:53 +0000 (UTC) (envelope-from ambrisko@ambrisko.com) X-Ambrisko-Me: Yes Received: from server2.ambrisko.com (HELO internal.ambrisko.com) ([192.168.1.2]) by ironport.ambrisko.com with ESMTP; 22 Mar 2016 13:52:21 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by internal.ambrisko.com (8.14.9/8.14.4) with ESMTP id u2MKbqS4076907; Tue, 22 Mar 2016 13:37:52 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.14.9/8.14.4/Submit) id u2MKbqNL076906; Tue, 22 Mar 2016 13:37:52 -0700 (PDT) (envelope-from ambrisko) Date: Tue, 22 Mar 2016 13:37:52 -0700 From: Doug Ambrisko To: Garrett Wollman Cc: freebsd-stable@freebsd.org Subject: Re: Hangs with mrsas? Message-ID: <20160322203752.GA73172@ambrisko.com> References: <22237.53738.967189.432979@khavrinen.csail.mit.edu> <20160322184238.GA58487@ambrisko.com> <22257.42636.358484.165317@khavrinen.csail.mit.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <22257.42636.358484.165317@khavrinen.csail.mit.edu> User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Mar 2016 20:37:54 -0000 On Tue, Mar 22, 2016 at 04:09:48PM -0400, Garrett Wollman wrote: | < said: | | > You could try: | > https://people.freebsd.org/~ambrisko/mrsas.patch | | I take it that the important part of this patch is changing the DMA | tag and scatter/gather setup to allow 64-bit addresses? (Why would | the original driver have been limited to 32-bit addresses? It's quite | new hardware!) Yes, primarily ... there are some other things such as let the OS set things up especially in the ioctl path since user-land probably won't setup a proper SG list for the kernel. The DMA address space for the card was limited to 256K in 32 bit address space. So it didn't take much to fragment that up so things could fail or have to wait to get memory. On initial boot things worked "okay" but after some run time with our appliance (we run 64 bit) memory allocations would have issues. We found this was made worse with RAID cards that didn't have cache. I assume no cache would make I/O operations to take longer and then tie up memory longer. With the same SW running on cards with cache we didn't see these issues. So I assume they completed fast enough not to hold onto memory for very long. With these changes our appliances without RAID cache runs faster and doesn't run into "strange" issues now. We run in RAID 10 mode. It also adds RAID card event messages to dmesg. On the plus side this code exposed a VM bug in 9.2 for us! There is still a bug that with a card without cache if I send lots of management commands quickly to reconfigure the RAID the driver reports the firmware had an OCR issue and never recovers. If I put a sleep 1 after each command then it is okay. I need to try this again and dump the term log to see if the firmware will give me a clue. With the cards that we are currently using the RAID cache is an option. So they only thing I'm changing is the HW and not the firmware. However, the firmware seems to flip itself into different device when I add or remove cache. Thanks, Doug A.