From owner-freebsd-current@FreeBSD.ORG Sat May 3 18:04:42 2014 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id AC7D0A32 for ; Sat, 3 May 2014 18:04:42 +0000 (UTC) Received: from smtp1.multiplay.co.uk (smtp1.multiplay.co.uk [85.236.96.35]) by mx1.freebsd.org (Postfix) with ESMTP id 6F51F154E for ; Sat, 3 May 2014 18:04:42 +0000 (UTC) Received: by smtp1.multiplay.co.uk (Postfix, from userid 65534) id 5337D20E7088C; Sat, 3 May 2014 18:04:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.multiplay.co.uk X-Spam-Level: ** X-Spam-Status: No, score=2.2 required=8.0 tests=AWL,BAYES_00,DOS_OE_TO_MX, FSL_HELO_NON_FQDN_1,HELO_NO_DOMAIN,RDNS_DYNAMIC,STOX_REPLY_TYPE autolearn=no version=3.3.1 Received: from r2d2 (82-69-141-170.dsl.in-addr.zen.co.uk [82.69.141.170]) by smtp1.multiplay.co.uk (Postfix) with ESMTPS id 92A4C20E70886 for ; Sat, 3 May 2014 18:04:36 +0000 (UTC) Message-ID: <229058B87F604A469C70F634AA1C793D@multiplay.co.uk> From: "Steven Hartland" To: "FreeBSD-Current" References: <20140503102923.6fadd904@fabiankeil.de> <20140503191424.16f9744b@fabiankeil.de> Subject: Re: Fatal double fault in ZFS with yesterday's CURRENT Date: Sat, 3 May 2014 19:04:40 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5931 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6157 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 03 May 2014 18:04:42 -0000 > "Steven Hartland" wrote: > > > From: "Fabian Keil" > > > > > After updating my laptop to yesterday's CURRENT (r265216), > > > I got the following fatal double fault on boot: > > > http://www.fabiankeil.de/bilder/freebsd/kernel-panic-r265216/ > > > > > > My previous kernel was based on r264721. > > > > > > I'm using a couple of custom patches, some of them are ZFS-related > > > and thus may be part of the problem (but worked fine for months). > > > I'll try to reproduce the panic without the patches tomorrow. > > > > > > > Your seeing a stack overflow in the new ZFS queuing code, which I > > believe is being triggered by lack of support for TRIM in one of > > your devices, something Xin reported to me yesterday. > > > > I commited a fix for failing TRIM requests processing slowly last > > night so you could try updating to after r265253 and see if that > > helps. > > Thanks. The hard disk is indeed unlikely to support TRIM requests, > but I can still reproduce the problem with a kernel based on r265255. Thanks for testing, I suspect its still a numbers game with how many items are outstanding in the queue and now that free / TRIM requests are also now queued its triggering the failure. If your just on a HDD try setting the following in /boot/loader.conf as a temporary workaround: vfs.zfs.trim.enabled=0 > > I still need to investigate the stack overflow more directly which > > appears to be caused by the new zfs queuing code when things are > > running slowly and there's a large backlog of IO's. > > > > I would be interested to know you config there so zpool layout and > > hardware in the mean time. > > The system is a Lenovo ThinkPad R500: > http://www.nycbug.org/index.cgi?action=dmesgd&do=view&dmesgid=2449 > > I'm booting from UFS, the panic occurs while the pool is being imported. > > The pool is located on a single geli-encrypted slice: > > fk@r500 ~ $zpool status tank > pool: tank > state: ONLINE > scan: scrub repaired 0 in 4h11m with 0 errors on Sat Mar 22 18:25:01 2014 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 0 > ada0s1d.eli ONLINE 0 0 0 > > errors: No known data errors > > Maybe geli fails TRIM requests differently. That helps, Xin also reported the issue with geli and thats what I'm testing with, I believe this is a factor because is significantly slows things down again meaning more items in the queues, but I've only managed to trigger it once here as the machine I'm using is pretty quick. I'll continue looking at this ASAP. Regards Steve