From owner-freebsd-stable@FreeBSD.ORG Mon Dec 17 12:37:30 2012 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2F1A013B; Mon, 17 Dec 2012 12:37:30 +0000 (UTC) (envelope-from mm@FreeBSD.org) Received: from mail.vx.sk (mail.vx.sk [176.9.45.25]) by mx1.freebsd.org (Postfix) with ESMTP id BF07D8FC0C; Mon, 17 Dec 2012 12:37:29 +0000 (UTC) Received: from core.vx.sk (localhost [127.0.0.2]) by mail.vx.sk (Postfix) with ESMTP id AB7605B45; Mon, 17 Dec 2012 13:37:28 +0100 (CET) X-Virus-Scanned: amavisd-new at mail.vx.sk Received: from mail.vx.sk by core.vx.sk (amavisd-new, unix socket) with LMTP id tuZedWX2gHtn; Mon, 17 Dec 2012 13:37:23 +0100 (CET) Received: from [10.9.8.1] (188-167-78-15.dynamic.chello.sk [188.167.78.15]) by mail.vx.sk (Postfix) with ESMTPSA id 02E2D5B39; Mon, 17 Dec 2012 13:37:22 +0100 (CET) Message-ID: <50CF1202.9070805@FreeBSD.org> Date: Mon, 17 Dec 2012 13:37:22 +0100 From: Martin Matuska User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Andriy Gapon Subject: Re: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE References: <50C9AFC6.6080902@FreeBSD.org> <50CA1639.1010409@FreeBSD.org> In-Reply-To: <50CA1639.1010409@FreeBSD.org> X-Enigmail-Version: 1.4.6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-fs@FreeBSD.org, freebsd-stable@FreeBSD.org, olivier X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 17 Dec 2012 12:37:30 -0000 On 13.12.2012 18:54, Andriy Gapon wrote: > on 13/12/2012 19:46 olivier said the following: >> Thanks. I'll be sure to follow your suggestions next time this happens. >> >> I have a naive question/suggestion though. I see from browsing past discussions on >> ZFS problems that it has been suggested a number of times that problems that >> appear to originate in ZFS in fact come from lower layers; in particular because >> of driver bugs or disks in the process of failing. It seems that it can take a lot >> of time to troubleshoot such problems. I accept that ZFS behavior correctly leaves >> dealing with timeouts to lower layers, but it seems to me that the ZFS layer would >> be a great place to warn the user about issues and provide some information to >> troubleshoot them. >> >> For example, if some I/O requests get lost because of a buggy driver, the driver >> itself might not be the best place to identify those lost requests. But perhaps we >> could have a compile time option in ZFS code that spits out a warning if it gets >> stuck waiting for a particular request to come back for more than say 10 seconds, >> and identifies the problematic disk? I'm sure there would be cases where these >> warnings would be unwarranted, and I imagine that changes in the code to provide >> such warnings would impact performance; so one certainly would not want that code >> active by default. But someone in my position could certainly recompile the kernel >> with a ZFS debugging option turned on to figure out the problem. >> >> I understand that ZFS code comes from upstream, and that you guys probably want to >> keep FreeBSD-specific changes minimal. If that's a big problem, even just a patch >> provided "as such" that does not make it into the FreeBSD code base might be >> extremely useful. I wish I could help write something like that, but I know very >> little about the kernel or ZFS. I would certainly be willing to help with testing. > Google for "zfs deadman". This is already committed upstream and I think that it > is imported into FreeBSD, but I am not sure... Maybe it's imported just into the > vendor area and is not merged yet. > So, when enabled this logic would panic a system as a way of letting know that > something is wrong. You can read in the links why panic was selected for this job. > > And speaking FreeBSD-centric - I think that our CAM layer would be a perfect place > to detect such issues in non-ZFS-specific way. > I can try to merge the ZFS deadman stuff (r242732) to HEAD, but I guess this will be something for a 1-month MFC period. Afterwards, a 9-STABLE patch can be easily created. https://www.illumos.org/issues/3246 https://hg.openindiana.org/upstream/illumos/illumos-gate/rev/921a99998bb4 Cheers, mm -- Martin Matuska FreeBSD committer http://blog.vx.sk