From owner-freebsd-fs@freebsd.org Tue Nov 21 20:52:27 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A6BFD945BB for ; Tue, 21 Nov 2017 20:52:27 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lf0-f48.google.com (mail-lf0-f48.google.com [209.85.215.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1907C7FA19; Tue, 21 Nov 2017 20:52:26 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lf0-f48.google.com with SMTP id f134so15686342lfg.8; Tue, 21 Nov 2017 12:52:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=xC5r2kZ5PK/h/uUxzN6KswOYCN5NAmIe2mXwX0l9m2Q=; b=B930NR8qM3oyfzILeSeFpKri3flmLOwfbeTH88hnHWuldPsr1A7l6ExMbB9KbQRV/J dFaMZFWqjTRDh6hKd1feFJ/HZpb4MI2zow1e2bM5msHOPwl/rujDiyFxnAEymF5bVQhO JxegVFgVN5bElI0kXCEOh4vFFAFxBtkGiG+WxBA6b0I+vZs6MyojHcmVSlcYpRKH3QgJ rXGZTSLfjrhBJaK+W5if1sX8O8K+L1Ox62nBmZIkYD6S8+9MyZVfVBAAx44yUBgJqm8N fYyAJmmzLF3i0bmcmSMdo8qEvJQLIpiosD0XV8B+gnDd4tW5+r02wkFveZURtP/R/aiJ pmIA== X-Gm-Message-State: AJaThX7uT2FdjhQPAU4ACYyjKZyZ2Ltjb7ZTe25kSVqb5TP0MKXD4nar Q0XEY2GvVj3iaKRcFr7Gvx0qPwiea0g= X-Google-Smtp-Source: AGs4zMYO0id+br4hcCet8Gfl363LqNCYGpRmPwLesGk5sh/FREO+zUskPlXZ4ZGbnHD0mHFz7TqqeA== X-Received: by 10.46.32.230 with SMTP id g99mr4869013lji.147.1511297184676; Tue, 21 Nov 2017 12:46:24 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id h28sm3279395ljb.30.2017.11.21.12.46.23 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Nov 2017 12:46:23 -0800 (PST) Subject: Re: zio_done panic in 10.3 To: Shiva Bhanujan , "cem@freebsd.org" Cc: "freebsd-fs@freebsd.org" References: <3A5A10BE32AC9E45B4A22F89FC90EC0701C367D3D1@QLEXC01.Quorum.local> <5021a016-9193-b626-78cf-54ffa3929e22@FreeBSD.org> <3A5A10BE32AC9E45B4A22F89FC90EC0701C367D562@QLEXC01.Quorum.local> <3A5A10BE32AC9E45B4A22F89FC90EC0701C367D636@QLEXC01.Quorum.local> From: Andriy Gapon Message-ID: <41e2465d-e1b5-33ce-57b5-49bea6087d9a@FreeBSD.org> Date: Tue, 21 Nov 2017 22:46:22 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <3A5A10BE32AC9E45B4A22F89FC90EC0701C367D636@QLEXC01.Quorum.local> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Nov 2017 20:52:27 -0000 On 21/11/2017 21:30, Shiva Bhanujan wrote: > it did get compressed to 0.5G - still too big to send via email. I did send some more debug information by running kgdb on the core file to Andriy, and I'm waiting for any analysis that he might provide. Yes, kgdb-over-email turned out to be a far more efficient compression :-) I already have an analysis based on the information provided by Shiva and by another user who has the same problem and contacted me privately. I am discussing possible ways to fix the problem with George Wilson who was very kind to double-check the analysis, complete it and suggest possible fixes. A short version is that dbuf_prefetch and dbuf_prefetch_indirect_done functions chain new zio-s under the same parent zio (a completion of one child zio may create another child zio). They do it using arc_read which can create either a logical zio in most cases or a vdev zio for a read from a cache device (2arc). zio_done() has a check for the completion of a parent zio's children but that check is not completely safe and can be broken by the pattern that dbuf_prefetch can create. So, under some specific circumstances the parent zio may complete and get destroyed while there is a child zio. I believe this problem to be rather rare, but there could be configurations and workloads where it's triggered more often. The problem does not happen if there are no cache devices. > From: Conrad Meyer [cem@freebsd.org] > > Sent: Tuesday, November 21, 2017 9:04 AM > > To: Shiva Bhanujan > > Cc: Andriy Gapon; freebsd-fs@freebsd.org > > Subject: Re: zio_done panic in 10.3 > > > > > > > > Have you tried compressing it with e.g. xz or zstd? > > -- Andriy Gapon