From owner-freebsd-fs@freebsd.org Mon Jun 27 15:57:25 2016 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0357BB841B6 for ; Mon, 27 Jun 2016 15:57:25 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-qt0-x232.google.com (mail-qt0-x232.google.com [IPv6:2607:f8b0:400d:c0d::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B0B8928C4 for ; Mon, 27 Jun 2016 15:57:24 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by mail-qt0-x232.google.com with SMTP id f89so24124006qtd.2 for ; Mon, 27 Jun 2016 08:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=QY93Sg3rnyvq2dF+wr45XkOiTsJglLxdyWtikHS9qQk=; b=Au0/ykQVHaPsi5bhjbPfhsJNb8LWkgpJ2bco1qqBUpMvFmkMO7iXFvzZI7SEvwBxym i8Rl+PUxE102IYFysIEYOF+1z9wweuziWR5hYkvVKPjF8zaKbYRO3TAKtI6Fffh2ITG4 9Cp6d4WnWYFbMFh/acZZbVFd2VYSuklOdaomtA9g6ZnnPUZobLxvvAVWGKBz/End+ap9 oLSLGsa+r/orRYvk5xnlhLNfSIvQxrf2UYiMARXENSbssej5hNk2XCDapRLRcp1oPTyb 5oGCERlur2rNKR70yxvo+whlyoEJluZPjgDzazlJ3LemyvVm2nXqgIJx9WizGKaNF9zw kT/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=QY93Sg3rnyvq2dF+wr45XkOiTsJglLxdyWtikHS9qQk=; b=D9s1F+wvVWphwsBlOroqlvEgSUUJCbA/YruUrdo3nJPpk/t28v8Oyls7BJG7T3S+3s Ej0diRyfnULQfdmJTiedF0qwHyKtOtaW8YhxgxI7hfHNNGPfa9kE2lbLIqfaG12svMwq YeJ8pxTfrDqsKEe4g1zV7Zg372NLeCHHRQTrOOMOgtXmAEL24IQdqjAjls8Lc4lxcIP1 qpyuLh5flg90JqkCGE+bM/zI30+rB9ezr6CzZh61QsW1Sv+RDIUk9mrTQVjUzjU56JHT qXyE9OseQ+og5fted+TsJGO/AN8p7OZs8YfDcXeVmHUFnEZvNZBNUFEdTpgwQAD6pnlm jLqA== X-Gm-Message-State: ALyK8tItYYF7L7jgJfASONhGmlvakOVuEbCbiKjVChIH5C4ZB9b441nbV8m1Ie9sjjVxd2OjNV4VVieUf4YAtA== MIME-Version: 1.0 X-Received: by 10.237.54.5 with SMTP id e5mr23980153qtb.41.1467043034265; Mon, 27 Jun 2016 08:57:14 -0700 (PDT) Received: by 10.200.56.93 with HTTP; Mon, 27 Jun 2016 08:57:14 -0700 (PDT) Received: by 10.200.56.93 with HTTP; Mon, 27 Jun 2016 08:57:14 -0700 (PDT) In-Reply-To: <9DF3E719-5184-419E-B81A-599D5ECCD969@freyther.de> References: <8a4cb87252c04ebfbf71451c5dc1a41e@exch2-4.slu.se> <9DF3E719-5184-419E-B81A-599D5ECCD969@freyther.de> Date: Mon, 27 Jun 2016 08:57:14 -0700 Message-ID: Subject: Re: Deadlock in zpool import with degraded pool From: Freddie Cash To: Holger Freyther Cc: =?UTF-8?Q?Karli_Sj=C3=B6berg?= , FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 27 Jun 2016 15:57:25 -0000 On Jun 27, 2016 8:21 AM, "Holger Freyther" wrote: > > > > On 26 Jun 2016, at 19:28, Karli Sj=C3=B6berg wro= te: > > > > Hi, > > > > > That's your problem right there; dedup! You need to throw more RAM into it until the destroy can complete. If the mobo is 'full', you need new/other hw to cram more RAM into or you can kiss your data goodbye. I've been in the exact same situation as you are now so I sympathize:( > > did you look at it further? > > * Why does it only start after I zfs destroyed something? The dedup hash/table/??? grows by that? Because every reference to every deleted block needs to be updated (decremented) in the DDT (dedupe table), which means the DDT needs to be pulled into ARC first. It's the pathological case for RAM use with dedupe enabled. :( > * Why a plain dead-lock and no panic? It's stuck trying to free RAM for ARC to load the DDT. > * Is there an easy way to see how much RAM is needed? (In the end I can use Linux/KVM with RAM backed in a file/disk and just wait...) There's a zdb command (-S or something like that) that will show the block distribution in the DDT, along with how many unique data blocks there are. You need approx 1 GB of ARC per TB of unique data, over and above any other RAM requirements for normal operation. And then double that for deleting snapshots. :( > * Would you know if zpool import -o readonly avoids loading/building that big table? From common sense this block table would only be needed on write to map from checksum to block? If you are in the "hang on import due to out-of-memory" situation, the only solution is to add more RAM (if possible) and just keep rebooting the server. Every import process will delete a little more data from the pool, update a little more of the DDT, and eventually the destroy process will complete, and the pool will be imported. The longest one for me took a little over a week of rebooting the server multiple times per day. :( We've since moved away from using dedupe. It was a great feature to have when we could only afford 400 GB drives and could get 3-5x convinced compress + dedupe ratios. Now that we can get 4-8 TB drives, it's not worth it. Cheers, Freddie