From owner-freebsd-fs@freebsd.org Mon Sep 4 17:12:53 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4E0AE12654 for ; Mon, 4 Sep 2017 17:12:53 +0000 (UTC) (envelope-from bsd@vink.pl) Received: from mail-qk0-x231.google.com (mail-qk0-x231.google.com [IPv6:2607:f8b0:400d:c09::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9FC947EBAF for ; Mon, 4 Sep 2017 17:12:53 +0000 (UTC) (envelope-from bsd@vink.pl) Received: by mail-qk0-x231.google.com with SMTP id a128so3751926qkc.5 for ; Mon, 04 Sep 2017 10:12:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vink-pl.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=azghGHSQNpRrjJrGaajKEC2y7A/smi1DE+uo69sJa7M=; b=szW9ZlYLEp5Kaed5KBXK45hHzpDgjbjk7X8a5k8vCX/gB+WgKSn3bIBI2pi9Tw+EZm HfJs8EU2bB3KX24rXuaZAeTJUMhb7XPvPjC572WrmYwRgi5B/Ay4grRM2ZugJ8F7F3K2 PsupeFJUQGdgFgK+L7FTzXyI0rUpbY50Q9F+rgpUwLPH251E263Tzv4M3Xqqkzw82z3W wF9eXv5X0KWXCPZ+z/MC0w6on0nNaAnAuNnOtzP0MANGnFpT9Tz6JjnLMWpFagxJSGaA DLteKQHt5w+6ureLtpPBUM0oVmdHzq/fKswjng/WOrqlnictvCfE2QlELfI/bWbJVOco jXZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=azghGHSQNpRrjJrGaajKEC2y7A/smi1DE+uo69sJa7M=; b=EuZ5OyqixzVt+camjTrpYxA3o1ehO5ELVqOueBZCaViIp+WpkE+JuMpiJtenrQqSQd yWDPugBt2xQIQCCGHuMNDtgYHC1qE9uwxCE7yffCcYzu1UFRxvFDJWmJxrHIwgYmM9Jc HBQC5KgO5uulp/tLZ61MjhtnQe5et9lHr0OSsHZKmXXCBa/ZFuS8b+HnMkzu18BgpoAq QDN8HTutyVI6CC0XSoBJsR08soTGcDlYpO/TK5Dr4XZ3Q9Yl+XuIF+SD3Q5hXrTO+IMv 9h4Esx9SRGha9NiBOSU9/fxeivQX85eE2ZYry+b09hkT1VAJ5StfjXW+IVIVHCMTXS06 BcPw== X-Gm-Message-State: AHPjjUhIVU5Bx2ZjthHzZUzrxZA5sQS5yykb1LT1Jrm/bB3Zla9Q+hFA 4rrk/7fclqI4JvjUK5g= X-Received: by 10.55.87.135 with SMTP id l129mr1632503qkb.282.1504545172024; Mon, 04 Sep 2017 10:12:52 -0700 (PDT) Received: from mail-qt0-f170.google.com (mail-qt0-f170.google.com. [209.85.216.170]) by smtp.gmail.com with ESMTPSA id i51sm5415146qta.75.2017.09.04.10.12.51 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Sep 2017 10:12:51 -0700 (PDT) Received: by mail-qt0-f170.google.com with SMTP id m35so4098526qte.1 for ; Mon, 04 Sep 2017 10:12:51 -0700 (PDT) X-Google-Smtp-Source: ADKCNb67Y0NvBn8s1gz21jhu4CbbvfJPEeiLLAXHiYou2lPSn7Hdhy1o0ynSTBVPEHWIFop5jFy/s9c0Ht0GOHJmuwg= X-Received: by 10.200.42.243 with SMTP id c48mr1762375qta.106.1504545171366; Mon, 04 Sep 2017 10:12:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.156.197 with HTTP; Mon, 4 Sep 2017 10:12:50 -0700 (PDT) In-Reply-To: References: From: Wiktor Niesiobedzki Date: Mon, 4 Sep 2017 19:12:50 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Resolving errors with ZVOL-s To: freebsd-fs Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Sep 2017 17:12:54 -0000 Hi, I can follow up on my issue - the same problem just happened on the second ZVOL that I've created: # zpool status -v pool: tank state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://illumos.org/msg/ZFS-8000-8A scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 14 mirror-0 ONLINE 0 0 28 gpt/tank1.eli ONLINE 0 0 28 gpt/tank2.eli ONLINE 0 0 28 errors: Permanent errors have been detected in the following files: tank/docker-big:<0x1> <0x5095>:<0x1> I suspect that these errors might be related to my recent upgrade to 11.1. Until 19 of August I was running 11.0. I consider rolling back to 11.0 right now. For reference: # zfs get all tank/docker-big NAME PROPERTY VALUE SOURCE tank/docker-big type volume - tank/docker-big creation Sat Sep 2 10:09 2017 - tank/docker-big used 100G - tank/docker-big available 747G - tank/docker-big referenced 10.5G - tank/docker-big compressratio 4.58x - tank/docker-big reservation none default tank/docker-big volsize 100G local tank/docker-big volblocksize 128K - tank/docker-big checksum skein inherited from tank tank/docker-big compression lz4 inherited from tank tank/docker-big readonly off default tank/docker-big copies 1 default tank/docker-big refreservation 100G local tank/docker-big primarycache all default tank/docker-big secondarycache all default tank/docker-big usedbysnapshots 0 - tank/docker-big usedbydataset 10.5G - tank/docker-big usedbychildren 0 - tank/docker-big usedbyrefreservation 89.7G - tank/docker-big logbias latency default tank/docker-big dedup off default tank/docker-big mlslabel - tank/docker-big sync standard default tank/docker-big refcompressratio 4.58x - tank/docker-big written 10.5G - tank/docker-big logicalused 47.8G - tank/docker-big logicalreferenced 47.8G - tank/docker-big volmode dev local tank/docker-big snapshot_limit none default tank/docker-big snapshot_count none default tank/docker-big redundant_metadata all default tank/docker-big com.sun:auto-snapshot false local Any ideas what should I try before rolling back? Cheers, Wiktor 2017-09-02 19:17 GMT+02:00 Wiktor Niesiobedzki : > Hi, > > I have recently encountered errors on my ZFS Pool on my 11.1-R: > $ uname -a > FreeBSD kadlubek 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 > 11:55:48 UTC 2017 root@amd64-builder.daemonology > .net:/usr/obj/usr/src/sys/GENERIC amd64 > > # zpool status -v tank > pool: tank > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://illumos.org/msg/ZFS-8000-8A > scan: scrub repaired 0 in 5h27m with 0 errors on Sat Sep 2 15:30:59 2017 > config: > > NAME STATE READ WRITE CKSUM > tank ONLINE 0 0 98 > mirror-0 ONLINE 0 0 196 > gpt/tank1.eli ONLINE 0 0 196 > gpt/tank2.eli ONLINE 0 0 196 > > errors: Permanent errors have been detected in the following files: > > dkr-test:<0x1> > > dkr-test is ZVOL that I use within bhyve and indeed - within bhyve I have > noticed I/O errors on this volume. This ZVOL did not have any snapshots. > > Following the advice mentioned in action I tried to restore the ZVOL: > # zfs desroy tank/dkr-test > > But still errors are mentioned in zpool status: > errors: Permanent errors have been detected in the following files: > > <0x5095>:<0x1> > > I can't find any reference to this dataset in zdb: > # zdb -d tank | grep 5095 > # zdb -d tank | grep 20629 > > > I tried also getting statistics about metadata in this pool: > # zdb -b tank > > Traversing all blocks to verify nothing leaked ... > > loading space map for vdev 0 of 1, metaslab 159 of 174 ... > No leaks (block sum matches space maps exactly) > > bp count: 24426601 > ganged count: 0 > bp logical: 1983127334912 avg: 81187 > bp physical: 1817897247232 avg: 74422 compression: > 1.09 > bp allocated: 1820446928896 avg: 74527 compression: > 1.09 > bp deduped: 0 ref>1: 0 deduplication: 1.00 > SPA allocated: 1820446928896 used: 60.90% > > additional, non-pointer bps of type 0: 57981 > Dittoed blocks on same vdev: 296490 > > And zdb got stuck using 100% CPU > > And now to my questions: > 1. Do I interpret correctly, that this situation is probably due to error > during write, and both copies of the block got checksum mismatching their > data? And if it is a hardware problem, it is probably something other than > disk? (No, I don't use ECC RAM) > > 2. Is there any way to remove offending dataset and clean the pool of the > errors? > > 3. Is my metadata OK? Or should I restore entire pool from backup? > > 4. I tried also running zdb -bc tank, but this resulted in kernel panic. I > might try to get the stack trace once I get physical access to machine next > week. Also - checksum verification slows down process from 1000MB/s to less > than 1MB/s. Is this expected? > > 5. When I work with zdb (as as above) should I try to limit writes to the > pool (e.g. by unmounting the datasets)? > > Cheers, > > Wiktor Niesiobedzki > > PS. Sorry for previous partial message. > >