From owner-freebsd-current@freebsd.org Thu Aug 4 05:22:53 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0E46CBADBD3 for ; Thu, 4 Aug 2016 05:22:53 +0000 (UTC) (envelope-from ultima1252@gmail.com) Received: from mail-yw0-x242.google.com (mail-yw0-x242.google.com [IPv6:2607:f8b0:4002:c05::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C4D1617E2 for ; Thu, 4 Aug 2016 05:22:52 +0000 (UTC) (envelope-from ultima1252@gmail.com) Received: by mail-yw0-x242.google.com with SMTP id j12so18675809ywb.1 for ; Wed, 03 Aug 2016 22:22:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=nAvocbrYq60SleyGRfo2FzZ+s0+h/1/y1Z8hnSWpL4E=; b=a2nl9i5JXR2JvabtpCqbpOv4Rl0SB8hWOw3XiDe6k7ON1vtj2vSzc6gRMk5oFtr7L/ uy+A0XcnymQgsvxgbG9wV6P+tP/QBZfH9K0EauQiV7gktAC+3PffRwvBNT10eIPF5Gku z1k0F8+EFLjAk2oDxDBhdvqeKkAv4oVimA46EDI5nD7bApRjMT66T81uSzg3oyC6J3Dc TE1LCC8cCq8nu+kP51slKur0qf/tS5JuxvHtyxI4FkmjiqaIJH2PRcw3nWZ74C5Arm+s OJYXFIWRQ3s8ED1KzQHeeXROtLs3yTN1EZXsUlBGU5EGsgvG1lIMJyXknyCY/IRw5q0c kxgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=nAvocbrYq60SleyGRfo2FzZ+s0+h/1/y1Z8hnSWpL4E=; b=WN1rO+jW0qpS63WpNf/tGTp1gVAVUdQ0Nrjs341+vmxtpnqzz5plig0EKkch7JBIIg SscSrriAGpiIcSP7RGmqooFbWLvNzVEgVQNLQ3Vp20oLgTa2e5KWwXqabETL7I4ac/qc miT8M8RBKg3pOamXH/cO+TMK3Ur8Tb+Ubua5MJqx5oLyWQdIHWawENGDsFbU8YR/2Tdx EcqMCMbj3EvYgDKeXk04AvV5eKo3GjWNFih0t0TdCQUwDsSQWwJTz6Tkyf3cubp+cQR1 DBCsjzAlqnNqD8KjeEx1jmv0gTYH5NOyLIhtP71WB4kyUMZygn7dAWmC201mns/UBDk6 cDPQ== X-Gm-Message-State: AEkoouu7uvBN4agymLv0itPiOa+SWkseGMwB8d24pFzF2jtrpgRP5KQruyAzL5UrXVTkBDnWwXGWhxkea8Bobg== X-Received: by 10.129.129.134 with SMTP id r128mr46420121ywf.179.1470288171800; Wed, 03 Aug 2016 22:22:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.51.150 with HTTP; Wed, 3 Aug 2016 22:22:51 -0700 (PDT) From: Ultima Date: Thu, 4 Aug 2016 01:22:51 -0400 Message-ID: Subject: Possible zpool online, resilvering issue To: freebsd-current@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Aug 2016 05:22:53 -0000 Hello, I recently had some issue with a PSU and ran several scrubs on a pool with around 35T. Random drives would drop and require a zpool online, this found checksum errors. (as expected) However, after all the scrubs I ran, I think I may have found a bug with zpool online resilvering process. 24 disks total, 4 vdevs raidz2 (6 drives each). Before this next part... I had a backup PSU, however it was also going bad and waiting for RMA. The current one seemed to be dieing but ran fine with less drives. So I decided I would run the server short 4 drives. Started by offline(or already removed from psu) 4 drives from different vdevs, then ran a scrub to verify everything. Many sum errors were present on some of the drives, but this was expected due to faulty psu. Then offlined 4 different drives and onlined the other 4 and scrubbed once again. After resilver, again, many sum errors on these drives as expected. After the scrub completed, I decided to offline 4 different drives, then online the ones that were out of pool for awhile. During the resilver, checksum errors were once again found. I was surprised due to the recent scrub, So I decided to run another scrub, and it found even more checksum errors on these recently onlined drives. I didn't think much about it, however after the replacement PSU arrived, I onlined all the drives out of pool and again, resilver had checksum errors as well as another scrub with more sum errors. Is this issue known? Is it common for a scrub to be required after onlining a disk that was out of pool for some time? The drives are ST4000NM0033, and until recent have never had a single checksum error in they're lifetime.(at least with zfs) FreeBSD S1 12.0-CURRENT FreeBSD 12.0-CURRENT #19 r303224: Sat Jul 23 10:41:12 EDT 2016 root@S1:/usr/src/head/obj/usr/src/head/src/sys/MYKERNEL-NODEBUG amd64 Sorry for the wall of text, but I hope this helps in tracking down this possible bug. Ultima