From owner-freebsd-bugs@freebsd.org Wed Oct 18 10:48:07 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E57DAE34408 for ; Wed, 18 Oct 2017 10:48:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CE03D73F7D for ; Wed, 18 Oct 2017 10:48:07 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id v9IAm7Wt015048 for ; Wed, 18 Oct 2017 10:48:07 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 223085] ZFS Resilver not completing - stuck at 99% Date: Wed, 18 Oct 2017 10:48:07 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: paul@vsl-net.com X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-bugs@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Oct 2017 10:48:08 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D223085 Bug ID: 223085 Summary: ZFS Resilver not completing - stuck at 99% Product: Base System Version: 10.2-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: paul@vsl-net.com I have a number of FreeBSD system with large (30TB) ZFS pools. I have had several disks fail over time and have seen problems with resilve= rs either not completing or getting to 99% within a week but then taking a fur= ther month to complete. I have been seeking advice in the forums. https://forums.freebsd.org/threads/61643/#post-355088 A system that has a disk replaced some time ago is in this state pool: s11d34 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Sep 14 15:08:15 2017 49.4T scanned out of 49.8T at 17.7M/s, 6h13m to go 4.93T resilvered, 99.24% done config: NAME STATE READ WRITE CKSUM s11d34 DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 multipath/J11F18-1EJB8KUJ ONLINE 0 0 0 multipath/J11R01-1EJ2XT4F ONLINE 0 0 0 multipath/J11R02-1EHZE2GF ONLINE 0 0 0 multipath/J11R03-1EJ2XTMF ONLINE 0 0 0 multipath/J11R04-1EJ3NK4J ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 multipath/J11R05-1EJ2Z8AF ONLINE 0 0 0 multipath/J11R06-1EJ2Z8NF ONLINE 0 0 0 replacing-2 OFFLINE 0 0 0 7444569586532474759 OFFLINE 0 0 0 was /dev/multipath/J11R07-1EJ03GXJ multipath/J11F23-1EJ3AJBJ ONLINE 0 0 0=20 (resilvering) multipath/J11R08-1EJ3A0HJ ONLINE 0 0 0 multipath/J11R09-1EJ32UPJ ONLINE 0 0 0 It got to 99.24% within a week but has stuck there since. I have stopped ALL access to the pool and ran zpool iostat and there is sti= ll activity (although low e.g. 1.2M read, 1.78M write etc...) so it does appea= r to be doing something. The disks (6TB or 8TB HGST SAS) are attached via an LSI 9207-8e HBA which is connected to a LSI 6160 SAS Switch that is connected to a Supermicro JBOD. The HBA's have 2 connectors, each is connected to a different SAS switch. The system sees the disk twice as expected and I use gmultipath to label the disks and set in Active/Passive mode, I then use the multipath name during zpool create e.g. root@freebsd04:~ # gmultipath status Name Status Components multipath/J11R00-1EJ2XR5F OPTIMAL da0 (ACTIVE) da11 (PASSIVE) multipath/J11R01-1EJ2XT4F OPTIMAL da1 (ACTIVE) da12 (PASSIVE) multipath/J11R02-1EHZE2GF OPTIMAL da2 (ACTIVE) da13 (PASSIVE) zpool create -f store43 raidz2 multipath/J11R00-1EJ2XR5F multipath/J11R01-1EJ2XT4F etc....... Any advice if this is a bug or something wrong with my setup? Thanks Paul --=20 You are receiving this mail because: You are the assignee for the bug.=