From owner-freebsd-fs@freebsd.org Thu Jun 8 21:13:04 2017 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1D5DEC79C48 for ; Thu, 8 Jun 2017 21:13:04 +0000 (UTC) (envelope-from tjg@ucsc.edu) Received: from mail-lf0-x233.google.com (mail-lf0-x233.google.com [IPv6:2a00:1450:4010:c07::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9200B71CA2 for ; Thu, 8 Jun 2017 21:13:03 +0000 (UTC) (envelope-from tjg@ucsc.edu) Received: by mail-lf0-x233.google.com with SMTP id o83so22807124lff.3 for ; Thu, 08 Jun 2017 14:13:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ucsc.edu; s=ucsc-google; h=mime-version:from:date:message-id:subject:to; bh=tcQ5wfEH+pqQ44wz/QszwO/cgsEyGlgUHjSgBsxxd4o=; b=jsOQjrhPonyMRru6zWDuKvLQXIvuEsL9Nu+9E4iMaAY+iH2J4iup1XQU3J0SW/y8rf LfBB1rM5bM11mAdz199N3BdIgH3209yqhqgtv1dwA4q4+kSCFdllhsVCGO5/VSdeYAw0 ZuDVhbUDxtu5xBOXsCEFXzif1silRekkpAFRs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=tcQ5wfEH+pqQ44wz/QszwO/cgsEyGlgUHjSgBsxxd4o=; b=eX4FHE/AmC6trgEDdR7WXCyK3cUS6oWgGoiATmY2548PfEOqvTi3TXyqb1uY7wYmSR OvxNNRAyQ22HPFYyBqGnn1iJZmasMsBx7Kwen8cxk/NnGSGgUCQDAWw+tVhtDk3GeIto 3NYrNZGelKSqIUwrO6bSN2Z45+bv9owZKNXFggwLDcIuoR8AX297o1x4SJ3vN/XydZBm ap52X2o2Gknd8P9xiv2ejF5syJAVkwkYH7ksF6B6ZvwmPjSo6QCPBpQJVdH5tDYzm06U BQ4LxE5Jd4oSfSDn9BC0Ad6Nvve5hXofOKmQJliOzQqz6JUiB5PoOmH8gwZU5lF9XAjD kUYw== X-Gm-Message-State: AODbwcC3ZFJdrZi5HtjShhnKIl//8nXI/G71eTSyfZfK3uvbssE2dQE0 VLf3ereaWQAXFo1u0gCiqS7s0MRyIHfDz09iLQ== X-Received: by 10.25.80.71 with SMTP id z7mr11494711lfj.16.1496956381440; Thu, 08 Jun 2017 14:13:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.67.4 with HTTP; Thu, 8 Jun 2017 14:13:01 -0700 (PDT) From: Tim Gustafson Date: Thu, 8 Jun 2017 14:13:01 -0700 Message-ID: Subject: ZFS Commands In "D" State To: freebsd-fs@freebsd.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Jun 2017 21:13:04 -0000 We have a ZFS server that we've been running for a few months now. The server is a backup server that receives ZFS sends from its primary daily. This mechanism has been working for us on several pairs of servers for years in general, and for several months with this particular piece of hardware. A few days ago, our nightly ZFS send failed. When I looked at the server, I saw that the "zfs receive" command was in a "D" wait state: 1425 - D 0:02.75 /sbin/zfs receive -v -F backup/export I rebooted the system, checked that "zpool status" and "zfs list" both came back correctly (which they did) and then re-started the "zfs send" on the master server. At first, the "zfs receive" command did not enter the "D" state, but once the master server started sending actual data (which I was able to ascertain because I was doing "zfs send" with the -v option), the receiving process entered the "D" state again, and another reboot was required. Only about 2MB of data got sent before this happened. I've rebooted several times, always with the same result. I did a "zpool scrub os" (there's a separate zpool for the OS to live on) and that completed in a few minutes, but when I did a "zpool scrub backup", that process immediately went into the "D+" state: 895 0 D+ 0:00.04 zpool scrub backup We run smartd on this device, and that is showing no disk errors. The devd process is logging some stuff, but it doesn't appear to be very helpful: Jun 8 13:52:49 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=11754027336427262018 Jun 8 13:52:49 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=11367786800631979308 Jun 8 13:52:49 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=18407069648425063426 Jun 8 13:52:49 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=9496839124651172990 Jun 8 13:52:49 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=332784898986906736 Jun 8 13:52:50 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=16384086680948393578 Jun 8 13:52:50 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=10762348983543761591 Jun 8 13:52:50 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=8585274278710252761 Jun 8 13:52:50 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=17456777842286400332 Jun 8 13:52:50 backup ZFS: vdev state changed, pool_guid=2176924632732322522 vdev_guid=10533897485373019500 No word on which state it changed "from" or "to". Also, the system only has three vdevs (the OS one, and then two raidz2 vdevs that make up the "backup" pool, so I'm not sure how it's coming up with more than 3 vdev GUIDs). What's my next step in diagnosing this? -- Tim Gustafson BSOE Computing Director tjg@ucsc.edu 831-459-5354 Baskin Engineering, Room 313A To request BSOE IT support, please visit https://support.soe.ucsc.edu/ or send e-mail to help@soe.ucsc.edu.