From owner-freebsd-fs@FreeBSD.ORG Fri May 22 19:10:17 2015 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A72BDBF3 for ; Fri, 22 May 2015 19:10:17 +0000 (UTC) Received: from mail-lb0-x22c.google.com (mail-lb0-x22c.google.com [IPv6:2a00:1450:4010:c04::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1438E1FC0 for ; Fri, 22 May 2015 19:10:17 +0000 (UTC) Received: by lbcmx3 with SMTP id mx3so19204439lbc.1 for ; Fri, 22 May 2015 12:10:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=OYrmMQI+LOyF/WZ3XhD17Kxa2VFJWhaFq1r/iQCMF2Q=; b=wL6MnhWCdnEv/TY9n76fq0LY5i25aScvLLk4aplJ6hurFUc9qjo7QFxpjbpZ2t0wUt FRNxDPOr/tO4eD79YIqYuhM7VbhJG1MnTmvtNDTdOpKgv+UyOL9n83WvcY+U6hxZa0C7 SlwcjNOjgY86BjwpsB7VP6wQDxzbrl7NRAyaX6TiFeYz+TqFbsyUMKNdlZtpYnzTJTJ+ dCbiR/SxyCBQwoi7ZlNJbdQ4f6b/popXipxvktqr8bpnrE3iu2bMpRKeCypjYUsSn6Xi A8GPqFboaFLRItWwIrN/EjyAS0QbQjC0+S31A1GysPAafQDc/eqlGNMv2Nex8Cuwty0z BP/w== MIME-Version: 1.0 X-Received: by 10.152.87.13 with SMTP id t13mr7779445laz.66.1432321815161; Fri, 22 May 2015 12:10:15 -0700 (PDT) Received: by 10.25.210.79 with HTTP; Fri, 22 May 2015 12:10:15 -0700 (PDT) Date: Fri, 22 May 2015 14:10:15 -0500 Message-ID: Subject: zpool on Dell MD3000 causes frequent hangs From: Thomas Johnson To: freebsd-fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 May 2015 19:10:17 -0000 Hello, I am trying to track down an ongoing issue that I've been having, and looking for any suggestions on a possible cause, or suggestions on how I might troubleshoot further. The issue seems to be related to a Dell MD3000 storage array, which contains a zpool. It seems that the host attached to the array will occasionally hang, usually during periods of high disk activity (annoyingly, usually about 0300). When the system hangs, I can ping the host, and switch between virtual consoles (but not interact with them). The system is otherwise unresponsive; with no errors reported on the console or logs. The only remedy I have found is to hard-reset the host. I believe this issue is tied to the MD3000. I have tried swapping out SAS cables, HBAs, the controller on the MD3000, and the host itself. I have updated all the firmware I can find. Before I upgraded the host OS to FreeBSD 10.1 (from 10.0) last month, I experienced hangs about once a month. Since the upgrade, I have seen several events per week. In addition to the MD3000, I have a set of USB drives that are used in a rotation as offsite backups for the zpool. I have seen a number of hang events during zfs send/receive transfers to the USB disk. After the most recent hang, I removed two [consumer] SSDs from the pool that were being used as cache devices. It is too early to tell if this change had any impact. Here is some of the pertinent output from the host. I can provide any other information that would be helpful. root@leopard:/home/tom-> uname -a FreeBSD leopard 10.1-RELEASE-p9 FreeBSD 10.1-RELEASE-p9 #0 r281232: Tue Apr 7 17:38:04 CDT 2015 root@cheshire-b:/pkg/base/obj_10.1-RELEASE-p9/pkg/base/src_10.1-RELEASE-p9/sys/GENERIC amd64 root@leopard:/home/tom-> zpool list NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH ALTROOT backup 5.31T 3.61T 1.70T 22% - 68% 1.00x ONLINE - jumpdrive_f 2.72T 2.04T 693G 30% - 75% 1.00x ONLINE - root@leopard:/home/tom-> zpool status backup pool: backup state: ONLINE scan: scrub repaired 0 in 13h15m with 0 errors on Wed May 13 16:17:29 2015 config: NAME STATE READ WRITE CKSUM backup ONLINE 0 0 0 da0 ONLINE 0 0 0 errors: No known data errors root@leopard:/home/tom-> zpool get all backup NAME PROPERTY VALUE SOURCE backup size 5.31T - backup capacity 68% - backup altroot - default backup health ONLINE - backup guid 12638712474922952450 default backup version - default backup bootfs - default backup delegation on default backup autoreplace off default backup cachefile - default backup failmode wait default backup listsnapshots off default backup autoexpand off default backup dedupditto 0 default backup dedupratio 1.00x - backup free 1.70T - backup allocated 3.61T - backup readonly off - backup comment - default backup expandsize 0 - backup freeing 0 default backup fragmentation 22% - backup leaked 0 default backup feature@async_destroy enabled local backup feature@empty_bpobj active local backup feature@lz4_compress active local backup feature@multi_vdev_crash_dump enabled local backup feature@spacemap_histogram active local backup feature@enabled_txg active local backup feature@hole_birth active local backup feature@extensible_dataset enabled local backup feature@embedded_data active local backup feature@bookmarks enabled local backup feature@filesystem_limits enabled local