From owner-freebsd-current@FreeBSD.ORG Tue Apr 20 06:57:24 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 15430106564A for ; Tue, 20 Apr 2010 06:57:24 +0000 (UTC) (envelope-from ehrmann@gmail.com) Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) by mx1.freebsd.org (Postfix) with ESMTP id E27438FC17 for ; Tue, 20 Apr 2010 06:57:23 +0000 (UTC) Received: from [10.0.0.171] (unknown [64.9.241.228]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 88EE0509B4 for ; Tue, 20 Apr 2010 02:57:22 -0400 (EDT) Message-ID: <4BCD5049.8030408@gmail.com> Date: Mon, 19 Apr 2010 23:57:13 -0700 From: David Ehrmann User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-current@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Strange disk problem X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2010 06:57:24 -0000 Initially, I noticed a problem where reading a file on this machine seemed to stop--something like a video would just stop playing. At first, I thought it was the machine, but a new motherboard, CPU, and RAM later, the problem persists. The network card uses a different chipset, too. The files are on zfs, but scrubs are fine, and zpool status lists no errors of any kind. Trying to reproduce the problem, I set up a script that reading a random 1M block every 60 seconds off the drive backing zfs. That's when I noticed something: one disk seems to be causing the problems. I logged the dd times, and some of them were huge--more than a minute. The times on the other disk in the mirrored vdev were low. I've only seen the problem when I have a vm's disk image hosted on the machine. That said, the network interface is configured at 100mbps, so there's no reason for that to saturate the disk's throughput. Top reports that almost 20% of the CPU is going towards interrupts. I can read a file off the zfs pool at over 50MB/s, so that shouldn't be a problem. One thing I'm wondering is why the disk read doesn't timeout quickly? At least that way zfs could try to use the other drive in the mirrored vdev. Any ideas? One thing I should try is switching the drive, see if the problem follows the disk or stays with the lowest /dev/adX device. I'm using geli, but the read problems happen with both /dev/adX AND /dev/adX.eli., so I don't think that's it. I've seen the problem with Samba, NFS, and dd. Thanks in advance.