From owner-freebsd-scsi@FreeBSD.ORG Wed Jan 12 13:19:35 2011 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F2A68106566C for ; Wed, 12 Jan 2011 13:19:34 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 80D558FC17 for ; Wed, 12 Jan 2011 13:19:34 +0000 (UTC) Received: by fxm16 with SMTP id 16so568907fxm.13 for ; Wed, 12 Jan 2011 05:19:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:sender:message-id:date:from:user-agent :mime-version:to:cc:subject:references:in-reply-to :x-enigmail-version:content-type:content-transfer-encoding; bh=PPBcQllHqKt2Cw+5QtjlFidfBLdBkQN8mT0R8ZHsF0c=; b=ax/U5pNHSwbwEu/7WhvEonenbTDOPs43IBDK/Jjd4G17WGE9DobMnhWXPCwGvo8Moh c3PI9RzOYfOn0YWx+EQI8I2ADGxahTXTnG8FHS+2YBSxmxoWSz6gYhNbXMZgf6tDOmT2 rqayU5NJW34pzUOnBwF7PLlG/UVxYIN/jslAU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=eRNiEQRuQKX7PWx9POIhM0iGJyLiYmtxdkaQN/nNdxYy7JzmsbvXYoiHrvav00AgIE YMOre29mbA+9vH2Letq91kvvjfEEQHyw0fo99kTU4oqD17w4IxNiCbJaBLoHLQU4Vo0G 4IKsWQN0YtPoG+Gtrd3544EQ5L24HeCD+WndI= Received: by 10.223.74.200 with SMTP id v8mr948919faj.144.1294838373424; Wed, 12 Jan 2011 05:19:33 -0800 (PST) Received: from mavbook2.mavhome.dp.ua (pc.mavhome.dp.ua [212.86.226.226]) by mx.google.com with ESMTPS id n7sm196266fam.35.2011.01.12.05.19.31 (version=SSLv3 cipher=RC4-MD5); Wed, 12 Jan 2011 05:19:32 -0800 (PST) Sender: Alexander Motin Message-ID: <4D2DAA45.30602@FreeBSD.org> Date: Wed, 12 Jan 2011 15:19:01 +0200 From: Alexander Motin User-Agent: Thunderbird 2.0.0.23 (X11/20091212) MIME-Version: 1.0 To: Joachim Tingvold References: In-Reply-To: X-Enigmail-Version: 0.96.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-scsi@freebsd.org Subject: Re: mps0-troubles X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 Jan 2011 13:19:35 -0000 Joachim Tingvold wrote: > I'm not sure if this is the proper place to ask for help regarding this, > but here it goes; > > I've got 17 disks connected to a HP SAS expander, which again is > connected to a LSI SAS 9211-8i HBA. I also have 1 system-disk that's > connected directly to the SATA-controller on the motherboard. This is > running on FreeBSD 9.0-CURRENT-201012. > > I'm running ZFS on root (referred to as "zroot"), and also on the 17 > disks connected to the LSI-controller (6x2TB raid-z2 + 10x1TB raid-z + 1 > hot-spare, referred to as "storage"). > > This setup has been running fine since around christmas, but today, when > I was moving some files from the zroot to storage, it failed. First, the > moving went just fine (I was looking at gstat while it was copying), but > then no activity (even though I knew it wasn't done -- there was a lot > of large files). Trying to list any files on the storage-volume didn't > work (CTRL+C didn't work either, I had to quit the terminal). The > mv-process was still running, even though there was no disk-activity; > > [jocke@filserver ~]$ ps aux | grep mv > root 33698 0,0 0,1 10048 2132 0- D+ 11:35am > 0:01,66 mv -PRp -- JAG /storage/series/JAG (cp) > > I've extracted the relevant lines from dmesg since the machine booted on > sunday; . > > After a while (couple of minutes), I could list files on the > storage-volume, and ZFS reported no problems. Then, after a few new > minutes, I could not list anything on the storage-volume, and it's been > like that since (ZFS and dmesg reports no further errors, though). > > I mentioned that the mv-process is still running; it won't die, but I > guess that's because it has the D-flag (disk wait). > > [root@filserver ~]# kill -9 33698 > [root@filserver ~]# ps aux | grep mv > root 33698 0,0 0,1 10048 2132 0- D+ 11:35am 0:01,66 mv -PRp -- JAG > /storage/series/JAG (cp) > > This isn't really my field of expertise, so I'm hoping that someone here > on the list might enlighten me. (-: dmesg you've shown shown many command timeouts on multiple devices. As soon as default ATA timeout is about 30 seconds - it may cause significant delays before recovery sequence will manage it. That could result in delays you observed. What's more suspicious is that timeouts happened same time on AHCI-attached disk and several disks on mps controller. I can hardly assume that two completely different controllers and drivers triggered some unrelated problems simultaneously. I would suggest to check your power supplies, cables, backplanes and other mechanical things. -- Alexander Motin