Date: Wed, 12 Jan 2011 15:19:01 +0200 From: Alexander Motin <mav@FreeBSD.org> To: Joachim Tingvold <joachim@tingvold.com> Cc: freebsd-scsi@freebsd.org Subject: Re: mps0-troubles Message-ID: <4D2DAA45.30602@FreeBSD.org> In-Reply-To: <mailpost.1294832739.2809102.16331.mailing.freebsd.scsi@FreeBSD.cs.nctu.edu.tw> References: <mailpost.1294832739.2809102.16331.mailing.freebsd.scsi@FreeBSD.cs.nctu.edu.tw>
next in thread | previous in thread | raw e-mail | index | archive | help
Joachim Tingvold wrote: > I'm not sure if this is the proper place to ask for help regarding this, > but here it goes; > > I've got 17 disks connected to a HP SAS expander, which again is > connected to a LSI SAS 9211-8i HBA. I also have 1 system-disk that's > connected directly to the SATA-controller on the motherboard. This is > running on FreeBSD 9.0-CURRENT-201012. > > I'm running ZFS on root (referred to as "zroot"), and also on the 17 > disks connected to the LSI-controller (6x2TB raid-z2 + 10x1TB raid-z + 1 > hot-spare, referred to as "storage"). > > This setup has been running fine since around christmas, but today, when > I was moving some files from the zroot to storage, it failed. First, the > moving went just fine (I was looking at gstat while it was copying), but > then no activity (even though I knew it wasn't done -- there was a lot > of large files). Trying to list any files on the storage-volume didn't > work (CTRL+C didn't work either, I had to quit the terminal). The > mv-process was still running, even though there was no disk-activity; > > [jocke@filserver ~]$ ps aux | grep mv > root 33698 0,0 0,1 10048 2132 0- D+ 11:35am > 0:01,66 mv -PRp -- JAG /storage/series/JAG (cp) > > I've extracted the relevant lines from dmesg since the machine booted on > sunday; <http://home.komsys.org/~jocke/dmesg_mps0_freebsd-scsi.txt>. > > After a while (couple of minutes), I could list files on the > storage-volume, and ZFS reported no problems. Then, after a few new > minutes, I could not list anything on the storage-volume, and it's been > like that since (ZFS and dmesg reports no further errors, though). > > I mentioned that the mv-process is still running; it won't die, but I > guess that's because it has the D-flag (disk wait). > > [root@filserver ~]# kill -9 33698 > [root@filserver ~]# ps aux | grep mv > root 33698 0,0 0,1 10048 2132 0- D+ 11:35am 0:01,66 mv -PRp -- JAG > /storage/series/JAG (cp) > > This isn't really my field of expertise, so I'm hoping that someone here > on the list might enlighten me. (-: dmesg you've shown shown many command timeouts on multiple devices. As soon as default ATA timeout is about 30 seconds - it may cause significant delays before recovery sequence will manage it. That could result in delays you observed. What's more suspicious is that timeouts happened same time on AHCI-attached disk and several disks on mps controller. I can hardly assume that two completely different controllers and drivers triggered some unrelated problems simultaneously. I would suggest to check your power supplies, cables, backplanes and other mechanical things. -- Alexander Motin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4D2DAA45.30602>