Date: Sun, 18 Jul 2010 19:58:39 -0700 From: Jeremy Chadwick <freebsd@jdc.parodius.com> To: Mike Tancsa <mike@sentex.net> Cc: freebsd-stable@freebsd.org Subject: Re: deadlock or bad disk ? RELENG_8 Message-ID: <20100719025839.GA91809@icarus.home.lan> In-Reply-To: <20100719023419.GA91006@icarus.home.lan> References: <201007182108.o6IL88eG043887@lava.sentex.ca> <20100718211415.GA84127@icarus.home.lan> <201007182142.o6ILgDQW044046@lava.sentex.ca> <20100719023419.GA91006@icarus.home.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 18, 2010 at 07:34:19PM -0700, Jeremy Chadwick wrote: > Now I'm confused -- this indicates twa(4) is involved, not arcmsr(4). > > Can you please provide a verbose explanation of the configuration of the > disks and controllers in this machine, including device and disk names > and what they're associated with, plus if they're RAIDed in any way? > > Thanks. I re-worked this out myself based on the OP's dmesg. It's confusing because there's literally 6 different storage controllers on a single machine: * arcmsr0 <--> irq 18 <--> Areca SATA Host Adapter RAID Controller siis0 <--> irq 17 <--> SiI3132 SATA controller * twa0 <--> irq 18 <--> 3ware 9000 series Storage Controller ahci0 <--> irq 16 <--> JMicron JMB361 AHCI SATA controller atapci0 <--> irq 17 <--> JMicron JMB361 ATA controller * ahci1 <--> irq 19 <--> Intel ICH10 AHCI SATA controller Controllers marked with asterisk (*) are in use/involved. Others don't appear to have anything connected to them. Channels and what above controllers they're connected to. Again, same with the asterisk: ahcich0 <--> ahci0 ahcich1 <--> ahci0 ata2 <--> atapci0 * ahcich2 <--> ahci1 * ahcich3 <--> ahci1 * ahcich4 <--> ahci1 * ahcich5 <--> ahci1 ahcich6 <--> ahci1 ahcich7 <--> ahci1 The dmesg output also shows this. I have no idea what it means: (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step Now we get into the disks. The kernel interspersed output within drivers so I had to work this out myself. da0 <--> arcmsr0 <--> Areca usrvar (RAID volume) da1 <--> arcmsr0 <--> Areca backup1 (RAID volume) da2 <--> twa0 <--> No idea, but looks like a RAID volume ada0 <--> ahcich2 <--> ST31000340AS (disk) ada1 <--> ahcich3 <--> ST31000340AS (disk) ada2 <--> ahcich4 <--> ST31000333AS (disk) ada3 <--> ahcich5 <--> ST31000528AS (disk) So one thing of interest is that the Areca and 3ware controllers are sharing an IRQ. If you do extensive bidirectional I/O between disks on the arcmsr0 and twa0 controllers at the same time (e.g. read from arcmsr0 which writes to twa0, and read from twa0 which writes to arcmsr0), do you see this problem? vmstat -i output would help here, except that it's going to show the rate as a total (for both controllers). I don't know if a way to get more granular output. pciconf -lvc output might also help (to see if the controllers are using MSI or not); only interested in the arcmsr0, twa0, and ahci1 entries. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100719025839.GA91809>