Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 18 Jul 2010 19:58:39 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        Mike Tancsa <mike@sentex.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: deadlock or bad disk ?  RELENG_8
Message-ID:  <20100719025839.GA91809@icarus.home.lan>
In-Reply-To: <20100719023419.GA91006@icarus.home.lan>
References:  <201007182108.o6IL88eG043887@lava.sentex.ca> <20100718211415.GA84127@icarus.home.lan> <201007182142.o6ILgDQW044046@lava.sentex.ca> <20100719023419.GA91006@icarus.home.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 18, 2010 at 07:34:19PM -0700, Jeremy Chadwick wrote:
> Now I'm confused -- this indicates twa(4) is involved, not arcmsr(4).
> 
> Can you please provide a verbose explanation of the configuration of the
> disks and controllers in this machine, including device and disk names
> and what they're associated with, plus if they're RAIDed in any way?
> 
> Thanks.

I re-worked this out myself based on the OP's dmesg.  It's confusing
because there's literally 6 different storage controllers on a single
machine:

* arcmsr0 <--> irq 18 <--> Areca SATA Host Adapter RAID Controller
  siis0   <--> irq 17 <--> SiI3132 SATA controller
* twa0    <--> irq 18 <--> 3ware 9000 series Storage Controller
  ahci0   <--> irq 16 <--> JMicron JMB361 AHCI SATA controller
  atapci0 <--> irq 17 <--> JMicron JMB361 ATA controller
* ahci1   <--> irq 19 <--> Intel ICH10 AHCI SATA controller

Controllers marked with asterisk (*) are in use/involved.  Others don't
appear to have anything connected to them.

Channels and what above controllers they're connected to.  Again, same
with the asterisk:

  ahcich0 <--> ahci0
  ahcich1 <--> ahci0
  ata2    <--> atapci0
* ahcich2 <--> ahci1
* ahcich3 <--> ahci1
* ahcich4 <--> ahci1
* ahcich5 <--> ahci1
  ahcich6 <--> ahci1
  ahcich7 <--> ahci1

The dmesg output also shows this.  I have no idea what it means:

(probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step

Now we get into the disks.  The kernel interspersed output within
drivers so I had to work this out myself.

da0  <--> arcmsr0 <--> Areca usrvar (RAID volume)
da1  <--> arcmsr0 <--> Areca backup1 (RAID volume)
da2  <--> twa0    <--> No idea, but looks like a RAID volume
ada0 <--> ahcich2 <--> ST31000340AS (disk)
ada1 <--> ahcich3 <--> ST31000340AS (disk)
ada2 <--> ahcich4 <--> ST31000333AS (disk)
ada3 <--> ahcich5 <--> ST31000528AS (disk)

So one thing of interest is that the Areca and 3ware controllers are
sharing an IRQ.  If you do extensive bidirectional I/O between disks on
the arcmsr0 and twa0 controllers at the same time (e.g. read from
arcmsr0 which writes to twa0, and read from twa0 which writes to
arcmsr0), do you see this problem?  vmstat -i output would help here,
except that it's going to show the rate as a total (for both
controllers).  I don't know if a way to get more granular output.

pciconf -lvc output might also help (to see if the controllers are using
MSI or not); only interested in the arcmsr0, twa0, and ahci1 entries.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100719025839.GA91809>