Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jul 2013 16:17:14 -0700
From:      Dieter BSD <dieterbsd@gmail.com>
To:        freebsd-hardware@freebsd.org
Subject:   Re: Reset Problem with SATA Port Multiplier
Message-ID:  <CAA3ZYrCrz-%2BJWFDnYU5ueBeuawZ9QpMNFYJ=rNG-%2BBj9LYrHmQ@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
> Drives: 45 * Seagate Altos ST3000NC002
> Port Multipliers: 9 * SiI3826
> SATA Controller: 3 * Marvell 88SX7042
>
> After a few hours of a database-like workload over ZFS (NCQ enable, disk
> write caches disabled), a disk becomes unresponsive (we think due to a
> drive firmware problem):

I have an 8.2 machine with Sil3132 controllers with Sil3726 pm with variety
of drives.  I have been getting the "Timeout on slot <small integer>"
followed by "lost device".  Sometimes the device reappears. (Although
the /dev/ufs/label does *not* reappear. :-(  )  I have not seen the other
drives on the pm get removed, or had to power cycle to recover.  Seagate
ST3000DM001 with CC4B firmware seems especially bad. ST3000DM001 with CC24
firmware have been ok.  So your theory that the drive firmware has a problem
seems promising.

Sounds like FreeBSD is doing something bad to the pm, which Linux
isn't doing. Perhaps log the commands the OS sends to the
controller (over the network to a 2nd machine, or to a local
disk not on a pm) and compare BSD to Linux?  Perhaps start
logging when you get the first timeout, to save hours of commands
to wade through.

Alternately you could stare at the driver sources until enlightenment
occurs.

AFAIK FreeBSD has never gotten a proper workaround for the quirk in
the 1st generation Sil sata controllers, while they run fine on NetBSD.
There might be a bug/quirk in the pm's firmware that FreeBSD triggers
but Linus doesn't.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAA3ZYrCrz-%2BJWFDnYU5ueBeuawZ9QpMNFYJ=rNG-%2BBj9LYrHmQ>