Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 30 Jul 2011 19:37:23 +0300
From:      Kostik Belousov <kostikbel@gmail.com>
To:        Alexander Motin <mav@freebsd.org>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r224496 - head/sys/cam
Message-ID:  <20110730163723.GZ17489@deviant.kiev.zoral.com.ua>
In-Reply-To: <201107292030.p6TKUSaf064895@svn.freebsd.org>
References:  <201107292030.p6TKUSaf064895@svn.freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--lzwOWbZ6TxNmVMlX
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri, Jul 29, 2011 at 08:30:28PM +0000, Alexander Motin wrote:
> Author: mav
> Date: Fri Jul 29 20:30:28 2011
> New Revision: 224496
> URL: http://svn.freebsd.org/changeset/base/224496
>=20
> Log:
>   In some cases failed SATA disks may report their presence, but don't
>   respond to any commands. I've found that because of multiple command
>   retries, each of which cause 30s timeout, bus reset and another retry or
>   requeue for many commands, it may take ages to eventually drop the
>   failed device. The odd thing is that those retries continue even after
>   XPT considered device as dead and invalidated it.
>  =20
>   This patch makes cam_periph_error() to block any command retries after
>   periph was marked as invalid. With that patch all activity completes in
>   1-2 minutes, just after several timeouts, required to consider device
>   death. This should make ZFS, gmirror, graid, etc. operation more robust.
>  =20
>   Reviewed by:	mjacob@ on scsi@
>  =20
>   Approved by:	re (kib)
>=20
> Modified:
>   head/sys/cam/cam_periph.c
Amusingly, this commit makes my test machine to not boot.
This is Ibex Peak PCH, with two SATA disks on the channels 0 and 1.

It seems that geom thread 100012 owns GEOM topology lock, while sleeping
in adaclose->cam_periph_getccb() :

db> bt 100012
Tracing pid 12 tid 100012 td 0xfffffe00028a2000
sched_switch() at 0xffffffff8034a0c7 =3D sched_switch+0x157
mi_switch() at 0xffffffff803291fb =3D mi_switch+0x2eb
sleepq_switch() at 0xffffffff803631f3 =3D sleepq_switch+0x123
sleepq_wait() at 0xffffffff80363eed =3D sleepq_wait+0x4d
_sleep() at 0xffffffff80329b59 =3D _sleep+0x3b9
cam_periph_getccb() at 0xffffffff817ffc50 =3D cam_periph_getccb+0xa0
adaclose() at 0xffffffff8182c484 =3D adaclose+0xc4
g_disk_access() at 0xffffffff802bea74 =3D g_disk_access+0x1e4
g_access() at 0xffffffff802c519a =3D g_access+0x1ba
g_dev_attrchanged() at 0xffffffff802bd1f6 =3D g_dev_attrchanged+0x96
g_dev_taste() at 0xffffffff802bd574 =3D g_dev_taste+0x284
g_new_provider_event() at 0xffffffff802c4ecd =3D g_new_provider_event+0xad
g_run_events() at 0xffffffff802c0750 =3D g_run_events+0x250
fork_exit() at 0xffffffff802f0d99 =3D fork_exit+0x189
fork_trampoline() at 0xffffffff804ee3be =3D fork_trampoline+0xe
--- trap 0, rip =3D 0, rsp =3D 0xffffff800025fd00, rbp =3D 0 ---

(gdb) list *cam_periph_getccb+0xa0
0x1c50 is in cam_periph_getccb (/usr/home/kostik/work/build/bsd/DEV/src/sys=
/modules/cam/../../cam/cam_periph.c:883).
882
883             while (SLIST_FIRST(&periph->ccb_list) =3D=3D NULL) {
884                     if (periph->immediate_priority > priority)

Reverting the rev. or not loading ahci.ko allows machine to boot.

--lzwOWbZ6TxNmVMlX
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (FreeBSD)

iEYEARECAAYFAk40M0MACgkQC3+MBN1Mb4hnowCfbdZicpeUrXDM+DM/ZVC38XNf
0EIAoIqCgEzxKP0tz9QkLpKKr4Y+/zBk
=C6T3
-----END PGP SIGNATURE-----

--lzwOWbZ6TxNmVMlX--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110730163723.GZ17489>