Date: Sat, 30 Jul 2011 19:37:23 +0300 From: Kostik Belousov <kostikbel@gmail.com> To: Alexander Motin <mav@freebsd.org> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r224496 - head/sys/cam Message-ID: <20110730163723.GZ17489@deviant.kiev.zoral.com.ua> In-Reply-To: <201107292030.p6TKUSaf064895@svn.freebsd.org> References: <201107292030.p6TKUSaf064895@svn.freebsd.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--lzwOWbZ6TxNmVMlX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Jul 29, 2011 at 08:30:28PM +0000, Alexander Motin wrote: > Author: mav > Date: Fri Jul 29 20:30:28 2011 > New Revision: 224496 > URL: http://svn.freebsd.org/changeset/base/224496 >=20 > Log: > In some cases failed SATA disks may report their presence, but don't > respond to any commands. I've found that because of multiple command > retries, each of which cause 30s timeout, bus reset and another retry or > requeue for many commands, it may take ages to eventually drop the > failed device. The odd thing is that those retries continue even after > XPT considered device as dead and invalidated it. > =20 > This patch makes cam_periph_error() to block any command retries after > periph was marked as invalid. With that patch all activity completes in > 1-2 minutes, just after several timeouts, required to consider device > death. This should make ZFS, gmirror, graid, etc. operation more robust. > =20 > Reviewed by: mjacob@ on scsi@ > =20 > Approved by: re (kib) >=20 > Modified: > head/sys/cam/cam_periph.c Amusingly, this commit makes my test machine to not boot. This is Ibex Peak PCH, with two SATA disks on the channels 0 and 1. It seems that geom thread 100012 owns GEOM topology lock, while sleeping in adaclose->cam_periph_getccb() : db> bt 100012 Tracing pid 12 tid 100012 td 0xfffffe00028a2000 sched_switch() at 0xffffffff8034a0c7 =3D sched_switch+0x157 mi_switch() at 0xffffffff803291fb =3D mi_switch+0x2eb sleepq_switch() at 0xffffffff803631f3 =3D sleepq_switch+0x123 sleepq_wait() at 0xffffffff80363eed =3D sleepq_wait+0x4d _sleep() at 0xffffffff80329b59 =3D _sleep+0x3b9 cam_periph_getccb() at 0xffffffff817ffc50 =3D cam_periph_getccb+0xa0 adaclose() at 0xffffffff8182c484 =3D adaclose+0xc4 g_disk_access() at 0xffffffff802bea74 =3D g_disk_access+0x1e4 g_access() at 0xffffffff802c519a =3D g_access+0x1ba g_dev_attrchanged() at 0xffffffff802bd1f6 =3D g_dev_attrchanged+0x96 g_dev_taste() at 0xffffffff802bd574 =3D g_dev_taste+0x284 g_new_provider_event() at 0xffffffff802c4ecd =3D g_new_provider_event+0xad g_run_events() at 0xffffffff802c0750 =3D g_run_events+0x250 fork_exit() at 0xffffffff802f0d99 =3D fork_exit+0x189 fork_trampoline() at 0xffffffff804ee3be =3D fork_trampoline+0xe --- trap 0, rip =3D 0, rsp =3D 0xffffff800025fd00, rbp =3D 0 --- (gdb) list *cam_periph_getccb+0xa0 0x1c50 is in cam_periph_getccb (/usr/home/kostik/work/build/bsd/DEV/src/sys= /modules/cam/../../cam/cam_periph.c:883). 882 883 while (SLIST_FIRST(&periph->ccb_list) =3D=3D NULL) { 884 if (periph->immediate_priority > priority) Reverting the rev. or not loading ahci.ko allows machine to boot. --lzwOWbZ6TxNmVMlX Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (FreeBSD) iEYEARECAAYFAk40M0MACgkQC3+MBN1Mb4hnowCfbdZicpeUrXDM+DM/ZVC38XNf 0EIAoIqCgEzxKP0tz9QkLpKKr4Y+/zBk =C6T3 -----END PGP SIGNATURE----- --lzwOWbZ6TxNmVMlX--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110730163723.GZ17489>