Date: Wed, 13 Dec 2006 03:21:24 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Jan Mikkelsen <janm@transactionware.com> Cc: freebsd-stable@freebsd.org Subject: Re: g_vfs_done() failures on 6.2-RC1 Message-ID: <20061213082124.GA29523@xor.obsecurity.org> In-Reply-To: <00d401c71e8d$fb60de00$3301a8c0@janmxp> References: <00a601c71e7f$ed63f7a0$3301a8c0@janmxp> <457FAAFD.1080707@samsco.org> <00d401c71e8d$fb60de00$3301a8c0@janmxp>
next in thread | previous in thread | raw e-mail | index | archive | help
--mP3DRpeJDSE+ciuQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Dec 13, 2006 at 07:09:15PM +1100, Jan Mikkelsen wrote: > Scott Long wrote: > >Jan Mikkelsen wrote: > > > >>- Daichi Goto's unionfs-p16 has been applied. > >>- The Areca driver is 1.20.00.12 from the Areca website. > >>- sym(4) patch (see PR/89550), but no sym controller present. > >>- SMP + FAST_IPSEC + SUIDDIR + device crypto. > >> > >>So: I've seen this problem on a few machines under heavy I/O load, wit= h=20 > >>ataraid and with arcmsr. I've seen others report similar problems, but= =20 > >>I've seen no resolution. Does anyone have any idea what the problem is= ?=20 > >>Has anyone else seen similar problems? Where to from here? > >> > >>Thanks, > >> > > > >You mention that you are using a driver from the Areca website. Have > >you tried using the stock driver that comes with FreeBSD? I don't know > >if it will be better or not, but I was planning on doing a refresh of > >the stock driver, and I'd hate to introduce instability that wasn't ther= e=20 > >before. >=20 > I haven't run it recently. I can roll back to the stock driver and see= =20 > whether I see it again. However, I can't always reproduce the problem, s= o=20 > I probably can't prove the absence of the problem. >=20 > I mentioned that I have seen similar problems on machines with ataraid,= =20 > like this: >=20 > DOH! ata_alloc_composite failed! (x5) > FAILURE - out of memory in ata_raid_init_request (x6) > g_vfs_done():ar0s3f[WRITE(offset=3D113324673024, length=3D2048)]error =3D= 5 > g_vfs_done():ar0s3f[WRITE(offset=3D113325062144, length=3D2048)]error =3D= 5 > g_vfs_done():ar0s3f[WRITE(offset=3D113325127680, length=3D2048)]error =3D= 5 > g_vfs_done():ar0s3f[WRITE(offset=3D113325242368, length=3D2048)]error =3D= 5 > g_vfs_done():ar0s3f[WRITE(offset=3D113325256704, length=3D2048)]error =3D= 5 > g_vfs_done():ar0s3f[WRITE(offset=3D113325275136, length=3D2048)]error =3D= 5 >=20 > However, looking at this again, I'm not sure that the problem is identica= l=20 > anymore because the offset seems to be within the partition rather than= =20 > just plain wrong (assuming the units of the offset message are bytes). = =20 > These messages are from an HP DL145G1 with two SATA drives and ataraid. >=20 > The workload that caused these messages is very similar: Heavy I/O durin= g=20 > multiple concurrent removes of deep trees on a filesystem with softupdate= s,=20 > system needs a reboot to get back on track. Yes, it looks like a different problem: a) It's a different driver (ataraid vs areca). The g_vfs_done message is a generic error, it means "the driver I was writing to returned EIO in response to this attempted write". The reasons why the error occurred will depend on the driver and hardware. b) As you say, the error messages are sensible in the ataraid case but not in the areca case. c) There is a previous error message which causes the g_vfs_done errors as secondary effects. Your bug here is whatever causes the "DOH!" in ataraid, so that's what you should follow up (separately). Kris --mP3DRpeJDSE+ciuQ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFFf7gEWry0BWjoQKURArHzAKC1wqMFp5nSdMICB856u3EoGa4F9wCgu0zk 5cGwr5awOIaDxaK+Z8fece4= =VDtB -----END PGP SIGNATURE----- --mP3DRpeJDSE+ciuQ--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061213082124.GA29523>