Date: Sun, 31 Jul 2011 13:06:18 -0700 From: David P Discher <dpd@bitgravity.com> To: "Steven Hartland" <killing@multiplay.co.uk> Cc: freebsd-fs@FreeBSD.org, Andriy Gapon <avg@freebsd.org> Subject: Re: zfs process hang on pool access Message-ID: <3D893A9B-2CD9-40EB-B4A2-5DBCBB72C62E@bitgravity.com> In-Reply-To: <04C305AE5F184C6AAC2A67CE23184013@multiplay.co.uk> References: <A14F1C768A41483C876AD77502A864D6@multiplay.co.uk> <0D449EC916264947AB31AA17F870EA7A@multiplay.co.uk> <4E3013DF.10803@FreeBSD.org> <3D6CEB50BEDD4ACE96FD35C4D085618A@multiplay.co.uk> <4E301C55.7090105@FreeBSD.org> <5C84E7C8452E489C8CA738294F5EBB78@multiplay.co.uk> <4E301F10.6060708@FreeBSD.org> <63705B5AEEAD4BB88ADB9EF770AB6C76@multiplay.co.uk> <4E302204.2030009@FreeBSD.org> <6703F0BB-D4FC-4417-B519-CAFC62E5BC39@bitgravity.com> <04C305AE5F184C6AAC2A67CE23184013@multiplay.co.uk>
index | next in thread | previous in thread | raw e-mail
I've actually found a second issue that my working theory is related to the *fix* of LBOLT, in zio_wait()/txg_delay() when calling _cv_wait()/_cv_timedwait(). This maybe aggravated by setting vfs.zfs.txg.timeout=1. And in fact these functions are using using LBOLT with signed 32bit ints. I got some cores, and ideas, and will dig into the debugging this week. And of course will post my findings (and pleads for help) here on freebsd-fs@. Rolling back the two patches I posted early for the 26+ day and 106+ days bugs, seemed to avoid the new issue. --- David P. Discher dpd@bitgravity.com * AIM: bgDavidDPD BITGRAVITY * http://www.bitgravity.com On Jul 31, 2011, at 12:50 PM, Steven Hartland wrote: > Is there a PR related to this so we can track progress. Having to reboot machines > every 100+ days to ensure they don't break is a bit of a PITA when you've got hundreds > of machines :( > > ----- Original Message ----- From: "David P Discher" <dpd@bitgravity.com> > To: "Steven Hartland" <killing@multiplay.co.uk> > Cc: <freebsd-fs@FreeBSD.org>; "Andriy Gapon" <avg@freebsd.org> > Sent: Wednesday, July 27, 2011 9:41 PM > Subject: Re: zfs process hang on pool access > > > The way I found this was breaking into the debugger, do some back traces, continue, break in again, do some more back traces on the hung processes ... see what is going on, then walk through the code. > > Then what I had specific loops and code locations, asking the higher powers of the freebsd kernel world. >home | help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3D893A9B-2CD9-40EB-B4A2-5DBCBB72C62E>
