Date: Tue, 16 Jan 2007 16:20:48 -0500 From: Kris Kennaway <kris@obsecurity.org> To: Doug Ambrisko <ambrisko@ambrisko.com> Cc: Scott Oertel <freebsd@scottevil.com>, Willem Jan Withagen <wjw@digiware.nl>, freebsd-stable@freebsd.org, Kris Kennaway <kris@obsecurity.org> Subject: Re: running mksnap_ffs Message-ID: <20070116212048.GA1041@xor.obsecurity.org> In-Reply-To: <200701162117.l0GLHXOS062816@ambrisko.com> References: <20070116203739.GA343@xor.obsecurity.org> <200701162117.l0GLHXOS062816@ambrisko.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--qMm9M+Fa2AknHoGS Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jan 16, 2007 at 01:17:33PM -0800, Doug Ambrisko wrote: > Kris Kennaway writes: > | On Tue, Jan 16, 2007 at 09:26:47PM +0100, Willem Jan Withagen wrote: > | > Doug Ambrisko wrote: > | > >| > or things can get wedged. We have some other patches as well th= at=20 > | > >might > | > >| > be required. As a hack on a local server we have been using sna= p shots > | > >| > to do a "hot" back-up of a data base each morning. This is base= d on > | > >| > 6.x. > | > >| > | > >| What do you mean by "get wedged"? Are you seeing a deadlock, and = if > | > >| so then what are the details? When you say 6.x, do you mean > | > >| up-to-date RELENG_6? There were various snapshot deadlock fixes > | > >| committed over the past year including some in the past few months. > | > > > | > >The file-system would come to a stop, processes stuck on bio, snap-s= hots > | > >not finishing etc. This was caused by the system running out of usa= ble > | > >buffers. The change forces them to be flushed every so often. This= is > | > >independant of locking. 10 might be to aggresive. Some scaling of > | > >nbuf would probably be better. > | >=20 > | > When I run mksnap_ffs it runs to the point where ANY access to the=20 > | > filesystem gives that process a lockup. > |=20 > | Yes, that is expected. Actually it begins when something accesses the > | directory in which the snapshot is being made, since that causes the > | parent directory to be locked...then something tries to access the > | parent directory, which eventually cascades back to the root. > |=20 > | > Getting the file system back is only thru "hard reboot". Trying to do= it=20 > | > the gentle way locks the whole system. > |=20 > | Or waiting until the snapshot operation finishes. You (still) haven't > | determined that it's actually hanging as opposed to just waiting for > | the snapshot operation to finish. >=20 > In my case is was easy to see that all the buffers were exhausted and > the system was churning waiting for some to become available. Since they > were all used up it never recovered. By sync'ing the buffers they got > cleaned up and then the system never ran out. The snap shot was then > able to finish. Via the debugger you can see this happen. I traced > this problem in the debugger. There are other issues with the buffer > deamon as well. We hit these since we run with a relatively low > nbuf. The buffers can be get frag'ed so bad that it can't flush > things since it can't get a full-size buffer. Another problem is that > it can end up waiting on itself since the current code can't use > it's emergency space to flush stuff. You can see this via ps etc. > It's not a good thing if the buffer daemon is waiting on itself :-( >=20 > We have patches to this as well but they need some more work. I was > working with Tor, on this but then I got swamped at work with our 4.X -> = 6.X > and platform transition. All I can say is that we don't suffer from > these problems now :-) I have printf's the log this stuff when some of > these bugs are hit. Now the system survives those lock-up points. Thanks for clarifying. Hopefully you and Tor can get something committed soon! Kris --qMm9M+Fa2AknHoGS Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQFFrUGwWry0BWjoQKURArg/AJ0dUnhnHUtm7zB8IZut5UEbeEf7fwCgl4kP N9uy1f2iov1VWR6rqKtwuAk= =H6Yy -----END PGP SIGNATURE----- --qMm9M+Fa2AknHoGS--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20070116212048.GA1041>