From owner-freebsd-hackers Tue Feb 23 12:43:32 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from apollo.backplane.com (apollo.backplane.com [209.157.86.2]) by hub.freebsd.org (Postfix) with ESMTP id 89D3A1143F for ; Tue, 23 Feb 1999 12:43:29 -0800 (PST) (envelope-from dillon@apollo.backplane.com) Received: (from dillon@localhost) by apollo.backplane.com (8.9.3/8.9.1) id MAA53144; Tue, 23 Feb 1999 12:43:22 -0800 (PST) (envelope-from dillon) Date: Tue, 23 Feb 1999 12:43:22 -0800 (PST) From: Matthew Dillon Message-Id: <199902232043.MAA53144@apollo.backplane.com> To: Luoqi Chen Cc: dfr@nlsystems.com, freebsd-hackers@FreeBSD.ORG, mjacob@feral.com Subject: Re: Panic in FFS/4.0 as of yesterday - update References: <199902231930.OAA21444@lor.watermarkgroup.com> Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> getnewbuf() is starting vfs_bio_awrite()'s on essentially random :> buffers - not necessarily just buffers related to the VFS recursion. :> This means that it is possible for it to recurse through unrelated :> bp's and overflow the stack. :> :Hmm, vfs_bio_awrite() only tries to write bufs already in core, no buffer :reconstitution is needed. I fail to see why it would recurse back into :getnewbuf(). If there is no VFS layering, then true. If there *IS* VFS layering, then the VOP_BWRITE() may try to consistitute another buffer. For example, if you are going through a VN device which is file-backed, it must do a BMAP operation which may constitute a new buffer ( or several ). This will loop back to the getnewbuf() code, which then may decide to do the same thing all over again with a different random bp and cause another recursion. And so on. Until the supervisor stack goes poof. :> getnewbuf() appears to have the same problem that the ufs fsync code :> has -- it's assuming that when it converts a DELWRI bp to async, that :> the I/O operation will either be in-progress or completely resolved :> after the call. But there are cases, such as with softupdates, where :> this isn't true.. where the bp may be requeued synchronously due to :> their being unresolved dependancies. In this case, both getnewbuf() :> and the ufs fsync code will potentially loop on the same bp over :> and over again. :> :Is this what caused the "bmsafemap" panic? I'll take a look. It shouldn't be the cause... the bmsafemap panic wasn't supposed to occur at all, no matter what the dirty buffer situation. I currently have case added to report the BMSAFEMAP condition and ignore it, and all my problems went away. I've added debugging code supplied by Kirk ( just a vprintf(), really ) so when it happens again I'll be able to give him some more meaningful information in regards to the type of vnode the condition occured on. -Matt :-lq Matthew Dillon To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message