From owner-freebsd-stable@FreeBSD.ORG Thu Nov 11 16:38:24 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A918716A4E6 for ; Thu, 11 Nov 2004 16:38:24 +0000 (GMT) Received: from park.rambler.ru (park.rambler.ru [81.19.64.101]) by mx1.FreeBSD.org (Postfix) with ESMTP id DF05C43D45 for ; Thu, 11 Nov 2004 16:38:23 +0000 (GMT) (envelope-from is@rambler-co.ru) Received: from is.park.rambler.ru (is.park.rambler.ru [81.19.64.102]) by park.rambler.ru (8.12.6/8.12.6) with ESMTP id iABGcM3g024319; Thu, 11 Nov 2004 19:38:22 +0300 (MSK) (envelope-from is@rambler-co.ru) Date: Thu, 11 Nov 2004 19:38:22 +0300 (MSK) From: Igor Sysoev X-X-Sender: is@is.park.rambler.ru To: Uwe Doering In-Reply-To: <20041111190413.F41088@is.park.rambler.ru> Message-ID: <20041111192947.A41088@is.park.rambler.ru> References: <4168578F.7060706@geminix.org> <20041103191641.K63546@is.park.rambler.ru> <4189666A.9020500@geminix.org> <20041104124616.S92154@is.park.rambler.ru> <418BEBC2.3020304@geminix.org> <20041111190413.F41088@is.park.rambler.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: stable@freebsd.org Subject: Re: vnode_pager_putpages errors and DOS? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Nov 2004 16:38:24 -0000 On Thu, 11 Nov 2004, Igor Sysoev wrote: > On Fri, 5 Nov 2004, Uwe Doering wrote: > > > Igor Sysoev wrote: > > > [...] > > > I've tried your patch from second email (it requires to include > > > for devsw and D_DISK): the system also became unresponsible. > > > > > > The main problem is that I could not kill the offending process - it > > > stuck in biowr state. > > > > In the meantime I've investigated this further. The two patches I > > provided so far certainly have their merits, since they deal with some > > unwanted side effects. However, I found that the root cause for the > > eventual system lock-up lies elsewhere. > > > > In an earlier email I already pointed out that function > > vnode_pager_generic_putpages() actually doesn't care whether the write > > operation failed or not. It always returns VM_PAGER_OK. > > > > Now, in case the write operation succeeds the file system code takes > > care that the formerly dirty pages associated with the i/o buffer get > > marked clean. On the other hand, if the write attempt fails, for > > instance in an out-of-disk-space situation, the pages are left dirty. > > At this point the syncer enters an infinite loop, trying to flush the > > same dirty pages to disk over and over again. > > > > The fix is actually quite simple. In case of a write error we have to > > make sure ourselves that the associated pages get marked clean. We do > > this by returning VM_PAGER_BAD instead of VM_PAGER_OK. These two result > > codes are functionally identical, with the exception that VM_PAGER_BAD > > additionally marks the respective page clean. For the details, please > > have a look at the caller function vm_pageout_flush() in 'vm_pageout.c'. > > > > What this modification means is that in case of a write error the > > affected pages remain intact in memory until they get recycled, but we > > lose their contents as far as the copy on disk is concerned. I believe > > this is acceptable (and possibly even originally intended) because > > giving up on syncing is about the best thing we can do in this > > situation, anyway. And it is certainly a much better choice than > > halting the whole system due to an infinite loop. > > > > I've attached an updated version of the patch for 'vnode_pager.c'. On > > my test system it resolved the issue. Please let us know whether it > > works for you as well. > > Sorry for the late response: I was ill and have no access to the test machine. > I applied the patch to the clean 4.10. The result is the same: the process > could not be killed, the file system access is very limited and the system > became unresponsible. Sorry, I applied the patch, but forget to rebuild kernel :). It seems that patch resolves the problem - the program exits and the system is working. I run it several times. I would also run buildworld on this system to ensure that the program did not affect VM. Igor Sysoev http://sysoev.ru/en/