Date: Fri, 05 Nov 2004 22:08:18 +0100 From: Uwe Doering <gemini@geminix.org> To: Igor Sysoev <is@rambler-co.ru> Cc: stable@freebsd.org Subject: Re: vnode_pager_putpages errors and DOS? Message-ID: <418BEBC2.3020304@geminix.org> In-Reply-To: <20041104124616.S92154@is.park.rambler.ru> References: <Pine.NEB.3.96L.1041009150440.93055O-100000@fledge.watson.org> <4168578F.7060706@geminix.org> <20041103191641.K63546@is.park.rambler.ru> <4189666A.9020500@geminix.org> <20041104124616.S92154@is.park.rambler.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------020708070405060102030405 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Igor Sysoev wrote: > [...] > I've tried your patch from second email (it requires to include > <sys/conf.h> for devsw and D_DISK): the system also became unresponsible. > > The main problem is that I could not kill the offending process - it > stuck in biowr state. In the meantime I've investigated this further. The two patches I provided so far certainly have their merits, since they deal with some unwanted side effects. However, I found that the root cause for the eventual system lock-up lies elsewhere. In an earlier email I already pointed out that function vnode_pager_generic_putpages() actually doesn't care whether the write operation failed or not. It always returns VM_PAGER_OK. Now, in case the write operation succeeds the file system code takes care that the formerly dirty pages associated with the i/o buffer get marked clean. On the other hand, if the write attempt fails, for instance in an out-of-disk-space situation, the pages are left dirty. At this point the syncer enters an infinite loop, trying to flush the same dirty pages to disk over and over again. The fix is actually quite simple. In case of a write error we have to make sure ourselves that the associated pages get marked clean. We do this by returning VM_PAGER_BAD instead of VM_PAGER_OK. These two result codes are functionally identical, with the exception that VM_PAGER_BAD additionally marks the respective page clean. For the details, please have a look at the caller function vm_pageout_flush() in 'vm_pageout.c'. What this modification means is that in case of a write error the affected pages remain intact in memory until they get recycled, but we lose their contents as far as the copy on disk is concerned. I believe this is acceptable (and possibly even originally intended) because giving up on syncing is about the best thing we can do in this situation, anyway. And it is certainly a much better choice than halting the whole system due to an infinite loop. I've attached an updated version of the patch for 'vnode_pager.c'. On my test system it resolved the issue. Please let us know whether it works for you as well. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net --------------020708070405060102030405 Content-Type: text/plain; name="vnode_pager.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="vnode_pager.c.diff" --- src/sys/vm/vnode_pager.c.orig Tue Dec 31 10:34:51 2002 +++ src/sys/vm/vnode_pager.c Fri Nov 5 20:41:15 2004 @@ -954,7 +954,9 @@ struct uio auio; struct iovec aiov; int error; + int status; int ioflags; + static int last_elog, last_rlog; object = vp->v_object; count = bytecount / PAGE_SIZE; @@ -1035,15 +1037,18 @@ cnt.v_vnodeout++; cnt.v_vnodepgsout += ncount; - if (error) { + if (error && last_elog != time_second) { + last_elog = time_second; printf("vnode_pager_putpages: I/O error %d\n", error); } - if (auio.uio_resid) { + if (auio.uio_resid && last_rlog != time_second) { + last_rlog = time_second; printf("vnode_pager_putpages: residual I/O %d at %lu\n", auio.uio_resid, (u_long)m[0]->pindex); } + status = error ? VM_PAGER_BAD : VM_PAGER_OK; for (i = 0; i < ncount; i++) { - rtvals[i] = VM_PAGER_OK; + rtvals[i] = status; } return rtvals[0]; } --------------020708070405060102030405--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?418BEBC2.3020304>