From owner-freebsd-stable@FreeBSD.ORG Sat Oct 9 21:26:44 2004 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D8C216A4CE for ; Sat, 9 Oct 2004 21:26:44 +0000 (GMT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9E02B43D39 for ; Sat, 9 Oct 2004 21:26:43 +0000 (GMT) (envelope-from gemini@geminix.org) Message-ID: <4168578F.7060706@geminix.org> Date: Sat, 09 Oct 2004 23:26:39 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.2) Gecko/20041002 X-Accept-Language: en-us, en MIME-Version: 1.0 To: stable@freebsd.org References: In-Reply-To: Content-Type: multipart/mixed; boundary="------------060704090007080900030007" Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1CGOjh-000GGr-00; Sat, 09 Oct 2004 23:26:42 +0200 Subject: Re: vnode_pager_putpages errors and DOS? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Oct 2004 21:26:44 -0000 This is a multi-part message in MIME format. --------------060704090007080900030007 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Robert Watson wrote: > On Fri, 8 Oct 2004, Steve Shorter wrote: > >>> I have some machines that run customers cgi stuff. >>>These machines have started to hang and become unresponsive. >>>At first I thought it was a hardware issue, but I discovered in >>>a cyclades log the following stuff that got logged to the >>>console which explains the cause of the system hangs/failures. >>> >>>vnode_pager_putpages: residual I/O 65536 at 347 >>>vnode_pager_putpages: I/O error 28] >>>vnode_pager_putpages: residual I/O 65536 at 285] >> >> Aha! also at the same time I get in syslog >> >> /kernel: pid 6 (syncer), uid 0 on /chroot/tmp: file system full >> >> Whats happening? Can a full filesystem bring the thing down? >>Ideas? Fixes? > > Ideally not, but many UNIX programs respond poorly to being out of memory > and disk space ("No space, wot?"). Are you using a swap file, and if so, > how did you create the swapfile? Are you using sparse files much? I wonder whether the unresponsiveness is actually just the result of the kernel spending most of the time in printf(), generating warning messages. vnode_pager_generic_putpages() doesn't return any error in case of a write failure, so the caller (syncer in this case) isn't aware that the paging out failed, that is, it is supposed to carry on as if nothing happened. So how about limiting the number of warnings to one per second? UFS has similar code in order to curb "file system full" and the like. Please consider trying the attached patch, which applies cleanly to 4-STABLE. It won't make the actual application causing these errors any happier, but it may eliminate the DoS aspect of the issue. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net --------------060704090007080900030007 Content-Type: text/plain; name="vnode_pager.c.diff" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="vnode_pager.c.diff" --- src/sys/vm/vnode_pager.c.orig Fri Oct 31 11:39:38 2003 +++ src/sys/vm/vnode_pager.c Sun Feb 15 02:38:21 2004 @@ -955,6 +955,7 @@ struct iovec aiov; int error; int ioflags; + static int last_elog, last_rlog; object = vp->v_object; count = bytecount / PAGE_SIZE; @@ -1035,10 +1036,12 @@ cnt.v_vnodeout++; cnt.v_vnodepgsout += ncount; - if (error) { + if (error && last_elog != time_second) { + last_elog = time_second; printf("vnode_pager_putpages: I/O error %d\n", error); } - if (auio.uio_resid) { + if (auio.uio_resid && last_rlog != time_second) { + last_rlog = time_second; printf("vnode_pager_putpages: residual I/O %d at %lu\n", auio.uio_resid, (u_long)m[0]->pindex); } --------------060704090007080900030007--