From owner-freebsd-bugs@FreeBSD.ORG Sun Aug 14 09:10:13 2005 Return-Path: X-Original-To: freebsd-bugs@hub.freebsd.org Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2913C16A41F for ; Sun, 14 Aug 2005 09:10:13 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 61F6243D48 for ; Sun, 14 Aug 2005 09:10:12 +0000 (GMT) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j7E9ACuZ018927 for ; Sun, 14 Aug 2005 09:10:12 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j7E9ACXi018926; Sun, 14 Aug 2005 09:10:12 GMT (envelope-from gnats) Resent-Date: Sun, 14 Aug 2005 09:10:12 GMT Resent-Message-Id: <200508140910.j7E9ACXi018926@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Ade Lovett Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 2D25016A41F for ; Sun, 14 Aug 2005 09:06:38 +0000 (GMT) (envelope-from ade@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id E945843D46 for ; Sun, 14 Aug 2005 09:06:37 +0000 (GMT) (envelope-from ade@FreeBSD.org) Received: from freefall.freebsd.org (ade@localhost [127.0.0.1]) by freefall.freebsd.org (8.13.3/8.13.3) with ESMTP id j7E96bvq018882 for ; Sun, 14 Aug 2005 09:06:37 GMT (envelope-from ade@freefall.freebsd.org) Received: (from ade@localhost) by freefall.freebsd.org (8.13.3/8.13.1/Submit) id j7E96bPI018881; Sun, 14 Aug 2005 09:06:37 GMT (envelope-from ade) Message-Id: <200508140906.j7E96bPI018881@freefall.freebsd.org> Date: Sun, 14 Aug 2005 09:06:37 GMT From: Ade Lovett To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: kern/84903: Incorrect initialization of nswbuf X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Ade Lovett List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 14 Aug 2005 09:10:13 -0000 >Number: 84903 >Category: kern >Synopsis: Incorrect initialization of nswbuf >Confidential: no >Severity: critical >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Aug 14 09:10:11 GMT 2005 >Closed-Date: >Last-Modified: >Originator: Ade Lovett >Release: All FreeBSD > 5.0 >Organization: Supernews >Environment: Any FreeBSD system (RELENG_5, RELENG_6, and HEAD) after revision 1.132 of sys/vm/vnode_pager.c (4 years, 1 month ago) >Description: Whilst attempting to nail down some serious performance issues (compared with 4.x) in preparation for a 6.x rollout here, we've come across something of a fundamental bug. In this particular environment (a Usenet transit server, so very high network and disk I/O) we observed that processes were spending a considerable amount of time in state 'wswbuf', traced back to getpbuf() in vm/vm_pager.c To cut a long story short, the order in which nswbuf is being initialized is completely, totally, and utterly wrong -- this was introduced by revision 1.132 of vm/vnode_pager.c just over 4 years ago. In vnode_pager.c we find: static void vnode_pager_init(void) { vnode_pbuf_freecnt = nswbuf / 2 + 1; } Unfortunately, nswbuf hasn't been assigned to yet, just happens to be zero (in all cases), and thus the kernel believes that there is only ever *one* swap buffer available. kern_vfs_bio_buffer_alloc() in kern/vfs_bio.c which actually does the calculation and assignment, is called rather further on in the process, by which time the damage has been done. The net result is that *any* calls involving getpbuf() will be unconditionally serialized, completely destroying any kind of concurrency (and performance). Given the memory footprint of our machines, we've hacked in a simple: nswbuf = 0x100; into vnode_pager_init(), since the calculation ends up giving us the maximum number anyway. There are a number of possible 'correct' fixes in terms of re-ordering the startup sequence. With the aforementioned hack, we're now seeing considerably better machine operation, certainly as good as similar 4.10-STABLE boxes. As per $SUBJECT, this affects all of RELENG_5, RELENG_6, and HEAD, and should, IMO, be considered an absolutely required fix for 6.0-RELEASE. >How-To-Repeat: N/A >Fix: We have implemented a local hack as above, given that the memory footprint of the machines would result in the maximal value of nswbuf being assigned in any case. This is not a real fix however. A solution has been offered by Alexander Kabaev as follows, which appears to do the right thing, at least on RELENG_6/i386, which is the only type of machine I have easy access to for testing purposes. In my opinion, it would be a fatal error to release 6.0 in any shape or form without addressing this issue. Index: vm_init.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_init.c,v retrieving revision 1.46 diff -u -r1.46 vm_init.c --- vm_init.c 25 Apr 2005 19:22:05 -0000 1.46 +++ vm_init.c 9 Aug 2005 01:59:12 -0000 @@ -124,7 +124,7 @@ vm_map_startup(); kmem_init(virtual_avail, virtual_end); pmap_init(); - vm_pager_init(); + /* vm_pager_init(); */ } void Index: vm_pager.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_pager.c,v retrieving revision 1.105 diff -u -r1.105 vm_pager.c --- vm_pager.c 18 May 2005 20:45:33 -0000 1.105 +++ vm_pager.c 9 Aug 2005 01:59:55 -0000 @@ -202,6 +202,8 @@ struct buf *bp; int i; + vm_pager_init(); + mtx_init(&pbuf_mtx, "pbuf mutex", NULL, MTX_DEF); bp = swbuf; /* >Release-Note: >Audit-Trail: >Unformatted: