From owner-cvs-src@FreeBSD.ORG Fri Aug 12 05:54:37 2005 Return-Path: X-Original-To: cvs-src@freebsd.org Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CBA9716A41F; Fri, 12 Aug 2005 05:54:37 +0000 (GMT) (envelope-from ambrisko@ambrisko.com) Received: from mail.ambrisko.com (mail.ambrisko.com [64.174.51.43]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5700543D45; Fri, 12 Aug 2005 05:54:37 +0000 (GMT) (envelope-from ambrisko@ambrisko.com) Received: from server2.ambrisko.com (HELO www.ambrisko.com) ([192.168.1.2]) by mail.ambrisko.com with ESMTP; 11 Aug 2005 22:54:37 -0700 Received: from ambrisko.com (localhost [127.0.0.1]) by www.ambrisko.com (8.12.11/8.12.9) with ESMTP id j7C5saD6021635; Thu, 11 Aug 2005 22:54:36 -0700 (PDT) (envelope-from ambrisko@ambrisko.com) Received: (from ambrisko@localhost) by ambrisko.com (8.12.11/8.12.11/Submit) id j7C5sZcx021634; Thu, 11 Aug 2005 22:54:35 -0700 (PDT) (envelope-from ambrisko) From: Doug Ambrisko Message-Id: <200508120554.j7C5sZcx021634@ambrisko.com> In-Reply-To: <20050810144125.GA32594@nevermind.kiev.ua> To: Alexandr Kovalenko Date: Thu, 11 Aug 2005 22:54:35 -0700 (PDT) X-Mailer: ELM [version 2.4ME+ PL94b (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Cc: Stephan Uphoff , cvs-src@FreeBSD.org, src-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/ufs/ffs ffs_softdep.c softdep.h X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 12 Aug 2005 05:54:38 -0000 Alexandr Kovalenko writes: | Hello, Stephan Uphoff! | | On Wed, Aug 10, 2005 at 02:09:26PM +0000, you wrote: | | > ups 2005-08-10 14:09:26 UTC | > | > FreeBSD src repository | > | > Modified files: (Branch: RELENG_6) | > sys/ufs/ffs ffs_softdep.c softdep.h | > Log: | > MFC ffs_softdep.c 1.182, softdep.h 1.18 | > | > Delay freeing disk space for file system blocks until all | > dirty buffers are safely released. This fixes softdep | > problems on truncation (deletion) of files with dirty | > buffers. | | Could this be the fix for the problem when unpacking large archives on | soft-updates-enabled volumes? (I experience complete lockup of | filesystem operations at some point of time during extracting files, for | example - cd /usr/ports/editors/openoffice-1.1 && make extract) Don't think so. Different bug that I found: Index: sys/kern/vfs_bio.c =================================================================== RCS file: /usr/local/cvsroot/freebsd/src/sys/kern/vfs_bio.c,v retrieving revision 1.493 diff -u -p -r1.493 vfs_bio.c --- sys/kern/vfs_bio.c 3 Aug 2005 05:02:08 -0000 1.493 +++ sys/kern/vfs_bio.c 12 Aug 2005 05:35:16 -0000 @@ -1646,6 +1646,9 @@ getnewbuf(int slpflag, int slptimeo, int * async I/O rather then sync I/O. */ + /* XXX DJA hack prevent fragmentation problems for now */ + maxsize = MAXBSIZE; + atomic_add_int(&getnewbufcalls, 1); atomic_subtract_int(&getnewbufrestarts, 1); restart: @@ -1690,6 +1693,14 @@ restart: nqindex = QUEUE_EMPTY; nbp = TAILQ_FIRST(&bufqueues[QUEUE_EMPTY]); } + if (nbp == NULL && curthread->td_proc == bufdaemonproc) { + if (defrag == 0 && bufspace + maxsize < maxbufspace){ + printf("buf daemon has some potential space %d\n",maxbufspace-(bufspace + maxsize)); + nqindex = QUEUE_EMPTY; + nbp = TAILQ_FIRST(&bufqueues[QUEUE_EMPTY]); + } + } + } /* @@ -1882,6 +1893,8 @@ restart: mtx_lock(&nblock); needsbuffer |= flags; while (needsbuffer & flags) { + if (curthread->td_proc == bufdaemonproc) + panic("Buffer Daemon lockup %s\n", waitmsg); if (msleep(&needsbuffer, &nblock, (PRIBIO + 4) | slpflag, waitmsg, slptimeo)) { mtx_unlock(&nblock); The problem is that the buffer pool memory space has extra space for the buffer daemon to use in case of emergency. It may need to use to space to clear some space. If all of the space is used up it ends up waiting on itself which never completes. So that is the easy thing to fix. If it is the buffer daemon then let it use the memory that been reserved for it for emergencies. Now the tougher problem is that later on when the buffer memory gets fragmented and then the buffer daemon needs to go into the emergency area it fails due to fragmentation and waits on itself again :-( Since the buffer pool points to blocks of memory it is difficult to defrag the space via garbage collection. Remember we can't flush it which the buffer daemon is trying to do since it needs space to do it. So we've have to walk the buffer pool and find adjacent free slots and then combine them into a bigger one or trickle things down to make a large contigous free space again. Since the buffer daemon tends to do larger then average size requests things get easily fragemented. So I just force everything to MAXBSIZE so effectively there can be no fragmentation with the trade off less then maximum number of active buffer pools :-( but atleast the buffer daemon stops waiting on itself and doesn't lock up. You can see this via DDB and ps. Over time it gets worse. Most people start to reboot before that happens. Advanced readers can figure out the defrag problem! It doesn't appear easy to me. Yes I have some debug printf's in it to confirm it is surviving versus locking up. We found this problem untar'ing large files with lots of little files with a smaller nbuf. A good example was untar of a tar'ed up image of FreeBSD including src, ports and all CVS metadata. Please try this patch and let me know if it helps. Watch for: buf daemon has some potential space in dmesg etc. I haven't tested this version much in -current but the same basic code saves our 4.X systems. Also I'll be gone until Mon. Thanks, Doug A.