From owner-freebsd-current@FreeBSD.ORG Sat May 27 00:25:30 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 4137616A4FC for ; Sat, 27 May 2006 00:25:30 +0000 (UTC) (envelope-from arno@heho.snv.jussieu.fr) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id B03F143D55 for ; Sat, 27 May 2006 00:25:29 +0000 (GMT) (envelope-from arno@heho.snv.jussieu.fr) Received: from heho.snv.jussieu.fr (heho.snv.jussieu.fr [134.157.184.22]) by shiva.jussieu.fr (8.13.6/jtpda-5.4) with ESMTP id k4R0PSlw007075 for ; Sat, 27 May 2006 02:25:28 +0200 (CEST) X-Ids: 168 Received: from heho.labo (localhost [127.0.0.1]) by heho.snv.jussieu.fr (8.13.3/jtpda-5.2) with ESMTP id k4R0PRsY063872 for ; Sat, 27 May 2006 02:25:27 +0200 (MEST) Received: (from arno@localhost) by heho.labo (8.13.3/8.13.1/Submit) id k4R0PRhf063869; Sat, 27 May 2006 02:25:27 +0200 (MEST) (envelope-from arno) Sender: arno@heho.snv.jussieu.fr To: freebsd-current@freebsd.org From: "Arno J. Klaassen" Date: 27 May 2006 02:25:26 +0200 Message-ID: Lines: 75 User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (shiva.jussieu.fr [134.157.0.168]); Sat, 27 May 2006 02:25:28 +0200 (CEST) X-Virus-Scanned: ClamAV 0.88.2/1486/Fri May 26 18:24:22 2006 on shiva.jussieu.fr X-Virus-Status: Clean X-Miltered: at shiva.jussieu.fr with ID 44779C78.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! Subject: indefinite wait buffer X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 May 2006 00:25:30 -0000 --=-=-= Hello, we use FreeBSD amongst others for scientific calculations, and ran into the 'indefinite wait buffer' problem on ordinary swap/dump devices : the swap-overhead being justified by enabling greater data-sets to be treated and processes grosso-modo still being CPU-bound rather than I/O(swap)-bound. On recent RELENG_6 however, this fails (for sure on scrappy ATA-devices, rather easy as well on SCSI-devices though they seem to persist 'a couple of' 'indefinite wait buffer' warnings). I tested today on an amd64-notebook with 1G physmem and 4G swap on a from-the-shelf ATA-disk. I wrote the following code : int main (int argc, char **argv) { unsigned long maxpage; int * base, * ptr; _malloc_options = "AJ"; maxpage = strtol(argv[1],(char **)NULL, 10) * M_SIZE; fprintf (stderr, "Allocing %ld Bytes\n", maxpage); base = (int *)(malloc (maxpage)); if (base == NULL ) { fprintf (stderr, "Jammer\n"); } while (0 == 0) { int * ptr = base; unsigned int i = 0; for (i=0; i< maxpage/sizeof(int); i++) { *(ptr++) += 1; } fprintf (stderr, "Loop <%d> done\n", iter); iter++; } exit (0); } Calling this (on RELENG_6) with an argument in between 1024 and 1500 in a few minutes deadlocks the notebook with an 'indefinite wait buffer' in /var/log/messages after reboot. After some fiddling I came to the attached amateuristic patch : swap_pager.c has a heuristiquely (I suppose) timeout of 20 seconds for a msleep call; I changed this for a timeout based on a presupposed pessimistic minimal througput for the swapping device multiplied by the minimum of swapsize and physmem. With this patch I can run the above code without deadlock even with 4096 (Meg) as argument.. Two remarks : 1 I abuse linux.ko/linprocfs to correctly initialise my code 2 This is no solution to the real 'indefinite wait buffer' problems since at shutdown it still panics with 'swap_pager_force_pagein: read from swap failed', but at least it keeps the system functional while working. I hope someone can comment this idea. Best regards, Arno --=-=-= Content-Type: text/x-patch Content-Disposition: attachment; filename=swap.patch Index: sys/vm/swap_pager.c =================================================================== RCS file: /home/ncvs/src/sys/vm/swap_pager.c,v retrieving revision 1.273.2.2 diff -r1.273.2.2 swap_pager.c 285a286,288 > static int MB_per_sec = 1; /* be pessimist, might be sysctl'ed */ > static int timo_secs = 60; /* for msleep() in swap_pager_getpages() */ > 401a405 > 604a609 > 1104c1109 < if (msleep(mreq, &vm_page_queue_mtx, PSWP, "swread", hz*20)) { --- > if (msleep(mreq, &vm_page_queue_mtx, PSWP, "swread", hz*timo_secs)) { 1106,1107c1111,1114 < "swap_pager: indefinite wait buffer: bufobj: %p, blkno: %jd, size: %ld\n", < bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); --- > "swap_pager: wait buffer timeout (%d secs): bufobj: %p, blkno: %jd, size: %ld\n", > timo_secs, bp->b_bufobj, (intmax_t)bp->b_blkno, bp->b_bcount); > /* wait & pray ... respect mutex */ > msleep(mreq, &vm_page_queue_mtx, PSWP, "swread", 0); 2240a2248 > int timo_secs_swap, timo_secs_physmem; 2248a2257,2265 > > timo_secs_swap = MB_per_sec * (*total * PAGE_SIZE) / (1024*1024); > timo_secs_physmem = MB_per_sec * (physmem * PAGE_SIZE) / (1024*1024); > timo_secs = min(timo_secs_swap, timo_secs_physmem); > > if (timo_secs < 60) timo_secs=60; > printf("ARNO timo_secs = <%d>.\n", timo_secs); > printf("ARNO timo_secs_swap = <%d> timo_secs_physmem <%d>.\n", > timo_secs_swap, timo_secs_physmem); --=-=-=--