From owner-freebsd-performance@FreeBSD.ORG Sun Apr 18 19:20:53 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 31B9816A4CE for ; Sun, 18 Apr 2004 19:20:53 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id A3ECB43D48 for ; Sun, 18 Apr 2004 19:20:50 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 30264 invoked by uid 1001); 19 Apr 2004 02:20:43 -0000 Date: Sun, 18 Apr 2004 21:20:43 -0500 From: "Jim C. Nasby" To: Uwe Doering Message-ID: <20040419022043.GO87362@nasby.net> References: <20040416163845.GG87362@nasby.net> <20040416221211.GM87362@nasby.net> <4080DF9F.3040302@geminix.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4080DF9F.3040302@geminix.org> X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i cc: freebsd-performance@freebsd.org Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 02:20:53 -0000 On Sat, Apr 17, 2004 at 09:41:19AM +0200, Uwe Doering wrote: > The disk i/o buffers you refer to (the 'Buf' column in 'top') are the > actual interface between the VM system and the disk device drivers. For > file and directory data, sets of VM pages get referred by and assigned > to disk i/o buffers. There they are dealt with by a kernel daemon > process that does the actual synchronization between VM and disks. > That's where the soft updates algorithm is implemented, for instance. > > In case of file and directory data, once the data has been written out > to disk (if the memory pages were "dirty") the respective disk i/o > buffer gets released immediately and can be recycled for other purposes, > since it just referred to memory pages that continue to exist within the > VM system. > > Meta data (inodes etc.) is a different matter, though. There is no VM > representation for this, so for disk i/o they have to be cached in extra > memory allocated for this purpose. A disk i/o buffer then refers to > this memory range and tries to keep it around for as long as possible. > A classical cache algorithm like LRU recycles these buffers and memory > allocations eventually. > > As usual, the actual implementation is even more complex, but I think > you got a picture of how it works. Yes, much clearer now, thanks! A few questions if I may... What's a good way to tune amount of space dedicated to IO buffers? What impact will vm_min|max_cache have on system performance? Is there any advantage to setting it fairly high? The machine I'm tuning is a dual Opteron box with 4G of ram, a mirror and a 6 disk RAID10. It's running PostgreSQL. -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Sun Apr 18 19:22:40 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9624816A4CE for ; Sun, 18 Apr 2004 19:22:40 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 2398B43D2F for ; Sun, 18 Apr 2004 19:22:40 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 30315 invoked by uid 1001); 19 Apr 2004 02:22:39 -0000 Date: Sun, 18 Apr 2004 21:22:39 -0500 From: "Jim C. Nasby" To: Aaron Seelye Message-ID: <20040419022239.GP87362@nasby.net> References: <20040416220556.GL87362@nasby.net> <002701c42404$e9dbecf0$3102a8c0@metallus> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <002701c42404$e9dbecf0$3102a8c0@metallus> X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i cc: freebsd-performance@freebsd.org Subject: Re: command piped into bzip not using all available CPU X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 02:22:40 -0000 Perhapse I didn't make it clear, but this is on a dual CPU machine. I would expect that either bzip2 or pgsql would hit 100% CPU, using one entire CPU. The 47% idle indicates to me that it's not. On Fri, Apr 16, 2004 at 03:48:19PM -0700, Aaron Seelye wrote: > I would venture a guess that bzip is not multi threaded and therefore > isn't spreading the load around. > > -Aaron Seelye > ----- Original Message ----- > From: "Jim C. Nasby" > To: > Sent: Friday, April 16, 2004 3:05 PM > Subject: command piped into bzip not using all available CPU > > > As you can see below, a command piped into bzip2 is only effectively > using one CPU. It's not disk bound, both systat and gstat report less > than 10% disk utilization. Why is this? > > The command I'm running is: > pg_dump -vZ0 ogr | bzip2 > ogr-20040416.sql.bz2 > > last pid: 18345; load averages: 1.17, 1.09, 0.81 up 8+22:12:27 > 17:00:56 > 66 processes: 2 running, 64 sleeping > CPU states: 49.4% user, 0.0% nice, 3.7% system, 0.2% interrupt, 46.7% > idle > Mem: 67M Active, 2935M Inact, 359M Wired, 331M Cache, 255M Buf, 5576K > Free > Swap: 8192M Total, 64M Used, 8127M Free, 48K Out > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU > COMMAND > 17334 decibel 109 0 10856K 7164K CPU0 0 11:05 65.77% 65.77% > bzip2 > 17335 pgsql 4 0 154M 124M sbwait 0 5:54 34.03% 34.03% > postgres > 17333 decibel -8 0 20128K 3236K pipdwt 0 0:46 2.88% 2.88% > pg_dump > -- > Jim C. Nasby, Database Consultant jim@nasby.net > Member: Triangle Fraternity, Sports Car Club of America > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Sun Apr 18 23:37:56 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BE7F216A4CF for ; Sun, 18 Apr 2004 23:37:56 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 447AC43D48 for ; Sun, 18 Apr 2004 23:37:56 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <408373C0.7080502@geminix.org> Date: Mon, 19 Apr 2004 08:37:52 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20040416163845.GG87362@nasby.net> <20040416221211.GM87362@nasby.net> <4080DF9F.3040302@geminix.org> <20040419022043.GO87362@nasby.net> In-Reply-To: <20040419022043.GO87362@nasby.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BFSPi-000B9b-00; Mon, 19 Apr 2004 08:37:54 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 06:37:56 -0000 Jim C. Nasby wrote: > On Sat, Apr 17, 2004 at 09:41:19AM +0200, Uwe Doering wrote: > [...] > A few questions if I may... > > What's a good way to tune amount of space dedicated to IO buffers? You can tune the number of i/o buffers, and therefore indirectly the amount of memory they may allocate, by using the variable 'kern.nbuf' in '/boot/loader.conf'. Note that this number gets multiplied by 16384 (the default filesystem block size) to arrive at the amount of memory it results in. My experience is that with large amounts of RAM this area becomes unduely big, though. It's not that you have to skimp on RAM in this enviroment, but the disk i/o buffers eat away at the KVM region (kernel virtual memory), which happens to be just 1 GB by default and doesn't grow with the RAM size. So it can be a good idea to actually reduce the number of disk i/o buffers (compared to its auto-scaled default) on systems with plenty of RAM (since you don't need that many buffers, anyway, due to the VM interaction I just described) and save the available KVM rather for other purposes (kernel resources). Systems that run out of KVM are prone to kernel panics, given the right combination of circumstances. > What impact will vm_min|max_cache have on system performance? Is there > any advantage to setting it fairly high? I'm not quite sure which variables you are referring to. In FreeBSD there are 'vm.v_cache_min' and 'vm.v_cache_max'. I don't recommend tuning them, though, without having a very deep and thorough look at the kernel sources. Many of these variables don't really do what their name suggests, and there are interdependencies between some of them. You can lock up your server by tuning them improperly. > The machine I'm tuning is a dual Opteron box with 4G of ram, a mirror > and a 6 disk RAID10. It's running PostgreSQL. I'm not a PostgreSQL expert, but there have been discussions on this mailing list and elsewhere about tuning PostgreSQL. I suggest to take a look at the archives. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 00:08:36 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 93A7C16A4CE for ; Mon, 19 Apr 2004 00:08:36 -0700 (PDT) Received: from katmai.eltopia.com (katmai.eltopia.com [64.146.186.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E62343D5C for ; Mon, 19 Apr 2004 00:08:35 -0700 (PDT) (envelope-from aseelye-lists@eltopia.com) Received: from metallus (unverified [68.116.17.36]) by katmai.eltopia.com (Vircom SMTPRS 3.0.286) with SMTP id ; Mon, 19 Apr 2004 00:08:34 -0700 Message-ID: <001801c425dd$1fda0b00$3102a8c0@metallus> From: "Aaron Seelye" To: "Jim C. Nasby" References: <20040416220556.GL87362@nasby.net> <002701c42404$e9dbecf0$3102a8c0@metallus> <20040419022239.GP87362@nasby.net> Date: Mon, 19 Apr 2004 00:08:32 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2720.3000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2739.300 cc: freebsd-performance@freebsd.org Subject: Re: command piped into bzip not using all available CPU X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 07:08:36 -0000 I'm not sure the exact technical reason, but as I understand that, it's 47% idle on the total cpu power of the machine, which would indicate that one cpu was 100% full, and the other was 3%, due to system usage, i/o, or whatever else was running. This is quite normal in my experience, and what you should expect to see. -Aaron ----- Original Message ----- From: "Jim C. Nasby" To: "Aaron Seelye" Cc: Sent: Sunday, April 18, 2004 7:22 PM Subject: Re: command piped into bzip not using all available CPU Perhapse I didn't make it clear, but this is on a dual CPU machine. I would expect that either bzip2 or pgsql would hit 100% CPU, using one entire CPU. The 47% idle indicates to me that it's not. On Fri, Apr 16, 2004 at 03:48:19PM -0700, Aaron Seelye wrote: > I would venture a guess that bzip is not multi threaded and therefore > isn't spreading the load around. > > -Aaron Seelye > ----- Original Message ----- > From: "Jim C. Nasby" > To: > Sent: Friday, April 16, 2004 3:05 PM > Subject: command piped into bzip not using all available CPU > > > As you can see below, a command piped into bzip2 is only effectively > using one CPU. It's not disk bound, both systat and gstat report less > than 10% disk utilization. Why is this? > > The command I'm running is: > pg_dump -vZ0 ogr | bzip2 > ogr-20040416.sql.bz2 > > last pid: 18345; load averages: 1.17, 1.09, 0.81 up 8+22:12:27 > 17:00:56 > 66 processes: 2 running, 64 sleeping > CPU states: 49.4% user, 0.0% nice, 3.7% system, 0.2% interrupt, 46.7% > idle > Mem: 67M Active, 2935M Inact, 359M Wired, 331M Cache, 255M Buf, 5576K > Free > Swap: 8192M Total, 64M Used, 8127M Free, 48K Out > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU > COMMAND > 17334 decibel 109 0 10856K 7164K CPU0 0 11:05 65.77% 65.77% > bzip2 > 17335 pgsql 4 0 154M 124M sbwait 0 5:54 34.03% 34.03% > postgres > 17333 decibel -8 0 20128K 3236K pipdwt 0 0:46 2.88% 2.88% > pg_dump > -- > Jim C. Nasby, Database Consultant jim@nasby.net > Member: Triangle Fraternity, Sports Car Club of America > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 06:45:51 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 73CBF16A4CF; Mon, 19 Apr 2004 06:45:51 -0700 (PDT) Received: from shadow.wixb.com (shadow.wixb.com [65.43.82.173]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8C12143D2D; Mon, 19 Apr 2004 06:45:50 -0700 (PDT) (envelope-from jbronson@wixb.com) Received: from dakota.wixb.com (shadow.wixb.com [10.43.82.173]) i3JDjnAe000875; Mon, 19 Apr 2004 08:45:49 -0500 (CDT) Message-Id: <6.1.0.6.2.20040419084442.0244d618@localhost> Date: Mon, 19 Apr 2004 08:45:50 -0500 To: freebsd-performance@freebsd.org From: "J.D. Bronson" Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Antivirus: Scanned by F-Prot Antivirus 4.4.1 X-Scanned-By: MIMEDefang 2.42 cc: freebsd-questions@freebsd.org Subject: etherchannel on 5.2.1 - possible? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 13:45:51 -0000 I am looking for performance. Not fail-over.. Does anyone have this working with either intel or broadcom nics? Anyone have any good site that talks about what is needed to make this work as well? - I do have a Cisco switch and it fully supports this. I need a little advice on setting this up... Thanks in advance! -JBD From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 07:09:27 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 098AE16A4CE for ; Mon, 19 Apr 2004 07:09:27 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 826A343D49 for ; Mon, 19 Apr 2004 07:09:26 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 65300 invoked by uid 1001); 19 Apr 2004 14:09:21 -0000 Date: Mon, 19 Apr 2004 09:09:21 -0500 From: "Jim C. Nasby" To: Aaron Seelye Message-ID: <20040419140921.GS87362@nasby.net> References: <20040416220556.GL87362@nasby.net> <002701c42404$e9dbecf0$3102a8c0@metallus> <20040419022239.GP87362@nasby.net> <001801c425dd$1fda0b00$3102a8c0@metallus> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <001801c425dd$1fda0b00$3102a8c0@metallus> X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i cc: freebsd-performance@freebsd.org Subject: Re: command piped into bzip not using all available CPU X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 14:09:27 -0000 Why would I expect to see it only use one CPU? It was CPU bound, not disk bound. There were two CPU-intensive processes running, why wouldn't they each use a different CPU? On Mon, Apr 19, 2004 at 12:08:32AM -0700, Aaron Seelye wrote: > I'm not sure the exact technical reason, but as I understand that, it's > 47% idle on the total cpu power of the machine, which would indicate > that one cpu was 100% full, and the other was 3%, due to system usage, > i/o, or whatever else was running. This is quite normal in my > experience, and what you should expect to see. > > -Aaron > ----- Original Message ----- > From: "Jim C. Nasby" > To: "Aaron Seelye" > Cc: > Sent: Sunday, April 18, 2004 7:22 PM > Subject: Re: command piped into bzip not using all available CPU > > > Perhapse I didn't make it clear, but this is on a dual CPU machine. I > would expect that either bzip2 or pgsql would hit 100% CPU, using one > entire CPU. The 47% idle indicates to me that it's not. > > On Fri, Apr 16, 2004 at 03:48:19PM -0700, Aaron Seelye wrote: > > I would venture a guess that bzip is not multi threaded and therefore > > isn't spreading the load around. > > > > -Aaron Seelye > > ----- Original Message ----- > > From: "Jim C. Nasby" > > To: > > Sent: Friday, April 16, 2004 3:05 PM > > Subject: command piped into bzip not using all available CPU > > > > > > As you can see below, a command piped into bzip2 is only effectively > > using one CPU. It's not disk bound, both systat and gstat report less > > than 10% disk utilization. Why is this? > > > > The command I'm running is: > > pg_dump -vZ0 ogr | bzip2 > ogr-20040416.sql.bz2 > > > > last pid: 18345; load averages: 1.17, 1.09, 0.81 up 8+22:12:27 > > 17:00:56 > > 66 processes: 2 running, 64 sleeping > > CPU states: 49.4% user, 0.0% nice, 3.7% system, 0.2% interrupt, > 46.7% > > idle > > Mem: 67M Active, 2935M Inact, 359M Wired, 331M Cache, 255M Buf, 5576K > > Free > > Swap: 8192M Total, 64M Used, 8127M Free, 48K Out > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU > > COMMAND > > 17334 decibel 109 0 10856K 7164K CPU0 0 11:05 65.77% 65.77% > > bzip2 > > 17335 pgsql 4 0 154M 124M sbwait 0 5:54 34.03% 34.03% > > postgres > > 17333 decibel -8 0 20128K 3236K pipdwt 0 0:46 2.88% 2.88% > > pg_dump > > -- > > Jim C. Nasby, Database Consultant jim@nasby.net > > Member: Triangle Fraternity, Sports Car Club of America > > Give your computer some brain candy! www.distributed.net Team #1828 > > > > Windows: "Where do you want to go today?" > > Linux: "Where do you want to go tomorrow?" > > FreeBSD: "Are you guys coming, or what?" > > _______________________________________________ > > freebsd-performance@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to > > "freebsd-performance-unsubscribe@freebsd.org" > > > > > > > > _______________________________________________ > > freebsd-performance@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > > To unsubscribe, send any mail to > "freebsd-performance-unsubscribe@freebsd.org" > > > > -- > Jim C. Nasby, Database Consultant jim@nasby.net > Member: Triangle Fraternity, Sports Car Club of America > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > > -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 07:12:51 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 869DF16A4DD; Mon, 19 Apr 2004 07:12:48 -0700 (PDT) Received: from multiplay.co.uk (www1.multiplay.co.uk [212.42.16.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1B11343D39; Mon, 19 Apr 2004 07:12:48 -0700 (PDT) (envelope-from killing@multiplay.co.uk) Received: from vader ([212.135.219.179]) by multiplay.co.uk (multiplay.co.uk [212.42.16.7]) (MDaemon.PRO.v7.0.1.R) with ESMTP id md50000132669.msg; Mon, 19 Apr 2004 15:11:55 +0100 Message-ID: <019301c42618$5db6fb00$b3db87d4@multiplay.co.uk> From: "Steven Hartland" To: , "J.D. Bronson" References: <6.1.0.6.2.20040419084442.0244d618@localhost> Date: Mon, 19 Apr 2004 15:12:35 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Spam-Processed: multiplay.co.uk, Mon, 19 Apr 2004 15:11:55 +0100 (not processed: message from valid local sender) X-MDRemoteIP: 212.135.219.179 X-Return-Path: killing@multiplay.co.uk X-MDAV-Processed: multiplay.co.uk, Mon, 19 Apr 2004 15:11:58 +0100 cc: freebsd-questions@freebsd.org Subject: Re: etherchannel on 5.2.1 - possible? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 14:12:51 -0000 When I checked the various methods a while back all resulted in performance drop not increase on a dual port intel etherxpress Pro 100. If anyone has different experiences I also would be interested. Also noted was that Gb performance on nge was also lower than that of fxp on 100Mb especially when using link0 on the fxp. Notes: Results using ftp ( proftpd ). Performance increase using nge was obtained when using single processor kernel and polling enabled without this interrupt rates appeared to be overloading the system. Steve ----- Original Message ----- From: "J.D. Bronson" To: Cc: Sent: Monday, April 19, 2004 2:45 PM Subject: etherchannel on 5.2.1 - possible? > I am looking for performance. Not fail-over.. > > Does anyone have this working with either > intel or broadcom nics? > > Anyone have any good site that talks about what is needed to make this work > as well? - I do have a Cisco switch and it fully supports this. > > I need a little advice on setting this up... > > Thanks in advance! > > -JBD > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137 or return the E.mail to postmaster@multiplay.co.uk. From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 08:16:20 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 1BB4616A4CE for ; Mon, 19 Apr 2004 08:16:20 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 9EE7E43D45 for ; Mon, 19 Apr 2004 08:16:19 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 68482 invoked by uid 1001); 19 Apr 2004 15:16:17 -0000 Date: Mon, 19 Apr 2004 10:16:16 -0500 From: "Jim C. Nasby" To: Uwe Doering Message-ID: <20040419151616.GT87362@nasby.net> References: <20040416163845.GG87362@nasby.net> <20040416221211.GM87362@nasby.net> <4080DF9F.3040302@geminix.org> <20040419022043.GO87362@nasby.net> <408373C0.7080502@geminix.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <408373C0.7080502@geminix.org> X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i cc: freebsd-performance@freebsd.org Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 15:16:20 -0000 On Mon, Apr 19, 2004 at 08:37:52AM +0200, Uwe Doering wrote: > Jim C. Nasby wrote: > >On Sat, Apr 17, 2004 at 09:41:19AM +0200, Uwe Doering wrote: > >[...] > >A few questions if I may... > > > >What's a good way to tune amount of space dedicated to IO buffers? > > You can tune the number of i/o buffers, and therefore indirectly the > amount of memory they may allocate, by using the variable 'kern.nbuf' in > '/boot/loader.conf'. Note that this number gets multiplied by 16384 > (the default filesystem block size) to arrive at the amount of memory it > results in. > > My experience is that with large amounts of RAM this area becomes > unduely big, though. It's not that you have to skimp on RAM in this > enviroment, but the disk i/o buffers eat away at the KVM region (kernel > virtual memory), which happens to be just 1 GB by default and doesn't > grow with the RAM size. So it can be a good idea to actually reduce the > number of disk i/o buffers (compared to its auto-scaled default) on > systems with plenty of RAM (since you don't need that many buffers, > anyway, due to the VM interaction I just described) and save the > available KVM rather for other purposes (kernel resources). Systems > that run out of KVM are prone to kernel panics, given the right > combination of circumstances. Yes, I was thinking the same thing. What I don't know is what would be a good value to use. dirtybuf in systat -v is typically less than 3000, which makes 261,000 buffer seem wasteful, but of course that's neglecting the read caching aspect. > >What impact will vm_min|max_cache have on system performance? Is there > >any advantage to setting it fairly high? > > I'm not quite sure which variables you are referring to. In FreeBSD > there are 'vm.v_cache_min' and 'vm.v_cache_max'. I don't recommend > tuning them, though, without having a very deep and thorough look at the > kernel sources. Many of these variables don't really do what their name > suggests, and there are interdependencies between some of them. You can > lock up your server by tuning them improperly. Sorry, I shouldn't have been lazy and actually looked up the settings. Yes, those are the settings I was reffering to. Someone else had cranked them up so that the machine was maintaining about 1.7G in cache; he said that he'd noticed a reduction in disk IO when he did that. I haven't been able to see any difference in disk IO, though it seems logical that setting cache too high would hurt write caching and actually increase disk IO. It's currently set to whatever the kernel thought best, so I'll just leave it there. > >The machine I'm tuning is a dual Opteron box with 4G of ram, a mirror > >and a 6 disk RAID10. It's running PostgreSQL. > > I'm not a PostgreSQL expert, but there have been discussions on this > mailing list and elsewhere about tuning PostgreSQL. I suggest to take a > look at the archives. Yes, I'm familiar with them. The big question that always seems to come up is about how the disk caching actually works, but I think that's been cleared up now. -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 09:25:51 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 295E916A4CE for ; Mon, 19 Apr 2004 09:25:51 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 69BCF43D53 for ; Mon, 19 Apr 2004 09:25:50 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <4083FD8B.3000900@geminix.org> Date: Mon, 19 Apr 2004 18:25:47 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20040416163845.GG87362@nasby.net> <20040416221211.GM87362@nasby.net> <4080DF9F.3040302@geminix.org> <20040419022043.GO87362@nasby.net> <408373C0.7080502@geminix.org> <20040419151616.GT87362@nasby.net> In-Reply-To: <20040419151616.GT87362@nasby.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BFbaf-000NGL-00; Mon, 19 Apr 2004 18:25:49 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 16:25:51 -0000 Jim C. Nasby wrote: > On Mon, Apr 19, 2004 at 08:37:52AM +0200, Uwe Doering wrote: > >>Jim C. Nasby wrote: >> >>>On Sat, Apr 17, 2004 at 09:41:19AM +0200, Uwe Doering wrote: >>>[...] >>>A few questions if I may... >>> >>>What's a good way to tune amount of space dedicated to IO buffers? >> >>You can tune the number of i/o buffers, and therefore indirectly the >>amount of memory they may allocate, by using the variable 'kern.nbuf' in >>'/boot/loader.conf'. Note that this number gets multiplied by 16384 >>(the default filesystem block size) to arrive at the amount of memory it >>results in. >> >>My experience is that with large amounts of RAM this area becomes >>unduely big, though. It's not that you have to skimp on RAM in this >>enviroment, but the disk i/o buffers eat away at the KVM region (kernel >>virtual memory), which happens to be just 1 GB by default and doesn't >>grow with the RAM size. So it can be a good idea to actually reduce the >>number of disk i/o buffers (compared to its auto-scaled default) on >>systems with plenty of RAM (since you don't need that many buffers, >>anyway, due to the VM interaction I just described) and save the >>available KVM rather for other purposes (kernel resources). Systems >>that run out of KVM are prone to kernel panics, given the right >>combination of circumstances. > > Yes, I was thinking the same thing. What I don't know is what would be a > good value to use. dirtybuf in systat -v is typically less than 3000, > which makes 261,000 buffer seem wasteful, but of course that's > neglecting the read caching aspect. With regard to the VM interaction I explained earlier, the same goes for read caching. File and directory data is kept in VM objects attached to the internal vnodes (files etc.) once read in. So large quantities of disk i/o buffers aren't needed for read caching, either. We have 'kern.nbuf="4096"' for our production systems with 2 GB RAM, which results in 64 MB disk i/o cache. These machines are used for server hosting purposes and therefore run all sorts of applications at the same time. Look at the URL in my signature for more details. I unfortunately don't know how much buffer space a dedicated database server would need, but I suspect that you won't notice any difference between 256 MB (default) and 64 MB (kern.nbuf="4096"). More interesting is probably to crank up 'vfs.hirunningspace' and 'vfs.lorunningspace' (both 'sysctl' variables) in order to not stall write operations when there are plenty of outstanding read requests waiting for completion. FreeBSD's classical bottleneck on disk i/o oriented servers. In case your raid controller has a large i/o buffer of its own (16 MB or more) you may want to use values in this range: vfs.hirunningspace=8388608 vfs.lorunningspace=6291456 These are in bytes. Also, disable agressive read-ahead in the controller, if possible. This feature is for MS Windows and would be counter-productive with FreeBSD. FreeBSD knows better than the controller if and when to read ahead. >>>What impact will vm_min|max_cache have on system performance? Is there >>>any advantage to setting it fairly high? >> >>I'm not quite sure which variables you are referring to. In FreeBSD >>there are 'vm.v_cache_min' and 'vm.v_cache_max'. I don't recommend >>tuning them, though, without having a very deep and thorough look at the >>kernel sources. Many of these variables don't really do what their name >>suggests, and there are interdependencies between some of them. You can >>lock up your server by tuning them improperly. > > Sorry, I shouldn't have been lazy and actually looked up the settings. > Yes, those are the settings I was reffering to. Someone else had cranked > them up so that the machine was maintaining about 1.7G in cache; he said > that he'd noticed a reduction in disk IO when he did that. I haven't > been able to see any difference in disk IO, though it seems logical that > setting cache too high would hurt write caching and actually increase > disk IO. It's currently set to whatever the kernel thought best, so I'll > just leave it there. Well, I'm afraid your colleague must have been imagining things. The cache queue ('Cache' column in 'top') is just a phase in the laundering procedure (VM page recyling) between the inactive queue ('Inact' in 'top') and the free queue ('Free' in 'top'). So these variables have nothing to do with disk i/o performance. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 11:23:50 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id E8D3016A4CE for ; Mon, 19 Apr 2004 11:23:50 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 755E243D3F for ; Mon, 19 Apr 2004 11:23:50 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 77728 invoked by uid 1001); 19 Apr 2004 18:23:45 -0000 Date: Mon, 19 Apr 2004 13:23:45 -0500 From: "Jim C. Nasby" To: Uwe Doering Message-ID: <20040419182345.GV87362@nasby.net> References: <20040416163845.GG87362@nasby.net> <20040416221211.GM87362@nasby.net> <4080DF9F.3040302@geminix.org> <20040419022043.GO87362@nasby.net> <408373C0.7080502@geminix.org> <20040419151616.GT87362@nasby.net> <4083FD8B.3000900@geminix.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4083FD8B.3000900@geminix.org> X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i cc: freebsd-performance@freebsd.org Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 18:23:51 -0000 Thanks very much for all your help. Only remaining question I have is: is there something we can monitor to gauge what impact (if any) changes to these settings have? Will changes to hi|lowrunningspace just result in increased KB/s or TPS write performance in gstat? How can we tell if we've advanced it too much? Likewise, what can we measure regarding nbuf? If I'm understanding things correctly, runningspace comes out of nbuf, so obviously it needs to be greater than that, but what symptoms will we see if it's set too low? On Mon, Apr 19, 2004 at 06:25:47PM +0200, Uwe Doering wrote: > With regard to the VM interaction I explained earlier, the same goes for > read caching. File and directory data is kept in VM objects attached to > the internal vnodes (files etc.) once read in. So large quantities of > disk i/o buffers aren't needed for read caching, either. > > We have 'kern.nbuf="4096"' for our production systems with 2 GB RAM, > which results in 64 MB disk i/o cache. These machines are used for > server hosting purposes and therefore run all sorts of applications at > the same time. Look at the URL in my signature for more details. > > I unfortunately don't know how much buffer space a dedicated database > server would need, but I suspect that you won't notice any difference > between 256 MB (default) and 64 MB (kern.nbuf="4096"). > > More interesting is probably to crank up 'vfs.hirunningspace' and > 'vfs.lorunningspace' (both 'sysctl' variables) in order to not stall > write operations when there are plenty of outstanding read requests > waiting for completion. FreeBSD's classical bottleneck on disk i/o > oriented servers. In case your raid controller has a large i/o buffer > of its own (16 MB or more) you may want to use values in this range: > > vfs.hirunningspace=8388608 > vfs.lorunningspace=6291456 > > These are in bytes. Also, disable agressive read-ahead in the > controller, if possible. This feature is for MS Windows and would be > counter-productive with FreeBSD. FreeBSD knows better than the > controller if and when to read ahead. -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 07:07:41 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DB58016A4CE for ; Mon, 19 Apr 2004 07:07:41 -0700 (PDT) Received: from smtp.housing.ufl.edu (smtp.housing.ufl.edu [128.227.47.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4CF0343D48 for ; Mon, 19 Apr 2004 07:07:41 -0700 (PDT) (envelope-from WillS@housing.ufl.edu) Received: (qmail 54656 invoked by uid 98); 19 Apr 2004 14:07:40 -0000 Received: from WillS@housing.ufl.edu by smtp.housing.ufl.edu by uid 82 with qmail-scanner-1.20 (spamassassin: 2.63. Clear:RC:1(128.227.47.18):. Processed in 0.015357 secs); 19 Apr 2004 14:07:40 -0000 X-Qmail-Scanner-Mail-From: WillS@housing.ufl.edu via smtp.housing.ufl.edu X-Qmail-Scanner: 1.20 (Clear:RC:1(128.227.47.18):. Processed in 0.015357 secs) Received: from bragi.housing.ufl.edu (128.227.47.18) by smtp.housing.ufl.edu with RC4-MD5 encrypted SMTP; 19 Apr 2004 14:07:40 -0000 Content-Class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable x-mimeole: Produced By Microsoft Exchange V6.0.6249.0 Date: Mon, 19 Apr 2004 10:06:45 -0400 Message-ID: <0E972CEE334BFE4291CD07E056C76ED802E869F4@bragi.housing.ufl.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: etherchannel on 5.2.1 - possible? Thread-Index: AcQmFLevX2DKhHLoSlGPaGbiSKwNYAAAGlpw From: "Will Saxon" To: "J.D. Bronson" , X-Mailman-Approved-At: Mon, 19 Apr 2004 12:04:14 -0700 cc: freebsd-questions@freebsd.org Subject: RE: etherchannel on 5.2.1 - possible? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 14:07:42 -0000 > -----Original Message----- > From: J.D. Bronson [mailto:jbronson@wixb.com] > Sent: Monday, April 19, 2004 9:46 AM > To: freebsd-performance@freebsd.org > Cc: freebsd-questions@freebsd.org > Subject: etherchannel on 5.2.1 - possible? >=20 >=20 > I am looking for performance. Not fail-over.. >=20 > Does anyone have this working with either > intel or broadcom nics? >=20 > Anyone have any good site that talks about what is needed to=20 > make this work > as well? - I do have a Cisco switch and it fully supports this. >=20 > I need a little advice on setting this up... I have used the ng_fec netgraph module with both broadcom 5703X and HP NC7170 nics (uses em driver).=20 This is how to set it up: First you have to have the ng_fec module loaded. Then, # ngctl mkpeer fec dummy fec # ngctl msg fec0: add_iface '"bge0"' # ngctl msg fec0: add_iface '"bge1"' Obviously replace bge with em or whatever other driver you are using. ng_fec supports up to 4 links. At this point, you will have a fec0 interface that you can=20 manipulate normally with ifconfig. I have noticed that sometimes I have to bring the interface up and down a couple of times to get it to start passing traffic. Whenever you 'ifconfig up' or assign an address to fec0 it resets the bundle.=20 One thing that is annoying is that ng_fec doesn't work with vlans. There is an ng_vlan module that was recently released, but ng_fec doesn't work with it because it isn't quite like other netgraph modules.=20 Almost all of my freebsd machines use vlans, so I am not making heavy=20 use of ng_fec. We aren't pushing enough data to make it really necessary = anyway. There is also ng_one2many which does implement failover and channel=20 bonding but not using the etherchannel technique. I think it uses round robin.=20 -Will From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 15:06:46 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9E48916A4CE for ; Mon, 19 Apr 2004 15:06:46 -0700 (PDT) Received: from f16.mail.ru (f16.mail.ru [194.67.57.46]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5AAA743D41 for ; Mon, 19 Apr 2004 15:06:46 -0700 (PDT) (envelope-from shmukler@mail.ru) Received: from mail by f16.mail.ru with local id 1BFguL-000EKA-00; Tue, 20 Apr 2004 02:06:29 +0400 Received: from [24.184.137.35] by msg.mail.ru with HTTP; Tue, 20 Apr 2004 02:06:29 +0400 From: =?koi8-r?Q?=22?=Igor Shmukler=?koi8-r?Q?=22=20?= To: =?koi8-r?Q?=22?=Uwe Doering=?koi8-r?Q?=22=20?= Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [24.184.137.35] Date: Tue, 20 Apr 2004 02:06:29 +0400 In-Reply-To: <4083FD8B.3000900@geminix.org> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: cc: freebsd-performance@freebsd.org Subject: Re[2]: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: =?koi8-r?Q?=22?=Igor Shmukler=?koi8-r?Q?=22=20?= List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Apr 2004 22:06:46 -0000 > > Sorry, I shouldn't have been lazy and actually looked up the settings. > > Yes, those are the settings I was reffering to. Someone else had cranked > > them up so that the machine was maintaining about 1.7G in cache; he said > > that he'd noticed a reduction in disk IO when he did that. I haven't > > been able to see any difference in disk IO, though it seems logical that > > setting cache too high would hurt write caching and actually increase > > disk IO. It's currently set to whatever the kernel thought best, so I'll > > just leave it there. > > Well, I'm afraid your colleague must have been imagining things. The > cache queue ('Cache' column in 'top') is just a phase in the laundering > procedure (VM page recyling) between the inactive queue ('Inact' in > 'top') and the free queue ('Free' in 'top'). So these variables have > nothing to do with disk i/o performance. I am not sure you are correct here. I understand things very differently. Why it is a fact that number of pages in the cache queue does not affect IO throughput, changing vm setting such as: vm.stats.vm.v_cache_min, vm.stats.vm.v_cache_max, vm.stats.vm.v_free_target and vm.stats.vm.v_free_min should have an effect on disk IO. The very reason JD came up with cache pages is to minimize IO traffic. If we require lagrer number of free pages we cause OS remove references at earlier point. This should cause kernel re-read some of the pages that otherwise would be just requeued to active queue. Having larger cache queue would require VM to start cleaning dirty pages earlier, which results in some additional write traffic as well. However, this is not that bad, because here it is a zero sum game. If pages to become free, they would have to written out regardless of cache queue size, just at a later point. However there is a benefit to a larger cache bucket. The upside is that if machine often experiences burst in memory demand (pretty much any real-world server would), you are able to accamodate changing load without blocking. IS. From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 20:27:29 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BCDAF16A4CE for ; Mon, 19 Apr 2004 20:27:29 -0700 (PDT) Received: from mail.lambertfam.org (www.lambertfam.org [216.223.208.55]) by mx1.FreeBSD.org (Postfix) with ESMTP id 61A1443D1F for ; Mon, 19 Apr 2004 20:27:29 -0700 (PDT) (envelope-from lambert@lambertfam.org) Received: from localhost (localhost [127.0.0.1]) by mail.lambertfam.org (Postfix) with ESMTP id 6D56634D50 for ; Mon, 19 Apr 2004 23:27:26 -0400 (EDT) Received: from mail.lambertfam.org ([127.0.0.1]) by localhost (www.lambertfam.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 25431-10 for ; Mon, 19 Apr 2004 23:27:17 -0400 (EDT) Received: from laptop.lambertfam.org (ool-182db8f6.dyn.optonline.net [24.45.184.246]) by mail.lambertfam.org (Postfix) with ESMTP id 5C69334D3B for ; Mon, 19 Apr 2004 23:27:17 -0400 (EDT) Received: by laptop.lambertfam.org (Postfix, from userid 1001) id A4F58C111; Mon, 19 Apr 2004 23:27:16 -0400 (EDT) Date: Mon, 19 Apr 2004 23:27:16 -0400 From: Scott Lambert To: freebsd-performance@freebsd.org Message-ID: <20040420032716.GC56561@laptop.lambertfam.org> Mail-Followup-To: freebsd-performance@freebsd.org References: <20040416220556.GL87362@nasby.net> <002701c42404$e9dbecf0$3102a8c0@metallus> <20040419022239.GP87362@nasby.net> <001801c425dd$1fda0b00$3102a8c0@metallus> <20040419140921.GS87362@nasby.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040419140921.GS87362@nasby.net> User-Agent: Mutt/1.5.6i X-Virus-Scanned: by amavisd-new at lambertfam.org Subject: Re: command piped into bzip not using all available CPU X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 03:27:29 -0000 On Mon, Apr 19, 2004 at 09:09:21AM -0500, Jim C. Nasby wrote: > Why would I expect to see it only use one CPU? It was CPU bound, not > disk bound. There were two CPU-intensive processes running, why wouldn't > they each use a different CPU? > > On Mon, Apr 19, 2004 at 12:08:32AM -0700, Aaron Seelye wrote: > > I'm not sure the exact technical reason, but as I understand that, it's > > 47% idle on the total cpu power of the machine, which would indicate > > that one cpu was 100% full, and the other was 3%, due to system usage, > > i/o, or whatever else was running. This is quite normal in my > > experience, and what you should expect to see. At the time you took the snapshot, both processes were running on the same CPU. FreeBSD 4.x or 5.2? If 5.2, SCHED_4BSD or SCHED_ULE? > > > The command I'm running is: > > > pg_dump -vZ0 ogr | bzip2 > ogr-20040416.sql.bz2 > > > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND > > > 17334 decibel 109 0 10856K 7164K CPU0 0 11:05 65.77% 65.77% bzip2 > > > 17335 pgsql 4 0 154M 124M sbwait 0 5:54 34.03% 34.03% postgres > > > 17333 decibel -8 0 20128K 3236K pipdwt 0 0:46 2.88% 2.88% pg_dump -- Scott Lambert KC5MLE Unix SysAdmin lambert@lambertfam.org From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 22:45:37 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 0751916A4CE for ; Mon, 19 Apr 2004 22:45:37 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id BCBB543D1F for ; Mon, 19 Apr 2004 22:45:36 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <4084B8FD.9080302@geminix.org> Date: Tue, 20 Apr 2004 07:45:33 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20040416163845.GG87362@nasby.net> <20040416221211.GM87362@nasby.net> <4080DF9F.3040302@geminix.org> <20040419022043.GO87362@nasby.net> <408373C0.7080502@geminix.org> <20040419151616.GT87362@nasby.net> <4083FD8B.3000900@geminix.org> <20040419182345.GV87362@nasby.net> In-Reply-To: <20040419182345.GV87362@nasby.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BFo4d-000Ey8-00; Tue, 20 Apr 2004 07:45:35 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 05:45:37 -0000 Jim C. Nasby wrote: > Thanks very much for all your help. Only remaining question I have is: > is there something we can monitor to gauge what impact (if any) changes > to these settings have? Will changes to hi|lowrunningspace just result > in increased KB/s or TPS write performance in gstat? How can we tell if > we've advanced it too much? I don't know of any simple means of monitoring the effect, short of doing your own benchmark tests. If these variables are too low you will notice that the write throughput suffers during peak read demand. That's when we started to examine the kernel sources and found the reason. To be on the safe side, don't make 'vfs.hirunningspace' larger than a fraction of the overall disk i/o buffer space (derived from 'kern.nbuf'), and also not larger than the buffer space in the disk controller. 'vfs.lorunningspace' is used to implement a hysteresis and should be 1/2 or 3/4 of 'vfs.hirunningspace'. > Likewise, what can we measure regarding nbuf? If I'm understanding > things correctly, runningspace comes out of nbuf, so obviously it needs > to be greater than that, but what symptoms will we see if it's set too > low? Maybe just bad performance due to the complete flushing of the disk i/o buffers (remember that meta data is cached in there), maybe system lockup. Can't tell. Just make sure that these variables stay smaller than the disk i/o buffer cache, and you won't have to bother with the consequences of overdoing it. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net From owner-freebsd-performance@FreeBSD.ORG Mon Apr 19 23:17:12 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 11E2916A4CE for ; Mon, 19 Apr 2004 23:17:12 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7051843D53 for ; Mon, 19 Apr 2004 23:17:11 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <4084C064.5080506@geminix.org> Date: Tue, 20 Apr 2004 08:17:08 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BFoZC-000FdE-00; Tue, 20 Apr 2004 08:17:10 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 06:17:12 -0000 Igor Shmukler wrote: >>>Sorry, I shouldn't have been lazy and actually looked up the settings. >>>Yes, those are the settings I was reffering to. Someone else had cranked >>>them up so that the machine was maintaining about 1.7G in cache; he said >>>that he'd noticed a reduction in disk IO when he did that. I haven't >>>been able to see any difference in disk IO, though it seems logical that >>>setting cache too high would hurt write caching and actually increase >>>disk IO. It's currently set to whatever the kernel thought best, so I'll >>>just leave it there. >> >>Well, I'm afraid your colleague must have been imagining things. The >>cache queue ('Cache' column in 'top') is just a phase in the laundering >>procedure (VM page recyling) between the inactive queue ('Inact' in >>'top') and the free queue ('Free' in 'top'). So these variables have >>nothing to do with disk i/o performance. > > I am not sure you are correct here. I understand things very differently. > Why it is a fact that number of pages in the cache queue does not affect IO throughput, changing vm setting such as: > vm.stats.vm.v_cache_min, vm.stats.vm.v_cache_max, vm.stats.vm.v_free_target and vm.stats.vm.v_free_min should have an effect on disk IO. > > The very reason JD came up with cache pages is to minimize IO traffic. If we require lagrer number of free pages we cause OS remove references at earlier point. This should cause kernel re-read some of the pages that otherwise would be just requeued to active queue. > > Having larger cache queue would require VM to start cleaning dirty pages earlier, which results in some additional write traffic as well. However, this is not that bad, because here it is a zero sum game. If pages to become free, they would have to written out regardless of cache queue size, just at a later point. However there is a benefit to a larger cache bucket. The upside is that if machine often experiences burst in memory demand (pretty much any real-world server would), you are able to accamodate changing load without blocking. Well, I didn't claim that the cache queue were useless. It does have its merits. And there is a certain default amount configured by the kernel's auto-scaling code already. What I was trying to point out is that these variables don't necessarily do what their name suggests. Take 'vm.v_cache_max', for example. When you crank that up, instead of increasing the size of the cache queue it is actually the inactive queue that grows in size. This is because the kernel steals pages from the inactive queue when it temporarily runs out of pages in the cache queue, without having to block for i/o as long as there are clean (not written to or already laundered) pages in the inactive queue. When it finds dirty pages during this scan it schedules them for background synchronization with the disk, but again without blocking in the foreground. The reason for this algorithm is that it is better to keep pages in the inactive queue for as long as possibe, rather than moving them over to the cache queue prematurely. Pages in the inactive queue can be still mapped into the memory space of processes, while pages in the cache queue have lost this association. So, quite naturally, when the VM system has to reactivate a page (put it back into the active queue) this operation tends to be less expensive when the page is still in the inactive queue. So, for reasons like these, I keep recommending to either study the kernel sources before you try to tune the VM system, or leave these variables alone. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 07:15:58 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9C2CA16A4CE for ; Tue, 20 Apr 2004 07:15:58 -0700 (PDT) Received: from f20.mail.ru (f20.mail.ru [194.67.57.52]) by mx1.FreeBSD.org (Postfix) with ESMTP id EA89043D48 for ; Tue, 20 Apr 2004 07:15:57 -0700 (PDT) (envelope-from shmukler@mail.ru) Received: from mail by f20.mail.ru with local id 1BFw2W-000L2G-00; Tue, 20 Apr 2004 18:15:56 +0400 Received: from [24.184.137.35] by msg.mail.ru with HTTP; Tue, 20 Apr 2004 18:15:56 +0400 From: =?koi8-r?Q?=22?=Igor Shmukler=?koi8-r?Q?=22=20?= To: =?koi8-r?Q?=22?=Uwe Doering=?koi8-r?Q?=22=20?= Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [24.184.137.35] Date: Tue, 20 Apr 2004 18:15:56 +0400 In-Reply-To: <4084C064.5080506@geminix.org> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: cc: freebsd-performance@freebsd.org Subject: Re[2]: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: =?koi8-r?Q?=22?=Igor Shmukler=?koi8-r?Q?=22=20?= List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 14:15:58 -0000 > >>>Sorry, I shouldn't have been lazy and actually looked up the settings. > >>>Yes, those are the settings I was reffering to. Someone else had cranked > >>>them up so that the machine was maintaining about 1.7G in cache; he said > >>>that he'd noticed a reduction in disk IO when he did that. I haven't > >>>been able to see any difference in disk IO, though it seems logical that > >>>setting cache too high would hurt write caching and actually increase > >>>disk IO. It's currently set to whatever the kernel thought best, so I'll > >>>just leave it there. > >> > >>Well, I'm afraid your colleague must have been imagining things. The > >>cache queue ('Cache' column in 'top') is just a phase in the laundering > >>procedure (VM page recyling) between the inactive queue ('Inact' in > >>'top') and the free queue ('Free' in 'top'). So these variables have > >>nothing to do with disk i/o performance. > > > > I am not sure you are correct here. I understand things very differently. > > Why it is a fact that number of pages in the cache queue does not affect IO throughput, changing vm setting such as: > > vm.stats.vm.v_cache_min, vm.stats.vm.v_cache_max, vm.stats.vm.v_free_target and vm.stats.vm.v_free_min should have an effect on disk IO. > > > > The very reason JD came up with cache pages is to minimize IO traffic. If we require lagrer number of free pages we cause OS remove references at earlier point. This should cause kernel re-read some of the pages that otherwise would be just requeued to active queue. > > > > Having larger cache queue would require VM to start cleaning dirty pages earlier, which results in some additional write traffic as well. However, this is not that bad, because here it is a zero sum game. If pages to become free, they would have to written out regardless of cache queue size, just at a later point. However there is a benefit to a larger cache bucket. The upside is that if machine often experiences burst in memory demand (pretty much any real-world server would), you are able to accamodate changing load without blocking. > > Well, I didn't claim that the cache queue were useless. It does have > its merits. And there is a certain default amount configured by the > kernel's auto-scaling code already. Yes, kernel defaults for queue sizes should work for most of us. > What I was trying to point out is that these variables don't necessarily > do what their name suggests. Take 'vm.v_cache_max', for example. When > you crank that up, instead of increasing the size of the cache queue it > is actually the inactive queue that grows in size. > > This is because the kernel steals pages from the inactive queue when it > temporarily runs out of pages in the cache queue, without having to > block for i/o as long as there are clean (not written to or already > laundered) pages in the inactive queue. When it finds dirty pages > during this scan it schedules them for background synchronization with > the disk, but again without blocking in the foreground. > > The reason for this algorithm is that it is better to keep pages in the > inactive queue for as long as possibe, rather than moving them over to > the cache queue prematurely. Pages in the inactive queue can be still > mapped into the memory space of processes, while pages in the cache > queue have lost this association. So, quite naturally, when the VM > system has to reactivate a page (put it back into the active queue) this > operation tends to be less expensive when the page is still in the > inactive queue. While you are correct that when cache is emtry kenrel will dip into the inactive queue. You are mistaken about other things. Pages on the cache queue still have the association. I wrote that one of the previous posts. To sum it up: cache queue is same as inactive queue except it has only clean pages. If things were the you suggest, cache queue would be totally useless. I actually pretty much explain the whole rotation process. If you read my email again, you should understand what happens whenever page is moved from inactive to cache and then to free. > So, for reasons like these, I keep recommending to either study the > kernel sources before you try to tune the VM system, or leave these > variables alone. I am not sure whether studying kernel sources is really necessary. Virtually every UNIX (R) admin had to tune the machine, despite sources not being available. From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 07:45:52 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 803ED16A4CE for ; Tue, 20 Apr 2004 07:45:52 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 04FE743D2F for ; Tue, 20 Apr 2004 07:45:52 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 43200 invoked by uid 1001); 20 Apr 2004 14:45:47 -0000 Date: Tue, 20 Apr 2004 09:45:47 -0500 From: "Jim C. Nasby" To: freebsd-performance@freebsd.org Message-ID: <20040420144547.GX87362@nasby.net> References: <20040416220556.GL87362@nasby.net> <002701c42404$e9dbecf0$3102a8c0@metallus> <20040419022239.GP87362@nasby.net> <001801c425dd$1fda0b00$3102a8c0@metallus> <20040419140921.GS87362@nasby.net> <20040420032716.GC56561@laptop.lambertfam.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040420032716.GC56561@laptop.lambertfam.org> X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i Subject: Re: command piped into bzip not using all available CPU X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 14:45:52 -0000 decibel@fritz.1[9:34]~:6>uname -a FreeBSD fritz.distributed.net 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #1: Wed Apr 7 18:42:52 CDT 2004 root@fritz.distributed.net:/usr/obj/usr/src/sys/FRITZ amd64 decibel@fritz.1[9:35]/usr/src/sys/amd64/conf:9>grep -i sched FRITZ options SCHED_4BSD #4BSD scheduler options _KPOSIX_PRIORITY_SCHEDULING #Posix P1003_1B real-time extensions Also, don't read anything into the fact that they were on the same CPU for that snapshot; here's one showing the exact opposite: PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 10336 dnetc 139 20 1344K 856K RUN 1 80.5H 90.14% 90.14% dnetc 10702 decibel 108 0 10856K 7304K CPU0 1 0:13 23.86% 22.41% bzip2 10703 pgsql 4 0 154M 78004K sbwait 0 0:07 10.92% 10.25% postgres FWIW, I recall seeing this same behavior under FreeBSD 4.x as well. On Mon, Apr 19, 2004 at 11:27:16PM -0400, Scott Lambert wrote: > On Mon, Apr 19, 2004 at 09:09:21AM -0500, Jim C. Nasby wrote: > > Why would I expect to see it only use one CPU? It was CPU bound, not > > disk bound. There were two CPU-intensive processes running, why wouldn't > > they each use a different CPU? > > > > On Mon, Apr 19, 2004 at 12:08:32AM -0700, Aaron Seelye wrote: > > > I'm not sure the exact technical reason, but as I understand that, it's > > > 47% idle on the total cpu power of the machine, which would indicate > > > that one cpu was 100% full, and the other was 3%, due to system usage, > > > i/o, or whatever else was running. This is quite normal in my > > > experience, and what you should expect to see. > > At the time you took the snapshot, both processes were running on the > same CPU. FreeBSD 4.x or 5.2? If 5.2, SCHED_4BSD or SCHED_ULE? > > > > > The command I'm running is: > > > > pg_dump -vZ0 ogr | bzip2 > ogr-20040416.sql.bz2 > > > > > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND > > > > 17334 decibel 109 0 10856K 7164K CPU0 0 11:05 65.77% 65.77% bzip2 > > > > 17335 pgsql 4 0 154M 124M sbwait 0 5:54 34.03% 34.03% postgres > > > > 17333 decibel -8 0 20128K 3236K pipdwt 0 0:46 2.88% 2.88% pg_dump > > -- > Scott Lambert KC5MLE Unix SysAdmin > lambert@lambertfam.org > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance-unsubscribe@freebsd.org" > -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 08:11:00 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3A13816A55F for ; Tue, 20 Apr 2004 08:10:43 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id C964D43D55 for ; Tue, 20 Apr 2004 08:10:15 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <40853D53.1040906@geminix.org> Date: Tue, 20 Apr 2004 17:10:11 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BFwt4-0000WK-00; Tue, 20 Apr 2004 17:10:14 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 15:11:00 -0000 Igor Shmukler wrote: >>What I was trying to point out is that these variables don't necessarily >>do what their name suggests. Take 'vm.v_cache_max', for example. When >>you crank that up, instead of increasing the size of the cache queue it >>is actually the inactive queue that grows in size. >> >>This is because the kernel steals pages from the inactive queue when it >>temporarily runs out of pages in the cache queue, without having to >>block for i/o as long as there are clean (not written to or already >>laundered) pages in the inactive queue. When it finds dirty pages >>during this scan it schedules them for background synchronization with >>the disk, but again without blocking in the foreground. >> >>The reason for this algorithm is that it is better to keep pages in the >>inactive queue for as long as possibe, rather than moving them over to >>the cache queue prematurely. Pages in the inactive queue can be still >>mapped into the memory space of processes, while pages in the cache >>queue have lost this association. So, quite naturally, when the VM >>system has to reactivate a page (put it back into the active queue) this >>operation tends to be less expensive when the page is still in the >>inactive queue. > > While you are correct that when cache is emtry kenrel will dip into the inactive queue. You are mistaken about other things. Pages on the cache queue still have the association. I wrote that one of the previous posts. > > To sum it up: cache queue is same as inactive queue except it has only clean pages. > > If things were the you suggest, cache queue would be totally useless. I think you're mixing up two different things here. The way I understand the kernel sources, the pages in the cache queue of course still have their association with the underlying VM object. Otherwise caching these pages would be useless. But they are no longer mapped into any process address space. If I may quote the relevant comment from vm_page_cache(): /* * Remove all pmaps and indicate that the page is not * writeable or mapped. */ vm_page_cache() is the function that moves the pages from the inactive to the cache queue once they are clean. Restoring the process address space mapping is what makes reactivating pages from the cache queue more expensive than just relinking them from the inactive queue, because a fault gets generated when the process tries to access the page. This fault then maps the page from the VM object into the process address space. This causes additional overhead. > I actually pretty much explain the whole rotation process. If you read my email again, you should understand what happens whenever page is moved from inactive to cache and then to free. You may want to study the kernel sources some more, I'm afraid. >>So, for reasons like these, I keep recommending to either study the >>kernel sources before you try to tune the VM system, or leave these >>variables alone. > > I am not sure whether studying kernel sources is really necessary. Virtually every UNIX (R) admin had to tune the machine, despite sources not being available. Sorry, but you just proved my point ... Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 11:04:13 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C5D0D16A4D2 for ; Tue, 20 Apr 2004 11:04:13 -0700 (PDT) Received: from f16.mail.ru (f16.mail.ru [194.67.57.46]) by mx1.FreeBSD.org (Postfix) with ESMTP id 46EFC43D5F for ; Tue, 20 Apr 2004 11:04:13 -0700 (PDT) (envelope-from shmukler@mail.ru) Received: from mail by f16.mail.ru with local id 1BFzbP-000Pqh-00; Tue, 20 Apr 2004 22:04:11 +0400 Received: from [24.184.137.35] by msg.mail.ru with HTTP; Tue, 20 Apr 2004 22:04:11 +0400 From: =?koi8-r?Q?=22?=Igor Shmukler=?koi8-r?Q?=22=20?= To: =?koi8-r?Q?=22?=Uwe Doering=?koi8-r?Q?=22=20?= Mime-Version: 1.0 X-Mailer: mPOP Web-Mail 2.19 X-Originating-IP: [24.184.137.35] Date: Tue, 20 Apr 2004 22:04:11 +0400 In-Reply-To: <40853D53.1040906@geminix.org> Content-Type: text/plain; charset=koi8-r Content-Transfer-Encoding: 8bit Message-Id: cc: freebsd-performance@freebsd.org Subject: Re[2]: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list Reply-To: =?koi8-r?Q?=22?=Igor Shmukler=?koi8-r?Q?=22=20?= List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 18:04:14 -0000 > >>The reason for this algorithm is that it is better to keep pages in the > >>inactive queue for as long as possibe, rather than moving them over to > >>the cache queue prematurely. Pages in the inactive queue can be still > >>mapped into the memory space of processes, while pages in the cache > >>queue have lost this association. So, quite naturally, when the VM > >>system has to reactivate a page (put it back into the active queue) this > >>operation tends to be less expensive when the page is still in the > >>inactive queue. > > > > While you are correct that when cache is emtry kenrel will dip into the inactive queue. You are mistaken about other things. Pages on the cache queue still have the association. I wrote that one of the previous posts. > > > > To sum it up: cache queue is same as inactive queue except it has only clean pages. > > > > If things were the you suggest, cache queue would be totally useless. > > I think you're mixing up two different things here. The way I > understand the kernel sources, the pages in the cache queue of course > still have their association with the underlying VM object. Otherwise > caching these pages would be useless. But they are no longer mapped > into any process address space. If I may quote the relevant comment > from vm_page_cache(): > > /* > * Remove all pmaps and indicate that the page is not > * writeable or mapped. > */ > > vm_page_cache() is the function that moves the pages from the inactive > to the cache queue once they are clean. Restoring the process address > space mapping is what makes reactivating pages from the cache queue more > expensive than just relinking them from the inactive queue, because a > fault gets generated when the process tries to access the page. This > fault then maps the page from the VM object into the process address > space. This causes additional overhead. > > > I actually pretty much explain the whole rotation process. If you read my email again, you should understand what happens whenever page is moved from inactive to cache and then to free. > > You may want to study the kernel sources some more, I'm afraid. First you explicitly write that "pages in the cache queue have lost this association" then you tell me that I don't understand how VM works. Are you trying to suggest that mapping page and change permission is comparable with reading page from backing store? Studying sources never hurts, but filtering lingo is just as helpful. > >>So, for reasons like these, I keep recommending to either study the > >>kernel sources before you try to tune the VM system, or leave these > >>variables alone. > > > > I am not sure whether studying kernel sources is really necessary. Virtually every UNIX (R) admin had to tune the machine, despite sources not being available. > > Sorry, but you just proved my point ... What was that? In first email you write that size of cache queue does not affect disk traffic. In next you say, no I did not mean that. I just wanted to say that cache queue holds pages that lost association. Now you say, no of course there is an association, someone just has to study kernel sources better. Last argument is valid only because it's a moot point. Studying kernel sources never hurts anyone ... I did not originally intend to flame you. I simply thought that some of your answers were not correct. If you had answered off the list, I would not bother, but this a public mailing list and it is a source of knowledge for many people. VM in particular is a grey area for many developers. Everyone knows what it's for, but few programmers really understand VM or VFS. Now you start giving me funny advices. That's not wise. Sincerely, IS. From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 12:50:15 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C1D0E16A4CF for ; Tue, 20 Apr 2004 12:50:15 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 6ED9B43D5D for ; Tue, 20 Apr 2004 12:50:15 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 58438 invoked by uid 1001); 20 Apr 2004 19:50:10 -0000 Date: Tue, 20 Apr 2004 14:50:10 -0500 From: "Jim C. Nasby" To: freebsd-performance@freebsd.org Message-ID: <20040420195010.GZ87362@nasby.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i Subject: vfs.hirunningspace on a 3ware 8506 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 19:50:15 -0000 Has anyone done any testing to see what value of vfs.hirunningspace is optimal for a 3ware 8506-8? -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 13:23:00 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 42C7816A4CE for ; Tue, 20 Apr 2004 13:23:00 -0700 (PDT) Received: from rms04.rommon.net (rms04.rommon.net [212.54.2.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id DED3543D55 for ; Tue, 20 Apr 2004 13:22:58 -0700 (PDT) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (h91.vuokselantie10.fi [193.64.42.145]) by rms04.rommon.net (8.12.10/8.12.9) with ESMTP id i3KKMxmo071131; Tue, 20 Apr 2004 23:22:59 +0300 (EEST) (envelope-from pete@he.iki.fi) Message-ID: <4085869E.7090306@he.iki.fi> Date: Tue, 20 Apr 2004 23:22:54 +0300 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: "Jim C. Nasby" References: <20040420195010.GZ87362@nasby.net> In-Reply-To: <20040420195010.GZ87362@nasby.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org Subject: Re: vfs.hirunningspace on a 3ware 8506 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 20:23:00 -0000 Jim C. Nasby wrote: >Has anyone done any testing to see what value of vfs.hirunningspace is >optimal for a 3ware 8506-8? > > Do the 3ware controllers actually care about this value due to the onboard processing and cache? I thought all writes are satisfied immediately? Pete From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 15:06:40 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6EABB16A4CE for ; Tue, 20 Apr 2004 15:06:40 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 01C3943D45 for ; Tue, 20 Apr 2004 15:06:40 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <40859EED.7040300@geminix.org> Date: Wed, 21 Apr 2004 00:06:37 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BG3O2-0009Ls-00; Wed, 21 Apr 2004 00:06:39 +0200 Subject: Re: How does disk caching work? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 22:06:40 -0000 Igor Shmukler wrote: >>[...] >>>I actually pretty much explain the whole rotation process. If you read my email again, you should understand what happens whenever page is moved from inactive to cache and then to free. >> >>You may want to study the kernel sources some more, I'm afraid. > > First you explicitly write that "pages in the cache queue have lost this association" then you tell me that I don't understand how VM works. Quote from my initial email about this subject: "Pages in the inactive queue can be still mapped into the memory space of processes, while pages in the cache queue have lost this association." Which means that I specified right from the beginning which association I was referring to (process memory mapping), and that's what I repeated, in more detail, in my email following that. I didn't mention the association with the underlying VM object in this context, which remains intact. So I don't quite see what inconsistency you are accusing me of. >>>>So, for reasons like these, I keep recommending to either study the >>>>kernel sources before you try to tune the VM system, or leave these >>>>variables alone. >>> >>>I am not sure whether studying kernel sources is really necessary. Virtually every UNIX (R) admin had to tune the machine, despite sources not being available. >> >>Sorry, but you just proved my point ... > > What was that? > > In first email you write that size of cache queue does not affect disk traffic. In next you say, no I did not mean that. I just wanted to say that cache queue holds pages that lost association. Now you say, no of course there is an association, someone just has to study kernel sources better. > > Last argument is valid only because it's a moot point. Studying kernel sources never hurts anyone ... > > I did not originally intend to flame you. I simply thought that some of your answers were not correct. If you had answered off the list, I would not bother, but this a public mailing list and it is a source of knowledge for many people. VM in particular is a grey area for many developers. Everyone knows what it's for, but few programmers really understand VM or VFS. Now you start giving me funny advices. That's not wise. Okay. I tried to be helpful and share my knowledge, but I don't insist on convincing anybody, at least not in my (scarce) spare time. Since I don't feel that this discussion is going to lead anywhere I suggest that we leave it at that and agree that we disagree. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net From owner-freebsd-performance@FreeBSD.ORG Tue Apr 20 15:55:16 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A881C16A4CE for ; Tue, 20 Apr 2004 15:55:16 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id A4C7543D54 for ; Tue, 20 Apr 2004 15:55:15 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <4085AA4B.1020700@geminix.org> Date: Wed, 21 Apr 2004 00:55:07 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20040420195010.GZ87362@nasby.net> <4085869E.7090306@he.iki.fi> In-Reply-To: <4085869E.7090306@he.iki.fi> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BG48z-000AIS-00; Wed, 21 Apr 2004 00:55:09 +0200 Subject: Re: vfs.hirunningspace on a 3ware 8506 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Apr 2004 22:55:16 -0000 Petri Helenius wrote: > Jim C. Nasby wrote: > >> Has anyone done any testing to see what value of vfs.hirunningspace is >> optimal for a 3ware 8506-8? >> > Do the 3ware controllers actually care about this value due to the > onboard processing and cache? I thought all writes are satisfied > immediately? The controller itself doesn't care, but the kernel does. With the current implementation, the amount of memory associated with outstanding read requests is subtracted from vfs.hirunningspace. With many concurrent read requests there is no reserve left for write operations, so write performance can suffer substantially. This balancing effect is actually intended in order to give read requests some priority, but in high performance systems with fast, caching raid controllers the default value of said variable is too low and therefore poses a bottleneck. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net From owner-freebsd-performance@FreeBSD.ORG Wed Apr 21 10:06:08 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C815216A4CE for ; Wed, 21 Apr 2004 10:06:08 -0700 (PDT) Received: from flake.decibel.org (flake.decibel.org [66.143.173.58]) by mx1.FreeBSD.org (Postfix) with SMTP id 53FFC43D55 for ; Wed, 21 Apr 2004 10:06:08 -0700 (PDT) (envelope-from decibel@decibel.org) Received: (qmail 48539 invoked by uid 1001); 21 Apr 2004 17:05:57 -0000 Date: Wed, 21 Apr 2004 12:05:56 -0500 From: "Jim C. Nasby" To: Uwe Doering Message-ID: <20040421170556.GB41429@nasby.net> References: <20040420195010.GZ87362@nasby.net> <4085869E.7090306@he.iki.fi> <4085AA4B.1020700@geminix.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4085AA4B.1020700@geminix.org> X-Operating-System: FreeBSD 4.9-RELEASE-p3 i386 X-Distributed: Join the Effort! http://www.distributed.net User-Agent: Mutt/1.5.6i cc: freebsd-performance@freebsd.org Subject: Re: vfs.hirunningspace on a 3ware 8506 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Apr 2004 17:06:08 -0000 On Wed, Apr 21, 2004 at 12:55:07AM +0200, Uwe Doering wrote: > Petri Helenius wrote: > >Jim C. Nasby wrote: > > > >>Has anyone done any testing to see what value of vfs.hirunningspace is > >>optimal for a 3ware 8506-8? > >> > >Do the 3ware controllers actually care about this value due to the > >onboard processing and cache? I thought all writes are satisfied > >immediately? > > The controller itself doesn't care, but the kernel does. With the > current implementation, the amount of memory associated with outstanding > read requests is subtracted from vfs.hirunningspace. With many > concurrent read requests there is no reserve left for write operations, > so write performance can suffer substantially. > > This balancing effect is actually intended in order to give read > requests some priority, but in high performance systems with fast, > caching raid controllers the default value of said variable is too low > and therefore poses a bottleneck. Unfortunately, it seems the 8500 series only has 1.8MB of cache, so it seems like the out-of-the-box setting of 1M may not be too far off. Is it normally advisable to set vfs.hirunningspace = whatever the controller's cache is? -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" From owner-freebsd-performance@FreeBSD.ORG Wed Apr 21 11:42:48 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F2AF616A4CE for ; Wed, 21 Apr 2004 11:42:47 -0700 (PDT) Received: from gen129.n001.c02.escapebox.net (gen129.n001.c02.escapebox.net [213.73.91.129]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8411B43D4C for ; Wed, 21 Apr 2004 11:42:47 -0700 (PDT) (envelope-from gemini@geminix.org) Message-ID: <4086C0A3.30407@geminix.org> Date: Wed, 21 Apr 2004 20:42:43 +0200 From: Uwe Doering Organization: Private UNIX Site User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.6) Gecko/20040119 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20040420195010.GZ87362@nasby.net> <4085869E.7090306@he.iki.fi> <4085AA4B.1020700@geminix.org> <20040421170556.GB41429@nasby.net> In-Reply-To: <20040421170556.GB41429@nasby.net> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Received: from gemini by geminix.org with asmtp (TLSv1:AES256-SHA:256) (Exim 3.36 #1) id 1BGMgH-0009x7-00; Wed, 21 Apr 2004 20:42:46 +0200 Subject: Re: vfs.hirunningspace on a 3ware 8506 X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Apr 2004 18:42:48 -0000 Jim C. Nasby wrote: > On Wed, Apr 21, 2004 at 12:55:07AM +0200, Uwe Doering wrote: >>Petri Helenius wrote: >>>Jim C. Nasby wrote: >>> >>>>Has anyone done any testing to see what value of vfs.hirunningspace is >>>>optimal for a 3ware 8506-8? >>> >>>Do the 3ware controllers actually care about this value due to the >>>onboard processing and cache? I thought all writes are satisfied >>>immediately? >> >>The controller itself doesn't care, but the kernel does. With the >>current implementation, the amount of memory associated with outstanding >>read requests is subtracted from vfs.hirunningspace. With many >>concurrent read requests there is no reserve left for write operations, >>so write performance can suffer substantially. >> >>This balancing effect is actually intended in order to give read >>requests some priority, but in high performance systems with fast, >>caching raid controllers the default value of said variable is too low >>and therefore poses a bottleneck. > > Unfortunately, it seems the 8500 series only has 1.8MB of cache, so it > seems like the out-of-the-box setting of 1M may not be too far off. > > Is it normally advisable to set vfs.hirunningspace = whatever the > controller's cache is? Well, 1.8 MB is not much. The Adaptec controllers we use have 16 MB buffer space, so I set 'vfs.hirunningspace' to half of that (8 MB). Basically by the seat of my pants. My thinking was that in cases where all of this amount was used for write operations the kernel shouldn't be able to completely flush the controller's buffer in one go. Just a conservative approach. In your case, you'll probably get away with picking a value equal to or even larger than the controller's cache. It might result in a slight performance penalty, but in case your database server really suffers from write starvation due to many concurrent read requests, removing this bottleneck is likely to outweigh that penalty by far. In any case, unless the effect of tweaking 'vfs.hirunningspace' isn't outright spectacular you will probably have to run benchmark tests in order to find the best value for your server. Uwe -- Uwe Doering | EscapeBox - Managed On-Demand UNIX Servers gemini@geminix.org | http://www.escapebox.net