From owner-freebsd-database@FreeBSD.ORG Wed Jul 23 23:57:08 2003 Return-Path: Delivered-To: freebsd-database@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8300737B401; Wed, 23 Jul 2003 23:57:08 -0700 (PDT) Received: from stork.mail.pas.earthlink.net (stork.mail.pas.earthlink.net [207.217.120.188]) by mx1.FreeBSD.org (Postfix) with ESMTP id 9923D43F75; Wed, 23 Jul 2003 23:57:07 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from user-2ivfnj3.dialup.mindspring.com ([165.247.222.99] helo=mindspring.com) by stork.mail.pas.earthlink.net with asmtp (SSLv3:RC4-MD5:128) (Exim 3.33 #1) id 19fa1e-00012v-00; Wed, 23 Jul 2003 23:56:31 -0700 Message-ID: <3F1F82DD.966C7F2F@mindspring.com> Date: Wed, 23 Jul 2003 23:55:25 -0700 From: Terry Lambert X-Mailer: Mozilla 4.79 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: "Jim C. Nasby" References: <200307191818.13516.paul@pathiakis.com> <20030720110939.GN24507@perrin.int.nxad.com> <20030720164237.GC55392@nasby.net> <20030722143449.B10666@smtp.k12us.com> <3F1E297E.70962D97@mindspring.com> <20030723144700.GL55392@nasby.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-ELNK-Trace: b1a02af9316fbb217a47c185c03b154d40683398e744b8a44ca18453e58061dd37fa2b3b25b9f726350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c cc: Sean Chittenden cc: freebsd-database@freebsd.org cc: freebsd-performance@freebsd.org cc: Christopher Weimann cc: Paul Pathiakis Subject: Re: Tuning for PostGreSQL Database X-BeenThere: freebsd-database@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Database use and development under FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jul 2003 06:57:08 -0000 "Jim C. Nasby" wrote: [ ... quote of me and quote of Matt Dillon's "Blue Prints" article ... ] > The question I have is: can pages in the inactive queue be used as disk > cache? The answer is "yes, they can be reactivated and written to before they are flushed if soft updates is enabled" and "yes, they can be reactivated and read (but not written) to before they are flushed if soft updates is not enabled". In general, this only happens for data pages, which is to say, the pages containing user file data. Pages containing FS metadata are specifically considered as "write through" or "virtually write through". It doesn't happen for data pages, if they are explicitly fsync'ed to ensure write ordering is guaranteed. Metadata pages will be marked as "busy" by the system until they are written out in dependency order, once a write is started on the page in question. Effectively, they are "read-only", and reads do not stall, but new writes stall, until the write completes. This only happens *after* the write hits the block I/O subsystem. In reality, the pages are treated as copy-on-write, with a blocking semantic to ensure metadata serialization (e.g. if there was a bwrite in progress and a bdwrite was requested, it could go through, but another bdwrite would be blocked until the first finished. IF there are multiple operations in progress in the same page, AND there are no dependencies between the operations, AND soft updates is enabled, AND the write has been paced on the soft updates clock wheel to be written AND the wheel has not progressed to the point where the write has actually been taken off the wheel and scheduled in the I/O subsystem, THEN the write may be scheduled to occur simultaneously, IF there are no intermediate dependent writes that need to take place. In other words, if the dependency is "soft", then it can gather any modifications to a single page together, and save I/O operations (or in the case of create-write-delete for a shortlived intermediate file, it can avoid the writes altogether. All this boils down to one thing: in the normal case, metadata write ordering is implicitly guaranteed in all cases where it is not specifically declined at the time the FS is mounted (via the "async" option, the "noatime" option, etc.), all of which are disabled by default. > Or maybe a better question would be: what does each memory catagory in > top mean? > Mem: 365M Active, 1400M Inact, 168M Wired, 76M Cache, 199M Buf, 3008K Free Depends on the version of "top" you are running. The statistics we keep are in the "struct vmmeter" in the file /usr/src/sys/sys/vmmeter.h. The meaning of these statistics varies slightly, over time, so that's not fixed either (but I've seen more changes in "top" than FreeBSD). The place to look for their meanings is first in the source code for the version of "top" you are running, to see what fields they are using and how/if they are combining them mathematically, and then second, in the code that updates the variables you are interested in (usually meaning code that lives in /usr/src/sys/vm/*.c). Honestly, if you aren't able to dig the information out, you are not likely to be able to understand the answer the way it was intended to be understood, if someone comes right out and tells you. Kirk McKusick is rumored to be working on a FreeBSD Internals book, but we are going on 3 years for that rumor. I started one, and I updated it several times in the process, but, frankly, FreeBSD will not stand still long enough for a single person to document it well, and I discontinued work on it at about the 4.6-RELEASE level. IMO, writing a good book takes at least 2080 hours on the part of the author(s), which is equivalent to a full time job for a year, and it also takes a willingness on the part o technical reviewer(s) to spend a lot of time on the review process, in order for the book to be any good (e.g. I spent probably 200 total hours in the review process on Uresh Vahalia's "UNIX Internals: The New Frontiers" for Prentice Hall's technical editor on that project). > Is there anywhere that clearly defines what each queue is, and how it's > used? The source code for a particular version tag does, for a version built from that particular version tag, and probably only that version. -- Terry