From owner-freebsd-arch Sun May 13 5:38:42 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id CA7C237B423; Sun, 13 May 2001 05:38:36 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f4DCcPp08076; Sun, 13 May 2001 14:38:25 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: arch@freebsd.org, current@freebsd.org Subject: /dev/tty and device cloning. From: Poul-Henning Kamp Date: Sun, 13 May 2001 14:38:25 +0200 Message-ID: <8074.989757505@critter> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG /dev/tty is as we all know quite a hack: it is a variant symlink is disguise. I have create a patch which cleans up the way /dev/tty works using the DEVFS cloning mechanism, and I would like to get feedback on it. http://phk.freebsd.dk/patch/devtty.patch 20010513 devtty.patch This patch slightly changes the way device cloning works. After the cloners have been called we try to find the match not by name but by the dev_t the successfull cloner gave us. This allows "cloning by a different name" activities. The patch to kern/tty_tty.c exploits this feature to remove the the magic and bogus /dev/tty device, and instead /dev/tty by clone-magic points at the process' controlling terminal. (The patch also contains a few cleanups) The following difference results: Without this patch: critter phk> tty /dev/ttypd critter phk> ls -li /dev/ttypd /dev/tty 12 crw-rw-rw- 1 root wheel 1, 0 13 Maj 14:22 /dev/tty 109 crw--w---- 1 phk tty 5, 13 13 Maj 14:28 /dev/ttypd With this patch: syv# tty /dev/ttyp0 syv# ls -li /dev/ttyp0 /dev/tty 87 crw--w---- 1 root tty 5, 0 May 13 12:27 /dev/tty 87 crw--w---- 1 root tty 5, 0 May 13 12:27 /dev/ttyp0 -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun May 13 10:22:44 2001 Delivered-To: freebsd-arch@freebsd.org Received: from netbank.com.br (garrincha.netbank.com.br [200.203.199.88]) by hub.freebsd.org (Postfix) with ESMTP id 3493A37B422 for ; Sun, 13 May 2001 10:22:39 -0700 (PDT) (envelope-from riel@conectiva.com.br) Received: from surriel.ddts.net (1-248.ctame701-1.telepar.net.br [200.181.137.248]) by netbank.com.br (Postfix) with ESMTP id 0754046806; Sun, 13 May 2001 14:24:03 -0300 (BRST) Received: from localhost (oqkcne@localhost [127.0.0.1]) by surriel.ddts.net (8.11.3/8.11.2) with ESMTP id f4DHMLi11752; Sun, 13 May 2001 14:22:25 -0300 Date: Sun, 13 May 2001 14:22:21 -0300 (BRST) From: Rik van Riel X-Sender: riel@imladris.rielhome.conectiva To: Matt Dillon Cc: arch@freebsd.org, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping In-Reply-To: <200105122358.f4CNwEr20137@earth.backplane.com> Message-ID: X-spambait: aardvark@kernelnewbies.org X-spammeplease: aardvark@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, 12 May 2001, Matt Dillon wrote: > :But if the larger processes never get a chance to make decent > :progress without thrashing, won't your system be slowed down > :forever by these (thrashing) large processes? > : > :It's nice to protect your small processes from the large ones, > :but if the large processes don't get to run to completion the > :system will never get out of thrashing... > > Consider the case where you have one large process and many small > processes. If you were to skew things to allow the large process to > run at the cost of all the small processes, you have just inconvenienced > 98% of your users so one ozob can run a big job. So we should not allow just one single large job to take all of memory, but we should allow some small jobs in memory too. > What if there are several big jobs? If you skew things in favor of > one the others could take 60 seconds *just* to recover their RSS when > they are finally allowed to run. So much for timesharing... you > would have to run each job exclusively for 5-10 minutes at a time > to get any sort of effiency, which is not practical in a timeshare > system. So there is really very little that you can do. If you don't do this very slow swapping, NONE of the big tasks will have the opportunity to make decent progress and the system will never get out of thrashing. If we simply make the "swap time slices" for larger processes larger than for smaller processes we: 1) have a better chance of the large jobs getting any work done 2) won't have the large jobs artificially increase memory load, because all time will be spent removing each other's RSS 3) can have more small jobs in memory at once, due to 2) 4) can be better for interactive performance due to 3) 5) have a better chance of getting out of the overload situation sooner I realise this would make the scheduling algorithm slightly more complex and I'm not convinced doing this would be worth it myself, but we may want to do some brainstorming over this ;) regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sun May 13 16:37:43 2001 Delivered-To: freebsd-arch@freebsd.org Received: from beastie.mckusick.com (beastie.mckusick.com [209.31.233.184]) by hub.freebsd.org (Postfix) with ESMTP id E7B8137B424 for ; Sun, 13 May 2001 16:37:37 -0700 (PDT) (envelope-from mckusick@mckusick.com) Received: from beastie.mckusick.com (localhost [127.0.0.1]) by beastie.mckusick.com (8.9.3/8.9.3) with ESMTP id QAA21864; Sun, 13 May 2001 16:37:33 -0700 (PDT) (envelope-from mckusick@beastie.mckusick.com) Message-Id: <200105132337.QAA21864@beastie.mckusick.com> To: "Niels Chr. Bank-Pedersen" Subject: Re: Background Fsck Cc: arch@FreeBSD.ORG In-Reply-To: Your message of "Sat, 31 Mar 2001 14:12:36 +0200." <20010331141236.B98023@bank-pedersen.dk> Date: Sun, 13 May 2001 16:37:33 -0700 From: Kirk McKusick Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Sorry for the slow response, but I did not ignore your email :-) I believe that I have found and fixed the bug that was causing the errors that you have reported below. Kirk =-=-=-=-=-= Date: Sat, 31 Mar 2001 14:12:36 +0200 From: "Niels Chr. Bank-Pedersen" To: Kirk McKusick Cc: arch@FreeBSD.ORG Subject: Re: Background Fsck On Fri, Mar 30, 2001 at 03:38:42PM -0800, Kirk McKusick wrote: [...] > In general `fsck -p' will not fix > everything, but `fsck -y' should always succeed (though > success may be an empty filesystem). This made me remember a problem I see on a -current machine (rebuild yesterday): THE FOLLOWING FILE SYSTEM HAD AN UNEXPECTED INCONSISTENCY: ufs: /dev/da0s1f (/var) Automatic file system check failed . . . help! Enter full pathname of shell or RETURN for /bin/sh: # fsck -y /var ** /dev/da0s1f ** Last Mounted on /var ** Phase 1 - Check Blocks and Sizes PARTIALLY TRUNCATED INODE I=31753 UNEXPECTED SOFT UPDATE INCONSISTENCY INCORRECT BLOCK COUNT I=39698 (1904 should be 1808) CORRECT? yes PARTIALLY TRUNCATED INODE I=87341 UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups FREE BLK COUNT(S) WRONG IN SUPERBLK SALVAGE? yes SUMMARY INFORMATION BAD SALVAGE? yes BLK(S) MISSING IN BIT MAPS SALVAGE? yes 391 files, 128648 used, 407159 free (351 frags, 50851 blocks, 0.1% fragmentation) ***** FILE SYSTEM MARKED CLEAN ***** ***** FILE SYSTEM WAS MODIFIED ***** # fsck -y /var ** /dev/da0s1f ** Last Mounted on /var ** Phase 1 - Check Blocks and Sizes PARTIALLY TRUNCATED INODE I=31753 UNEXPECTED SOFT UPDATE INCONSISTENCY PARTIALLY TRUNCATED INODE I=87341 UNEXPECTED SOFT UPDATE INCONSISTENCY ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cyl groups 391 files, 128648 used, 407159 free (351 frags, 50851 blocks, 0.1% fragmentation) # tunefs -p /var tunefs: soft updates: (-n) enabled tunefs: maximum contiguous block count: (-a) 15 tunefs: rotational delay between contiguous blocks: (-d) 0 ms tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time As I read your mail, this should not be possible, or? > Kirk TIA, Niels Chr. -- Niels Christian Bank-Pedersen, NCB1-RIPE. Network Manager, Tele Danmark NET, IP-section. "Hey, are any of you guys out there actually *using* RFC 2549?" To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon May 14 20:37:15 2001 Delivered-To: freebsd-arch@freebsd.org Received: from gull.mail.pas.earthlink.net (gull.mail.pas.earthlink.net [207.217.121.85]) by hub.freebsd.org (Postfix) with ESMTP id 5CE2437B423; Mon, 14 May 2001 20:37:11 -0700 (PDT) (envelope-from dleimbac@earthlink.net) Received: from 1Cust79.tnt1.starkville.ms.da.uu.net (1Cust79.tnt1.starkville.ms.da.uu.net [63.30.107.79]) by gull.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id UAA19677; Mon, 14 May 2001 20:37:07 -0700 (PDT) Message-Id: <200105150337.UAA19677@gull.mail.pas.earthlink.net> Date: Mon, 14 May 2001 22:39:32 CDT From: dave To: freebsd-questions@freebsd.org, arch@freebsd.org Subject: Gettimeofday Again... Reply-To: dleimbac@earthlink.net X-Mailer: Spruce 0.6.5 for X11 w/smtpio 0.7.9 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Well I have been on the IRC in and out of mail list archives and cannot get a good answer to this question... Why does gettimeofday perform so poorly on FreeBSD vs the same hardware on Linux 2.4.2? ---SNIP----- #include #include int main() { struct timeval tv1, tv2, tv3; gettimeofday(&tv1, 0); gettimeofday(&tv2, 0); gettimeofday(&tv3, 0); printf("Time 1 %d:%d\n", tv1.tv_sec, tv1.tv_usec); printf("Time 2 %d:%d\n", tv3.tv_sec, tv3.tv_usec); } ----SNIP----- I get anywhere from 14usec to 17usec just for the call to gettimeofday. On the 2.4.2 linux kernel its something like 3usec. I just want to know why we are so much slower. I have heard the "caching" argument and it doesn't float very well since all I can find is CPU based L1 cache which should also apply to FreeBSD. That's principle of locality stuff and more on the hardware side. Since I am running on identical hardware I don't see how this is possible except for a potential problem/lack of optimization in FreeBSD. Don't get me wrong... I love FreeBSD... and it generally outperforms linux everywhere I use it. I just don't understand this huge gap in performance. Concerned.... Dave Leimbach To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon May 14 20:46:25 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 6F61237B42C; Mon, 14 May 2001 20:46:21 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4F3kLE45720; Mon, 14 May 2001 20:46:21 -0700 (PDT) (envelope-from dillon) Date: Mon, 14 May 2001 20:46:21 -0700 (PDT) From: Matt Dillon Message-Id: <200105150346.f4F3kLE45720@earth.backplane.com> To: dave Cc: freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... References: <200105150337.UAA19677@gull.mail.pas.earthlink.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :Well I have been on the IRC in and out of mail list archives and cannot get :a good answer to this question... : :Why does gettimeofday perform so poorly on FreeBSD vs the same hardware on :Linux 2.4.2? Why should it matter? gettimeofday() is not something that is typically called in a tight loop, it would be silly to optimize it. 17uS is plenty fast enough. Systems rarely have performance problems due to syscall overhead. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon May 14 22:43:15 2001 Delivered-To: freebsd-arch@freebsd.org Received: from inet03.citec.qld.gov.au (inet03.citec.qld.gov.au [203.5.10.10]) by hub.freebsd.org (Postfix) with ESMTP id 8C21F37B43C for ; Mon, 14 May 2001 22:43:06 -0700 (PDT) (envelope-from Tom.Bounxokvan@qsuper.qld.gov.au) Received: by inet03.citec.qld.gov.au; id PAA10543; Tue, 15 May 2001 15:42:54 +1000 (EST) From: Received: from pyxis-ext.qsuper.qld.gov.au(pyxis.qsuper.qld.gov.au 147.132.142.125) by inet03.citec.qld.gov.au via smap (V2.0) id xma010219; Tue, 15 May 01 15:42:43 +1000 Received: from pegasus.qsuper.qld.gov.au by pyxis-ext.qsuper.qld.gov.au via smtpd (for inet3.citec.qld.gov.au [147.132.176.23]) with SMTP; 15 May 2001 05:43:36 UT Received: (private information removed) X-Lotus-FromDomain: QSUPER To: freebsd-arch@FreeBSD.org Message-ID: <4A256A4D.001F7C42.00@pegasus.qsuper.qld.gov.au> Date: Tue, 15 May 2001 15:42:39 +1000 Subject: What is FreeBSD??? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hello, I am a Linux user, but I'm thinking of giving FreeBSD ago. I am thinking of downloading FreeBSD today, but just wonder what is BSD like? Do it has GUI? Is it easy to connection to window partition in the case of multiple OS on one box? Does it include Samba and NSF? Is it good for Desktop user? The lattest version for now is 4.3 right, Is the next version will be 5.0???? please reply me all the detail and pictures of the BSD too if you can I really do appreciate it.. Thank you Tom EMAIL CONFIDENTIALITY: This email is for its intended recipient only and may contain confidential and/or privileged information. If this email has been sent to you in error, you must not copy or distribute any information which it contains. Following your receipt of this email, please forward to and notify the Government Superannuation Office at the email address listed above. Following such notification, please delete this email. Please note that any views expressed in this email are those of the individual sender, except where the sender specifically states such views to be those of the Government Superannuation Office. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon May 14 22:49: 8 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by hub.freebsd.org (Postfix) with ESMTP id 4112137B424 for ; Mon, 14 May 2001 22:49:02 -0700 (PDT) (envelope-from michael.schuster@sun.com) Received: from sun-gy.Germany.Sun.COM ([129.157.128.5]) by mercury.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id WAA03454; Mon, 14 May 2001 22:48:55 -0700 (PDT) Received: from hacker.Germany.Sun.COM (hacker [129.157.133.195]) by sun-gy.Germany.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v2.1p1) with ESMTP id HAA14865; Tue, 15 May 2001 07:48:53 +0200 (MEST) Received: from sun.com (localhost [127.0.0.1]) by hacker.Germany.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id HAA20032; Tue, 15 May 2001 07:48:52 +0200 (MEST) Message-ID: <3B00C344.4B952453@sun.com> Date: Tue, 15 May 2001 07:48:52 +0200 From: Michael Schuster X-Mailer: Mozilla 4.77 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: Tom.Bounxokvan@qsuper.qld.gov.au Cc: freebsd-arch@freebsd.org Subject: Re: What is FreeBSD??? References: <4A256A4D.001F7C42.00@pegasus.qsuper.qld.gov.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Tom.Bounxokvan@qsuper.qld.gov.au wrote: > > Hello, > > I am a Linux user, but I'm thinking of giving FreeBSD ago. I am thinking of > downloading > FreeBSD today, that's a very good idea ... but why don't you go and have a look at http://www.freebsd.org/ ... there's lots of lovely stuff there. I hope you enjoy what you find and give FreeBSD a try, I'm sure you'll have lots of fun. btw: if you have specific questions that are not answered anywhere on that webpage, try freebsd-questions@freebsd.org first :-) regards Michael -- Michael Schuster / Michael.Schuster@sun.com Sun Microsystems GmbH / (+49 89) 46008-2974 | x62974 Sonnenallee 1, D-85551 Kirchheim-Heimstetten Recursion, n.: see 'Recursion' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Mon May 14 23:38: 0 2001 Delivered-To: freebsd-arch@freebsd.org Received: from maynard.mail.mindspring.net (maynard.mail.mindspring.net [207.69.200.243]) by hub.freebsd.org (Postfix) with ESMTP id BAEE337B422 for ; Mon, 14 May 2001 23:37:56 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (pool0356.cvx21-bradley.dialup.earthlink.net [209.179.193.101]) by maynard.mail.mindspring.net (8.9.3/8.8.5) with ESMTP id CAA10173; Tue, 15 May 2001 02:37:45 -0400 (EDT) Message-ID: <3B00CECF.9A3DEEFA@mindspring.com> Date: Mon, 14 May 2001 23:38:07 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Rik van Riel Cc: Matt Dillon , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Rik van Riel wrote: > So we should not allow just one single large job to take all > of memory, but we should allow some small jobs in memory too. Historically, this problem is solved with a "working set quota". > If you don't do this very slow swapping, NONE of the big tasks > will have the opportunity to make decent progress and the system > will never get out of thrashing. > > If we simply make the "swap time slices" for larger processes > larger than for smaller processes we: > > 1) have a better chance of the large jobs getting any work done > 2) won't have the large jobs artificially increase memory load, > because all time will be spent removing each other's RSS > 3) can have more small jobs in memory at once, due to 2) > 4) can be better for interactive performance due to 3) > 5) have a better chance of getting out of the overload situation > sooner > > I realise this would make the scheduling algorithm slightly > more complex and I'm not convinced doing this would be worth > it myself, but we may want to do some brainstorming over this ;) A per vnode working set quota with a per use count adjust would resolve most load thrashing issues. Programs with large working sets can either be granted a case by case exception (via rlimit), or, more likely just have their pages thrashed out more often. You only ever need to do this when you have exhausted memory to the point you are swapping, and then only when you want to reap cached clean pages; when all you have left is dirty pages in memory and swap, you are well and truly thrashing -- for the right reason: your system load is too high. It's also relatively easy to implement something like a per vnode working set quota, which can be self-enforced, without making the scheduler so ugly that you will never be able to do things like have per-CPU run queues for a very efficient SMP that deals with the cache locality issue naturally and easily (by merely setting migration policies for moving from one run queue to another, and by threads in a thread group having negative affinity for each other's CPUs, to maximize real concurrency). Psuedo code: IF THRASH_CONDITIONS IF (COPY_ON_WRITE_FAULT OR PAGE_FILL_OF_SBRKED_PAGE_FAULT) IF VNODE_OVER_WORKING_SET_QUOTA STEAL_PAGE_FROM_VNODE_LRU ELSE GET_PAGE_FROM_SYSTEM Obviously, this would work for vnodes that were acting as backing store for programs, just as they would prevent a large mmap() with a traversal from thrashing everyone else's data and code out of core (which is, I think, a much worse and much more common problem). Doing extremely complicated things is only going to get you into trouble... in particular, you don't want to have policy in effect to deal with border load conditions unless you are under those conditions in the first place. The current scheduling algorithms are quite simple, relatively speaking, and it makes much more sense to make the thrasher fight with themselves, rather than them peeing in everyone's pool. I think that badly written programs taking more time, as a result, is not a problem; if it is, it's one I could live with much more easily than cache-busting for no good reason, and slowing well behaved code down. You need to penalize the culprit. It's possible to do a more complicated working set quota, which actually applies to a process' working set, instead of to vnodes, out of context with the process, but I think that the vnode approach, particularly when you bump the working set up per each additional opener, using the count I suggested, to ensure proper locality of reference, is good enough to solve the problem. At the very least, the system would not "freeze" with this approach, even if it could later recover. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 1:40:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id B025237B43C; Tue, 15 May 2001 01:40:13 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f4F8dup30517; Tue, 15 May 2001 10:39:57 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: dleimbac@earthlink.net Cc: freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: Your message of "Mon, 14 May 2001 22:39:32 CDT." <200105150337.UAA19677@gull.mail.pas.earthlink.net> Date: Tue, 15 May 2001 10:39:56 +0200 Message-ID: <30515.989915996@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200105150337.UAA19677@gull.mail.pas.earthlink.net>, dave writes: > > >Well I have been on the IRC in and out of mail list archives and cannot get >a good answer to this question... > >Why does gettimeofday perform so poorly on FreeBSD vs the same hardware on >Linux 2.4.2? > >---SNIP----- >#include >#include > >int main() { > struct timeval tv1, tv2, tv3; > > gettimeofday(&tv1, 0); > gettimeofday(&tv2, 0); > gettimeofday(&tv3, 0); > > printf("Time 1 %d:%d\n", tv1.tv_sec, tv1.tv_usec); > printf("Time 2 %d:%d\n", tv3.tv_sec, tv3.tv_usec); > >} >----SNIP----- > >I get anywhere from 14usec to 17usec just for the call to gettimeofday. > >On the 2.4.2 linux kernel its something like 3usec. > >I just want to know why we are so much slower. Because we havn't particularly optimised it. If we want to, we can get to the point where a machine with a usable TSC doesn't even have to enter the kernel. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 1:56:22 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp6.mindspring.com (smtp6.mindspring.com [207.69.200.110]) by hub.freebsd.org (Postfix) with ESMTP id 3DB6B37B424; Tue, 15 May 2001 01:56:16 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (pool0356.cvx21-bradley.dialup.earthlink.net [209.179.193.101]) by smtp6.mindspring.com (8.9.3/8.8.5) with ESMTP id EAA05760; Tue, 15 May 2001 04:56:14 -0400 (EDT) Message-ID: <3B00EF40.A1232B75@mindspring.com> Date: Tue, 15 May 2001 01:56:32 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matt Dillon Cc: dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... References: <200105150337.UAA19677@gull.mail.pas.earthlink.net> <200105150346.f4F3kLE45720@earth.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon wrote: > :Well I have been on the IRC in and out of mail list archives > :and cannot geta good answer to this question... > : > :Why does gettimeofday perform so poorly on FreeBSD vs the > :same hardware on Linux 2.4.2? > > Why should it matter? gettimeofday() is not something > that is typically called in a tight loop, it would be > silly to optimize it. 17uS is plenty fast enough. > > Systems rarely have performance problems due to syscall > overhead. I think that perhaps you aren't doing high performance server work, if you really think 17uS is "plenty fast enough". I have an application where gettimeofday() was a significant fraction of the overhead; it was being used to provide for marketing eye-candy... basically, squid-compatible proxy logging that could be processed using common eye-candy tools normally used against squid logs for transaction in, time sent to back end, time data came from back end, and time data sent to client. The gettimeofday() calls were _the_ major useless overhead, until I eliminated them by creating a zero system call version that someone (was it Bruce?) insisted was an impossibility (don't think that drift wasn't problematic to eliminate, in the case of a clock interrupt in the middle of the huge calculation). That's a savings of 68uS per transaction. IMO, the 12uS that Linux would have taken for the same 4 calls would also be too much overhead to be acceptable for the application in question. The timecounter code has a huge calculation overhead, by default, as well. I really dislike the amount of data that has to be pushed around, and I really dislike the amount of code that has to execute to do the calculation, whether it be in kernel or in user space. It doesn't help that the data structure it uses is a hidden copy that only has a reference when you are making the call via a structure pointer. Linux is generally faster, because it has a global "current time", and just copies it out from the global, while FreeBSD does this huge dance with many structure pointer dereferences each time, and ends up calling two functions deep to do the deed (or three, if you use time(2) instead of getimeofday(2)). Really ugly code, which appears designed to allow us to use alternate timecounters "because the i8254 is too slow", and then we just use it anyway, because the Pentium cycle counter is "not accurate enough". Foo. We were better off before all the timecounter crap, when we had much less data moving around, and a nice struct timeval global that could be cheaply copied; the new code doesn't do what it was supposed to do (let us use a less expensive method of getting a uS level time of day clock accuracy), and is vastly more expensive and complex. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 1:58: 3 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id D4D2A37B423; Tue, 15 May 2001 01:57:59 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f4F8vdp31191; Tue, 15 May 2001 10:57:39 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: tlambert2@mindspring.com Cc: Matt Dillon , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: Your message of "Tue, 15 May 2001 01:56:32 PDT." <3B00EF40.A1232B75@mindspring.com> Date: Tue, 15 May 2001 10:57:39 +0200 Message-ID: <31189.989917059@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <3B00EF40.A1232B75@mindspring.com>, Terry Lambert writes: >I think that perhaps you aren't doing high performance >server work, if you really think 17uS is "plenty fast enough". > >I have an application where gettimeofday() was a significant >fraction of the overhead; In that case change the kernel to use getmicrotime() instead of microtime(). -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 3:48:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 561AD37B422; Tue, 15 May 2001 03:48:08 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id UAA12912; Tue, 15 May 2001 20:47:52 +1000 Date: Tue, 15 May 2001 20:46:35 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Poul-Henning Kamp Cc: dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <30515.989915996@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 15 May 2001, Poul-Henning Kamp wrote: > In message <200105150337.UAA19677@gull.mail.pas.earthlink.net>, dave writes: > > > >Why does gettimeofday perform so poorly on FreeBSD vs the same hardware on > >Linux 2.4.2? > > > >---SNIP----- > >#include > >#include > > > >int main() { > > struct timeval tv1, tv2, tv3; > > > > gettimeofday(&tv1, 0); > > gettimeofday(&tv2, 0); > > gettimeofday(&tv3, 0); > > > > printf("Time 1 %d:%d\n", tv1.tv_sec, tv1.tv_usec); > > printf("Time 2 %d:%d\n", tv3.tv_sec, tv3.tv_usec); > > > >} > >----SNIP----- > > > >I get anywhere from 14usec to 17usec just for the call to gettimeofday. I get 2 or 3 usec per call (on a Celeron 5.5 * 96MHz). > >On the 2.4.2 linux kernel its something like 3usec. > > > >I just want to know why we are so much slower. Probably primarily because FreeBSD is misconfigured to used the i8254 timecounter and Linux is configured to use the TSC. I can get closer to your results by suitably misconfiguring FreeBSD: # sysctl -w kern.timecounter.hardware=i8254 The test then takes 5, 6 or 7 usec per call. # sysctl -w kern.timecounter.hardware=TSC # put it back before I forget :) Secondarily, because Linux syscalls are faster than FreeBSD's in general. Most of the 2-3 usec times above is for generic syscall overhead (i.e., the overhead for an almost minimal syscall() like getppid() that has no args and returns one value in a register). Some measurements of this overhead on the above system by lmbench2: Linux-2.2.9: 0.7 usec Linux-2.4.0.something: 0.6 usec FreeBSD-4.something: 0.8 usec (maybe my version of it) FreeBSD-current: 1.4 usec (KTRACE but no INVARIANT* or WITNESS*, etc.) FreeBSD-current: 1.6 usec (my version of it) FreeBSD-current-SMP: 2.2 usec gettimeofday() has slightly more overhead (for copying in 2 args and copying out one timeval), but not much more. The above times are for getppid() in a loop, and the corresponding time for gettimeofday() in a loop is only 2.045 usec. So my version of -current is taking only 0.445 usec to do the actual work for gettimeofday(). My version of microtime() even does more work than the standard one (it updates time_second, timecounter->tc_microtime and timecounter->tc_nanotime so that the `get' functions are monotonic). > Because we havn't particularly optimised it. If we want to, we Actually we (mainly you) have optimized most of it more than it is possible (microtime-went backwards problems are a symptom of over- optimization). OTOH, syscalls haven't been optimized; they have mostly been pessimized, especially in SMPng. > can get to the point where a machine with a usable TSC doesn't > even have to enter the kernel. That would be about as useful as optimizing getppid(). (The details would be similar -- you would have to signal applications or something to tell them when a relevant kernel state change occurs.) Optimizing syscalls generally would be more useful, but is still only important for unusual applications that make too many syscalls. Some history of the performance of gettimeofday() under FreeBSD: output from an old benchmark/test program written by wollman: ==> dx2-66 <== 1995/11/03: min 13, max 171, mean 14.286634, std 1.836667 1th: 14 (1053245 observations) 2th: 15 (587593 observations) 3th: 13 (299847 observations) 4th: 16 (51817 observations) 5th: 33 (934 observations) 2000/11/15: min 20, max 542, mean 21.843003, std 8.003137 1th: 21 (1237408 observations) 2th: 22 (610609 observations) 3th: 23 (68643 observations) 4th: 20 (64561 observations) 5th: 25 (2233 observations) ==> k6-233 <== 1998/02/21: min 2, max 124, mean 2.240976, std 0.617886 1th: 2 (1535123 observations) 2th: 3 (463782 observations) 3th: 18 (302 observations) 4th: 17 (234 observations) 5th: 19 (107 observations) ==> p5-133 <== 1996/07/12: min 3, max 472, mean 3.320346, std 0.694846 1th: 3 (1376420 observations) 2th: 4 (621949 observations) 3th: 12 (594 observations) 4th: 13 (281 observations) 5th: 11 (233 observations) 1998/02/21 pre-phk: min 3, max 595, mean 3.443382, std 0.767383 1th: 3 (1134768 observations) 2th: 4 (863276 observations) 3th: 15 (719 observations) 4th: 5 (299 observations) 5th: 16 (263 observations) 1998/02/21 post-phk: min 4, max 99, mean 4.614527, std 0.710407 1th: 5 (1195577 observations) 2th: 4 (801694 observations) 3th: 16 (707 observations) 4th: 6 (456 observations) 5th: 17 (420 observations) 1999/12/04: min 4, max 120, mean 4.630231, std 0.777733 1th: 5 (1222295 observations) 2th: 4 (775244 observations) 3th: 18 (444 observations) 4th: 19 (441 observations) 5th: 21 (250 observations) ==> cel-450 <== 1999/05/02: min 1, max 44, mean 1.683213, std 0.501948 1th: 2 (1359226 observations) 2th: 1 (639961 observations) 3th: 10 (286 observations) 4th: 11 (123 observations) 5th: 7 (116 observations) ==> cel-458 <== 1999/12/04: min 1, max 29, mean 1.498473, std 0.523034 1th: 1 (1008000 observations) 2th: 2 (991299 observations) 3th: 9 (253 observations) 4th: 10 (144 observations) 5th: 6 (106 observations) ==> cel-522 <== 2000/10/27 min 1, max 125, mean 1.389330, std 0.628235 1th: 1 (1233282 observations) 2th: 2 (765817 observations) 3th: 15 (137 observations) 4th: 4 (119 observations) 5th: 5 (118 observations) 2001/05/15 min 2, max 1194, mean 2.045723, std 0.933668 1th: 2 (1922234 observations) 2th: 3 (76839 observations) 3th: 16 (360 observations) 4th: 15 (280 observations) 5th: 14 (108 observations) Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 4:33:50 2001 Delivered-To: freebsd-arch@freebsd.org Received: from pcnet1.pcnet.com (pcnet1.pcnet.com [204.213.232.3]) by hub.freebsd.org (Postfix) with ESMTP id 0BA8737B423; Tue, 15 May 2001 04:33:46 -0700 (PDT) (envelope-from eischen@vigrid.com) Received: (from eischen@localhost) by pcnet1.pcnet.com (8.8.7/PCNet) id HAA05286; Tue, 15 May 2001 07:33:09 -0400 (EDT) Date: Tue, 15 May 2001 07:33:08 -0400 (EDT) From: Daniel Eischen To: Terry Lambert Cc: Matt Dillon , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <3B00EF40.A1232B75@mindspring.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 15 May 2001, Terry Lambert wrote: > Matt Dillon wrote: > > :Well I have been on the IRC in and out of mail list archives > > :and cannot geta good answer to this question... > > : > > :Why does gettimeofday perform so poorly on FreeBSD vs the > > :same hardware on Linux 2.4.2? > > > > Why should it matter? gettimeofday() is not something > > that is typically called in a tight loop, it would be > > silly to optimize it. 17uS is plenty fast enough. > > > > Systems rarely have performance problems due to syscall > > overhead. > > I think that perhaps you aren't doing high performance > server work, if you really think 17uS is "plenty fast enough". > > I have an application where gettimeofday() was a significant > fraction of the overhead; it was being used to provide for > marketing eye-candy... basically, squid-compatible proxy > logging that could be processed using common eye-candy tools > normally used against squid logs for transaction in, time > sent to back end, time data came from back end, and time data > sent to client. The threads library was also changed to avoid gettimeofday() to avoid the overhead of an extra system call during a thread context switch. It needs the current time to wakeup waiting threads. We switched to a fixed interval timer and only called gettimeofday when the timer went off, but it was necessary to add a couple of hacks because it introduced a couple of other problems. I'd really like an mmap'able gettimeofday/clock_gettime thingie, or some counter whose rate I know. -- Dan Eischen To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 6:41:39 2001 Delivered-To: freebsd-arch@freebsd.org Received: from point.osg.gov.bc.ca (point.osg.gov.bc.ca [142.32.102.44]) by hub.freebsd.org (Postfix) with ESMTP id DCC4D37B422 for ; Tue, 15 May 2001 06:41:35 -0700 (PDT) (envelope-from Cy.Schubert@uumail.gov.bc.ca) Received: (from daemon@localhost) by point.osg.gov.bc.ca (8.8.7/8.8.8) id GAA23162; Tue, 15 May 2001 06:40:41 -0700 Received: from passer.osg.gov.bc.ca(142.32.110.29) via SMTP by point.osg.gov.bc.ca, id smtpda23160; Tue May 15 06:40:24 2001 Received: (from uucp@localhost) by passer.osg.gov.bc.ca (8.11.2/8.9.1) id f4FDeJg51195; Tue, 15 May 2001 06:40:19 -0700 (PDT) Received: from cwsys9.cwsent.com(10.2.2.1), claiming to be "cwsys.cwsent.com" via SMTP by passer9.cwsent.com, id smtpdG51185; Tue May 15 06:39:43 2001 Received: (from uucp@localhost) by cwsys.cwsent.com (8.11.3/8.9.1) id f4FDdcD09937; Tue, 15 May 2001 06:39:38 -0700 (PDT) Message-Id: <200105151339.f4FDdcD09937@cwsys.cwsent.com> Received: from localhost.cwsent.com(127.0.0.1), claiming to be "cwsys" via SMTP by localhost.cwsent.com, id smtpdcE9933; Tue May 15 06:39:06 2001 X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 Reply-To: Cy Schubert - ITSD Open Systems Group From: Cy Schubert - ITSD Open Systems Group X-Sender: schubert To: tlambert2@mindspring.com Cc: Rik van Riel , Matt Dillon , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping In-reply-to: Your message of "Mon, 14 May 2001 23:38:07 PDT." <3B00CECF.9A3DEEFA@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 15 May 2001 06:39:06 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <3B00CECF.9A3DEEFA@mindspring.com>, Terry Lambert writes: > Rik van Riel wrote: > > So we should not allow just one single large job to take all > > of memory, but we should allow some small jobs in memory too. > > Historically, this problem is solved with a "working set > quota". > > > If you don't do this very slow swapping, NONE of the big tasks > > will have the opportunity to make decent progress and the system > > will never get out of thrashing. > > > > If we simply make the "swap time slices" for larger processes > > larger than for smaller processes we: > > > > 1) have a better chance of the large jobs getting any work done > > 2) won't have the large jobs artificially increase memory load, > > because all time will be spent removing each other's RSS > > 3) can have more small jobs in memory at once, due to 2) > > 4) can be better for interactive performance due to 3) > > 5) have a better chance of getting out of the overload situation > > sooner > > > > I realise this would make the scheduling algorithm slightly > > more complex and I'm not convinced doing this would be worth > > it myself, but we may want to do some brainstorming over this ;) > > A per vnode working set quota with a per use count adjust > would resolve most load thrashing issues. Programs with > large working sets can either be granted a case by case > exception (via rlimit), or, more likely just have their > pages thrashed out more often. > > You only ever need to do this when you have exhausted > memory to the point you are swapping, and then only when > you want to reap cached clean pages; when all you have > left is dirty pages in memory and swap, you are well and > truly thrashing -- for the right reason: your system load > is too high. An operating system I worked on at one time, MVS, had this feature (not sure whether it still does today). We called it fencing (e.g. fencing an address space). An address space could be limited to the amount of real memory used. Conversely, important address spaces could be given a minimum amount of real memory, e.g. online applications such a CICS. Additionally instead of limiting an address space to a minimum or maximum amount of real memory, an address space could be limited to a maximum paging rate, giving the O/S the option of increasing its real memory to match its WSS, reducing paging of the specified address space to a preset limit. Of course this could have negative impact on other applications running on the system, which is why IBM recommended against using this feature. Regards, Phone: (250)387-8437 Cy Schubert Fax: (250)387-5766 Team Leader, Sun/Alpha Team Internet: Cy.Schubert@osg.gov.bc.ca Open Systems Group, ITSD, ISTA Province of BC To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 6:54:10 2001 Delivered-To: freebsd-arch@freebsd.org Received: from MPI-Softtech.Com (mpi.mpi-softtech.com [208.60.120.177]) by hub.freebsd.org (Postfix) with ESMTP id AD2FC37B424; Tue, 15 May 2001 06:54:06 -0700 (PDT) (envelope-from dleimbac@MPI-Softtech.Com) Received: from mpi.mpi-softtech.com (mpi.mpi-softtech.com [208.60.120.177]); by MPI-Softtech.Com (8.9.3/8.9.3/MPI-Softtech/evision: 1.3 $) with SMTP; id IAA27481; Tue, 15 May 2001 08:53:51 -0500 (CDT) Message-Id: <200105151353.IAA27481@MPI-Softtech.Com> Date: Tue, 15 May 2001 08:53:51 -0500 (CDT) From: Dave Leimbach Reply-To: Dave Leimbach Subject: Re: Gettimeofday Again... To: tlambert2@mindspring.com, eischen@vigrid.com Cc: dillon@earth.backplane.com, dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: 7dKEzUmoozXmCdoRMH4Upw== X-Mailer: dtmail 1.3.0 CDE Version 1.3 SunOS 5.7 sun4u sparc Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Thanks for all of the great answers... For now if I want to see better performance of the timing code I can just "sysctl -w kern.timecounter.hardware=TSC"?? Are these sysctl options documented anywhere other than the man page? I don't remember seeing that specific option listed. I generally find the documentation of FreeBSD in the man pages to be far ahead of that in linux and I really appreciate the efforts that are done there. In fact if you would like me to document all of the known sysctl's I will start my hunt for them as a new spare time project. One of the best things about linux is the runtime kernel configurability and I just recently learned that FreeBSD has a similar mechanism. Thanks again for the great responses everyone!! Dave Leimbach To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 7:11:14 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 17ACE37B423; Tue, 15 May 2001 07:11:10 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f4FEAjp33657; Tue, 15 May 2001 16:10:45 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Dave Leimbach Cc: tlambert2@mindspring.com, eischen@vigrid.com, dillon@earth.backplane.com, dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: Your message of "Tue, 15 May 2001 08:53:51 CDT." <200105151353.IAA27481@MPI-Softtech.Com> Date: Tue, 15 May 2001 16:10:45 +0200 Message-ID: <33655.989935845@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200105151353.IAA27481@MPI-Softtech.Com>, Dave Leimbach writes: >Thanks for all of the great answers... > >For now if I want to see better performance of the timing code I can just >"sysctl -w kern.timecounter.hardware=TSC"?? You can try it, but depending on a lot of stuff it may not A) work nor B) be a good idea if it does. If you want a faster (but less exact!) gettimeofday, change the call to microtime() to getmicrotime() in the gettimeofday() in kern_time.c -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 7:12:17 2001 Delivered-To: freebsd-arch@freebsd.org Received: from casimir.physics.purdue.edu (casimir.physics.purdue.edu [128.210.146.111]) by hub.freebsd.org (Postfix) with ESMTP id 7312D37B423; Tue, 15 May 2001 07:12:12 -0700 (PDT) (envelope-from will@physics.purdue.edu) Received: by casimir.physics.purdue.edu (Postfix, from userid 1000) id 62C8E17D32; Tue, 15 May 2001 09:06:12 -0500 (EST) Date: Tue, 15 May 2001 09:06:12 -0500 From: Will Andrews To: Dave Leimbach Cc: tlambert2@mindspring.com, eischen@vigrid.com, dillon@earth.backplane.com, dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... Message-ID: <20010515090612.J11113@casimir.physics.purdue.edu> Reply-To: Will Andrews Mail-Followup-To: Will Andrews , Dave Leimbach , tlambert2@mindspring.com, eischen@vigrid.com, dillon@earth.backplane.com, dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG References: <200105151353.IAA27481@MPI-Softtech.Com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.15i In-Reply-To: <200105151353.IAA27481@MPI-Softtech.Com>; from dleimbac@MPI-Softtech.Com on Tue, May 15, 2001 at 08:53:51AM -0500 X-Operating-System: Linux 2.2.18 sparc64 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, May 15, 2001 at 08:53:51AM -0500, Dave Leimbach wrote: > Are these sysctl options documented anywhere other than the man page? I don't > remember seeing that specific option listed. The source. -- wca To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 7:40:36 2001 Delivered-To: freebsd-arch@freebsd.org Received: from MPI-Softtech.Com (mpi.mpi-softtech.com [208.60.120.177]) by hub.freebsd.org (Postfix) with ESMTP id 46BD037B423; Tue, 15 May 2001 07:40:28 -0700 (PDT) (envelope-from dleimbac@MPI-Softtech.Com) Received: from mpi.mpi-softtech.com (mpi.mpi-softtech.com [208.60.120.177]); by MPI-Softtech.Com (8.9.3/8.9.3/MPI-Softtech/evision: 1.3 $) with SMTP; id JAA01279; Tue, 15 May 2001 09:40:12 -0500 (CDT) Message-Id: <200105151440.JAA01279@MPI-Softtech.Com> Date: Tue, 15 May 2001 09:40:12 -0500 (CDT) From: Dave Leimbach Reply-To: Dave Leimbach Subject: Re: Gettimeofday Again... To: dleimbac@earthlink.net, wes@softweyr.com Cc: freebsd-questions@freebsd.org, arch@freebsd.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: IwjY9TXrjV34s8ShLBR/fg== X-Mailer: dtmail 1.3.0 CDE Version 1.3 SunOS 5.7 sun4u sparc Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This is a valid point.... I will have to find the link that spawned this concern. It was an IBM exploration of gettimeofday on linux vs a comparable function in Windows. People immediately began claiming that linux is 2.5 x faster than Windows. I decided to try the "tests" myself and found that by the results that are generated FreeBSD came up more than 13 times slower than linux. I never admitted that the tests were a good benchmark for anything. I just thought that it was really damned slow :). I think SMP is a way more important topic so long as gettimeofday doesn't get called in thread context switches :).. This was mentioned before and the overhead of gettimeofday could come into play in different areas so I felt it was important to have a fast & accurate one. I have no idea how to implement one so I should probably shut up now :). Dave >Date: Tue, 15 May 2001 08:20:48 -0600 >From: Wes Peters >X-Accept-Language: en >MIME-Version: 1.0 >To: dleimbac@earthlink.net >CC: freebsd-questions@freebsd.org, arch@freebsd.org >Subject: Re: Gettimeofday Again... >Content-Transfer-Encoding: 7bit > >dave wrote: >> >> Well I have been on the IRC in and out of mail list archives and cannot get >> a good answer to this question... >> >> Why does gettimeofday perform so poorly on FreeBSD vs the same hardware on >> Linux 2.4.2? >> >> ---SNIP----- >> #include >> #include >> >> int main() { >> struct timeval tv1, tv2, tv3; >> >> gettimeofday(&tv1, 0); >> gettimeofday(&tv2, 0); >> gettimeofday(&tv3, 0); >> >> printf("Time 1 %d:%d\n", tv1.tv_sec, tv1.tv_usec); >> printf("Time 2 %d:%d\n", tv3.tv_sec, tv3.tv_usec); >> >> } >> ----SNIP----- >> >> I get anywhere from 14usec to 17usec just for the call to gettimeofday. >> >> On the 2.4.2 linux kernel its something like 3usec. >> >> I just want to know why we are so much slower. >> >> I have heard the "caching" argument and it doesn't float very well since >> all I can find is CPU based L1 cache which should also apply to FreeBSD. >> That's principle of locality stuff and more on the hardware side. Since I >> am running on identical hardware I don't see how >> this is possible except for a potential problem/lack of optimization in >> FreeBSD. >> >> Don't get me wrong... I love FreeBSD... and it generally outperforms linux >> everywhere I use it. I just don't understand this huge gap in performance. > >Show us a profile of an application where this is a problem. Premature >optimization is the root of much evil. Would you rather have somebody >spend time on optimizing gettimeofday, or working on SMP or kqueue or >doing a security audit, something with meaning? > >-- > "Where am I, and what am I doing in this handbasket?" > >Wes Peters Softweyr LLC >wes@softweyr.com http://softweyr.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 7:54: 5 2001 Delivered-To: freebsd-arch@freebsd.org Received: from bdr-xcon.matchlogic.com (mail.matchlogic.com [205.216.147.127]) by hub.freebsd.org (Postfix) with ESMTP id 23CDF37B42C; Tue, 15 May 2001 07:54:01 -0700 (PDT) (envelope-from crandall@matchlogic.com) Received: by mail.matchlogic.com with Internet Mail Service (5.5.2653.19) id ; Tue, 15 May 2001 08:53:56 -0600 Message-ID: <5FE9B713CCCDD311A03400508B8B30130828ED85@bdr-xcln.corp.matchlogic.com> From: Charles Randall To: 'Poul-Henning Kamp' , Dave Leimbach Cc: tlambert2@mindspring.com, eischen@vigrid.com, dillon@earth.backplane.com, dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: RE: Gettimeofday Again... Date: Tue, 15 May 2001 08:53:38 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Isn't that what kern.timecounter.method (documented in the microtime() man page) is for? That manual page states that this can be toggled. Charles -----Original Message----- From: Poul-Henning Kamp [mailto:phk@critter.freebsd.dk] Sent: Tuesday, May 15, 2001 8:11 AM To: Dave Leimbach Cc: tlambert2@mindspring.com; eischen@vigrid.com; dillon@earth.backplane.com; dleimbac@earthlink.net; freebsd-questions@FreeBSD.ORG; arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In message <200105151353.IAA27481@MPI-Softtech.Com>, Dave Leimbach writes: >Thanks for all of the great answers... > >For now if I want to see better performance of the timing code I can just >"sysctl -w kern.timecounter.hardware=TSC"?? You can try it, but depending on a lot of stuff it may not A) work nor B) be a good idea if it does. If you want a faster (but less exact!) gettimeofday, change the call to microtime() to getmicrotime() in the gettimeofday() in kern_time.c -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 8:22:42 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 017C837B43C; Tue, 15 May 2001 08:22:37 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f4FFMGp34495; Tue, 15 May 2001 17:22:16 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Charles Randall Cc: Dave Leimbach , tlambert2@mindspring.com, eischen@vigrid.com, dillon@earth.backplane.com, dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: Your message of "Tue, 15 May 2001 08:53:38 MDT." <5FE9B713CCCDD311A03400508B8B30130828ED85@bdr-xcln.corp.matchlogic.com> Date: Tue, 15 May 2001 17:22:16 +0200 Message-ID: <34493.989940136@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <5FE9B713CCCDD311A03400508B8B30130828ED85@bdr-xcln.corp.matchlogic.c om>, Charles Randall writes: >Isn't that what kern.timecounter.method (documented in the microtime() man >page) is for? That manual page states that this can be toggled. No, that was a band-aid for a specific problem. Poul-Henning -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 8:36:48 2001 Delivered-To: freebsd-arch@freebsd.org Received: from netbank.com.br (garrincha.netbank.com.br [200.203.199.88]) by hub.freebsd.org (Postfix) with ESMTP id C233E37B422 for ; Tue, 15 May 2001 08:36:43 -0700 (PDT) (envelope-from riel@conectiva.com.br) Received: from surriel.ddts.net (1-248.ctame701-1.telepar.net.br [200.181.137.248]) by netbank.com.br (Postfix) with ESMTP id 45EB546810; Tue, 15 May 2001 12:32:38 -0300 (BRST) Received: from localhost (msnpvu@localhost [127.0.0.1]) by surriel.ddts.net (8.11.3/8.11.2) with ESMTP id f4FFVLi01316; Tue, 15 May 2001 12:31:25 -0300 Date: Tue, 15 May 2001 12:31:21 -0300 (BRST) From: Rik van Riel X-Sender: riel@imladris.rielhome.conectiva To: Terry Lambert Cc: Matt Dillon , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping In-Reply-To: <3B00CECF.9A3DEEFA@mindspring.com> Message-ID: X-spambait: aardvark@kernelnewbies.org X-spammeplease: aardvark@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Mon, 14 May 2001, Terry Lambert wrote: > Rik van Riel wrote: > > So we should not allow just one single large job to take all > > of memory, but we should allow some small jobs in memory too. > > Historically, this problem is solved with a "working set > quota". This is a great idea for when the system is in-between normal loads and real thrashing. It will save small processes while slowing down memory hogs which are taking resources fairly. I'm not convinced it is any replacement for swapping, but it sure a good way to delay swapping as long as possible. Also, having a working set size guarantee in combination with idle swapping will almost certainly give the proveribial root shell the boost it needs ;) > Doing extremely complicated things is only going to get > you into trouble... in particular, you don't want to > have policy in effect to deal with border load conditions > unless you are under those conditions in the first place. Agreed. > It's possible to do a more complicated working set quota, > which actually applies to a process' working set, instead > of to vnodes, out of context with the process, I guess in FreeBSD a per-vnode approach would be easier to implement while in Linux a per-process working set would be easier... regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 10:24:47 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 210DD37B424 for ; Tue, 15 May 2001 10:24:44 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4FHOYt54576; Tue, 15 May 2001 10:24:34 -0700 (PDT) (envelope-from dillon) Date: Tue, 15 May 2001 10:24:34 -0700 (PDT) From: Matt Dillon Message-Id: <200105151724.f4FHOYt54576@earth.backplane.com> To: Terry Lambert Cc: Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: <3B00CECF.9A3DEEFA@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Rik van Riel wrote: :> So we should not allow just one single large job to take all :> of memory, but we should allow some small jobs in memory too. : :Historically, this problem is solved with a "working set :quota". We have a process-wide working set quota. It's called the 'memoryuse' resource. :... :> 5) have a better chance of getting out of the overload situation :> sooner :> :> I realise this would make the scheduling algorithm slightly :> more complex and I'm not convinced doing this would be worth :> it myself, but we may want to do some brainstorming over this ;) : :A per vnode working set quota with a per use count adjust :would resolve most load thrashing issues. Programs with It most certainly would not. Limiting the number of pages you allow to be 'cached' on a vnode by vnode basis would be a disaster. It has absolutely nothing whatsoever to do with thrashing or thrash-management. It would simply be an artificial limitation based on artificial assumptions that are as likely to be wrong as right. If I've learned anything working on the FreeBSD VM system, it's that the number of assumptions you make in regards to what programs do, how they do it, how much data they should be able to cache, and so forth is directly proportional to how badly you fuck up the paging algorithms. I implemented a special page-recycling algorithm in 4.1/4.2 (which is still there in 4.3). Basically it tries predict when it is possible to throw away pages 'behind' a sequentially accessed file, so as not to allow that file to blow away your cache. E.G. if you have 128M of ram and you are sequentially accessing a 200MB file, obviously there is not much point in trying to cache the data as you read it. But being able to predict something like this is extremely difficult. In fact, nearly impossible. And without being able to make the prediction accurately you simply cannot determine how much data you should try to cache before you begin recycling it. I wound up having to change the algorithm to act more like a heuristic -- it does a rough prediction but doesn't hold the system to it, then allows the page priority mechanism to refine the prediction. But it can take several passes (or non-passes) on the file before the page recycling stabilizes. So the jist of the matter is that FreeBSD (1) already has process-wide working set limitations which are activated when the system is under load, and (2) already has a heuristic that attempts to predict when not to cache pages. Actually several heuristics (a number of which were in place in the original CSRG code). -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 10:37:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 6059637B422; Tue, 15 May 2001 10:37:16 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4FHbBL55271; Tue, 15 May 2001 10:37:11 -0700 (PDT) (envelope-from dillon) Date: Tue, 15 May 2001 10:37:11 -0700 (PDT) From: Matt Dillon Message-Id: <200105151737.f4FHbBL55271@earth.backplane.com> To: Terry Lambert Cc: dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... References: <200105150337.UAA19677@gull.mail.pas.earthlink.net> <200105150346.f4F3kLE45720@earth.backplane.com> <3B00EF40.A1232B75@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :I have an application where gettimeofday() was a significant :fraction of the overhead; it was being used to provide for :marketing eye-candy... basically, squid-compatible proxy :logging that could be processed using common eye-candy tools :normally used against squid logs for transaction in, time :sent to back end, time data came from back end, and time data :sent to client. : :The gettimeofday() calls were _the_ major useless overhead, :until I eliminated them by creating a zero system call version Terry, to be blunt... if you need performance you don't go making superfluous system calls for every transaction. If you want marketing eye candy, all you need to do is a statistical measurement... do fine measurement of 10% of the transactions rather then fine measurement of 100% of the transaction, and you are done. Also, using gettimeofday() is a ridiculous way to measure fine grained time billions of times in production code. I mean, sure, it works... but getitimer() is about 5 times faster. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 17: 6:17 2001 Delivered-To: freebsd-arch@freebsd.org Received: from maila.telia.com (maila.telia.com [194.22.194.231]) by hub.freebsd.org (Postfix) with ESMTP id 0CB5037B423 for ; Tue, 15 May 2001 17:06:03 -0700 (PDT) (envelope-from roger.larsson@norran.net) Received: from there (h164n1fls31o925.telia.com [213.65.254.164]) by maila.telia.com (8.11.2/8.11.0) with SMTP id f4G05fe26435; Wed, 16 May 2001 02:05:41 +0200 (CEST) Message-Id: <200105160005.f4G05fe26435@maila.telia.com> Content-Type: text/plain; charset="iso-8859-1" From: Roger Larsson To: Matt Dillon Subject: Re: on load control / process swapping Date: Wed, 16 May 2001 01:55:13 +0200 X-Mailer: KMail [version 1.2.1] Cc: Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu References: <3B00CECF.9A3DEEFA@mindspring.com> <200105151724.f4FHOYt54576@earth.backplane.com> In-Reply-To: <200105151724.f4FHOYt54576@earth.backplane.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tuesday 15 May 2001 19:24, Matt Dillon wrote: > I implemented a special page-recycling algorithm in 4.1/4.2 (which is > still there in 4.3). Basically it tries predict when it is possible to > throw away pages 'behind' a sequentially accessed file, so as not to > allow that file to blow away your cache. E.G. if you have 128M of ram > and you are sequentially accessing a 200MB file, obviously there is > not much point in trying to cache the data as you read it. > > But being able to predict something like this is extremely difficult. > In fact, nearly impossible. And without being able to make the > prediction accurately you simply cannot determine how much data you > should try to cache before you begin recycling it. I wound up having > to change the algorithm to act more like a heuristic -- it does a rough > prediction but doesn't hold the system to it, then allows the page > priority mechanism to refine the prediction. But it can take several > passes (or non-passes) on the file before the page recycling > stabilizes. > Are the heuristics persistent? Or will the first use after boot use the rough prediction? For how long time will the heuristic stick? Suppose it is suddenly used in a slightly different way. Like two sequential readers instead of one... /RogerL -- Roger Larsson Skellefteå Sweden To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 17:17:13 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id ED3BB37B422 for ; Tue, 15 May 2001 17:17:11 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4G0GwY65956; Tue, 15 May 2001 17:16:58 -0700 (PDT) (envelope-from dillon) Date: Tue, 15 May 2001 17:16:58 -0700 (PDT) From: Matt Dillon Message-Id: <200105160016.f4G0GwY65956@earth.backplane.com> To: Roger Larsson Cc: Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: <3B00CECF.9A3DEEFA@mindspring.com> <200105151724.f4FHOYt54576@earth.backplane.com> <200105160005.f4G05fe26435@maila.telia.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Are the heuristics persistent? :Or will the first use after boot use the rough prediction? :For how long time will the heuristic stick? Suppose it is suddenly used in :a slightly different way. Like two sequential readers instead of one... : :/RogerL :Roger Larsson :Skellefteå :Sweden It's based on the VM page cache, so its adaptive over time. I wouldn't call it persistent, it is nothing more then a simple heuristic that 'normally' throws a page away but 'sometimes' caches it. In otherwords, you lose some performance on the frontend in order to gain some later on. If you loop through a file enough times, most of the file winds up getting cached. It's still experimental so it is only lightly tied into the system. It seems to work, though, so at some point in the future I'll probably try to put some significant prediction in. But as I said, it's a very difficult thing to predict. You can't just put your foot down and say 'I'll cache X amount of file Y'. That doesn't work at all. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 19:16: 8 2001 Delivered-To: freebsd-arch@freebsd.org Received: from technokratis.com (modemcable052.174-202-24.mtl.mc.videotron.ca [24.202.174.52]) by hub.freebsd.org (Postfix) with ESMTP id 9DD6437B423 for ; Tue, 15 May 2001 19:15:54 -0700 (PDT) (envelope-from bmilekic@technokratis.com) Received: (from bmilekic@localhost) by technokratis.com (8.11.3/8.11.3) id f4G2OlO51871; Tue, 15 May 2001 22:24:47 -0400 (EDT) (envelope-from bmilekic) Date: Tue, 15 May 2001 22:24:47 -0400 From: Bosko Milekic To: Matt Dillon Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Mbuf slab [new allocator] Message-ID: <20010515222447.A51806@technokratis.com> References: <20010503195904.A53281@technokratis.com> <200105051833.f45IXiW49096@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105051833.f45IXiW49096@earth.backplane.com>; from dillon@earth.backplane.com on Sat, May 05, 2001 at 11:33:44AM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Sat, May 05, 2001 at 11:33:44AM -0700, Matt Dillon wrote: > > :Hello, > : > : Anyone interested in the mbuf subsystem code should probably read this. > :Others may still read it, but it is somewhat longer than your average Email, > :so consider this a warning. :-) > : Also, although I tried my best to cover most issues here, feel free to let > :me know if I should clarify some points. > : > : Not so long ago, as I'm sure some of you remember, Alfred committed a patch > :... > > Sounds good. You know the motto - first make it work, then make it fast. > > -Matt Ok, I just wrote my last exam today, and I'm back at this. Terry commented at some point regarding the fact that "slab" allocators are not really worth it now-a-days and mentionned something about the Dynix allocator being less "bad." I feel that I have been somewhat imprecise when first talking about the new mb_slab (it shouldn't really be called "mb_slab.") The allocator is based on memcache, which is based on Dynix and Horde (sp?) at least, as far as I have seen from the prior two. Further, I've been looking at the linux implementation of their slab allocator (in 2.4 was it?) and I have also jotted down some neat things. For one thing, I will be changing the way freeing is done in mb_slab right now, and it will likely turn out that freeing in the general case will no longer require lock acquiring at all. So, to put this back into perspective, I appologize for being misleading with the "mb_slab" name and hereby rename the allocator to "mb_malloc." :-) Once I am done with the mb_free() modifications, I'll re-post a new URL with a new version asking for review. I'd like to see the code committed sometime soon, as I think it's worth it (and time to move on), but I can't possibly convince everyone (and am not really up for it, to be honest) right now. If someone happens to have a major objection to the new allocator being introduced, however, I will be glad to provide any information I can, if that is needed. A couple of people have contacted me off-list and brought forward a couple of questions - for which I hope I have provided adequate answers. For reference purposes (and if you actually want to look at the implementation - and possibly offer improvements - the present version of the code is at http://people.freebsd.org/~bmilekic/code/mb_slab/ ). Cheers, -- Bosko Milekic bmilekic@technokratis.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Tue May 15 21:25:36 2001 Delivered-To: freebsd-arch@freebsd.org Received: from wiproecmx1.wipro.com (wiproecmx1.wipro.com [164.164.31.5]) by hub.freebsd.org (Postfix) with ESMTP id 572E237B422 for ; Tue, 15 May 2001 21:25:29 -0700 (PDT) (envelope-from amarnath.jolad@wipro.com) Received: from ecvwall1.wipro.com (ecvwall1.wipro.com [192.168.181.23]) by wiproecmx1.wipro.com (8.9.3/8.9.3) with SMTP id JAA26492 for ; Wed, 16 May 2001 09:45:40 -0500 (GMT) Received: from ecvwall1.wipro.com ([192.168.181.23]) by ecmail.mail.wipro.com (Netscape Messaging Server 4.15) with SMTP id GDEUTD00.D7O for ; Wed, 16 May 2001 09:52:25 +0530 Received: from wipro.com ([192.168.91.47]) by helium.mail.wipro.com (Netscape Messaging Server 4.15) with ESMTP id GDEUQT00.J6R; Wed, 16 May 2001 09:50:53 +0530 Message-ID: <3B02007A.E9257BA2@wipro.com> Date: Wed, 16 May 2001 09:52:18 +0530 From: Amarnath Jolad Reply-To: amarnath.jolad@wipro.com Organization: Wipro Technologies X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 Cc: Roger Larsson , Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Kernel Debugger References: <3B00CECF.9A3DEEFA@mindspring.com> <200105151724.f4FHOYt54576@earth.backplane.com> <200105160005.f4G05fe26435@maila.telia.com> <200105160016.f4G0GwY65956@earth.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hi all, Is there any kernel debugger for linux like adb/crash/kadb. If so, from where can I get them. Thanks in advance. -Amar To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 0:51:19 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 35D9037B422; Wed, 16 May 2001 00:51:11 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA22374; Wed, 16 May 2001 17:50:55 +1000 Date: Wed, 16 May 2001 17:49:37 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Dave Leimbach Cc: dleimbac@earthlink.net, wes@softweyr.com, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <200105151440.JAA01279@MPI-Softtech.Com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 15 May 2001, Dave Leimbach wrote: > I think SMP is a way more important topic so long as gettimeofday doesn't get > called in thread context switches :).. This was mentioned before and the gettimeofday() (actually the internal kernel part of it) certainly does get called in thread context switches. This has a fairly small impact on context switching time. According to lmbench2 for 2 tiny processes (see my previous mail for some details on the machine): Linux 2.2.9 1 usec Linux 2.4.0.something 1 usec FreeBSD-4.0.something 1 usec FreeBSD-current 2 usec microtime() in the context switch takes about 0.4 usec. This accounts for about half of the pessimizations in SMPng according to the above measurements. But this may misleading since most of the times in the above are rounded to the nearest usec. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 0:57:33 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 0220B37B423; Wed, 16 May 2001 00:57:29 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA23073; Wed, 16 May 2001 17:57:07 +1000 Date: Wed, 16 May 2001 17:55:49 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Poul-Henning Kamp Cc: Dave Leimbach , tlambert2@mindspring.com, eischen@vigrid.com, dillon@earth.backplane.com, dleimbac@earthlink.net, freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <33655.989935845@critter> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 15 May 2001, Poul-Henning Kamp wrote: > In message <200105151353.IAA27481@MPI-Softtech.Com>, Dave Leimbach writes: > >Thanks for all of the great answers... > > > >For now if I want to see better performance of the timing code I can just > >"sysctl -w kern.timecounter.hardware=TSC"?? > > You can try it, but depending on a lot of stuff it may not A) work nor > B) be a good idea if it does. > > If you want a faster (but less exact!) gettimeofday, change the > call to microtime() to getmicrotime() in the gettimeofday() in > kern_time.c This would be insignificantly faster unless the hardware counter read by microtime() is slow (e.g., if it is an i8254), since most of the overhead for gettimeofday() is for the syscall and not to actually determine the time (unless the hardware counter is slow). Even in the kernel, getmicrotime() is only a significant optimization for the case where the hardware counter is slow. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 1:23:48 2001 Delivered-To: freebsd-arch@freebsd.org Received: from tisch.mail.mindspring.net (tisch.mail.mindspring.net [207.69.200.157]) by hub.freebsd.org (Postfix) with ESMTP id 6B23737B423 for ; Wed, 16 May 2001 01:23:38 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (pool0141.cvx7-bradley.dialup.earthlink.net [209.178.164.141]) by tisch.mail.mindspring.net (8.9.3/8.8.5) with ESMTP id EAA30610; Wed, 16 May 2001 04:23:27 -0400 (EDT) Message-ID: <3B0238EB.DF435099@mindspring.com> Date: Wed, 16 May 2001 01:23:07 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matt Dillon Cc: Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: <3B00CECF.9A3DEEFA@mindspring.com> <200105151724.f4FHOYt54576@earth.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon wrote: > :> So we should not allow just one single large job to take all > :> of memory, but we should allow some small jobs in memory too. > : > :Historically, this problem is solved with a "working set > :quota". > > We have a process-wide working set quota. It's called > the 'memoryuse' resource. It's not terrifically useful for limiting pageout as a result of excessive demand pagein operations. > :A per vnode working set quota with a per use count adjust > :would resolve most load thrashing issues. Programs with > > It most certainly would not. Limiting the number of pages > you allow to be 'cached' on a vnode by vnode basis would > be a disaster. I don't know whether to believe you, or Dave Cutler... 8-). > It has absolutely nothing whatsoever to do with thrashing > or thrash-management. It would simply be an artificial > limitation based on artificial assumptions that are as > likely to be wrong as right. I have a lot of problems with most of FreeBSD's anti-thrash "protection"; I don't think many people are really running it at a very high load. I think a lot of the "administrative limits" are stupid; in particular, I think it's really dumb to have 70% free resources, and yet enforce administrative limits as if all machines were shell account servers at an ISP where the customers are just waiting for the operators to turn their heads for a second so they can run 10,000 IRC "bots". I also have a problem with the preallocation of contiguous pageable regions of real memory via zalloci() in order to support inpcb and tcpcb structures, which inherently mean that I have to statically preallocate structures for IPs, TCP structures, and sockets, as well as things like file descriptors. In other words, I have to guess the future characteristics of my load, rather than having the OS do the best it can in any given situation. Not to mention the allocation of an entire mbuf per socket. > If I've learned anything working on the FreeBSD VM > system, it's that the number of assumptions you make > in regards to what programs do, how they do it, how > much data they should be able to cache, and so forth > is directly proportional to how badly you fuck up the > paging algorithms. I've personally experienced thrash from a moronic method of implementing "ld", which mmap's all the .o files, and then seeks all over heck, randomly, in order to perform the actual link. It makes that specific operation very fast, at the expense of the rest of the system. The result of this is that everything else on the system gets thrashed out of core, including the X server, and the very simple and intuitive "move mouse, wiggle cursor" breaks, which then breaks the entire paradigm. FreeBSD is succeptible to this problem. So was SVR4 UNIX. The way SVR4 "repaired" the problem was to invent a new scheduling class, "fixed", which would guarantee time slices to the X server. Thus, as fast as "ld" thrashed pages it wasn't interested in out, "X" thrashed them back in. The interactive experience was degraded by the excessive paging. I implemented a different approach in UnixWare 2.x; it didn't end up making it into the main UnixWare source tree (I was barely able to get my /procfs based rfork() into the thing, with the help of some good engineers from NJ); but it was a per vnode working set quota approach. It operated in much the way I described, and it fixed the problem: the only program that got thrashed by "ld" was "ld": everything else on the system had LRU pages present when the needed to run. The "ld" program wasn't affected itself until you started running low on buffer cache. IMO, anything that results in the majority of programs remaining reasonably runnable, and penalizes only the programs making life hell for everyone else, and only kicks in when life is truly starting to go to hell, is a good approach. I really don't care that I got the idea from Dave Cutler's work in VMS, instead of arriving at it on my own (those the per-vnode nature of mine is, I think, an historically unique approach). > I implemented a special page-recycling algorithm in > 4.1/4.2 (which is still there in 4.3). Basically it > tries predict when it is possible to throw away pages > 'behind' a sequentially accessed file, so as not to > allow that file to blow away your cache. E.G. if you > have 128M of ram and you are sequentially accessing a > 200MB file, obviously there is not much point in trying > to cache the data as you read it. IMO, the ability to stream data like this is why Sun, in Solaris 2.8, felt the need to "invent" seperate VM and buffer caches once again -- "everything old is new again". Also, IMO, I feel that the rationale used to justify this decision was poorly defended, and that there are much better implementations one could have -- including simple red queueing for large data sets. It was a cop out on their part, having to do with not setting up simple high and low water marks to keep things like a particular FS or networking subsystem from monopolizing memory. Instead, they now have this artificial divide, where under typical workloads, one pool lies largely fallow (which one depends on the server role). I guess that's not a problem, if your primary after market marked up revenue generation sale item is DRAM... If the code you are referring to is the code that I think it is, I don't think it's useful, except for something like a web server with large objects to serve. Even then, discarding the entire concept of locality of reference when you notice sequential access seems bogus. Realize that average web server service objects are on the order of 10k, not 200M. Realize also the _absolutely disasterous_ effect that code kicking in would have on, for example, an FTP server immediately after the release of FreeBSD ISO images to the net. You would basically not cache that data which is your primary hottest content -- turning virtually assured cache hits into cache misses. > But being able to predict something like this is > extremely difficult. In fact, nearly impossible. I would say that it could be reduced to a stochiastic and iterative process, but (see above), that it would be a terrible idea for all but something like a popular MP3 server... even then, you start discarding useful data under burst loads, and we're back to cache missing. > And without being able to make the prediction > accurately you simply cannot determine how much data > you should try to cache before you begin recycling it. I should think that would be obvious: nearly everything you can, based on locality and number of concurrent references. It's only when you attempt prefetch that it actually becomes complicated; deciding to throw away a clean page later instead of _now_ costs you practically nothing. > So the jist of the matter is that FreeBSD (1) already > has process-wide working set limitations which are > activated when the system is under load, They are largely useless, since they are also active even when the system is not under load, so they act as preemptive drags on performance. They are also (as was pointed out in an earlier thread) _not_ applied to mmap() and other regions, so they are easily subverted. > and (2) already has a heuristic that attempts to predict > when not to cache pages. Actually several heuristics (a > number of which were in place in the original CSRG code). I would argue that the CPU vs. memory vs. disk speed pendulum is moving back the other way, and that it's time to reconsider these algorithms once again. If it's done correctly, they would be adaptive based on knowing the data rate for each given subsystem. We have gigabit NICs these days, which can fully monopolize a PCI bus very easily with few cards -- doing noting but network I/O at burst rate on a 66MHz 64 bit PCI bus, thing max out at 4 cards -- and that's if you can get them to transfer the data directly to each other, with no host intervention being required, which you can't. The fastest memory bus I've seen in Intel calls hardware is 133MHz; at 64 bits, that's twice as fast as the 64bit 66MHz PCI bus. Disks are pig-slow comparatively; in all cases, they're going to be limited to the I/O bus speed anyway, and as rotational speeds have gone up, seek latency has failed to keep pace. Most fast IDE ("multimedia") drives still turn off thermal recalibration in order to keep streaming. I think you need to stress a system -- really stress it, so that you are hitting some hardware limit because of the way FreeBSD uses the hardware -- in order to understand where the real problems in FreeBSD lie. Otherwise, it's just like profiling a program over a tiny workload: the actual cost of servicing real work get lost in the costs associated with initialization. It's pretty obvious from some of the recent bugs I've run into that no one has attempted to open more than 32767 sockets in a production environment using a FreeBSD system. It's also obvious that no one has attempted to have more than 65535 client connections open on a FreeBSD box. There are similar (obvious in retrospect) problems in the routing and other code (what is with the alias requirement for a 255.255.255.255 netmask, for example? Has no one heard of VLANs, without explicit VLAN code?). The upshot is that things are failing to scale under a number of serious stress loads, and rather than defending the past, we should be looking at fixing the problems. I'm personally very happy to have the Linux geeks interested in covering this territory cooperatively with the FreeBSD geeks. We need to be clever about causing scaling problems, and more clever about fixing them, IMO. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 1:28:56 2001 Delivered-To: freebsd-arch@freebsd.org Received: from tisch.mail.mindspring.net (tisch.mail.mindspring.net [207.69.200.157]) by hub.freebsd.org (Postfix) with ESMTP id 96D7837B423; Wed, 16 May 2001 01:28:51 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (pool0141.cvx7-bradley.dialup.earthlink.net [209.178.164.141]) by tisch.mail.mindspring.net (8.9.3/8.8.5) with ESMTP id EAA19088; Wed, 16 May 2001 04:28:48 -0400 (EDT) Message-ID: <3B023A55.D8E21C03@mindspring.com> Date: Wed, 16 May 2001 01:29:09 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matt Dillon Cc: dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... References: <200105150337.UAA19677@gull.mail.pas.earthlink.net> <200105150346.f4F3kLE45720@earth.backplane.com> <3B00EF40.A1232B75@mindspring.com> <200105151737.f4FHbBL55271@earth.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon wrote: > :The gettimeofday() calls were _the_ major useless overhead, > :until I eliminated them by creating a zero system call version > > Terry, to be blunt... if you need performance you don't go > making superfluous system calls for every transaction. If > you want marketing eye candy, all you need to do is a > statistical measurement... do fine measurement of 10% of > the transactions rather then fine measurement of 100% of > the transaction, and you are done. > > Also, using gettimeofday() is a ridiculous way to measure > fine grained time billions of times in production code. I > mean, sure, it works... but getitimer() is about 5 times > faster. I didn't need it for elapsed time measurements, I needed it for log timestamps, so getitimer() would be useless. BTW... to be blunt, gettimeofday() wouldn't be _the_ major useless overhead, if I were making superfluous system calls. ;^). -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 1:37:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 459CD37B422; Wed, 16 May 2001 01:37:09 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id SAA26931; Wed, 16 May 2001 18:37:00 +1000 Date: Wed, 16 May 2001 18:35:42 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Matt Dillon Cc: Terry Lambert , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <200105151737.f4FHbBL55271@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Tue, 15 May 2001, Matt Dillon wrote: > Also, using gettimeofday() is a ridiculous way to measure fine grained > time billions of times in production code. I mean, sure, it works... > but getitimer() is about 5 times faster. This seems unlikely, since most of the overhead for both is in the syscall. Actual testing shows that getitimer() is a whole 10% faster on a Celeron 5.5 * 95 MHz: $ sysctl kern.timecounter.hardware kern.timecounter.hardware: TSC $ sysctl machdep.tsc_freq machdep.tsc_freq: 522493830 $ cat z.c #include #include #include #include int main(void) { struct timeval tv; int i; for (i = 0; i < 1000000; i++) if (gettimeofday(&tv, NULL) != 0) err(1, "gettimeofday"); exit(0); } $ cc -O -o z z.c $ time ./z 2.04 real 0.47 user 1.57 sys $ cat q.c #include #include #include #include int main(void) { struct itimerval it; int i; it.it_interval.tv_sec = 0; it.it_interval.tv_usec = 0; it.it_value.tv_sec = 100; it.it_value.tv_usec = 0; if (setitimer(ITIMER_REAL, &it, NULL) != 0) err(1, "setitimer"); for (i = 0; i < 1000000; i++) if (getitimer(ITIMER_REAL, &it) != 0) err(1, "getitimer"); exit(0); } $ cc -O -o q q.c $ time ./q 1.84 real 0.46 user 1.37 sys This is for my version of -current, which has a few pessimizations in microtime(). Plain -current would be a few nsec faster. OTOH, gettimeofday() may be significantly slower than getitimer() if the hardware timecounter is slow: # sysctl -w kern.timecounter.hardware=i8254 $ sysctl kern.timecounter.hardware kern.timecounter.hardware: i8254 $ time ./z 5.06 real 0.44 user 4.60 sys Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 5:54:59 2001 Delivered-To: freebsd-arch@freebsd.org Received: from scaup.mail.pas.earthlink.net (scaup.mail.pas.earthlink.net [207.217.121.49]) by hub.freebsd.org (Postfix) with ESMTP id BCAA537B422; Wed, 16 May 2001 05:54:54 -0700 (PDT) (envelope-from dleimbac@earthlink.net) Received: from 1Cust72.tnt1.starkville.ms.da.uu.net (1Cust72.tnt1.starkville.ms.da.uu.net [63.30.107.72]) by scaup.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id FAA14005; Wed, 16 May 2001 05:54:53 -0700 (PDT) Message-Id: <200105161254.FAA14005@scaup.mail.pas.earthlink.net> Date: Wed, 16 May 2001 07:57:19 CDT From: dave To: questions@freebsd.org, arch@freebsd.org Subject: python fork call raised my load over 400! Reply-To: dleimbac@earthlink.net X-Mailer: Spruce 0.6.5 for X11 w/smtpio 0.7.9 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG If you have a block of free time today check this out! I keyed this in interactively with Python ----SNIP-------- import os while 1: os.fork() -----SNIP------- This user run program brought my system to a load of 419 with the system using 94% of the resources and 500 user processes on my AMD Duron 800 box with 256MB RAM... I don't know that the processor/RAM is relevant but I could not fork anymore! I started manually killing the processes with ctrl-c and ctrl-d until I could log in as root and killall -9 python. It was not good. The system did NOT crash and all my resources came back after all the python processes were killed. I have a friend who tested the same 3 lines of python code right now on his linux box. He ended up rebooting but he may not have tried to manually kill the processes to get back. My ultimate question is ... should I be comparing FreeBSD to Linux? Does it really matter if Linux is performing better or worse than FreeBSD? Still a user process probably shouldn't be able to hose the whole system IMHO. Dave To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 6: 4:29 2001 Delivered-To: freebsd-arch@freebsd.org Received: from patan.sun.com (patan.Sun.COM [192.18.98.43]) by hub.freebsd.org (Postfix) with ESMTP id D58D837B424; Wed, 16 May 2001 06:04:20 -0700 (PDT) (envelope-from michael.schuster@sun.com) Received: from sun-gy.Germany.Sun.COM ([129.157.128.5]) by patan.sun.com (8.9.3+Sun/8.9.3) with ESMTP id GAA09340; Wed, 16 May 2001 06:04:14 -0700 (PDT) Received: from hacker.Germany.Sun.COM (hacker [129.157.133.195]) by sun-gy.Germany.Sun.COM (8.9.3+Sun/8.9.3/ENSMAIL,v2.1p1) with ESMTP id PAA22818; Wed, 16 May 2001 15:04:12 +0200 (MEST) Received: from sun.com (localhost [127.0.0.1]) by hacker.Germany.Sun.COM (8.9.3+Sun/8.9.3) with ESMTP id PAA18760; Wed, 16 May 2001 15:04:11 +0200 (MEST) Message-ID: <3B027ACB.30F5CF56@sun.com> Date: Wed, 16 May 2001 15:04:11 +0200 From: Michael Schuster Reply-To: questions@FreeBSD.ORG X-Mailer: Mozilla 4.77 [en] (X11; U; SunOS 5.8 sun4u) X-Accept-Language: en MIME-Version: 1.0 To: dleimbac@earthlink.net Cc: questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: python fork call raised my load over 400! References: <200105161254.FAA14005@scaup.mail.pas.earthlink.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG dave wrote: > > If you have a block of free time today check this out! > > I keyed this in interactively with Python > ----SNIP-------- > > import os > > while 1: > os.fork() > -----SNIP------- this is a classical fork bomb, and the system behaved very much as designed. If you're using this to compare Linux to FreeBSD, you'd better reconsider and get yourself proper benchmarks. btw: pls. don't cross-post, questions is quite enough. > This user run program brought my system to a load of 419 with the system > using > 94% of the resources and 500 user processes on my AMD Duron 800 box with > 256MB RAM... of course: every new process needs resources, and as new processes get more CPU share than older ones, the newly forked processes would immediately fork again. > I don't know that the processor/RAM is relevant but I could not fork > anymore! of course you couldn't, you completely filled up your machine are were still doing so - getting a word in egdeways was impossible. > My ultimate question is ... should I be comparing FreeBSD to Linux? > Does it really matter if Linux is performing better or worse than FreeBSD? see above - this about the worst type of "benchmark" I've ever seen. > Still a user process probably shouldn't be able to hose the whole system > IMHO. sorry, that's the way Unix's fair-share scheduler works. for more details, look into "Design and Implementation of 4.4 BSD" HTH Michael -- Michael Schuster / Michael.Schuster@sun.com Sun Microsystems GmbH / (+49 89) 46008-2974 | x62974 Sonnenallee 1, D-85551 Kirchheim-Heimstetten Recursion, n.: see 'Recursion' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 6:40:44 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ringworld.nanolink.com (ringworld.nanolink.com [195.24.48.13]) by hub.freebsd.org (Postfix) with SMTP id 3961E37B424 for ; Wed, 16 May 2001 06:40:40 -0700 (PDT) (envelope-from roam@orbitel.bg) Received: (qmail 30770 invoked by uid 1000); 16 May 2001 13:39:59 -0000 Date: Wed, 16 May 2001 16:39:59 +0300 From: Peter Pentchev To: dave Cc: questions@freebsd.org Subject: Re: python fork call raised my load over 400! Message-ID: <20010516163959.B30670@ringworld.oblivion.bg> References: <200105161254.FAA14005@scaup.mail.pas.earthlink.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105161254.FAA14005@scaup.mail.pas.earthlink.net>; from dleimbac@earthlink.net on Wed, May 16, 2001 at 07:57:19AM -0500 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, May 16, 2001 at 07:57:19AM -0500, dave wrote: [snip description of a classic forkbomb] > I have a friend who tested the same 3 lines of python code right now on his > linux box. He ended up rebooting but he may not have tried to manually > kill the processes to get back. > > My ultimate question is ... should I be comparing FreeBSD to Linux? > Does it really matter if Linux is performing better or worse than FreeBSD? > > Still a user process probably shouldn't be able to hose the whole system > IMHO. (followup to -questions only) It won't be able to, if you define proper limits in /etc/login.conf (for the forking, look at the 'maxproc' limit). G'luck, Peter -- No language can express every thought unambiguously, least of all this one. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 8:18: 3 2001 Delivered-To: freebsd-arch@freebsd.org Received: from bdr-xcon.matchlogic.com (mail.matchlogic.com [205.216.147.127]) by hub.freebsd.org (Postfix) with ESMTP id C5F6F37B422 for ; Wed, 16 May 2001 08:17:58 -0700 (PDT) (envelope-from crandall@matchlogic.com) Received: by mail.matchlogic.com with Internet Mail Service (5.5.2653.19) id ; Wed, 16 May 2001 09:17:46 -0600 Message-ID: <5FE9B713CCCDD311A03400508B8B30130828EDA8@bdr-xcln.corp.matchlogic.com> From: Charles Randall To: 'Matt Dillon' , Roger Larsson Cc: Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: RE: on load control / process swapping Date: Wed, 16 May 2001 09:17:21 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On a related note, we have a process (currently on Solaris, but = possibly moving to FreeBSD) that reads a 26 GB file just once for a database = load. On Solaris, we use the directio() function call to tell the filesystem to bypass the buffer cache for this file descriptor. From the Solaris directio() man page, DIRECTIO_ON The system behaves as though the application is not going to reuse the file data in the near future. In other words, the file data is not cached in the system's memory pages. We found that without this, Solaris was aggressively trying to cache = the huge input file at the expense of database load performance (but we = knew that we'd never access it again). For some applications this is a huge = win (random I/O on a file much larger than memory seems to be another = case). Would there be an advantage to having a similar feature in FreeBSD (if = not already present)? -Charles -----Original Message----- From: Matt Dillon [mailto:dillon@earth.backplane.com] Sent: Tuesday, May 15, 2001 6:17 PM To: Roger Larsson Cc: Rik van Riel; arch@FreeBSD.ORG; linux-mm@kvack.org; sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping :Are the heuristics persistent?=20 :Or will the first use after boot use the rough prediction?=20 :For how long time will the heuristic stick? Suppose it is suddenly = used in :a slightly different way. Like two sequential readers instead of = one... : :/RogerL :Roger Larsson :Skellefte=E5 :Sweden It's based on the VM page cache, so its adaptive over time. I = wouldn't call it persistent, it is nothing more then a simple heuristic that 'normally' throws a page away but 'sometimes' caches it. In = otherwords, you lose some performance on the frontend in order to gain some = later on. If you loop through a file enough times, most of the file winds up getting cached. It's still experimental so it is only lightly tied into the system. It seems to work, though, so at some point in the future I'll probably try to put some significant = prediction in. But as I said, it's a very difficult thing to predict. You = can't just put your foot down and say 'I'll cache X amount of file Y'. = That doesn't work at all. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 10:14:25 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 2004937B424 for ; Wed, 16 May 2001 10:14:23 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GHEFs72217; Wed, 16 May 2001 10:14:15 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 10:14:15 -0700 (PDT) From: Matt Dillon Message-Id: <200105161714.f4GHEFs72217@earth.backplane.com> To: Charles Randall Cc: Roger Larsson , Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: RE: on load control / process swapping References: <5FE9B713CCCDD311A03400508B8B30130828EDA8@bdr-xcln.corp.matchlogic.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :We found that without this, Solaris was aggressively trying to cache the :huge input file at the expense of database load performance (but we knew :that we'd never access it again). For some applications this is a huge win :(random I/O on a file much larger than memory seems to be another case). : :Would there be an advantage to having a similar feature in FreeBSD (if not :already present)? : :-Charles We've talked about implementing O_DIRECT. I think it's a good idea. In regards to the particular case of scanning a huge multi-gigabyte file, FreeBSD has a sequential detection heuristic which does a pretty good job preventing cache blow-aways by depressing the priority of the data as it is read or written. FreeBSD will still try to cache a good chunk, but it won't sacrifice all available memory. If you access the data via the VM system, through mmap, you get even more control through the madvise() syscall. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 10:26:53 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 3F4FD37B422 for ; Wed, 16 May 2001 10:26:50 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GHQg472438; Wed, 16 May 2001 10:26:42 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 10:26:42 -0700 (PDT) From: Matt Dillon Message-Id: <200105161726.f4GHQg472438@earth.backplane.com> To: Terry Lambert Cc: Rik van Riel , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: <3B00CECF.9A3DEEFA@mindspring.com> <200105151724.f4FHOYt54576@earth.backplane.com> <3B0238EB.DF435099@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :I think a lot of the "administrative limits" are stupid; :in particular, I think it's really dumb to have 70% free :resources, and yet enforce administrative limits as if all :... The 'memoryuse' resource limit is not enforced unless the system is under memory pressure. :... :> And without being able to make the prediction :> accurately you simply cannot determine how much data :> you should try to cache before you begin recycling it. : :I should think that would be obvious: nearly everything :you can, based on locality and number of concurrent :references. It's only when you attempt prefetch that it :actually becomes complicated; deciding to throw away a :clean page later instead of _now_ costs you practically :nothing. :... Prefetching has nothing to do with what we've been talking about. We don't have a problem caching prefetched pages that aren't used. The problem we have is determining when to throw away data once it has been used by a program. :... :> So the jist of the matter is that FreeBSD (1) already :> has process-wide working set limitations which are :> activated when the system is under load, : :They are largely useless, since they are also active even :when the system is not under load, so they act as preemptive :... This is not true. Who told you this? This is absolutely not true. :drags on performance. They are also (as was pointed out in :an earlier thread) _not_ applied to mmap() and other regions, :so they are easily subverted. :... : :-- Terry : This is not true. The 'memoryuse' limit applies to all in-core pages associated with the process, whether mmap()'d or not. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 10:41:52 2001 Delivered-To: freebsd-arch@freebsd.org Received: from perninha.conectiva.com.br (perninha.conectiva.com.br [200.250.58.156]) by hub.freebsd.org (Postfix) with ESMTP id 939C437B422 for ; Wed, 16 May 2001 10:41:42 -0700 (PDT) (envelope-from riel@conectiva.com.br) Received: from burns.conectiva (burns.conectiva [10.0.0.4]) by perninha.conectiva.com.br (Postfix) with SMTP id 7BB1C16B50 for ; Wed, 16 May 2001 14:41:35 -0300 (EST) Received: (qmail 25576 invoked by uid 0); 16 May 2001 17:40:13 -0000 Received: from duckman.distro.conectiva (HELO duckman.conectiva.com.br) (root@10.0.17.2) by burns.conectiva with SMTP; 16 May 2001 17:40:13 -0000 Received: from localhost (riel@localhost) by duckman.conectiva.com.br (8.11.3/8.11.3) with ESMTP id f4GHfZq16610; Wed, 16 May 2001 14:41:35 -0300 X-Authentication-Warning: duckman.distro.conectiva: riel owned process doing -bs Date: Wed, 16 May 2001 14:41:35 -0300 (BRST) From: Rik van Riel X-X-Sender: To: Matt Dillon Cc: Charles Randall , Roger Larsson , , , Subject: Re: RE: on load control / process swapping In-Reply-To: <200105161714.f4GHEFs72217@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 16 May 2001, Matt Dillon wrote: > In regards to the particular case of scanning a huge multi-gigabyte > file, FreeBSD has a sequential detection heuristic which does a > pretty good job preventing cache blow-aways by depressing the priority > of the data as it is read or written. FreeBSD will still try to cache > a good chunk, but it won't sacrifice all available memory. If you > access the data via the VM system, through mmap, you get even more > control through the madvise() syscall. There's one thing "wrong" with the drop-behind idea though; it penalises data even when it's still in core and we're reading it for the second or third time. Maybe it would be better to only do drop-behind when we're actually allocating new memory for the vnode in question and let re-use of already present memory go "unpunished" ? Hmmm, now that I think about this more, it _could_ introduce some different fairness issues. Darn ;) regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 10:43:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 1854D37B42C; Wed, 16 May 2001 10:43:19 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GHhEl72847; Wed, 16 May 2001 10:43:14 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 10:43:14 -0700 (PDT) From: Matt Dillon Message-Id: <200105161743.f4GHhEl72847@earth.backplane.com> To: Bruce Evans Cc: Terry Lambert , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... References: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :On Tue, 15 May 2001, Matt Dillon wrote: : :> Also, using gettimeofday() is a ridiculous way to measure fine grained :> time billions of times in production code. I mean, sure, it works... :> but getitimer() is about 5 times faster. : :(Bruce) : :This seems unlikely, since most of the overhead for both is in the syscall. :Actual testing shows that getitimer() is a whole 10% faster on a Celeron :5.5 * 95 MHz: I just ran the test using the default timecounter on a 4.3 box (P3). timercounter.method was 0, timecounter.hardware was i8254. In that case the itimer was about 4 times faster. So this was using the 'slow' itimer as you indicate below. I don't change the timercounter method defaults, and I sure hope you aren't advocating that people change their timecounter defaults. If the TSC is a reasonable default, the system should figure it out and use it without requiring intervention. But, be that as it may, Terry's argument doesn't hold water. Logging and performance just don't go together if you are tring to log thousands of connections/sec, so Terry deserves whatever hell he's gotten himself into and shouldn't blame gettimeofday() for his problems. If someone needs high performance logging, there are plenty of ways of offloading it. A network monitor would be my first choice. You could also shove a UDP packet off to another machine for each connection, etc etc etc. Personally speaking, I don't give a damn about contrived cpu-intensive performance figures because if cpu were my only problem the solution would be to simply throw another machine into the rack. You have to look at these things from a revenue/price/performance point of view. If I have, say, a webserver, which is cpu-bound serving a thousand pages a second, and each of those pages is worth a fraction of a cent to my bottom line, then the cost of purchasing another machine to help with the load is negligible compared to the revenue each machine is generating. And if it isn't someone has fucked the business plan and the algorithms up and needs to go fix them. It's that simple. -Matt :This is for my version of -current, which has a few pessimizations in :microtime(). Plain -current would be a few nsec faster. : :OTOH, gettimeofday() may be significantly slower than getitimer() if the :hardware timecounter is slow: :... :(Bruce) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 10:54:28 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 097F037B422 for ; Wed, 16 May 2001 10:54:20 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GHsCd73025; Wed, 16 May 2001 10:54:12 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 10:54:12 -0700 (PDT) From: Matt Dillon Message-Id: <200105161754.f4GHsCd73025@earth.backplane.com> To: Rik van Riel Cc: Charles Randall , Roger Larsson , , , Subject: Re: RE: on load control / process swapping References: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :There's one thing "wrong" with the drop-behind idea though; :it penalises data even when it's still in core and we're :reading it for the second or third time. It's not dropping the data, it's dropping the priority. And yes, it does penalize the data somewhat. On the otherhand if the data happens to still be in the cache and you scan it a second time, the page priority gets bumped up relative to what it already was so the net effect is that the data becomes high priority after a few passes. :Maybe it would be better to only do drop-behind when we're :actually allocating new memory for the vnode in question and :let re-use of already present memory go "unpunished" ? You get an equivalent effect even without dropping the priority, because you blow away prior pages when reading a file that is larger then main memory so they don't exist at all when you re-read. But you do not get the expected 'recycling' characteristics verses the rest of the system if you do not make a distinction between sequential and random access. You want to slightly depress the priority behind a sequential access because the 'cost' of re-reading the disk sequentially is nothing compared to the cost of re-reading the disk randomly (by about a 30:1 ratio!). So keeping randomly seek/read data is more important by degrees then keeping sequentially read data. This isn't to say that it isn't important to try to cache sequentially read data, just that the cost of throwing away sequentially read data is much lower then the cost of throwing away randomly read data on a general purpose machine. Terry's description of 'ld' mmap()ing and doing all sorts of random seeking causing most UNIXes, including FreeBSD, to have a brainfart of the dataset is too big to fit in the cache is true as far as it goes, but there really isn't much we can do about that situation 'automatically'. Without hints, the system can't predict the fact that it should be trying to cache the whole of the object files being accessed randomly. A hint could make performance much better... a simple madvise(... MADV_SEQUENTIAL) on the mapped memory inside LD would probably be beneficial, as would madvise(... MADV_WILLNEED). -Matt :Hmmm, now that I think about this more, it _could_ introduce :some different fairness issues. Darn ;) : :regards, : :Rik To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 10:57:52 2001 Delivered-To: freebsd-arch@freebsd.org Received: from superconductor.rush.net (superconductor.rush.net [208.9.155.8]) by hub.freebsd.org (Postfix) with ESMTP id 2E40937B422 for ; Wed, 16 May 2001 10:57:49 -0700 (PDT) (envelope-from bright@superconductor.rush.net) Received: (from bright@localhost) by superconductor.rush.net (8.11.2/8.11.2) id f4GHv8I23828; Wed, 16 May 2001 13:57:08 -0400 (EDT) Date: Wed, 16 May 2001 13:57:07 -0400 From: Alfred Perlstein To: Rik van Riel Cc: Matt Dillon , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping Message-ID: <20010516135707.H12365@superconductor.rush.net> References: <200105161714.f4GHEFs72217@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0us In-Reply-To: ; from riel@conectiva.com.br on Wed, May 16, 2001 at 02:41:35PM -0300 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Rik van Riel [010516 13:42] wrote: > On Wed, 16 May 2001, Matt Dillon wrote: > > > In regards to the particular case of scanning a huge multi-gigabyte > > file, FreeBSD has a sequential detection heuristic which does a > > pretty good job preventing cache blow-aways by depressing the priority > > of the data as it is read or written. FreeBSD will still try to cache > > a good chunk, but it won't sacrifice all available memory. If you > > access the data via the VM system, through mmap, you get even more > > control through the madvise() syscall. > > There's one thing "wrong" with the drop-behind idea though; > it penalises data even when it's still in core and we're > reading it for the second or third time. > > Maybe it would be better to only do drop-behind when we're > actually allocating new memory for the vnode in question and > let re-use of already present memory go "unpunished" ? > > Hmmm, now that I think about this more, it _could_ introduce > some different fairness issues. Darn ;) Both of you guys are missing the point. The directio interface is meant to reduce the stress of a large seqential operation on a file where caching is of no use. Even if you depress the worthyness of the pages you've still blown rather large amounts of unrelated data out of the cache in order to allocate new cacheable pages. A simple solution would involve passing along flags such that if the IO occurs to a non-previously-cached page the buf/page is immediately placed on the free list upon completion. That way the next IO can pull the now useless bufferspace from the freelist. Basically you add another buffer queue for "throw away" data that exists as a "barely cached" queue. This way your normal data doesn't compete on the LRU with non-cached data. As a hack one it looks like one could use the QUEUE_EMPTYKVA buffer queue under FreeBSD for this, however I think one might loose the minimal amount of caching that could be done. If the direct IO happens to a page that's previously cached you adhere to the previous behavior. A more fancy approach might map in user pages into the kernel to do the IO directly, however on large MP this may cause pain because the vm may need to issue ipi to invalidate tlb entries. It's quite simple in theory, the hard part is the code. -Alfred Perlstein -- Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom. http://www.egr.unlv.edu/~slumos/on-netbsd.html To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 11: 1:37 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 1B1EF37B423 for ; Wed, 16 May 2001 11:01:34 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GI1Oc73283; Wed, 16 May 2001 11:01:24 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 11:01:24 -0700 (PDT) From: Matt Dillon Message-Id: <200105161801.f4GI1Oc73283@earth.backplane.com> To: Alfred Perlstein Cc: Rik van Riel , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: <200105161714.f4GHEFs72217@earth.backplane.com> <20010516135707.H12365@superconductor.rush.net> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :Both of you guys are missing the point. : :The directio interface is meant to reduce the stress of a large :seqential operation on a file where caching is of no use. : :Even if you depress the worthyness of the pages you've still :blown rather large amounts of unrelated data out of the cache :in order to allocate new cacheable pages. : :A simple solution would involve passing along flags such that if :the IO occurs to a non-previously-cached page the buf/page is :immediately placed on the free list upon completion. That way the :next IO can pull the now useless bufferspace from the freelist. : :Basically you add another buffer queue for "throw away" data that :exists as a "barely cached" queue. This way your normal data :doesn't compete on the LRU with non-cached data. : :As a hack one it looks like one could use the QUEUE_EMPTYKVA :buffer queue under FreeBSD for this, however I think one might :loose the minimal amount of caching that could be done. : :If the direct IO happens to a page that's previously cached :you adhere to the previous behavior. : :A more fancy approach might map in user pages into the kernel to :do the IO directly, however on large MP this may cause pain because :the vm may need to issue ipi to invalidate tlb entries. : :It's quite simple in theory, the hard part is the code. : :-Alfred Perlstein I think someone tried to implement O_DIRECT a while back, but it was fairly complex to try to do away with caching entirely. I think our best bet to 'start' an implementation of O_DIRECT is to support the flag in open() and fcntl(), and have it simply modify the sequential detection heuristic to throw away pages and buffers rather then simply depressing their priority. Eventually we can implement the direct-I/O piece of the equation. I could do this first part in an hour, I think. When I get home.... -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 11:11: 2 2001 Delivered-To: freebsd-arch@freebsd.org Received: from superconductor.rush.net (superconductor.rush.net [208.9.155.8]) by hub.freebsd.org (Postfix) with ESMTP id 6952037B424 for ; Wed, 16 May 2001 11:10:56 -0700 (PDT) (envelope-from bright@superconductor.rush.net) Received: (from bright@localhost) by superconductor.rush.net (8.11.2/8.11.2) id f4GIAir28578; Wed, 16 May 2001 14:10:44 -0400 (EDT) Date: Wed, 16 May 2001 14:10:43 -0400 From: Alfred Perlstein To: Matt Dillon Cc: Rik van Riel , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping Message-ID: <20010516141042.I12365@superconductor.rush.net> References: <200105161714.f4GHEFs72217@earth.backplane.com> <20010516135707.H12365@superconductor.rush.net> <200105161801.f4GI1Oc73283@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0us In-Reply-To: <200105161801.f4GI1Oc73283@earth.backplane.com>; from dillon@earth.backplane.com on Wed, May 16, 2001 at 11:01:24AM -0700 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Matt Dillon [010516 14:01] wrote: > > I think someone tried to implement O_DIRECT a while back, but it > was fairly complex to try to do away with caching entirely. > > I think our best bet to 'start' an implementation of O_DIRECT is > to support the flag in open() and fcntl(), and have it simply > modify the sequential detection heuristic to throw away pages > and buffers rather then simply depressing their priority. yes, as i said: > :A simple solution would involve passing along flags such that if > :the IO occurs to a non-previously-cached page the buf/page is > :immediately placed on the free list upon completion. That way the > :next IO can pull the now useless bufferspace from the freelist. > : > :Basically you add another buffer queue for "throw away" data that > :exists as a "barely cached" queue. This way your normal data > :doesn't compete on the LRU with non-cached data. > > Eventually we can implement the direct-I/O piece of the equation. > > I could do this first part in an hour, I think. When I get home.... Thank you. -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 11:58:32 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp017.mail.yahoo.com (smtp017.mail.yahoo.com [216.136.174.114]) by hub.freebsd.org (Postfix) with SMTP id 5AFBC37B42C for ; Wed, 16 May 2001 11:58:25 -0700 (PDT) (envelope-from fbsdq@yahoo.com) Received: from h2.impactidealsolutions.com (HELO support10) (216.98.200.91) by smtp.mail.vip.sc5.yahoo.com with SMTP; 16 May 2001 18:58:25 -0000 X-Apparently-From: Message-Id: Date: Wed, 16 May 2001 13:01:57 -0600 X-Priority: 3 From: Peter X-Mailer: Mail Warrior To: michael.schuster@sun.com, "questions@FreeBSD.ORG" , "questions@FreeBSD.ORG" , "arch@FreeBSD.ORG" MIME-Version: 1.0 Subject: Re: python fork call raised my load over 400! Content-Type: Text/Plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8Bit X-Mailer-Version: v3.57 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG . . . .|> Still a user process probably shouldn't be able to hose the whole system . . . .|> IMHO. . . . .| . . . .|sorry, that's the way Unix's fair-share scheduler works. Isn't that what user limits are for? man login.conf [I think -- Never could get it to work properly, but then again I'm the only one using my system]. On 05/16/2001 7:04:11 AM, Michael Schuster is quoted as saying: . . . .|dave wrote: . . . .|> . . . .|> If you have a block of free time today check this out! . . . .|> . . . .|> I keyed this in interactively with Python . . . .|> ----SNIP-------- . . . .|> . . . .|> import os . . . .|> . . . .|> while 1: . . . .|> os.fork() . . . .|> -----SNIP------- . . . .| . . . .|this is a classical fork bomb, and the system behaved very much as . . . .|designed. If you're using this to compare Linux to FreeBSD, you'd better . . . .|reconsider and get yourself proper benchmarks. . . . .| . . . .|btw: pls. don't cross-post, questions is quite enough. . . . .| . . . .| . . . .|> This user run program brought my system to a load of 419 with the system . . . .|> using . . . .|> 94% of the resources and 500 user processes on my AMD Duron 800 box with . . . .|> 256MB RAM... . . . .| . . . .|of course: every new process needs resources, and as new processes get more . . . .|CPU share than older ones, the newly forked processes would immediately . . . .|fork again. . . . .| . . . .|> I don't know that the processor/RAM is relevant but I could not fork . . . .|> anymore! . . . .| . . . .|of course you couldn't, you completely filled up your machine are were . . . .|still doing so - getting a word in egdeways was impossible. . . . .| . . . .|> My ultimate question is ... should I be comparing FreeBSD to Linux? . . . .|> Does it really matter if Linux is performing better or worse than FreeBSD? . . . .| . . . .|see above - this about the worst type of "benchmark" I've ever seen. . . . .| . . . .|> Still a user process probably shouldn't be able to hose the whole system . . . .|> IMHO. . . . .| . . . .|sorry, that's the way Unix's fair-share scheduler works. . . . .| . . . .|for more details, look into "Design and Implementation of 4.4 BSD" . . . .| . . . .|HTH . . . .|Michael . . . .|-- . . . .|Michael Schuster / Michael.Schuster@sun.com . . . .|Sun Microsystems GmbH / (+49 89) 46008-2974 | x62974 . . . .|Sonnenallee 1, D-85551 Kirchheim-Heimstetten . . . .| . . . .|Recursion, n.: see 'Recursion' . . . .| . . . .|To Unsubscribe: send mail to majordomo@FreeBSD.org . . . .|with "unsubscribe freebsd-questions" in the body of the message www.nul.cjb.net www.FreeBSD.org _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 13:18:59 2001 Delivered-To: freebsd-arch@freebsd.org Received: from midten.fast.no (midten.fast.no [213.188.8.11]) by hub.freebsd.org (Postfix) with ESMTP id AE34F37B424 for ; Wed, 16 May 2001 13:18:41 -0700 (PDT) (envelope-from Tor.Egge@fast.no) Received: from fast.no (IDENT:tegge@midten.fast.no [213.188.8.11]) by midten.fast.no (8.9.3/8.9.3) with ESMTP id WAA99982; Wed, 16 May 2001 22:18:25 +0200 (CEST) Message-Id: <200105162018.WAA99982@midten.fast.no> To: dillon@earth.backplane.com Cc: arch@FreeBSD.ORG Subject: Re: on load control / process swapping From: Tor.Egge@fast.no In-Reply-To: Your message of "Wed, 16 May 2001 11:01:24 -0700 (PDT)" References: <200105161801.f4GI1Oc73283@earth.backplane.com> X-Mailer: Mew version 1.70 on Emacs 19.34.1 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Wed, 16 May 2001 22:18:25 +0200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I think someone tried to implement O_DIRECT a while back, but it > was fairly complex to try to do away with caching entirely. > > I think our best bet to 'start' an implementation of O_DIRECT is > to support the flag in open() and fcntl(), and have it simply > modify the sequential detection heuristic to throw away pages > and buffers rather then simply depressing their priority. > > Eventually we can implement the direct-I/O piece of the equation. > > I could do this first part in an hour, I think. When I get home.... I've used something like the following patch since FreeBSD 3.3-STABLE. On a Dell 2450 machine running a FreeBSD 4.3-RELEASE SMP kernel it increases idle time from 0% to 95% when running a test program with 100 threads that each reads 256K from random sector aligned locations in a 10 GB file. Read speed is increased from 120 MB/s to 160 MB/s. This implementation is not semantically correct since it doesn't check for dirty pages in the vm object. Index: sys/sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.150 diff -u -r1.150 vnode.h --- sys/sys/vnode.h 2001/05/01 08:34:44 1.150 +++ sys/sys/vnode.h 2001/05/09 16:09:32 @@ -220,6 +220,7 @@ #define IO_VMIO 0x20 /* data already in VMIO space */ #define IO_INVAL 0x40 /* invalidate after I/O */ #define IO_ASYNC 0x80 /* bawrite rather then bdwrite */ +#define IO_NOBUFFER 0x100 /* bypass buffer cache */ /* * Modes. Some values same as Ixxx entries from inode.h for now. Index: sys/sys/file.h =================================================================== RCS file: /home/ncvs/src/sys/sys/file.h,v retrieving revision 1.28 diff -u -r1.28 file.h --- sys/sys/file.h 2001/02/15 16:34:10 1.28 +++ sys/sys/file.h 2001/02/15 19:14:53 @@ -56,7 +56,7 @@ */ struct file { LIST_ENTRY(file) f_list;/* list of active files */ - short f_flag; /* see fcntl.h */ + int f_flag; /* see fcntl.h */ #define DTYPE_VNODE 1 /* file */ #define DTYPE_SOCKET 2 /* communications endpoint */ #define DTYPE_PIPE 3 /* pipe */ Index: sys/sys/fcntl.h =================================================================== RCS file: /home/ncvs/src/sys/sys/fcntl.h,v retrieving revision 1.10 diff -u -r1.10 fcntl.h --- sys/sys/fcntl.h 2000/04/22 15:22:21 1.10 +++ sys/sys/fcntl.h 2000/04/25 19:33:55 @@ -98,15 +98,18 @@ /* Defined by POSIX 1003.1; BSD default, but must be distinct from O_RDONLY. */ #define O_NOCTTY 0x8000 /* don't assign controlling terminal */ +/* Bypass buffer cache */ +#define O_DIRECT 0x00010000 + #ifdef _KERNEL /* convert from open() flags to/from fflags; convert O_RD/WR to FREAD/FWRITE */ #define FFLAGS(oflags) ((oflags) + 1) #define OFLAGS(fflags) ((fflags) - 1) /* bits to save after open */ -#define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK) +#define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK|O_DIRECT) /* bits settable by fcntl(F_SETFL, ...) */ -#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM) +#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM|O_DIRECT) #endif /* Index: sys/kern/vfs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_vnops.c,v retrieving revision 1.116 diff -u -r1.116 vfs_vnops.c --- sys/kern/vfs_vnops.c 2001/04/29 02:44:49 1.116 +++ sys/kern/vfs_vnops.c 2001/05/09 16:09:02 @@ -352,6 +360,10 @@ ioflag = 0; if (fp->f_flag & FNONBLOCK) ioflag |= IO_NDELAY; +#ifdef DIRECTIO + if (fp->f_flag & O_DIRECT) + ioflag |= IO_NOBUFFER; +#endif VOP_LEASE(vp, p, cred, LEASE_READ); vn_lock(vp, LK_SHARED | LK_NOPAUSE | LK_RETRY, p); if ((flags & FOF_OFFSET) == 0) Index: sys/ufs/ufs/ufs_readwrite.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_readwrite.c,v retrieving revision 1.77 diff -u -r1.77 ufs_readwrite.c --- sys/ufs/ufs/ufs_readwrite.c 2001/05/01 08:34:45 1.77 +++ sys/ufs/ufs/ufs_readwrite.c 2001/05/09 16:09:33 @@ -42,6 +42,12 @@ #define WRITE ffs_write #define WRITE_S "ffs_write" +#ifdef DIRECTIO +extern int allowrawread; +extern int ffs_rawread __P((struct vnode *vp, + struct uio *uio)); +#endif + #include #include #include @@ -86,6 +92,14 @@ mode = ip->i_mode; uio = ap->a_uio; ioflag = ap->a_ioflag; +#ifdef DIRECTIO + if ((ioflag & IO_NOBUFFER) != 0 && allowrawread != 0 && + uio->uio_iovcnt == 1 && + (uio->uio_offset & (DEV_BSIZE - 1)) == 0 && + (uio->uio_resid & (DEV_BSIZE - 1)) == 0 && + uio->uio_resid == uio->uio_iov->iov_len) + return ffs_rawread(vp, uio); +#endif #ifdef DIAGNOSTIC if (uio->uio_rw != UIO_READ) @@ -251,7 +265,7 @@ * doing sequential access. */ error = cluster_read(vp, ip->i_size, lbn, - size, NOCRED, uio->uio_resid, seqcount, &bp); + size, NOCRED, blkoffset + uio->uio_resid, seqcount, &bp); else if (seqcount > 1) { /* * If we are NOT allowed to cluster, then --- /dev/null Wed May 16 21:49:24 2001 +++ sys/ufs/ufs/ufs_rawread.c Sun Nov 26 06:01:31 2000 @@ -0,0 +1,307 @@ +/*- + * Copyright (c) 2000 Tor Egge + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD:$ + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +static int ffs_rawread_readahead __P((struct vnode *vp, + caddr_t udata, + off_t offset, + size_t len, + struct proc *p, + struct buf *bp, + caddr_t sa)); +int ffs_rawread __P((struct vnode *vp, + struct uio *uio)); + +static void ffs_rawreadwakeup __P((struct buf *bp)); + + +static int rawbufcnt = 350; +SYSCTL_INT(_debug, OID_AUTO, rawbufcnt, CTLFLAG_RD, &rawbufcnt, 0, ""); + +unsigned long allowrawread = 1; +SYSCTL_INT(_debug, OID_AUTO, allowrawread, CTLFLAG_RW, &allowrawread, 0, ""); + +static unsigned long rawreadahead = 1; +SYSCTL_INT(_debug, OID_AUTO, rawreadahead, CTLFLAG_RW, &rawreadahead, 0, ""); + +static int +ffs_rawread_readahead(vp, udata, offset, len, p, bp, sa) + struct vnode *vp; + caddr_t udata; + off_t offset; + size_t len; + struct proc *p; + struct buf *bp; + caddr_t sa; +{ + int error; + u_int iolen; + off_t blockno; + int blockoff; + int bsize; + struct vnode *dp; + int bforwards; + + bsize = vp->v_mount->mnt_stat.f_iosize; + + iolen = ((vm_offset_t) udata) & PAGE_MASK; + bp->b_bcount = len; + if (bp->b_bcount + iolen > bp->b_kvasize) { + bp->b_bcount = bp->b_kvasize; + if (iolen != 0) + bp->b_bcount -= PAGE_SIZE; + } + bp->b_flags = B_PHYS; + bp->b_iocmd = BIO_READ; + bp->b_iodone = ffs_rawreadwakeup; + bp->b_data = udata; + bp->b_saveaddr = sa; + bp->b_offset = offset; + blockno = bp->b_offset / bsize; + blockoff = (bp->b_offset % bsize) / DEV_BSIZE; + if ((daddr_t) blockno != blockno) { + return EINVAL; /* blockno overflow */ + } + + bp->b_lblkno = bp->b_blkno = blockno; + if (!useracc(bp->b_data, bp->b_bcount, VM_PROT_WRITE)) { + return EFAULT; + } + + error = VOP_BMAP(vp, bp->b_lblkno, &dp, &bp->b_blkno, &bforwards, + NULL); + if (error != 0) { + return error; + } + + if (bp->b_bcount + blockoff * DEV_BSIZE > bsize * (1 + bforwards)) + bp->b_bcount = bsize * (1 + bforwards) - blockoff * DEV_BSIZE; + bp->b_bufsize = bp->b_bcount; + bp->b_blkno += blockoff; + bp->b_dev = dp->v_rdev; + + vmapbuf(bp); + + (void) VOP_STRATEGY(dp, bp); + return 0; +} + +int +ffs_rawread(vp, uio) + struct vnode *vp; + struct uio *uio; +{ + int error, nerror; + struct buf *bp, *nbp, *tbp; + caddr_t sa, nsa, tsa; + u_int iolen; + int spl; + caddr_t udata; + long resid; + off_t offset; + struct proc *p; + + udata = uio->uio_iov->iov_base; + resid = uio->uio_resid; + offset = uio->uio_offset; + p = uio->uio_procp ? uio->uio_procp : curproc; + + if ((offset % DEV_BSIZE) != 0 || (resid % DEV_BSIZE) != 0) + return EINVAL; + + /* + * keep the process from being swapped + */ + PHOLD(p); + + error = 0; + nerror = 0; + + bp = NULL; + nbp = NULL; + sa = NULL; + nsa = NULL; + + while (resid > 0) { + + if (bp == NULL) { /* Setup first read */ + /* XXX: Leave some bufs for swap */ + bp = getpbuf(&rawbufcnt); + sa = bp->b_data; + bp->b_vp = vp; + bp->b_error = 0; + error = ffs_rawread_readahead(vp, udata, offset, + resid, p, bp, sa); + if (error != 0) + break; + + if (resid > bp->b_bufsize) { /* Setup fist readahead */ + /* XXX: Leave bufs for swap */ + if (rawreadahead != 0) + nbp = trypbuf(&rawbufcnt); + else + nbp = NULL; + if (nbp != NULL) { + nsa = nbp->b_data; + nbp->b_vp = vp; + nbp->b_error = 0; + + nerror = ffs_rawread_readahead(vp, + udata + + bp->b_bufsize, + offset + + bp->b_bufsize, + resid - + bp->b_bufsize, + p, + nbp, + nsa); + if (nerror) { + relpbuf(nbp, &rawbufcnt); + nbp = NULL; + } + } + } + } + + spl = splbio(); + while ((bp->b_flags & B_DONE) == 0) { + tsleep((caddr_t)bp, PRIBIO, "rawrd", 0); + } + splx(spl); + + vunmapbuf(bp); + + iolen = bp->b_bcount - bp->b_resid; + if (iolen == 0 && (bp->b_ioflags & BIO_ERROR) == 0) { + nerror = 0; /* Ignore possible beyond EOF error */ + break; /* EOF */ + } + + if ((bp->b_ioflags & BIO_ERROR) != 0) { + error = bp->b_error; + break; + } + resid -= iolen; + udata += iolen; + offset += iolen; + if (iolen < bp->b_bufsize) { + /* Incomplete read. Try to read remaining part */ + error = ffs_rawread_readahead(vp, + udata, + offset, + bp->b_bufsize - iolen, + p, + bp, + sa); + if (error) + break; + } else if (nbp != NULL) { /* Complete read with readahead */ + + tbp = bp; + bp = nbp; + nbp = tbp; + + tsa = sa; + sa = nsa; + nsa = tsa; + + if (resid <= bp->b_bufsize) { /* No more readaheads */ + relpbuf(nbp, &rawbufcnt); + nbp = NULL; + } else { /* Setup next readahead */ + nerror = ffs_rawread_readahead(vp, + udata + + bp->b_bufsize, + offset + + bp->b_bufsize, + resid - + bp->b_bufsize, + p, + nbp, + nsa); + if (nerror != 0) { + relpbuf(nbp, &rawbufcnt); + nbp = NULL; + } + } + } else if (nerror != 0) {/* Deferred Readahead error */ + break; + } else if (resid > 0) { /* More to read, no readahead */ + error = ffs_rawread_readahead(vp, udata, offset, + resid, p, bp, sa); + if (error != 0) + break; + } + } + + if (bp != NULL) + relpbuf(bp, &rawbufcnt); + if (nbp != NULL) { /* Run down readahead buffer */ + spl = splbio(); + while ((nbp->b_flags & B_DONE) == 0) { + tsleep((caddr_t)nbp, PRIBIO, "rawrd", 0); + } + splx(spl); + vunmapbuf(nbp); + relpbuf(nbp, &rawbufcnt); + } + + if (error == 0) + error = nerror; + PRELE(p); + uio->uio_resid = resid; + return error; +} + +static void +ffs_rawreadwakeup(bp) + struct buf *bp; +{ + wakeup((caddr_t) bp); +} + Index: sys/conf/options =================================================================== RCS file: /home/ncvs/src/sys/conf/options,v retrieving revision 1.271 diff -u -r1.271 options --- sys/conf/options 2001/05/13 20:52:36 1.271 +++ sys/conf/options 2001/05/16 17:36:04 @@ -378,6 +380,7 @@ REGRESSION opt_global.h SIMPLELOCK_DEBUG opt_global.h VFS_BIO_DEBUG opt_global.h +DIRECTIO opt_global.h # These are VM related options VM_KMEM_SIZE opt_vm.h - Tor Egge To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 13:27:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from perninha.conectiva.com.br (perninha.conectiva.com.br [200.250.58.156]) by hub.freebsd.org (Postfix) with ESMTP id 5A23D37B423 for ; Wed, 16 May 2001 13:27:21 -0700 (PDT) (envelope-from riel@conectiva.com.br) Received: from burns.conectiva (burns.conectiva [10.0.0.4]) by perninha.conectiva.com.br (Postfix) with SMTP id D0B5A16B55 for ; Wed, 16 May 2001 16:59:51 -0300 (EST) Received: (qmail 24949 invoked by uid 0); 16 May 2001 19:58:29 -0000 Received: from duckman.distro.conectiva (HELO duckman.conectiva.com.br) (root@10.0.17.2) by burns.conectiva with SMTP; 16 May 2001 19:58:29 -0000 Received: from localhost (riel@localhost) by duckman.conectiva.com.br (8.11.3/8.11.3) with ESMTP id f4GJxpg06073; Wed, 16 May 2001 16:59:51 -0300 X-Authentication-Warning: duckman.distro.conectiva: riel owned process doing -bs Date: Wed, 16 May 2001 16:59:51 -0300 (BRST) From: Rik van Riel X-X-Sender: To: Matt Dillon Cc: Charles Randall , Roger Larsson , , , Subject: Re: RE: on load control / process swapping In-Reply-To: <200105161754.f4GHsCd73025@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 16 May 2001, Matt Dillon wrote: > :There's one thing "wrong" with the drop-behind idea though; > :it penalises data even when it's still in core and we're > :reading it for the second or third time. > > It's not dropping the data, it's dropping the priority. And yes, it > does penalize the data somewhat. On the otherhand if the data happens > to still be in the cache and you scan it a second time, the page priority > gets bumped up But doesn't it get pushed _down_ again after the process has read the data? Or is this a part of the code outside of vm/* which I haven't read yet? regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 13:32:22 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 7075537B42C for ; Wed, 16 May 2001 13:32:19 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GKVkd77205; Wed, 16 May 2001 13:31:46 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 13:31:46 -0700 (PDT) From: Matt Dillon Message-Id: <200105162031.f4GKVkd77205@earth.backplane.com> To: Tor.Egge@fast.no Cc: arch@FreeBSD.ORG Subject: Re: on load control / process swapping References: <200105161801.f4GI1Oc73283@earth.backplane.com> <200105162018.WAA99982@midten.fast.no> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :> :> I think someone tried to implement O_DIRECT a while back, but it :> was fairly complex to try to do away with caching entirely. :> :> I think our best bet to 'start' an implementation of O_DIRECT is :> to support the flag in open() and fcntl(), and have it simply :> modify the sequential detection heuristic to throw away pages :> and buffers rather then simply depressing their priority. :> :> Eventually we can implement the direct-I/O piece of the equation. :> :> I could do this first part in an hour, I think. When I get home.... : :I've used something like the following patch since FreeBSD 3.3-STABLE. : :On a Dell 2450 machine running a FreeBSD 4.3-RELEASE SMP kernel it :increases idle time from 0% to 95% when running a test program with :100 threads that each reads 256K from random sector aligned locations :in a 10 GB file. Read speed is increased from 120 MB/s to 160 MB/s. : :This implementation is not semantically correct since it doesn't check :for dirty pages in the vm object. Ah, that's right... you were the one working on it. It looks like you've done some serious work on it since the last time we talked. Ok, I've done a quick once-over of the patch and I have a question: What happens if you've just written that file normally and there are still some uncomitted dirty buffers associated with it, and you then do an O_DIRECT read of the file? Do you get the old data or the new data? -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 13:42:12 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 44DDC37B422 for ; Wed, 16 May 2001 13:42:10 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GKfoR77402; Wed, 16 May 2001 13:41:50 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 13:41:50 -0700 (PDT) From: Matt Dillon Message-Id: <200105162041.f4GKfoR77402@earth.backplane.com> To: Rik van Riel Cc: Charles Randall , Roger Larsson , , , Subject: Re: RE: on load control / process swapping References: Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> does penalize the data somewhat. On the otherhand if the data happens :> to still be in the cache and you scan it a second time, the page priority :> gets bumped up : :But doesn't it get pushed _down_ again after the process has read :the data? Or is this a part of the code outside of vm/* which I :haven't read yet? : :regards, : :Rik :-- :Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Well, I was going to answer, but I can't find the code. I'll have to look at it more closely. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 13:51: 6 2001 Delivered-To: freebsd-arch@freebsd.org Received: from midten.fast.no (midten.fast.no [213.188.8.11]) by hub.freebsd.org (Postfix) with ESMTP id 67B3B37B422 for ; Wed, 16 May 2001 13:51:03 -0700 (PDT) (envelope-from Tor.Egge@fast.no) Received: from fast.no (IDENT:tegge@midten.fast.no [213.188.8.11]) by midten.fast.no (8.9.3/8.9.3) with ESMTP id WAA01047; Wed, 16 May 2001 22:50:59 +0200 (CEST) Message-Id: <200105162050.WAA01047@midten.fast.no> To: dillon@earth.backplane.com Cc: arch@FreeBSD.ORG Subject: Re: on load control / process swapping From: Tor.Egge@fast.no In-Reply-To: Your message of "Wed, 16 May 2001 13:31:46 -0700 (PDT)" References: <200105162031.f4GKVkd77205@earth.backplane.com> X-Mailer: Mew version 1.70 on Emacs 19.34.1 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Wed, 16 May 2001 22:50:59 +0200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Ok, I've done a quick once-over of the patch and I have a question: > What happens if you've just written that file normally and there are > still some uncomitted dirty buffers associated with it, and you then > do an O_DIRECT read of the file? Do you get the old data or the new > data? Currently, you get the old data. That's both semantically incorrect and a security hole. Some check for dirty buffers should be made if the OBJ_MIGHTBEDIRTY flag is set on the vm object. - Tor Egge To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 14:36:15 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id EF86F37B422 for ; Wed, 16 May 2001 14:36:12 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GLZdo78984; Wed, 16 May 2001 14:35:39 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 14:35:39 -0700 (PDT) From: Matt Dillon Message-Id: <200105162135.f4GLZdo78984@earth.backplane.com> To: Tor.Egge@fast.no Cc: arch@FreeBSD.ORG Subject: Re: on load control / process swapping References: <200105162031.f4GKVkd77205@earth.backplane.com> <200105162050.WAA01047@midten.fast.no> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG : :> Ok, I've done a quick once-over of the patch and I have a question: :> What happens if you've just written that file normally and there are :> still some uncomitted dirty buffers associated with it, and you then :> do an O_DIRECT read of the file? Do you get the old data or the new :> data? : :Currently, you get the old data. That's both semantically incorrect :and a security hole. Some check for dirty buffers should be made if :the OBJ_MIGHTBEDIRTY flag is set on the vm object. : :- Tor Egge Question number 2. You have this: error = cluster_read(vp, ip->i_size, lbn, - size, NOCRED, uio->uio_resid, seqcount, &bp); + size, NOCRED, blkoffset + uio->uio_resid, seqcount, &bp); What is the blkoffset adjustment for? Is that a bug fix for something else? -- In anycase, in regards to the main patch. Why don't I commit the header file support pieces from your patch with some minor alignment cleanups to the struct file, but leave your rawread/rawwrite out until we can make it work properly. Then I can use IO_NOBUFFER to cause the underlying VM pages to be freed (the underlying struct buf is already released in the existing code). The result will be the same low-VM-page-cache impact as your rawread/rawwrite code except for the extra buffer copy. I think I can reach about 90% of the performance you get simply by freeing the underlying VM pages because this will allow them to be reused in the next read(), and they will already be in the L2 cache. If I don't free the underlying VM pages the sequential read will force the L2 cache to cycle, and I'll bet that is why you get such drastically different idle times. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 14:48:25 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 67EF537B423 for ; Wed, 16 May 2001 14:48:23 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GLmNm79465; Wed, 16 May 2001 14:48:23 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 14:48:23 -0700 (PDT) From: Matt Dillon Message-Id: <200105162148.f4GLmNm79465@earth.backplane.com> To: freebsd-arch@FreeBSD.ORG Subject: Upgrading u_short cr_ref to u_int cr_ref in ucred on -stable References: <14721.48065.766815.376959@grasshopper.cs.duke.edu> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG As per conversations on this list 5-10 April (see 'Eliminate crget() from...') I intend to MFC the change Alfred made in -current to -stable for the ucred reference count. This will occur tonight. The change will turn cr_ref from a u_short to a u_int. Due to alignment issues, the other fields in the structure will not change their location and a quick run through the driver code seems to show that device drivers do not access cr_ref. That is, only crhold() seems to really reference the cr_ref field directly and devices do not appear to call crhold(), so I think we are plenty safe enough in regards to binary compatibility. This being between releases, if this change creates a serious issue somewhere along the line we can always rip it out and throw in Terry's stop-gap fix (that allocates a new copy when the ref count would otherwise overflow). I do not think that will be necessary though. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 14:57:31 2001 Delivered-To: freebsd-arch@freebsd.org Received: from superconductor.rush.net (superconductor.rush.net [208.9.155.8]) by hub.freebsd.org (Postfix) with ESMTP id BA8C237B422 for ; Wed, 16 May 2001 14:57:27 -0700 (PDT) (envelope-from bright@superconductor.rush.net) Received: (from bright@localhost) by superconductor.rush.net (8.11.2/8.11.2) id f4GLvPt12475; Wed, 16 May 2001 17:57:25 -0400 (EDT) Date: Wed, 16 May 2001 17:57:24 -0400 From: Alfred Perlstein To: Matt Dillon Cc: freebsd-arch@FreeBSD.ORG Subject: Re: Upgrading u_short cr_ref to u_int cr_ref in ucred on -stable Message-ID: <20010516175724.J12365@superconductor.rush.net> References: <14721.48065.766815.376959@grasshopper.cs.duke.edu> <200105162148.f4GLmNm79465@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0us In-Reply-To: <200105162148.f4GLmNm79465@earth.backplane.com>; from dillon@earth.backplane.com on Wed, May 16, 2001 at 02:48:23PM -0700 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Matt Dillon [010516 17:48] wrote: > As per conversations on this list 5-10 April (see > 'Eliminate crget() from...') I intend to MFC the change > Alfred made in -current to -stable for the ucred reference count. > This will occur tonight. Go for it. -- -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 15:11:12 2001 Delivered-To: freebsd-arch@freebsd.org Received: from midten.fast.no (midten.fast.no [213.188.8.11]) by hub.freebsd.org (Postfix) with ESMTP id B1B4037B42C for ; Wed, 16 May 2001 15:11:09 -0700 (PDT) (envelope-from Tor.Egge@fast.no) Received: from fast.no (IDENT:tegge@midten.fast.no [213.188.8.11]) by midten.fast.no (8.9.3/8.9.3) with ESMTP id AAA02889; Thu, 17 May 2001 00:11:06 +0200 (CEST) Message-Id: <200105162211.AAA02889@midten.fast.no> To: dillon@earth.backplane.com Cc: arch@FreeBSD.ORG Subject: Re: on load control / process swapping From: Tor.Egge@fast.no In-Reply-To: Your message of "Wed, 16 May 2001 14:35:39 -0700 (PDT)" References: <200105162135.f4GLZdo78984@earth.backplane.com> X-Mailer: Mew version 1.70 on Emacs 19.34.1 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Date: Thu, 17 May 2001 00:11:05 +0200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > Question number 2. You have this: > > error = cluster_read(vp, ip->i_size, lbn, > - size, NOCRED, uio->uio_resid, seqcount, &bp); > + size, NOCRED, blkoffset + uio->uio_resid, seqcount, &bp); > > > What is the blkoffset adjustment for? Is that a bug fix for something > else? lbn doesn't reflect the least significant bits in uio->uio_offset, causing too small readahead. Adding blkoffset to uio->uio_resid compensates for that. > In anycase, in regards to the main patch. Why don't I commit > the header file support pieces from your patch with some minor > alignment cleanups to the struct file, but leave your > rawread/rawwrite out until we can make it work properly. Fine. > Then I can use IO_NOBUFFER to cause the underlying VM pages to > be freed (the underlying struct buf is already released in the > existing code). The result will be the same low-VM-page-cache > impact as your rawread/rawwrite code except for the extra buffer > copy. I think I can reach about 90% of the performance you get > simply by freeing the underlying VM pages because this will > allow them to be reused in the next read(), and they will > already be in the L2 cache. If I don't free the underlying VM > pages the sequential read will force the L2 cache to cycle, and > I'll bet that is why you get such drastically different idle > times. Avoiding that copyout() is the major reason for increased idle time. The L2 cache will still cycle a lot with your suggested implementation for the load I used since the normal amount of outstanding IO is 25 MB (256 KB x 100). The L2 cache is a lot smaller then 25 MB. - Tor Egge To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 15:23:26 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id F1B4C37B422 for ; Wed, 16 May 2001 15:23:23 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4GMMpC81247; Wed, 16 May 2001 15:22:51 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 15:22:51 -0700 (PDT) From: Matt Dillon Message-Id: <200105162222.f4GMMpC81247@earth.backplane.com> To: Tor.Egge@fast.no Cc: arch@FreeBSD.ORG Subject: Re: on load control / process swapping References: <200105162135.f4GLZdo78984@earth.backplane.com> <200105162211.AAA02889@midten.fast.no> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :.. :> allow them to be reused in the next read(), and they will :> already be in the L2 cache. If I don't free the underlying VM :> pages the sequential read will force the L2 cache to cycle, and :> I'll bet that is why you get such drastically different idle :> times. : :Avoiding that copyout() is the major reason for increased idle time. : :The L2 cache will still cycle a lot with your suggested implementation :for the load I used since the normal amount of outstanding IO is 25 MB :(256 KB x 100). The L2 cache is a lot smaller then 25 MB. : :- Tor Egge I'd have to see your test code. Doing a direct-read into a user buffer has no cache impact at all (DMA does not go through the cpu cache). If you are doing seek/read()s but not actually looking at the data that is returned, your test results are going to be seriously skewed. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 16:31:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from midten.fast.no (midten.fast.no [213.188.8.11]) by hub.freebsd.org (Postfix) with ESMTP id 0817337B423 for ; Wed, 16 May 2001 16:31:17 -0700 (PDT) (envelope-from Tor.Egge@fast.no) Received: from fast.no (IDENT:tegge@midten.fast.no [213.188.8.11]) by midten.fast.no (8.9.3/8.9.3) with ESMTP id BAA04708; Thu, 17 May 2001 01:31:06 +0200 (CEST) Message-Id: <200105162331.BAA04708@midten.fast.no> To: dillon@earth.backplane.com Cc: arch@FreeBSD.ORG Subject: Re: on load control / process swapping From: Tor.Egge@fast.no In-Reply-To: Your message of "Wed, 16 May 2001 15:22:51 -0700 (PDT)" References: <200105162222.f4GMMpC81247@earth.backplane.com> X-Mailer: Mew version 1.70 on Emacs 19.34.1 Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Thu_May_17_01:30:16_2001)--" Content-Transfer-Encoding: 7bit Date: Thu, 17 May 2001 01:31:06 +0200 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG ----Next_Part(Thu_May_17_01:30:16_2001)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit > I'd have to see your test code. Doing a direct-read into a user buffer > has no cache impact at all (DMA does not go through the cpu cache). > If you are doing seek/read()s but not actually looking at the data that > is returned, your test results are going to be seriously skewed. The test code does not look at the data. I sent a copy of it to you at January 7th 2000 (along with a previous version of the O_DIRECT patch). I agree that the 95% reduction in CPU usage is seriously skewed. The performance improvement for most real applications will be very small or even negative. For some specialized applications it is a significant performance improvement, giving nearly the same performance as when bypassing the kernel file system and using the raw device directly. - Tor Egge ----Next_Part(Thu_May_17_01:30:16_2001)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Description: "Makefile" all: aiotest_lt_raw aiotest_lt aiotest_ut clean: rm -f aiotest_lt_raw aiotest_lt aiotest_ut aiotest_lt_raw: aiotest.c cc -static -D_THREAD_SAFE -D_PTHREADS -DLINUXTHREADS -DRAWREAD -O2 -I/usr/local/include/pthread/linuxthreads -o aiotest_lt_raw aiotest.c -L/usr/local/lib -llthread -llgcc_r aiotest_lt: aiotest.c cc -D_THREAD_SAFE -D_PTHREADS -DLINUXTHREADS -I/usr/local/include/pthread/linuxthreads -O2 -o aiotest_lt aiotest.c -L/usr/local/lib -llthread -llgcc_r aiotest_ut: aiotest.c cc -static -pthread -D_THREAD_SAFE -D_PTHREADS -O2 -o aiotest_ut aiotest.c ----Next_Part(Thu_May_17_01:30:16_2001)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Description: "aiotest.c" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifndef __linux__ #include #endif #ifndef LINUXTHREADS #include struct myaio { struct aiocb cb; struct { int busy; pthread_mutex_t mutex; pthread_cond_t cond; } cond; struct myaio *next; struct myaio *prev; ssize_t retval; size_t reterrno; time_t started; int errwritten; }; static struct myaio *activeaios; static struct myaio *freeaios; static int freecnt; static pthread_mutex_t aiomutex = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t aiocond = PTHREAD_COND_INITIALIZER; static int aiostartcnt; static int aioendcnt; static volatile sig_atomic_t gotusr1; struct timeval maxlat; static struct timeval gotusr1time; static pthread_once_t aiothread_once = PTHREAD_ONCE_INIT; static pthread_t aiothread; static int aiothread_running; #endif int xreadlen; /* bytes */ int xreadoff; /* skip KB at eof */ static void runaiothread(void); #ifndef __linux__ #ifndef O_DIRECT ssize_t rawread(int fd, void *buf, size_t nbytes, off_t offset) { struct rawread rr; ssize_t ret; rr.udata = buf; rr.len = nbytes; rr.offset = offset; ret = ioctl(fd, FIORAWREAD, &rr); if (ret < 0 && errno == ENOTTY) ret = pread(fd, buf, nbytes, offset); return ret; } #endif #endif #ifndef LINUXTHREADS ssize_t aio_pread(const int fd, void *buf, const size_t buflen, const off_t off) { struct myaio *aio; int ret; size_t retval; int reterrno; pthread_mutex_lock(&aiomutex); if (freeaios != NULL) { assert(freecnt > 0); freecnt--; aio = freeaios; freeaios = aio->next; aio->next = NULL; aio->prev = NULL; } else { assert(freecnt == 0); pthread_once(&aiothread_once, runaiothread); while (aiothread_running == 0) pthread_cond_wait(&aiocond, &aiomutex); pthread_mutex_unlock(&aiomutex); aio = (struct myaio *) malloc(sizeof(struct myaio)); memset(aio, 0, sizeof(struct myaio)); pthread_mutex_init(&aio->cond.mutex, NULL); pthread_cond_init(&aio->cond.cond, NULL); aio->next = NULL; aio->prev = NULL; pthread_mutex_lock(&aiomutex); } assert(aio->cond.busy == 0); aio->cond.busy = 1; aio->cb.aio_fildes = fd; aio->cb.aio_offset = off; aio->cb.aio_buf = buf; aio->cb.aio_nbytes = buflen; aio->cb.aio_sigevent.sigev_notify = SIGEV_SIGNAL; aio->cb.aio_sigevent.sigev_signo = SIGUSR1; aio->cb.aio_sigevent.sigev_value.sigval_ptr = &aio->cb; aio->cb.aio_lio_opcode = 0; aio->cb.aio_reqprio = 0; aio->retval = 0; aio->started = time(0); aio->errwritten = 0; aio->prev = NULL; aio->next = activeaios; if (activeaios != NULL) activeaios->prev = aio; activeaios = aio; aiostartcnt++; ret = aio_read(&aio->cb); pthread_mutex_unlock(&aiomutex); assert(ret == 0); pthread_mutex_lock(&aio->cond.mutex); while (aio->cond.busy != 0) { pthread_cond_wait(&aio->cond.cond, &aio->cond.mutex); } pthread_mutex_unlock(&aio->cond.mutex); retval = aio->retval; reterrno = aio->reterrno; #if 0 assert((size_t) aio->retval == buflen); #endif pthread_mutex_lock(&aiomutex); assert(aio->next == NULL); assert(aio->prev == NULL); assert(aio != activeaios); assert(aio != freeaios); aio->next = freeaios; aio->prev = NULL; freeaios = aio; freecnt++; pthread_mutex_unlock(&aiomutex); errno = reterrno; return retval; } static void usr1handler(int sig) { (void) sig; if (gotusr1 == 0) gettimeofday(&gotusr1time, NULL); gotusr1 = 1; } void processusr1(void) { struct myaio *aio, *naio; int reterrno; int now; int qpos; pthread_mutex_lock(&aiomutex); now = time(0); qpos = 0; for (aio = activeaios; aio != NULL; aio =naio, qpos++) { naio = aio->next; reterrno = aio_error(&aio->cb); if (now - aio->started > 15 && (reterrno != EINPROGRESS || aio->errwritten == 0)) { printf("ERROR: aio used more than %d seconds: cb=%p, buflen=%u" ", qpos=%d %s, aiocnt=%d,%d\n", (int) (now - aio->started - 1), (void *) &aio->cb, aio->cb.aio_nbytes, qpos, aio->next == NULL ? "" : "(more elements)", aiostartcnt, aioendcnt); aio->errwritten = 1; } if (reterrno == EINPROGRESS) continue; else if (reterrno < 0) { assert(errno == EINVAL); assert(now - aio->started < 10); } else { aioendcnt++; assert(aio->prev != NULL || aio == activeaios); aio->retval = aio_return(&aio->cb); aio->reterrno = reterrno; if (aio->next != NULL) aio->next->prev = aio->prev; if (aio->prev != NULL) aio->prev->next = aio->next; if (aio == activeaios) activeaios = aio->next; aio->prev = NULL; aio->next = NULL; pthread_mutex_lock(&aio->cond.mutex); aio->cond.busy = 0; pthread_cond_signal(&aio->cond.cond); pthread_mutex_unlock(&aio->cond.mutex); } } pthread_mutex_unlock(&aiomutex); } void *aiothreadmeat(void *dummy) { sigset_t sigs_to_block; struct sigaction act; struct timeval now, lat; struct sched_param schedparam; int policy; if (pthread_getschedparam(pthread_self(), &policy, &schedparam) == 0) { printf("Initial Aiothread priority was %d\n", schedparam.sched_priority); schedparam.sched_priority += 4; if (pthread_setschedparam(pthread_self(), policy, &schedparam) == 0) { if (pthread_getschedparam(pthread_self(), &policy, &schedparam) == 0) printf("Bumped priority of Aiothread to %d\n", schedparam.sched_priority); else printf("Failed rereading Aiothread priority\n"); } else printf("Failed Bumping Aiothread priority\n"); } else printf("Failed reading initial Aiothread priority\n"); act.sa_handler=usr1handler; sigemptyset(&act.sa_mask); act.sa_flags=0; sigaction(SIGUSR1,&act,NULL); sigemptyset(&sigs_to_block); sigaddset(&sigs_to_block, SIGUSR1); pthread_sigmask(SIG_UNBLOCK, &sigs_to_block, NULL); pthread_mutex_lock(&aiomutex); aiothread_running = 1; pthread_cond_broadcast(&aiocond); pthread_mutex_unlock(&aiomutex); gettimeofday(&gotusr1time, NULL); gotusr1 = 1; while (1) { if (gotusr1 != 0) { gettimeofday(&now, NULL); if (now.tv_usec >= gotusr1time.tv_usec) { lat.tv_usec = now.tv_usec - gotusr1time.tv_usec; lat.tv_sec = now.tv_sec - gotusr1time.tv_sec; } else { lat.tv_usec = now.tv_usec + 1000000 - gotusr1time.tv_usec; lat.tv_sec = now.tv_sec - 1 - gotusr1time.tv_sec; } if (lat.tv_sec > maxlat.tv_sec || (lat.tv_sec == maxlat.tv_sec && lat.tv_usec >= maxlat.tv_usec)) maxlat = lat; gotusr1 = 0; processusr1(); } sleep(1); } abort(); } static void runaiothread(void) { pthread_create(&aiothread, NULL, aiothreadmeat, NULL); } #endif /* 10000 MB test file */ #define FILESIZE 10000 static off_t filesize; int writefile(void) { char *buf; size_t buflen; int fd; int count; ssize_t wgot; struct stat stbuf; buflen = 1024 * 1024; buf = (char *) malloc(buflen); assert(buf != NULL); filesize = (off_t) FILESIZE * (off_t) buflen; fd = open("largefile", O_RDWR | O_CREAT, 0666); assert(fd >= 0); #if 1 fstat(fd, &stbuf); if (stbuf.st_size < filesize) { for (count = 0; count < FILESIZE; count++) { wgot = write(fd, buf, buflen); assert(wgot == buflen); } } #endif #ifdef RAWREAD #ifdef O_DIRECT { int flags; flags = fcntl(fd, F_GETFL, 0); flags |= O_DIRECT; fcntl(fd, F_SETFL, flags); } #endif #endif return fd; } static pthread_mutex_t cntmutex = PTHREAD_MUTEX_INITIALIZER; static int startreadcnt; static int donereadcnt; static off_t donereadbytes; void *readthread(void *data) { int fd; size_t buflen; char *buf; ssize_t rgot; off_t loc; sigset_t sigs_to_block; fd = (int) data; buflen = xreadlen; buf = (char *) malloc(buflen); assert(buf != NULL); sigemptyset(&sigs_to_block); sigaddset(&sigs_to_block, SIGUSR1); pthread_sigmask(SIG_BLOCK, &sigs_to_block, NULL); sleep(1); while (1) { loc = (off_t) (random() % (FILESIZE * 2048 - xreadoff)) * (off_t) 512; #if 0 loc &= ~ 32767LL; #endif pthread_mutex_lock(&cntmutex); startreadcnt++; pthread_mutex_unlock(&cntmutex); #ifdef LINUXTHREADS #if defined(RAWREAD) && !defined(O_DIRECT) rgot = rawread(fd, buf, buflen, loc); #else rgot = pread(fd, buf, buflen, loc); #endif #else rgot = aio_pread(fd, buf, buflen, loc); #endif if (rgot != buflen) { printf("rgot=%d, buflen=%d, loc=%qd, startreadcnt=%d,%d\n", rgot, buflen, loc, startreadcnt, donereadcnt); } assert(rgot == buflen); pthread_mutex_lock(&cntmutex); donereadcnt++; donereadbytes += buflen; pthread_mutex_unlock(&cntmutex); } return NULL; } int main(int argc, char **argv) { int fd; int cnt; pthread_t curthread; int startcntcopy, donecntcopy; sigset_t sigs_to_block; struct timeval stime; struct timeval now; struct timeval report; struct timeval delta; double fdelta; double rate; double mbrate; struct timeval tvsel; int nthreads; xreadlen = 1024; if (argc >= 2) { xreadlen = atoi(argv[1]); if (xreadlen < 0 || xreadlen > 2097152) xreadlen = 1024; xreadlen = (xreadlen + 511) & ~511; } xreadoff = (xreadlen / 512) - 1; nthreads = 250; if (argc >= 3) { nthreads = atoi(argv[2]); if (nthreads < 1 || nthreads > 1000) nthreads = 1; } fd = writefile(); #if 1 sigemptyset(&sigs_to_block); sigaddset(&sigs_to_block, SIGUSR1); pthread_sigmask(SIG_BLOCK, &sigs_to_block, NULL); #endif srandom(time(NULL)); gettimeofday(&stime, NULL); report = stime; report.tv_sec++; for (cnt = 0; cnt < nthreads; cnt++) { pthread_create(&curthread, NULL, readthread, (void *) fd); } while (1) { #if 0 sleep(1); /* XXX: Does not work */ #else gettimeofday(&now, NULL); if (now.tv_sec < report.tv_sec || (now.tv_sec == report.tv_sec && now.tv_usec < report.tv_usec)) { if (report.tv_usec >= now.tv_usec) { tvsel.tv_sec = report.tv_sec - now.tv_sec; tvsel.tv_usec = report.tv_usec - now.tv_usec; } else { tvsel.tv_sec = report.tv_sec -now.tv_sec - 1; tvsel.tv_usec = report.tv_usec + 1000000 - now.tv_usec; } select(1, NULL, NULL, NULL, &tvsel); continue; } report.tv_sec++; #endif gettimeofday(&now, NULL); if (now.tv_usec >= stime.tv_usec) { delta.tv_sec = now.tv_sec - stime.tv_sec; delta.tv_usec = now.tv_usec - stime.tv_usec; } else { delta.tv_sec = now.tv_sec - stime.tv_sec - 1; delta.tv_usec = now.tv_usec + 1000000 - stime.tv_usec; } fdelta = delta.tv_sec + ((double) delta.tv_usec) / 1000000.0; pthread_mutex_lock(&cntmutex); startcntcopy = startreadcnt; donecntcopy = donereadcnt; pthread_mutex_unlock(&cntmutex); rate = (double) donecntcopy / (double) fdelta; mbrate = (double) donereadbytes / ((double) (fdelta) * 1048576.0); printf("%d(+%d) read operations time=%6.3f, rate=%6.3f tps/s, %6.3f MB/s\n", donecntcopy, startcntcopy - donecntcopy, fdelta, rate, mbrate); #ifndef LINUXTHREADS printf("lat=%d.%06d\n", maxlat.tv_sec, maxlat.tv_usec); #endif fflush(stdout); } } ----Next_Part(Thu_May_17_01:30:16_2001)---- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 17:31:53 2001 Delivered-To: freebsd-arch@freebsd.org Received: from nothing-going-on.demon.co.uk (pc-62-31-42-140-hy.blueyonder.co.uk [62.31.42.140]) by hub.freebsd.org (Postfix) with ESMTP id CD0F337B423; Wed, 16 May 2001 17:31:45 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.11.3/8.11.3) id f4H0S0f99954; Thu, 17 May 2001 01:28:00 +0100 (BST) (envelope-from nik) Date: Thu, 17 May 2001 01:27:54 +0100 From: Nik Clayton To: Matt Dillon Cc: Bruce Evans , Terry Lambert , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... Message-ID: <20010517012753.A99822@catkin.nothing-going-on.org> References: <200105161743.f4GHhEl72847@earth.backplane.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="VbJkn9YxBvnuCH5J" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105161743.f4GHhEl72847@earth.backplane.com>; from dillon@earth.backplane.com on Wed, May 16, 2001 at 10:43:14AM -0700 Organization: FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG --VbJkn9YxBvnuCH5J Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, May 16, 2001 at 10:43:14AM -0700, Matt Dillon wrote: > I don't change the timercounter method defaults, and I sure hope you > aren't advocating that people change their timecounter defaults. If > the TSC is a reasonable default, the system should figure it out and > use it without requiring intervention. At the risk of going off at a slight tangent, TSC is not a good default on some hardware. In particular, at least some laptops will start to gain or lose time almost immediately they boot if the timecounter is set to TSC. It has to be set to i8254 in /etc/sysctl.conf. This is covered in the FAQ. N --=20 FreeBSD: The Power to Serve http://www.freebsd.org/ FreeBSD Documentation Project http://www.freebsd.org/docproj/ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- --VbJkn9YxBvnuCH5J Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsDGwkACgkQk6gHZCw343VRrgCeIfaMEu910VOMfaxiCT5o519V ukAAn1IlEd4EtsloREYcP6VlfICwTgz3 =6KZe -----END PGP SIGNATURE----- --VbJkn9YxBvnuCH5J-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 19:56: 2 2001 Delivered-To: freebsd-arch@freebsd.org Received: from peter3.wemm.org (c1315225-a.plstn1.sfba.home.com [65.0.135.147]) by hub.freebsd.org (Postfix) with ESMTP id 962DF37B422; Wed, 16 May 2001 19:55:48 -0700 (PDT) (envelope-from peter@wemm.org) Received: from overcee.netplex.com.au (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id f4H2tmM57663; Wed, 16 May 2001 19:55:48 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 35895380E; Wed, 16 May 2001 19:55:48 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: Nik Clayton Cc: Matt Dillon , Bruce Evans , Terry Lambert , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <20010517012753.A99822@catkin.nothing-going-on.org> Date: Wed, 16 May 2001 19:55:48 -0700 From: Peter Wemm Message-Id: <20010517025548.35895380E@overcee.netplex.com.au> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Nik Clayton wrote: > On Wed, May 16, 2001 at 10:43:14AM -0700, Matt Dillon wrote: > > I don't change the timercounter method defaults, and I sure hope you > > aren't advocating that people change their timecounter defaults. If > > the TSC is a reasonable default, the system should figure it out and > > use it without requiring intervention. > > At the risk of going off at a slight tangent, TSC is not a good default > on some hardware. In particular, at least some laptops will start to > gain or lose time almost immediately they boot if the timecounter is set > to TSC. It has to be set to i8254 in /etc/sysctl.conf. This is covered > in the FAQ. FYI: Pentium4 cpus have: Features=0x3febfbff,ACC> The last one is interesting. As I understand it, ACC ("Auto Clock Correction") allows for TSC correction in spite of varying cpu clocks. This is important because of the variable cpu speed throttling to keep the heat down. Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 20:21:19 2001 Delivered-To: freebsd-arch@freebsd.org Received: from green.bikeshed.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 5B6A937B422; Wed, 16 May 2001 20:21:16 -0700 (PDT) (envelope-from green@green.bikeshed.org) Received: from localhost (green@localhost) by green.bikeshed.org (8.11.2/8.11.1) with ESMTP id f4H3LB333474; Wed, 16 May 2001 23:21:15 -0400 (EDT) (envelope-from green@green.bikeshed.org) Message-Id: <200105170321.f4H3LB333474@green.bikeshed.org> X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: Alfred Perlstein Cc: Matt Dillon , freebsd-arch@FreeBSD.ORG Subject: Re: Upgrading u_short cr_ref to u_int cr_ref in ucred on -stable In-Reply-To: Message from Alfred Perlstein of "Wed, 16 May 2001 17:57:24 EDT." <20010516175724.J12365@superconductor.rush.net> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 16 May 2001 23:21:10 -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred Perlstein wrote: > * Matt Dillon [010516 17:48] wrote: > > As per conversations on this list 5-10 April (see > > 'Eliminate crget() from...') I intend to MFC the change > > Alfred made in -current to -stable for the ucred reference count. > > This will occur tonight. > > Go for it. Did you want to introduce xucred at the same time? -- Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / green@FreeBSD.org `------------------------------' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 20:36:41 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 493A037B423; Wed, 16 May 2001 20:36:39 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4H3ads87696; Wed, 16 May 2001 20:36:39 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 20:36:39 -0700 (PDT) From: Matt Dillon Message-Id: <200105170336.f4H3ads87696@earth.backplane.com> To: "Brian F. Feldman" Cc: Alfred Perlstein , freebsd-arch@FreeBSD.ORG Subject: Re: Upgrading u_short cr_ref to u_int cr_ref in ucred on -stable References: <200105170321.f4H3LB333474@green.bikeshed.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> > This will occur tonight. :> :> Go for it. : :Did you want to introduce xucred at the same time? : :-- : Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / : green@FreeBSD.org `------------------------------' I've got too much on my plate at the moment, but I will throw in MFCing changing the crhold() #define to be a real procedure, and a little later I will throw an overflow check and panic into -current (and -stable in 3 days) as per Terry's suggestion. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 21:42:39 2001 Delivered-To: freebsd-arch@freebsd.org Received: from green.bikeshed.org (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 72AB637B423; Wed, 16 May 2001 21:42:36 -0700 (PDT) (envelope-from green@green.bikeshed.org) Received: from localhost (green@localhost) by green.bikeshed.org (8.11.2/8.11.1) with ESMTP id f4H4gZf51783; Thu, 17 May 2001 00:42:35 -0400 (EDT) (envelope-from green@green.bikeshed.org) Message-Id: <200105170442.f4H4gZf51783@green.bikeshed.org> X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: Matt Dillon Cc: Alfred Perlstein , freebsd-arch@FreeBSD.ORG Subject: Re: Upgrading u_short cr_ref to u_int cr_ref in ucred on -stable In-Reply-To: Message from Matt Dillon of "Wed, 16 May 2001 20:36:39 PDT." <200105170336.f4H3ads87696@earth.backplane.com> From: "Brian F. Feldman" Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 17 May 2001 00:42:34 -0400 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Matt Dillon wrote: > > :> > This will occur tonight. > :> > :> Go for it. > : > :Did you want to introduce xucred at the same time? > : > :-- > : Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / > : green@FreeBSD.org `------------------------------' > > I've got too much on my plate at the moment, but I will throw in > MFCing changing the crhold() #define to be a real procedure, and > a little later I will throw an overflow check and panic into -current > (and -stable in 3 days) as per Terry's suggestion. It would really be a good idea to do it instead of breaking the API completely, I think. I've been meaning to test it all on 4.3 and commit it, but I'm still finishing school. Finals time means too much worry and not enough FreeBSD hacking :( It should be an easy change, though. -- Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / green@FreeBSD.org `------------------------------' To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 21:47:13 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id D869437B423; Wed, 16 May 2001 21:47:10 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4H4lA188014; Wed, 16 May 2001 21:47:10 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 21:47:10 -0700 (PDT) From: Matt Dillon Message-Id: <200105170447.f4H4lA188014@earth.backplane.com> To: "Brian F. Feldman" Cc: Alfred Perlstein , freebsd-arch@FreeBSD.ORG Subject: Re: Upgrading u_short cr_ref to u_int cr_ref in ucred on -stable References: <200105170442.f4H4gZf51783@green.bikeshed.org> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :> a little later I will throw an overflow check and panic into -current :> (and -stable in 3 days) as per Terry's suggestion. : :It would really be a good idea to do it instead of breaking the API :completely, I think. I've been meaning to test it all on 4.3 and commit it, :but I'm still finishing school. Finals time means too much worry and not :enough FreeBSD hacking :( It should be an easy change, though. : :-- : Brian Fundakowski Feldman \ FreeBSD: The Power to Serve! / Well, I agree on principle, but I don't think the changes to cr_ref breaks any existing programs (though obviously I can't be 100% sure). A much bigger problem is going to be Tor's O_DIRECT stuff... that requires the struct file's f_flag to go from short -> u_int. So far I haven't found anything that breaks. If it turns out to be an issue I can split the flags field into two shorts (there are unused shorts elsewhere in the structure). -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 23:32:58 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 6212837B422; Wed, 16 May 2001 23:32:53 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f4H6WiL09653; Thu, 17 May 2001 08:32:44 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Peter Wemm Cc: Nik Clayton , Matt Dillon , Bruce Evans , Terry Lambert , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: Your message of "Wed, 16 May 2001 19:55:48 PDT." <20010517025548.35895380E@overcee.netplex.com.au> Date: Thu, 17 May 2001 08:32:44 +0200 Message-ID: <9651.990081164@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <20010517025548.35895380E@overcee.netplex.com.au>, Peter Wemm writes : >FYI: Pentium4 cpus have: >Features=0x3febfbff MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,,ACC> > >The last one is interesting. As I understand it, ACC ("Auto Clock >Correction") allows for TSC correction in spite of varying cpu clocks. > >This is important because of the variable cpu speed throttling to keep the >heat down. Hmm, anyone has any doc on that ? -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Wed May 16 23:48:38 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 33F0037B423 for ; Wed, 16 May 2001 23:48:24 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4H6lkk88458; Wed, 16 May 2001 23:47:46 -0700 (PDT) (envelope-from dillon) Date: Wed, 16 May 2001 23:47:46 -0700 (PDT) From: Matt Dillon Message-Id: <200105170647.f4H6lkk88458@earth.backplane.com> To: Tor.Egge@fast.no Cc: arch@FreeBSD.ORG Subject: Preliminary O_DIRECT patch (for review only, not yet tested!) References: <200105162222.f4GMMpC81247@earth.backplane.com> <200105162331.BAA04708@midten.fast.no> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG This is my preliminary O_DIRECT patch so far, against -stable at the moment (Obviously it will be committed to -current first, but I have to test it on -stable). It seems to work for reads. It doesn't work for writes yet (the buffers still get cached). Basically it takes Tor's infrastructure with some minor modifications, removes the rawread/rawwrite stuff, and then adds a B_DIRECT flag to the buffer cache. write()'s are converted to synchronous writes, and both read()s and write()s attempt to completely free the underlying VM pages plus the buffer is released. I need to figure out how to free underlying buffers/VM for write() operations before I can commit any of this. It could be a while. -- I've looked at the rawread/rawwrite issue and I believe it may be possible to use the already-existing B_VMIO flag coupled with some VM magic to achieve the equivalent in the buffer cache itself rather then having to write a rawread/rawwrite function for each filesystem. Filesystems already support B_VMIO. If it is possible, then we'll have a general raw I/O solution. -Matt Index: kern/vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.242.2.7 diff -u -r1.242.2.7 vfs_bio.c --- kern/vfs_bio.c 2001/03/02 16:45:12 1.242.2.7 +++ kern/vfs_bio.c 2001/05/17 04:21:37 @@ -1230,7 +1230,7 @@ /* unlock */ BUF_UNLOCK(bp); - bp->b_flags &= ~(B_ORDERED | B_ASYNC | B_NOCACHE | B_AGE | B_RELBUF); + bp->b_flags &= ~(B_ORDERED | B_ASYNC | B_NOCACHE | B_AGE | B_RELBUF | B_DIRECT); splx(s); } @@ -1296,7 +1296,7 @@ /* unlock */ BUF_UNLOCK(bp); - bp->b_flags &= ~(B_ORDERED | B_ASYNC | B_NOCACHE | B_AGE | B_RELBUF); + bp->b_flags &= ~(B_ORDERED | B_ASYNC | B_NOCACHE | B_AGE | B_RELBUF | B_DIRECT); splx(s); } @@ -1328,12 +1328,15 @@ vm_page_flag_clear(m, PG_ZERO); /* * Might as well free the page if we can and it has - * no valid data. + * no valid data. We also free the page if the + * buffer was used for direct I/O */ if ((bp->b_flags & B_ASYNC) == 0 && !m->valid && m->hold_count == 0) { vm_page_busy(m); vm_page_protect(m, VM_PROT_NONE); vm_page_free(m); + } else if (bp->b_flags & B_DIRECT) { + vm_page_try_to_free(m); } else if (vm_page_count_severe()) { vm_page_try_to_cache(m); } @@ -2187,7 +2190,7 @@ } splx(s); - bp->b_flags &= ~B_DONE; + bp->b_flags &= ~(B_DONE | B_DIRECT); } else { /* * Buffer is not in-core, create new buffer. The buffer @@ -2267,7 +2270,7 @@ allocbuf(bp, size); splx(s); - bp->b_flags &= ~B_DONE; + bp->b_flags &= ~(B_DONE | B_DIRECT); } return (bp); } Index: kern/vfs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_vnops.c,v retrieving revision 1.87.2.6 diff -u -r1.87.2.6 vfs_vnops.c --- kern/vfs_vnops.c 2001/02/26 04:23:16 1.87.2.6 +++ kern/vfs_vnops.c 2001/05/17 05:17:55 @@ -334,6 +334,8 @@ ioflag = 0; if (fp->f_flag & FNONBLOCK) ioflag |= IO_NDELAY; + if (fp->f_flag & O_DIRECT) + ioflag |= IO_DIRECT; VOP_LEASE(vp, p, cred, LEASE_READ); vn_lock(vp, LK_SHARED | LK_NOPAUSE | LK_RETRY, p); if ((flags & FOF_OFFSET) == 0) @@ -374,6 +376,8 @@ ioflag |= IO_APPEND; if (fp->f_flag & FNONBLOCK) ioflag |= IO_NDELAY; + if (fp->f_flag & O_DIRECT) + ioflag |= IO_DIRECT; if ((fp->f_flag & O_FSYNC) || (vp->v_mount && (vp->v_mount->mnt_flag & MNT_SYNCHRONOUS))) ioflag |= IO_SYNC; Index: sys/buf.h =================================================================== RCS file: /home/ncvs/src/sys/sys/buf.h,v retrieving revision 1.88.2.3 diff -u -r1.88.2.3 buf.h --- sys/buf.h 2000/12/30 01:51:10 1.88.2.3 +++ sys/buf.h 2001/05/17 04:18:35 @@ -191,12 +191,14 @@ * if b_bufsize and b_bcount are not. ( b_bufsize is * always at least DEV_BSIZE aligned, though ). * + * B_DIRECT Hint (along with B_RELBUF) that we should attempt to + * completely free the pages underlying the buffer. */ #define B_AGE 0x00000001 /* Move to age queue when I/O done. */ #define B_NEEDCOMMIT 0x00000002 /* Append-write in progress. */ #define B_ASYNC 0x00000004 /* Start I/O, do not wait. */ -#define B_UNUSED0 0x00000008 /* Old B_BAD */ +#define B_DIRECT 0x00000008 /* direct I/O flag (pls free vmio) */ #define B_DEFERRED 0x00000010 /* Skipped over for cleaning */ #define B_CACHE 0x00000020 /* Bread found us in the cache. */ #define B_CALL 0x00000040 /* Call b_iodone from biodone. */ @@ -231,7 +233,7 @@ "\33paging\32xxx\31writeinprog\30want\27relbuf\26dirty" \ "\25read\24raw\23phys\22clusterok\21malloc\20nocache" \ "\17locked\16inval\15scanned\14error\13eintr\12done\11freebuf" \ - "\10delwri\7call\6cache\4bad\3async\2needcommit\1age" + "\10delwri\7call\6cache\4direct\3async\2needcommit\1age" /* * These flags are kept in b_xflags. Index: sys/fcntl.h =================================================================== RCS file: /home/ncvs/src/sys/sys/fcntl.h,v retrieving revision 1.9.2.1 diff -u -r1.9.2.1 fcntl.h --- sys/fcntl.h 2000/08/22 01:46:30 1.9.2.1 +++ sys/fcntl.h 2001/05/17 04:01:47 @@ -98,15 +98,18 @@ /* Defined by POSIX 1003.1; BSD default, but must be distinct from O_RDONLY. */ #define O_NOCTTY 0x8000 /* don't assign controlling terminal */ +/* Attempt to bypass buffer cache */ +#define O_DIRECT 0x00010000 + #ifdef _KERNEL /* convert from open() flags to/from fflags; convert O_RD/WR to FREAD/FWRITE */ #define FFLAGS(oflags) ((oflags) + 1) #define OFLAGS(fflags) ((fflags) - 1) /* bits to save after open */ -#define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK) +#define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK|O_DIRECT) /* bits settable by fcntl(F_SETFL, ...) */ -#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM) +#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM|O_DIRECT) #endif /* Index: sys/file.h =================================================================== RCS file: /home/ncvs/src/sys/sys/file.h,v retrieving revision 1.22.2.5 diff -u -r1.22.2.5 file.h --- sys/file.h 2001/02/26 04:23:21 1.22.2.5 +++ sys/file.h 2001/05/17 04:34:53 @@ -56,15 +56,14 @@ */ struct file { LIST_ENTRY(file) f_list;/* list of active files */ - short f_flag; /* see fcntl.h */ + short f_FILLER3; /* (old f_flag) */ #define DTYPE_VNODE 1 /* file */ #define DTYPE_SOCKET 2 /* communications endpoint */ #define DTYPE_PIPE 3 /* pipe */ #define DTYPE_FIFO 4 /* fifo (named pipe) */ #define DTYPE_KQUEUE 5 /* event queue */ short f_type; /* descriptor type */ - short f_FILLER1; /* (OLD) reference count */ - short f_FILLER2; /* (OLD) references from message queue */ + u_int f_flag; /* see fcntl.h */ struct ucred *f_cred; /* credentials associated with descriptor */ struct fileops { int (*fo_read) __P((struct file *fp, struct uio *uio, Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.111.2.4 diff -u -r1.111.2.4 vnode.h --- sys/vnode.h 2000/12/30 01:51:10 1.111.2.4 +++ sys/vnode.h 2001/05/17 04:49:14 @@ -213,6 +213,7 @@ #define IO_VMIO 0x20 /* data already in VMIO space */ #define IO_INVAL 0x40 /* invalidate after I/O */ #define IO_ASYNC 0x80 /* bawrite rather then bdwrite */ +#define IO_DIRECT 0x100 /* attempt to bypass buffer cache */ /* * Modes. Some values same as Ixxx entries from inode.h for now. Index: ufs/ufs/ufs_readwrite.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_readwrite.c,v retrieving revision 1.65.2.6 diff -u -r1.65.2.6 ufs_readwrite.c --- ufs/ufs/ufs_readwrite.c 2000/12/30 01:51:11 1.65.2.6 +++ ufs/ufs/ufs_readwrite.c 2001/05/17 06:26:16 @@ -278,6 +278,15 @@ } /* + * If IO_DIRECT then set B_DIRECT for the buffer. This + * will cause us to attempt to release the buffer later on + * and will cause the buffer cache to attempt to free the + * underlying pages. + */ + if (ioflag & IO_DIRECT) + bp->b_flags |= B_DIRECT; + + /* * We should only get non-zero b_resid when an I/O error * has occurred, which should cause us to break above. * However, if the short read did not cause an error, @@ -319,12 +328,12 @@ if (error) break; - if ((ioflag & IO_VMIO) && - (LIST_FIRST(&bp->b_dep) == NULL)) { + if ((ioflag & (IO_VMIO|IO_DIRECT)) && + (LIST_FIRST(&bp->b_dep) == NULL)) { /* - * If there are no dependencies, and - * it's VMIO, then we don't need the buf, - * mark it available for freeing. The VM has the data. + * If there are no dependencies, and it's VMIO, + * then we don't need the buf, mark it available + * for freeing. The VM has the data. */ bp->b_flags |= B_RELBUF; brelse(bp); @@ -346,8 +355,8 @@ * so it must have come from a 'break' statement */ if (bp != NULL) { - if ((ioflag & IO_VMIO) && - (LIST_FIRST(&bp->b_dep) == NULL)) { + if ((ioflag & (IO_VMIO|IO_DIRECT)) && + (LIST_FIRST(&bp->b_dep) == NULL)) { bp->b_flags |= B_RELBUF; brelse(bp); } else { @@ -449,7 +458,7 @@ resid = uio->uio_resid; osize = ip->i_size; flags = 0; - if ((ioflag & IO_SYNC) && !DOINGASYNC(vp)) + if ((ioflag & (IO_SYNC|IO_DIRECT)) && !DOINGASYNC(vp)) flags = B_SYNC; if (object && (object->flags & OBJ_OPT)) { @@ -486,6 +495,8 @@ ap->a_cred, flags, &bp); if (error != 0) break; + if (ioflag & IO_DIRECT) + bp->b_flags |= B_DIRECT; if (uio->uio_offset + xfersize > ip->i_size) { ip->i_size = uio->uio_offset + xfersize; @@ -498,11 +509,12 @@ error = uiomove((char *)bp->b_data + blkoffset, (int)xfersize, uio); - if ((ioflag & IO_VMIO) && - (LIST_FIRST(&bp->b_dep) == NULL)) + if ((ioflag & (IO_VMIO|IO_DIRECT)) && + (LIST_FIRST(&bp->b_dep) == NULL)) { bp->b_flags |= B_RELBUF; + } - if (ioflag & IO_SYNC) { + if (ioflag & (IO_SYNC|IO_DIRECT)) { (void)bwrite(bp); } else if (vm_page_count_severe() || buf_dirty_count_severe() || Index: vm/vm_page.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_page.c,v retrieving revision 1.147.2.6 diff -u -r1.147.2.6 vm_page.c --- vm/vm_page.c 2001/03/03 23:06:09 1.147.2.6 +++ vm/vm_page.c 2001/05/17 04:22:38 @@ -1353,6 +1353,31 @@ } /* + * vm_page_try_to_free() + * + * Attempt to free the page. If we cannot free it, we do nothing. + * 1 is returned on success, 0 on failure. + */ + +int +vm_page_try_to_free(m) + vm_page_t m; +{ + if (m->dirty || m->hold_count || m->busy || m->wire_count || + (m->flags & (PG_BUSY|PG_UNMANAGED))) { + return(0); + } + vm_page_test_dirty(m); + if (m->dirty) + return(0); + vm_page_busy(m); + vm_page_protect(m, VM_PROT_NONE); + vm_page_free(m); + return(1); +} + + +/* * vm_page_cache * * Put the specified page onto the page cache queue (if appropriate). Index: vm/vm_page.h =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_page.h,v retrieving revision 1.75.2.5 diff -u -r1.75.2.5 vm_page.h --- vm/vm_page.h 2000/12/30 01:51:11 1.75.2.5 +++ vm/vm_page.h 2001/05/17 04:23:05 @@ -406,6 +406,7 @@ vm_page_t vm_page_grab __P((vm_object_t, vm_pindex_t, int)); void vm_page_cache __P((register vm_page_t)); int vm_page_try_to_cache __P((vm_page_t)); +int vm_page_try_to_free __P((vm_page_t)); void vm_page_dontneed __P((register vm_page_t)); static __inline void vm_page_copy __P((vm_page_t, vm_page_t)); static __inline void vm_page_free __P((vm_page_t)); To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 0:28:23 2001 Delivered-To: freebsd-arch@freebsd.org Received: from superconductor.rush.net (superconductor.rush.net [208.9.155.8]) by hub.freebsd.org (Postfix) with ESMTP id 35B3637B424 for ; Thu, 17 May 2001 00:28:21 -0700 (PDT) (envelope-from bright@superconductor.rush.net) Received: (from bright@localhost) by superconductor.rush.net (8.11.2/8.11.2) id f4H7S6J12802; Thu, 17 May 2001 03:28:06 -0400 (EDT) Date: Thu, 17 May 2001 03:28:06 -0400 From: Alfred Perlstein To: Matt Dillon Cc: Tor.Egge@fast.no, arch@FreeBSD.ORG Subject: Re: Preliminary O_DIRECT patch (for review only, not yet tested!) Message-ID: <20010517032806.N12365@superconductor.rush.net> References: <200105162222.f4GMMpC81247@earth.backplane.com> <200105162331.BAA04708@midten.fast.no> <200105170647.f4H6lkk88458@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0us In-Reply-To: <200105170647.f4H6lkk88458@earth.backplane.com>; from dillon@earth.backplane.com on Wed, May 16, 2001 at 11:47:46PM -0700 X-all-your-base: are belong to us. Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG * Matt Dillon [010517 02:49] wrote: > > I've looked at the rawread/rawwrite issue and I believe it may be > possible to use the already-existing B_VMIO flag coupled with > some VM magic to achieve the equivalent in the buffer cache itself > rather then having to write a rawread/rawwrite function for each > filesystem. Filesystems already support B_VMIO. If it is possible, > then we'll have a general raw I/O solution. I'm not sure what the rawread/rawwrite functions do (I'll review them if/when I get a chance) however if what they do is avoid remapping things into kernel memory it'd be really nice to see them go into the system. Avoiding vm tricks if possible would be nice. -Alfred To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 0:46: 4 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id A9A3C37B422; Thu, 17 May 2001 00:45:59 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id RAA17041; Thu, 17 May 2001 17:45:47 +1000 Date: Thu, 17 May 2001 17:44:28 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Matt Dillon Cc: Terry Lambert , dave , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <200105161743.f4GHhEl72847@earth.backplane.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, 16 May 2001, Matt Dillon wrote: > I just ran the test using the default timecounter on a 4.3 box (P3). > timercounter.method was 0, timecounter.hardware was i8254. In > that case the itimer was about 4 times faster. So this was using > the 'slow' itimer as you indicate below. Unfortunately, 4.3 (like all 4.x) defaults to the pessimized configuration of always using the i8254 for no good reason. This is because apm is configured in GENERIC, and clock.c disables the TSC if apm is configured even if apm is disabled (as it is by default). -current achives the same pessimization by enabling apm by default although clock.c is smarter. > I don't change the timercounter method defaults, and I sure hope you > aren't advocating that people change their timecounter defaults. If > the TSC is a reasonable default, the system should figure it out and > use it without requiring intervention. It's only a reasonable default if apm (or possibly acpica) is configured (and used). Efficiency is not the only advantage of the TSC timecounter. It has a higher resolution and is more robust if interrupt latency is high. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 1: 6:42 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mass.dis.org (mass.dis.org [216.240.45.41]) by hub.freebsd.org (Postfix) with ESMTP id AF99537B422; Thu, 17 May 2001 01:06:38 -0700 (PDT) (envelope-from msmith@mass.dis.org) Received: from mass.dis.org (localhost [127.0.0.1]) by mass.dis.org (8.11.3/8.11.3) with ESMTP id f4H8DhE01424; Thu, 17 May 2001 01:13:44 -0700 (PDT) (envelope-from msmith@mass.dis.org) Message-Id: <200105170813.f4H8DhE01424@mass.dis.org> X-Mailer: exmh version 2.1.1 10/15/1999 To: Bruce Evans Cc: freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-reply-to: Your message of "Thu, 17 May 2001 17:44:28 +1000." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 17 May 2001 01:13:43 -0700 From: Mike Smith Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > I don't change the timercounter method defaults, and I sure hope you > > aren't advocating that people change their timecounter defaults. If > > the TSC is a reasonable default, the system should figure it out and > > use it without requiring intervention. > > It's only a reasonable default if apm (or possibly acpica) is configured > (and used). The TSC is never a reasonable default; there is no good way to be certain that the TSC is and/or will remain stable. Even with ACPI, you can't be entirely sure. (Modulo Peter's comments about new P4 features, which I have not investigated yet.) -- ... every activity meets with opposition, everyone who acts has his rivals and unfortunately opponents also. But not because people want to be opponents, rather because the tasks and relationships force people to take different points of view. [Dr. Fritz Todt] V I C T O R Y N O T V E N G E A N C E To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 1:11:58 2001 Delivered-To: freebsd-arch@freebsd.org Received: from critter.freebsd.dk (critter.freebsd.dk [212.242.86.163]) by hub.freebsd.org (Postfix) with ESMTP id 7FC3137B422; Thu, 17 May 2001 01:11:53 -0700 (PDT) (envelope-from phk@critter.freebsd.dk) Received: from critter (localhost [127.0.0.1]) by critter.freebsd.dk (8.11.3/8.11.3) with ESMTP id f4H8BlL10773; Thu, 17 May 2001 10:11:47 +0200 (CEST) (envelope-from phk@critter.freebsd.dk) To: Mike Smith Cc: Bruce Evans , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: Your message of "Thu, 17 May 2001 01:13:43 PDT." <200105170813.f4H8DhE01424@mass.dis.org> Date: Thu, 17 May 2001 10:11:47 +0200 Message-ID: <10771.990087107@critter> From: Poul-Henning Kamp Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200105170813.f4H8DhE01424@mass.dis.org>, Mike Smith writes: >> > I don't change the timercounter method defaults, and I sure hope you >> > aren't advocating that people change their timecounter defaults. If >> > the TSC is a reasonable default, the system should figure it out and >> > use it without requiring intervention. >> >> It's only a reasonable default if apm (or possibly acpica) is configured >> (and used). > >The TSC is never a reasonable default; there is no good way to be certain >that the TSC is and/or will remain stable. Even with ACPI, you can't be >entirely sure. Right. I have tried some hybrid schemes where the TSC is interpolating between i8254 interrupts, but it is all but impossible to maintain continuity on a clock-throttling laptop... -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 4:31:47 2001 Delivered-To: freebsd-arch@freebsd.org Received: from nothing-going-on.demon.co.uk (pc-62-31-42-140-hy.blueyonder.co.uk [62.31.42.140]) by hub.freebsd.org (Postfix) with ESMTP id A577B37B43E for ; Thu, 17 May 2001 04:31:30 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.11.3/8.11.3) id f4HBJOJ03178 for arch@freebsd.org; Thu, 17 May 2001 12:19:25 +0100 (BST) (envelope-from nik) Date: Thu, 17 May 2001 12:19:02 +0100 From: Nik Clayton To: arch@freebsd.org Subject: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010517121902.A3047@catkin.nothing-going-on.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="8GpibOaaTibBMecb" Content-Disposition: inline User-Agent: Mutt/1.2.5i Organization: FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG --8GpibOaaTibBMecb Content-Type: multipart/mixed; boundary="nFreZHaLTZJo0R7j" Content-Disposition: inline --nFreZHaLTZJo0R7j Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Two things for review. The first set of patches add a new ioctl, CONS_SCRSHOT, to syscons. This allows user mode programs to request a dump of the current contents of the text mode video buffer, typically consisting of tuples of the form (character code x video attribute). This isn't a security problem, because it will only work on vtys that you already have read access to. The second, scrshot.c uses the ioctl to dump the contents of the video memory to stdout. Usage is scrshot /dev/ttyv0 > shot.scr Both of these were (IIRC) originally written by jmallet back in the 2.2.x days, I've just forward-ported them to -current. I'd like to commit both of these, with scrshot becoming src/usr.bin/scrshot (I'll write a man page before I commit it). There's a third utility, shot2gif which reads the screen dumps and kicks out GIF files -- I'll bring that in as a port. In case you're wondering, this should make it a bit easier for the doc. project to generate accurate screenshots. 1. You don't need to be in X to do screenshots. 2. The size of the .scr files is smaller than the equivalent .gif or .png files. 3. An accompanying shot2txt utility is trivial to write, making the production of text only alternatives for the images much less effort. 4. Should a screen dump need to be changed (perhaps to fix a typo), it's easier to fix a .scr file than it is to recapture the screen. 5. shot2gif parses the same font files as syscons, giving a much closer rendition of the output from a text screen than doing the=20 capture in X. N --=20 FreeBSD: The Power to Serve http://www.freebsd.org/ FreeBSD Documentation Project http://www.freebsd.org/docproj/ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- --nFreZHaLTZJo0R7j Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="syscons2.diff" Content-Transfer-Encoding: quoted-printable Index: dev/syscons/syscons.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/dev/syscons/syscons.c,v retrieving revision 1.357 diff -u -r1.357 syscons.c --- dev/syscons/syscons.c 2001/05/01 08:12:05 1.357 +++ dev/syscons/syscons.c 2001/05/17 09:42:15 @@ -838,6 +838,24 @@ splx(s); return 0; =20 + case CONS_SCRSHOT: /* get a screen shot */ + { + scrshot_t *ptr =3D (scrshot_t*)data; + s =3D spltty(); + if (ISGRAPHSC(scp)) { + splx(s); + return EOPNOTSUPP; + } + if (scp->xsize !=3D ptr->xsize || scp->ysize !=3D ptr->ysize) { + splx(s); + return EINVAL; + } + copyout ((void*)scp->vtb.vtb_buffer, ptr->buf, + ptr->xsize * ptr->ysize * sizeof(u_int16_t)); + splx(s); + return 0; + } + case VT_SETMODE: /* set screen switcher mode */ { struct vt_mode *mode; Index: sys/consio.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/sys/sys/consio.h,v retrieving revision 1.6 diff -u -r1.6 consio.h --- sys/consio.h 2000/04/27 13:34:31 1.6 +++ sys/consio.h 2001/05/16 22:54:44 @@ -239,6 +239,16 @@ /* release the current keyboard */ #define CONS_RELKBD _IO('c', 111) =20 +/* Snapshot the current video buffer */ +#define CONS_SCRSHOT _IOWR('c', 105, scrshot_t) + +struct scrshot { + int xsize; + int ysize; + u_int16_t* buf; +}; +typedef struct scrshot scrshot_t; + /* get/set the current terminal emulator info. */ #define TI_NAME_LEN 32 #define TI_DESC_LEN 64 --nFreZHaLTZJo0R7j Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="scrshot.c" Content-Transfer-Encoding: quoted-printable /*- * Copyright (c) 2001 Nik Clayton * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. The name of the author may not be used to endorse or promote products * derived from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #include #include #include #include #include #include #include #include #include /* * Given the path to a syscons terminal (e.g., "/dev/ttyv0"), tries to * snapshot the video memory of that terminal, using the CONS_SCRSHOT * ioctl, and writes the results to stdout. */ int main(int argc, char *argv[]) { int fd; int result; scrshot_t shot; vid_info_t info; if (argc !=3D 2) errx(1, "improper # of args"); fd =3D open(argv[1], O_RDWR); if (fd < 0) { perror(argv[1]); exit(1); } =09 info.size =3D sizeof(info); result =3D ioctl(fd, CONS_GETINFO, &info); if (result !=3D 0) { perror("getinfo failed"); exit(1); } =09 shot.buf =3D malloc(info.mv_csz * info.mv_rsz * sizeof(u_int16_t)); if (!shot.buf) { perror("couldn't allocate shot space"); exit(1); } =09 shot.xsize =3D info.mv_csz; shot.ysize =3D info.mv_rsz; result =3D ioctl (fd, CONS_SCRSHOT, &shot); if (result !=3D 0) { perror("CONS_SCRSHOT failed"); exit(1); } =09 write(1, shot.buf, shot.xsize * shot.ysize * sizeof(u_int16_t)); return 0; } --nFreZHaLTZJo0R7j-- --8GpibOaaTibBMecb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsDs6UACgkQk6gHZCw343UpEwCgityz05R9NdQAvOwq6t7wX0oW w8UAoI1RZGjl1tjZFtbUrcko9HHXpj5v =mDb5 -----END PGP SIGNATURE----- --8GpibOaaTibBMecb-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 4:52:23 2001 Delivered-To: freebsd-arch@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id 0A75A37B424; Thu, 17 May 2001 04:52:14 -0700 (PDT) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.2/8.11.2) id f4HBq9L60381; Thu, 17 May 2001 14:52:09 +0300 (EEST) (envelope-from ru) Date: Thu, 17 May 2001 14:52:09 +0300 From: Ruslan Ermilov To: Nik Clayton Cc: arch@FreeBSD.ORG Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010517145209.C55371@sunbay.com> Mail-Followup-To: Nik Clayton , arch@FreeBSD.ORG References: <20010517121902.A3047@catkin.nothing-going-on.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517121902.A3047@catkin.nothing-going-on.org>; from nik@FreeBSD.ORG on Thu, May 17, 2001 at 12:19:02PM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 12:19:02PM +0100, Nik Clayton wrote: > Two things for review. > [...] > The second, scrshot.c uses the ioctl to dump the contents of the video > memory to stdout. Usage is > > scrshot /dev/ttyv0 > shot.scr > There are some style(9) and -security issues with this: --- scrshot.c~ Thu May 17 14:42:40 2001 +++ scrshot.c Thu May 17 14:50:22 2001 @@ -28,9 +28,9 @@ * $FreeBSD$ */ +#include #include #include -#include #include #include @@ -48,7 +48,6 @@ main(int argc, char *argv[]) { int fd; - int result; scrshot_t shot; vid_info_t info; @@ -56,33 +55,24 @@ errx(1, "improper # of args"); fd = open(argv[1], O_RDWR); - if (fd < 0) { - perror(argv[1]); - exit(1); - } + if (fd < 0) + err(1, "%s", argv[1]); info.size = sizeof(info); - result = ioctl(fd, CONS_GETINFO, &info); - if (result != 0) { - perror("getinfo failed"); - exit(1); - } + if (ioctl(fd, CONS_GETINFO, &info) == -1) + err(1, "ioctl(CONS_GETINFO)"); shot.buf = malloc(info.mv_csz * info.mv_rsz * sizeof(u_int16_t)); - if (!shot.buf) { - perror("couldn't allocate shot space"); - exit(1); - } + if (shot.buf == NULL) + err(1, "couldn't allocate shot space"); shot.xsize = info.mv_csz; shot.ysize = info.mv_rsz; - result = ioctl (fd, CONS_SCRSHOT, &shot); - if (result != 0) { - perror("CONS_SCRSHOT failed"); - exit(1); - } + if (ioctl(fd, CONS_SCRSHOT, &shot) == -1) + err(1, "ioctl(CONS_SCRSHOT)"); - write(1, shot.buf, shot.xsize * shot.ysize * sizeof(u_int16_t)); + (void) write(STDOUT_FILENO, shot.buf, + shot.xsize * shot.ysize * sizeof(u_int16_t)); - return 0; + exit(0); } Cheers, -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 5: 2: 9 2001 Delivered-To: freebsd-arch@freebsd.org Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by hub.freebsd.org (Postfix) with ESMTP id 2357437B424; Thu, 17 May 2001 05:02:04 -0700 (PDT) (envelope-from dleimbac@earthlink.net) Received: from 1Cust53.tnt1.starkville.ms.da.uu.net (1Cust53.tnt1.starkville.ms.da.uu.net [63.30.107.53]) by hawk.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id FAA26853; Thu, 17 May 2001 05:02:00 -0700 (PDT) Message-Id: <200105171202.FAA26853@hawk.mail.pas.earthlink.net> Date: Thu, 17 May 2001 07:04:21 CDT From: dave To: Poul-Henning Kamp , Mike Smith Cc: Bruce Evans , freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... Reply-To: dleimbac@earthlink.net X-Mailer: Spruce 0.6.5 for X11 w/smtpio 0.7.9 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG So its then generally bad to have apm and TSC enabled? Of course the new information submitted about the Pentium IV may imply that there are ways around this. Dave On Thu, 17 May 2001, Poul-Henning Kamp wrote: > Date: Thu, 17 May 2001 10:11:47 +0200 > To: Mike Smith > From: Poul-Henning Kamp > Subject: Re: Gettimeofday Again... > > In message <200105170813.f4H8DhE01424@mass.dis.org>, Mike Smith writes: > >> > I don't change the timercounter method defaults, and I sure hope > you > >> > aren't advocating that people change their timecounter defaults. > If > >> > the TSC is a reasonable default, the system should figure it out > and > >> > use it without requiring intervention. > >> > >> It's only a reasonable default if apm (or possibly acpica) is > configured > >> (and used). > > > >The TSC is never a reasonable default; there is no good way to be > certain > >that the TSC is and/or will remain stable. Even with ACPI, you can't be > > >entirely sure. > > Right. I have tried some hybrid schemes where the TSC is interpolating > between i8254 interrupts, but it is all but impossible to maintain > continuity on a clock-throttling laptop... > > -- > Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 > phk@FreeBSD.ORG | TCP/IP since RFC 956 > FreeBSD committer | BSD since 4.3-tahoe > Never attribute to malice what can adequately be explained by > incompetence. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-questions" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 5: 9: 7 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ringworld.nanolink.com (ringworld.nanolink.com [195.24.48.13]) by hub.freebsd.org (Postfix) with SMTP id 5D1BE37B422 for ; Thu, 17 May 2001 05:09:04 -0700 (PDT) (envelope-from roam@orbitel.bg) Received: (qmail 43652 invoked by uid 1000); 17 May 2001 12:08:23 -0000 Date: Thu, 17 May 2001 15:08:23 +0300 From: Peter Pentchev To: Ruslan Ermilov Cc: Nik Clayton , arch@FreeBSD.ORG Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010517150823.A39834@ringworld.oblivion.bg> Mail-Followup-To: Ruslan Ermilov , Nik Clayton , arch@FreeBSD.ORG References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010517145209.C55371@sunbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517145209.C55371@sunbay.com>; from ru@FreeBSD.ORG on Thu, May 17, 2001 at 02:52:09PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 02:52:09PM +0300, Ruslan Ermilov wrote: > On Thu, May 17, 2001 at 12:19:02PM +0100, Nik Clayton wrote: > > Two things for review. > > > [...] > > The second, scrshot.c uses the ioctl to dump the contents of the video > > memory to stdout. Usage is > > > > scrshot /dev/ttyv0 > shot.scr > > > There are some style(9) and -security issues with this: > > --- scrshot.c~ Thu May 17 14:42:40 2001 > +++ scrshot.c Thu May 17 14:50:22 2001 [snip[ > - result = ioctl(fd, CONS_GETINFO, &info); > - if (result != 0) { > - perror("getinfo failed"); > - exit(1); > - } > + if (ioctl(fd, CONS_GETINFO, &info) == -1) > + err(1, "ioctl(CONS_GETINFO)"); Wouldn't it be better to check for < 0 here, too? More compatible in the long run.. G'luck, Peter -- If this sentence were in Chinese, it would say something else. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 5:12: 1 2001 Delivered-To: freebsd-arch@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id 4CAC537B422; Thu, 17 May 2001 05:11:40 -0700 (PDT) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.2/8.11.2) id f4HCBbY62213; Thu, 17 May 2001 15:11:37 +0300 (EEST) (envelope-from ru) Date: Thu, 17 May 2001 15:11:37 +0300 From: Ruslan Ermilov To: Nik Clayton , arch@FreeBSD.ORG Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010517151137.E55371@sunbay.com> Mail-Followup-To: Nik Clayton , arch@FreeBSD.ORG References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010517145209.C55371@sunbay.com> <20010517150823.A39834@ringworld.oblivion.bg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517150823.A39834@ringworld.oblivion.bg>; from roam@orbitel.bg on Thu, May 17, 2001 at 03:08:23PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 03:08:23PM +0300, Peter Pentchev wrote: > On Thu, May 17, 2001 at 02:52:09PM +0300, Ruslan Ermilov wrote: > > On Thu, May 17, 2001 at 12:19:02PM +0100, Nik Clayton wrote: > > > Two things for review. > > > > > [...] > > > The second, scrshot.c uses the ioctl to dump the contents of the video > > > memory to stdout. Usage is > > > > > > scrshot /dev/ttyv0 > shot.scr > > > > > There are some style(9) and -security issues with this: > > > > --- scrshot.c~ Thu May 17 14:42:40 2001 > > +++ scrshot.c Thu May 17 14:50:22 2001 > [snip[ > > - result = ioctl(fd, CONS_GETINFO, &info); > > - if (result != 0) { > > - perror("getinfo failed"); > > - exit(1); > > - } > > + if (ioctl(fd, CONS_GETINFO, &info) == -1) > > + err(1, "ioctl(CONS_GETINFO)"); > > Wouldn't it be better to check for < 0 here, too? > More compatible in the long run.. > Nope, see RETURN VALUES in ioctl(2) manpage; see POSIX then. All sysctl's return -1 on error, not <0. Cheers, -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 5:15:27 2001 Delivered-To: freebsd-arch@freebsd.org Received: from ringworld.nanolink.com (ringworld.nanolink.com [195.24.48.13]) by hub.freebsd.org (Postfix) with SMTP id 9A8EB37B424 for ; Thu, 17 May 2001 05:15:23 -0700 (PDT) (envelope-from roam@orbitel.bg) Received: (qmail 43740 invoked by uid 1000); 17 May 2001 12:14:43 -0000 Date: Thu, 17 May 2001 15:14:42 +0300 From: Peter Pentchev To: Ruslan Ermilov Cc: Nik Clayton , arch@FreeBSD.ORG Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010517151442.B39834@ringworld.oblivion.bg> Mail-Followup-To: Ruslan Ermilov , Nik Clayton , arch@FreeBSD.ORG References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010517145209.C55371@sunbay.com> <20010517150823.A39834@ringworld.oblivion.bg> <20010517151137.E55371@sunbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517151137.E55371@sunbay.com>; from ru@FreeBSD.ORG on Thu, May 17, 2001 at 03:11:37PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 03:11:37PM +0300, Ruslan Ermilov wrote: > On Thu, May 17, 2001 at 03:08:23PM +0300, Peter Pentchev wrote: > > On Thu, May 17, 2001 at 02:52:09PM +0300, Ruslan Ermilov wrote: > > > On Thu, May 17, 2001 at 12:19:02PM +0100, Nik Clayton wrote: > > > > Two things for review. > > > > > > > [...] > > > > The second, scrshot.c uses the ioctl to dump the contents of the video > > > > memory to stdout. Usage is > > > > > > > > scrshot /dev/ttyv0 > shot.scr > > > > > > > There are some style(9) and -security issues with this: > > > > > > --- scrshot.c~ Thu May 17 14:42:40 2001 > > > +++ scrshot.c Thu May 17 14:50:22 2001 > > [snip[ > > > - result = ioctl(fd, CONS_GETINFO, &info); > > > - if (result != 0) { > > > - perror("getinfo failed"); > > > - exit(1); > > > - } > > > + if (ioctl(fd, CONS_GETINFO, &info) == -1) > > > + err(1, "ioctl(CONS_GETINFO)"); > > > > Wouldn't it be better to check for < 0 here, too? > > More compatible in the long run.. > > > Nope, see RETURN VALUES in ioctl(2) manpage; see POSIX then. > All sysctl's return -1 on error, not <0. Oh ok, I didn't know that the explicit -1 return value was standardized. G'luck, Peter -- I am the meaning of this sentence. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 5:20:40 2001 Delivered-To: freebsd-arch@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id 45DC237B422 for ; Thu, 17 May 2001 05:20:33 -0700 (PDT) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.2/8.11.2) id f4HCKRp63165 for arch@FreeBSD.ORG; Thu, 17 May 2001 15:20:27 +0300 (EEST) (envelope-from ru) Date: Thu, 17 May 2001 15:20:27 +0300 From: Ruslan Ermilov To: arch@FreeBSD.ORG Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010517152027.A62561@sunbay.com> Mail-Followup-To: arch@FreeBSD.ORG References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010517145209.C55371@sunbay.com> <20010517150823.A39834@ringworld.oblivion.bg> <20010517151137.E55371@sunbay.com> <20010517151442.B39834@ringworld.oblivion.bg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517151442.B39834@ringworld.oblivion.bg>; from roam@orbitel.bg on Thu, May 17, 2001 at 03:14:42PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 03:14:42PM +0300, Peter Pentchev wrote: > On Thu, May 17, 2001 at 03:11:37PM +0300, Ruslan Ermilov wrote: > > On Thu, May 17, 2001 at 03:08:23PM +0300, Peter Pentchev wrote: > > > On Thu, May 17, 2001 at 02:52:09PM +0300, Ruslan Ermilov wrote: > > > > On Thu, May 17, 2001 at 12:19:02PM +0100, Nik Clayton wrote: > > > > > Two things for review. > > > > > > > > > [...] > > > > > The second, scrshot.c uses the ioctl to dump the contents of the video > > > > > memory to stdout. Usage is > > > > > > > > > > scrshot /dev/ttyv0 > shot.scr > > > > > > > > > There are some style(9) and -security issues with this: > > > > > > > > --- scrshot.c~ Thu May 17 14:42:40 2001 > > > > +++ scrshot.c Thu May 17 14:50:22 2001 > > > [snip[ > > > > - result = ioctl(fd, CONS_GETINFO, &info); > > > > - if (result != 0) { > > > > - perror("getinfo failed"); > > > > - exit(1); > > > > - } > > > > + if (ioctl(fd, CONS_GETINFO, &info) == -1) > > > > + err(1, "ioctl(CONS_GETINFO)"); > > > > > > Wouldn't it be better to check for < 0 here, too? > > > More compatible in the long run.. > > > > > Nope, see RETURN VALUES in ioctl(2) manpage; see POSIX then. > > All sysctl's return -1 on error, not <0. > > Oh ok, I didn't know that the explicit -1 return value was standardized. > Hmm, -mdoc's ``.Rv -std syscall'' was here for years, and most libc_sys manpages that use it declare POSIX.1 conformance. -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 5:23:49 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id ADD8337B422; Thu, 17 May 2001 05:23:44 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id WAA06692; Thu, 17 May 2001 22:23:41 +1000 Date: Thu, 17 May 2001 22:22:13 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Mike Smith Cc: freebsd-questions@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Gettimeofday Again... In-Reply-To: <200105170813.f4H8DhE01424@mass.dis.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, 17 May 2001, Mike Smith wrote: > > > I don't change the timercounter method defaults, and I sure hope you > > > aren't advocating that people change their timecounter defaults. If > > > the TSC is a reasonable default, the system should figure it out and > > > use it without requiring intervention. > > > > It's only a reasonable default if apm (or possibly acpica) is configured > > (and used). > > The TSC is never a reasonable default; there is no good way to be certain > that the TSC is and/or will remain stable. Even with ACPI, you can't be > entirely sure. This must be why Linux uses it by default ;-). See linux/arch/i386/config.in, option CONFIG_X86_TSC. Linux-2.4.1 still only uses it to give an offset from the last i8254 clock interrupt, like FreeBSD used to do 3+ years ago before timecounters. This may limit the errors from the TSC frequency changing to between -10 and 0 msec (hopefully the frequency is calibrated when it is as large as possible; then if it slows down down you underestimate the offset). Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 7:25:40 2001 Delivered-To: freebsd-arch@freebsd.org Received: from point.osg.gov.bc.ca (point.osg.gov.bc.ca [142.32.102.44]) by hub.freebsd.org (Postfix) with ESMTP id E1B7137B422 for ; Thu, 17 May 2001 07:25:36 -0700 (PDT) (envelope-from Cy.Schubert@uumail.gov.bc.ca) Received: (from daemon@localhost) by point.osg.gov.bc.ca (8.8.7/8.8.8) id HAA00166; Thu, 17 May 2001 07:24:37 -0700 Received: from passer.osg.gov.bc.ca(142.32.110.29) via SMTP by point.osg.gov.bc.ca, id smtpda00163; Thu May 17 07:24:28 2001 Received: (from uucp@localhost) by passer.osg.gov.bc.ca (8.11.2/8.9.1) id f4HEONI68008; Thu, 17 May 2001 07:24:23 -0700 (PDT) Received: from cwsys9.cwsent.com(10.2.2.1), claiming to be "cwsys.cwsent.com" via SMTP by passer9.cwsent.com, id smtpdN67990; Thu May 17 07:24:05 2001 Received: (from uucp@localhost) by cwsys.cwsent.com (8.11.3/8.9.1) id f4HEO4g05590; Thu, 17 May 2001 07:24:04 -0700 (PDT) Message-Id: <200105171424.f4HEO4g05590@cwsys.cwsent.com> Received: from localhost.cwsent.com(127.0.0.1), claiming to be "cwsys" via SMTP by localhost.cwsent.com, id smtpdHu5586; Thu May 17 07:23:15 2001 X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 Reply-To: Cy Schubert - ITSD Open Systems Group From: Cy Schubert - ITSD Open Systems Group X-Sender: schubert To: Tor.Egge@fast.no Cc: dillon@earth.backplane.com, arch@FreeBSD.ORG Subject: Re: on load control / process swapping In-reply-to: Your message of "Thu, 17 May 2001 01:31:06 +0200." <200105162331.BAA04708@midten.fast.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 17 May 2001 07:23:15 -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In message <200105162331.BAA04708@midten.fast.no>, Tor.Egge@fast.no writes: > ----Next_Part(Thu_May_17_01:30:16_2001)-- > Content-Type: Text/Plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > > > I'd have to see your test code. Doing a direct-read into a user buffer > > has no cache impact at all (DMA does not go through the cpu cache). > > If you are doing seek/read()s but not actually looking at the data that > > is returned, your test results are going to be seriously skewed. > > The test code does not look at the data. I sent a copy of it to you > at January 7th 2000 (along with a previous version of the O_DIRECT > patch). > > I agree that the 95% reduction in CPU usage is seriously skewed. The > performance improvement for most real applications will be very small > or even negative. For some specialized applications it is a > significant performance improvement, giving nearly the same > performance as when bypassing the kernel file system and using the raw > device directly. Specialised applications such as a DBMS using its own cache, e.g. Oracle with a large SGA, would benefit from this. VxFS and Solaris UFS (since Solaris 2.6) have an option to turn on directio for all objects on a filesystem. Assuming you have a disk with only tablespaces on it (quite common in an Oracle shop), applying directio to a whole filesystem is not as damaging to other I/O as one would think because there would be no other I/O to the disk, only Oracle I/O. Regards, Phone: (250)387-8437 Cy Schubert Fax: (250)387-5766 Team Leader, Sun/Alpha Team Internet: Cy.Schubert@osg.gov.bc.ca Open Systems Group, ITSD, ISTA Province of BC To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 9:38: 9 2001 Delivered-To: freebsd-arch@freebsd.org Received: from Awfulhak.org (awfulhak.demon.co.uk [194.222.196.252]) by hub.freebsd.org (Postfix) with ESMTP id 544E437B422; Thu, 17 May 2001 09:37:57 -0700 (PDT) (envelope-from brian@Awfulhak.org) Received: from hak.lan.Awfulhak.org (root@hak.lan.Awfulhak.org [172.16.0.12]) by Awfulhak.org (8.11.3/8.11.3) with ESMTP id f4HGbtP02139; Thu, 17 May 2001 17:37:55 +0100 (BST) (envelope-from brian@lan.Awfulhak.org) Received: from hak.lan.Awfulhak.org (brian@localhost [127.0.0.1]) by hak.lan.Awfulhak.org (8.11.3/8.11.3) with ESMTP id f4HGbsb65668; Thu, 17 May 2001 17:37:54 +0100 (BST) (envelope-from brian@hak.lan.Awfulhak.org) Message-Id: <200105171637.f4HGbsb65668@hak.lan.Awfulhak.org> X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: Warner Losh Cc: Brian Somers , Bruce Evans , cvs-all@FreeBSD.org, current@FreeBSD.org, freebsd-arch@FreeBSD.org Subject: Re: Where to put include files (was: cvs commit: src Makefile.inc1) In-Reply-To: Message from Warner Losh of "Thu, 17 May 2001 10:00:25 MDT." <200105171600.f4HG0Pl05438@billy-club.village.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 17 May 2001 17:37:54 +0100 From: Brian Somers Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG [cc'd to -arch and not to cvs-committers] For anyone that's reading -arch and hasn't seen this on -current, the thread is discussing userland sources that have -I../../sys in their Makefile and then #include . I think everyone agrees that these headers should be made public, the question is ``where to put them ?''. Warner wrote: > In message <200105171233.f4HCXhb62786@hak.lan.Awfulhak.org> Brian > Somers writes: > : Solaris calls it's ioctl files /usr/include/sys/_io.h so I'd > : spell digiio.h /usr/include/sys/digi_io.h. > > Actually, the more I think about it, the more I like putting it in > /usr/include/sys/fooio.h. We have lots of other files there now. The > down side to this approach is that it breaks up the driver sources > that we've been trying to concentrate into sys/dev/foo/* (or > introduces asymetry such that you can't just toss in a -I/sys and have > the same tree that gets stuck under /usr/include). The SHARED variable in src/include/Makefile makes this side of things tricky too - we've got to be careful that we either keep our sources together and maintain a resemblance of the hierarchy in /usr/include or split our sources. When I was working on Solaris I found it better to have the *io.h files in sys (separate from the driver) as it made it very clear that it was a public interface - the driver lived somewhere that just got built into a module and wasn't seen by the outside world. So I think I'd tend to vote (FWIW) for moving digiio.h (and other similar things) out of sys/dev// and into sys/sys/. Comments ? -- Brian Don't _EVER_ lose your sense of humour ! To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 9:55:40 2001 Delivered-To: freebsd-arch@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id 71FBE37B422; Thu, 17 May 2001 09:55:14 -0700 (PDT) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.2/8.11.2) id f4HGqpd91246; Thu, 17 May 2001 19:52:51 +0300 (EEST) (envelope-from ru) Date: Thu, 17 May 2001 19:52:51 +0300 From: Ruslan Ermilov To: Brian Somers Cc: Warner Losh , Bruce Evans , cvs-all@FreeBSD.ORG, current@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG Subject: Re: Where to put include files (was: cvs commit: src Makefile.inc1) Message-ID: <20010517195251.A90318@sunbay.com> Mail-Followup-To: Brian Somers , Warner Losh , Bruce Evans , cvs-all@FreeBSD.ORG, current@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG References: <200105171637.f4HGbsb65668@hak.lan.Awfulhak.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105171637.f4HGbsb65668@hak.lan.Awfulhak.org>; from brian@Awfulhak.org on Thu, May 17, 2001 at 05:37:54PM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 05:37:54PM +0100, Brian Somers wrote: > [cc'd to -arch and not to cvs-committers] > > For anyone that's reading -arch and hasn't seen this on -current, the > thread is discussing userland sources that have -I../../sys in their > Makefile and then #include . > > I think everyone agrees that these headers should be made public, the > question is ``where to put them ?''. > > Warner wrote: > > In message <200105171233.f4HCXhb62786@hak.lan.Awfulhak.org> Brian > > Somers writes: > > : Solaris calls it's ioctl files /usr/include/sys/_io.h so I'd > > : spell digiio.h /usr/include/sys/digi_io.h. > > > > Actually, the more I think about it, the more I like putting it in > > /usr/include/sys/fooio.h. We have lots of other files there now. The > > down side to this approach is that it breaks up the driver sources > > that we've been trying to concentrate into sys/dev/foo/* (or > > introduces asymetry such that you can't just toss in a -I/sys and have > > the same tree that gets stuck under /usr/include). > > The SHARED variable in src/include/Makefile makes this side of things > tricky too - we've got to be careful that we either keep our sources > together and maintain a resemblance of the hierarchy in /usr/include > or split our sources. > > When I was working on Solaris I found it better to have the *io.h > files in sys (separate from the driver) as it made it very clear that > it was a public interface - the driver lived somewhere that just got > built into a module and wasn't seen by the outside world. > > So I think I'd tend to vote (FWIW) for moving digiio.h (and other > similar things) out of sys/dev// and into sys/sys/. > > Comments ? > More to that. There are 59 Makefiles that have -I${.CURDIR}/(../)+sys in them. All these are bogus. We should get rid of all of them (-I's). So far, I have found sbin/mount_* use headers from /sys/miscfs/ that are not installed into /usr/include, but should be. Where should these be installed? /usr/include/fs/ or should we preserve the /usr/include/miscfs/ layout like in /sys/miscfs? Modern fs'es install their headers into include/fs and old ones in include/. Cheers, -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 10: 9:46 2001 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [209.152.133.57]) by hub.freebsd.org (Postfix) with ESMTP id 87A5037B422; Thu, 17 May 2001 10:09:42 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.3/8.11.1) id f4HH9dn60653; Thu, 17 May 2001 10:09:39 -0700 (PDT) (envelope-from obrien) Date: Thu, 17 May 2001 10:09:39 -0700 From: "David O'Brien" To: Peter Pentchev Cc: Ruslan Ermilov , Nik Clayton , arch@FreeBSD.ORG Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010517100939.A60619@dragon.nuxi.com> Reply-To: obrien@FreeBSD.ORG References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010517145209.C55371@sunbay.com> <20010517150823.A39834@ringworld.oblivion.bg> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517150823.A39834@ringworld.oblivion.bg>; from roam@orbitel.bg on Thu, May 17, 2001 at 03:08:23PM +0300 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 03:08:23PM +0300, Peter Pentchev wrote: > > - result = ioctl(fd, CONS_GETINFO, &info); > > - if (result != 0) { > > - perror("getinfo failed"); > > - exit(1); > > - } > > + if (ioctl(fd, CONS_GETINFO, &info) == -1) > > + err(1, "ioctl(CONS_GETINFO)"); > > Wouldn't it be better to check for < 0 here, too? > More compatible in the long run.. Uh.. why? RETURN VALUES If an error has occurred, a value of -1 is returned and errno is set to indicate the error. the return is set to "-1", not -20, not -100; but -1, peroid. Or were you planning on changing this in the future? Using "< 0" is just sloppy coding and it even caused me big problems porting bits to 286 Xenix in the past. -- -- David (obrien@FreeBSD.org) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 10:17: 5 2001 Delivered-To: freebsd-arch@freebsd.org Received: from wall.polstra.com (rtrwan160.accessone.com [206.213.115.74]) by hub.freebsd.org (Postfix) with ESMTP id 6311137B422 for ; Thu, 17 May 2001 10:17:01 -0700 (PDT) (envelope-from jdp@wall.polstra.com) Received: from vashon.polstra.com (vashon.polstra.com [206.213.73.13]) by wall.polstra.com (8.11.3/8.11.1) with ESMTP id f4HHGn013674; Thu, 17 May 2001 10:16:49 -0700 (PDT) (envelope-from jdp@wall.polstra.com) Received: (from jdp@localhost) by vashon.polstra.com (8.11.3/8.11.0) id f4HHGmg91136; Thu, 17 May 2001 10:16:48 -0700 (PDT) (envelope-from jdp) Date: Thu, 17 May 2001 10:16:48 -0700 (PDT) Message-Id: <200105171716.f4HHGmg91136@vashon.polstra.com> To: arch@freebsd.org From: John Polstra Cc: phk@critter.freebsd.dk Subject: Re: Gettimeofday Again... In-Reply-To: <9651.990081164@critter> References: <9651.990081164@critter> Organization: Polstra & Co., Seattle, WA Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG In article <9651.990081164@critter>, Poul-Henning Kamp wrote: > In message <20010517025548.35895380E@overcee.netplex.com.au>, Peter Wemm writes > : > > >FYI: Pentium4 cpus have: > >Features=0x3febfbff > MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,,ACC> > > > >The last one is interesting. As I understand it, ACC ("Auto Clock > >Correction") allows for TSC correction in spite of varying cpu clocks. > > > >This is important because of the variable cpu speed throttling to keep the > >heat down. > > Hmm, anyone has any doc on that ? I'm not so sure ACC is the right label for it. I assume it's bit 29, since all the other flags print out in ascending bit order. The CPUID description in the P4 instruction set reference labels this bit "TM" and describes it as follows: Thermal Monitor. The processor implements the thermal monitor automatic thermal control circuitry (TCC). That is from 24547103.pdf which can be found at developer.intel.com among the P4 documents. A different volume in the same series, 24547203.pdf, says a little bit about this in section 12.14.4. They mention the "software controlled clock modulation facilities," but it all looks oriented toward controlling the temperature of the chip rather than compensating for somebody else's control over the clock rate. In section 12.14.6 they say: "The Performance Event monitoring architecture provides an event that counts the number of clock cycles that the processor clock has been modulated by the thermal monitor." However, at least the way I read the document, there are ways the clock can be slowed down by software which are not reflected in this performance counter. But don't take my word for it. I didn't study the docs very carefully. :-) John -- John Polstra jdp@polstra.com John D. Polstra & Co., Inc. Seattle, Washington USA "Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 11: 9:39 2001 Delivered-To: freebsd-arch@freebsd.org Received: from smtp03.primenet.com (smtp03.primenet.com [206.165.6.133]) by hub.freebsd.org (Postfix) with ESMTP id B75AA37B422; Thu, 17 May 2001 11:09:36 -0700 (PDT) (envelope-from tlambert@usr05.primenet.com) Received: (from daemon@localhost) by smtp03.primenet.com (8.9.3/8.9.3) id LAA17687; Thu, 17 May 2001 11:09:35 -0700 (MST) Received: from usr05.primenet.com(206.165.6.205) via SMTP by smtp03.primenet.com, id smtpdAAAqLaiEI; Thu May 17 11:09:25 2001 Received: (from tlambert@localhost) by usr05.primenet.com (8.8.5/8.8.5) id LAA09113; Thu, 17 May 2001 11:16:32 -0700 (MST) From: Terry Lambert Message-Id: <200105171816.LAA09113@usr05.primenet.com> Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer To: roam@orbitel.bg (Peter Pentchev) Date: Thu, 17 May 2001 18:16:32 +0000 (GMT) Cc: ru@FreeBSD.ORG (Ruslan Ermilov), nik@FreeBSD.ORG (Nik Clayton), arch@FreeBSD.ORG In-Reply-To: <20010517151442.B39834@ringworld.oblivion.bg> from "Peter Pentchev" at May 17, 2001 03:14:42 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > > Wouldn't it be better to check for < 0 here, too? > > > More compatible in the long run.. > > > > > Nope, see RETURN VALUES in ioctl(2) manpage; see POSIX then. > > All sysctl's return -1 on error, not <0. > > Oh ok, I didn't know that the explicit -1 return value was standardized. On a practical note, the code generated merely inverts the sense of the same cmpl at default optimization, and at -O2 ends up being either: testl %eax,%eax jge .L3 or: cmpl $-1,%eax jne .L3 So the number of instruction cycles is identical. Off the top of my head, I can thing of a number of architectures where "<0" would be more efficient (single bit test), but personally prefer the "== -1" test, as being more exact. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 17:32:18 2001 Delivered-To: freebsd-arch@freebsd.org Received: from nothing-going-on.demon.co.uk (pc-62-31-42-140-hy.blueyonder.co.uk [62.31.42.140]) by hub.freebsd.org (Postfix) with ESMTP id 00EAF37B42C; Thu, 17 May 2001 17:32:08 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.11.3/8.11.3) id f4I08f808073; Fri, 18 May 2001 01:08:41 +0100 (BST) (envelope-from nik) Date: Fri, 18 May 2001 01:07:20 +0100 From: Nik Clayton To: Nik Clayton Cc: arch@freebsd.org Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010518010720.A8037@catkin.nothing-going-on.org> References: <20010517121902.A3047@catkin.nothing-going-on.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="oLBj+sq0vYjzfsbl" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517121902.A3047@catkin.nothing-going-on.org>; from nik@freebsd.org on Thu, May 17, 2001 at 12:19:02PM +0100 Organization: FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: X-Loop: FreeBSD.ORG --oLBj+sq0vYjzfsbl Content-Type: multipart/mixed; boundary="yrj/dFKFPuw6o+aM" Content-Disposition: inline --yrj/dFKFPuw6o+aM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 17, 2001 at 12:19:02PM +0100, Nik Clayton wrote: > The second, scrshot.c uses the ioctl to dump the contents of the video > memory to stdout. =20 Attached is an updated version, with Ruslan's patches. I've also (a) Dug through the petrification layer of my mail archives, and=20 discovered that a chunk of the code was originally written by Joel Holveck, who is now credited appropriately. (b) Knocked together a man page. (c) Tweaked the output format. Byte 1 Output format version (currently 1) Byte 2 Width of the display at snapshot time, in characters Byte 3 Depth of the display at snapshot time, in characters Byte 4+ Snapshot data Hopefully this should allow a little bit of future proofing, should more information need to included in the future (e.g., the name of font that was loaded at the time, that sort of thing). Any comments about better ways to do this, and or other information that should be part of the file format are appreciated. N --=20 FreeBSD: The Power to Serve http://www.freebsd.org/ FreeBSD Documentation Project http://www.freebsd.org/docproj/ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- --yrj/dFKFPuw6o+aM Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="scrshot.c" Content-Transfer-Encoding: quoted-printable /*- * Copyright (c) 2001 Joel Holveck and Nik Clayton * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #include #include #include #include #include #include #include #include #include #define VERSION 1 /* File format version */ /* * Given the path to a syscons terminal (e.g., "/dev/ttyv0"), tries to * snapshot the video memory of that terminal, using the CONS_SCRSHOT * ioctl, and writes the results to stdout. */ int main(int argc, char *argv[]) { int fd; scrshot_t shot; vid_info_t info; if (argc !=3D 2) errx(1, "improper # of args"); fd =3D open(argv[1], O_RDWR); if (fd < 0) err(1, "%s", argv[1]); =09 info.size =3D sizeof(info); if (ioctl(fd, CONS_GETINFO, &info) =3D=3D -1) err(1, "ioctl(CONS_GETINFO)"); =09 shot.buf =3D malloc(info.mv_csz * info.mv_rsz * sizeof(u_int16_t)); if (shot.buf =3D=3D NULL) err(1, "couldn't allocate shot space"); =09 shot.xsize =3D info.mv_csz; shot.ysize =3D info.mv_rsz; if (ioctl(fd, CONS_SCRSHOT, &shot) =3D=3D -1) err(1, "ioctl(CONS_SCRSHOT)"); printf("%c%c%c", VERSION, shot.xsize, shot.ysize); =09 (void)write(STDOUT_FILENO, shot.buf, shot.xsize * shot.ysize * sizeof(u_int16_t)); exit(0); } --yrj/dFKFPuw6o+aM Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="scrshot.1" .\" Copyright (c) 2001 Nik Clayton .\" All rights reserved .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES .\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. .\" IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, .\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT .\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, .\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY .\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" .\" $FreeBSD$ .\" .Dd May 17, 2001 .Dt SCRSHOT 1 .Os .Sh NAME .Nm scrshot .Nd capture the contents of a syscons terminal .Sh SYNOPSIS .Nm .Ar device .Sh DESCRIPTION The .Nm utility uses the .Xr syscons 4 .Li CONS_SCRSHOT ioctl to capture the current contents of the terminal device given as the first argument. .Nm writes version and additional information to the standard output, followed by the contents of the terminal device. .Sh IMPLEMENTATION NOTES PC video memory is typically arranged in two byte tuples, one per character position. In each tuple, the first byte will be the character code, and the second byte is the character's colour attribute. .Pp The colour attribute byte is further broken down in to the low nybble, which specifies which of 16 different foreground colours is active, and the high nybble, which specifies which of 16 different background colours is active. .Pp .Bl -hang -offset indent -compact .It 0 Black .It 1 Blue .It 2 Green .It 3 Cyan .It 4 Red .It 5 Magenta .It 6 Brown .It 7 White .It 8 Grey .It 9 Light Blue .It 10 Light Green .It 11 Light Cyan .It 12 Light Red .It 13 Light Magenta .It 14 Yellow .It 15 White .El .Pp It can be seen that the last 8 colours are brighter versions of the first 8. .Pp For example, the two bytes .Bd -literal -offset indent 65 158 .Ed .Pp specify an uppercase A (character code 65), in yellow (low nybble 15) on a light blue background (high nybble 9). .Pp The .Nm output contains a small header which includes additional information which may be useful to utilities processing the output. .Pp The first byte of the header contains the version number. Subsequent bytes depend on the version number. .Bl -column "Version " "4 and up" -offset indent .It Sy Version Ta Sy Byte Ta Sy Meaning .It 1 Ta 2 Ta Terminal width, in characters .It Ta 3 Ta Terminal depth, in characters .It Ta 4 and up Ta The snapshot data .El .Sh RETURN VALUES The .Nm utility exits 0 on success or >0 if an error occurred. .Sh EXAMPLES The command: .Bd -literal -offset indent .Ic scrshot /dev/ttyv0 > shot.scr .Ed .Pp will capture the contents of the first virtual terminal, and redirect the output to the .Li shot.scr file. .Sh SEE ALSO .Xr syscons 4 , .Xr ascii 7 , .Xr watch 8 . .Pp The various .Li shot2* utilities in the .Li textproc category of the ports collection. .Sh HISTORY A .Nm utility appeared in .Fx 5.0 and was backported to .Fx 4.4 . .Sh AUTHORS .An Joel Holveck Aq joelh@gnu.org and .An Nik Clayton Aq nik@FreeBSD.org --yrj/dFKFPuw6o+aM-- --oLBj+sq0vYjzfsbl Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsEZ7cACgkQk6gHZCw343WWSQCgg5nwmQR5owsizubXvYUgo28W h/UAoIt7CKWdcjkmNu0JkemFNn9UP30a =QEEw -----END PGP SIGNATURE----- --oLBj+sq0vYjzfsbl-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 21:41:29 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 7D35B37B423 for ; Thu, 17 May 2001 21:41:18 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4I4eiB05429; Thu, 17 May 2001 21:40:44 -0700 (PDT) (envelope-from dillon) Date: Thu, 17 May 2001 21:40:44 -0700 (PDT) From: Matt Dillon Message-Id: <200105180440.f4I4eiB05429@earth.backplane.com> To: Tor.Egge@fast.no Cc: arch@FreeBSD.ORG Subject: Final O_DIRECT patch (first stage, without rawread/rawwrite) References: <200105162222.f4GMMpC81247@earth.backplane.com> <200105162331.BAA04708@midten.fast.no> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: X-Loop: FreeBSD.ORG Ok, I've done some pretty good testing of this patch. The problem with write() not freeing the buffer was due to the clustering code. bdwrite() called bqrelse() which cleared B_RELBUF and B_DIRECT. That was easy to fix. This patch should cause O_DIRECT I/O to operate without polluting the buffer cache. As an added bonus I managed to keep the write clustering code intact, so the I/O should be mostly optimal (as far as that goes). The patch below is for -stable. I will continue testing it on stable through the weekend. I'll probably commit the -current version on the weekend and the stable version the weekend after. This is the first stage. The second stage will be to figure out how best to implement the zero-copy rawread/rawwrite functionality using Tor's code as a reference point. -Matt Index: kern/vfs_bio.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v retrieving revision 1.242.2.7 diff -u -r1.242.2.7 vfs_bio.c --- kern/vfs_bio.c 2001/03/02 16:45:12 1.242.2.7 +++ kern/vfs_bio.c 2001/05/18 04:32:52 @@ -1230,7 +1230,7 @@ /* unlock */ BUF_UNLOCK(bp); - bp->b_flags &= ~(B_ORDERED | B_ASYNC | B_NOCACHE | B_AGE | B_RELBUF); + bp->b_flags &= ~(B_ORDERED | B_ASYNC | B_NOCACHE | B_AGE | B_RELBUF | B_DIRECT); splx(s); } @@ -1242,6 +1242,8 @@ * biodone() to requeue an async I/O on completion. It is also used when * known good buffers need to be requeued but we think we may need the data * again soon. + * + * XXX we should be able to leave the B_RELBUF hint set on completion. */ void bqrelse(struct buf * bp) @@ -1328,12 +1330,15 @@ vm_page_flag_clear(m, PG_ZERO); /* * Might as well free the page if we can and it has - * no valid data. + * no valid data. We also free the page if the + * buffer was used for direct I/O */ if ((bp->b_flags & B_ASYNC) == 0 && !m->valid && m->hold_count == 0) { vm_page_busy(m); vm_page_protect(m, VM_PROT_NONE); vm_page_free(m); + } else if (bp->b_flags & B_DIRECT) { + vm_page_try_to_free(m); } else if (vm_page_count_severe()) { vm_page_try_to_cache(m); } Index: kern/vfs_cluster.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v retrieving revision 1.92.2.5 diff -u -r1.92.2.5 vfs_cluster.c --- kern/vfs_cluster.c 2001/03/02 16:45:12 1.92.2.5 +++ kern/vfs_cluster.c 2001/05/18 04:33:46 @@ -490,6 +490,15 @@ } else { tbp->b_dirtyoff = tbp->b_dirtyend = 0; tbp->b_flags &= ~(B_ERROR|B_INVAL); + /* + * XXX the bdwrite()/bqrelse() issued during + * cluster building clears B_RELBUF (see bqrelse() + * comment). If direct I/O was specified, we have + * to restore it here to allow the buffer and VM + * to be freed. + */ + if (tbp->b_flags & B_DIRECT) + tbp->b_flags |= B_RELBUF; } biodone(tbp); } Index: kern/vfs_vnops.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_vnops.c,v retrieving revision 1.87.2.6 diff -u -r1.87.2.6 vfs_vnops.c --- kern/vfs_vnops.c 2001/02/26 04:23:16 1.87.2.6 +++ kern/vfs_vnops.c 2001/05/17 05:17:55 @@ -334,6 +334,8 @@ ioflag = 0; if (fp->f_flag & FNONBLOCK) ioflag |= IO_NDELAY; + if (fp->f_flag & O_DIRECT) + ioflag |= IO_DIRECT; VOP_LEASE(vp, p, cred, LEASE_READ); vn_lock(vp, LK_SHARED | LK_NOPAUSE | LK_RETRY, p); if ((flags & FOF_OFFSET) == 0) @@ -374,6 +376,8 @@ ioflag |= IO_APPEND; if (fp->f_flag & FNONBLOCK) ioflag |= IO_NDELAY; + if (fp->f_flag & O_DIRECT) + ioflag |= IO_DIRECT; if ((fp->f_flag & O_FSYNC) || (vp->v_mount && (vp->v_mount->mnt_flag & MNT_SYNCHRONOUS))) ioflag |= IO_SYNC; Index: sys/buf.h =================================================================== RCS file: /home/ncvs/src/sys/sys/buf.h,v retrieving revision 1.88.2.3 diff -u -r1.88.2.3 buf.h --- sys/buf.h 2000/12/30 01:51:10 1.88.2.3 +++ sys/buf.h 2001/05/18 04:02:02 @@ -191,12 +191,16 @@ * if b_bufsize and b_bcount are not. ( b_bufsize is * always at least DEV_BSIZE aligned, though ). * + * B_DIRECT Hint that we should attempt to completely free + * the pages underlying the buffer. B_DIRECT is + * sticky until the buffer is released and typically + * only has an effect when B_RELBUF is also set. */ #define B_AGE 0x00000001 /* Move to age queue when I/O done. */ #define B_NEEDCOMMIT 0x00000002 /* Append-write in progress. */ #define B_ASYNC 0x00000004 /* Start I/O, do not wait. */ -#define B_UNUSED0 0x00000008 /* Old B_BAD */ +#define B_DIRECT 0x00000008 /* direct I/O flag (pls free vmio) */ #define B_DEFERRED 0x00000010 /* Skipped over for cleaning */ #define B_CACHE 0x00000020 /* Bread found us in the cache. */ #define B_CALL 0x00000040 /* Call b_iodone from biodone. */ @@ -231,7 +235,7 @@ "\33paging\32xxx\31writeinprog\30want\27relbuf\26dirty" \ "\25read\24raw\23phys\22clusterok\21malloc\20nocache" \ "\17locked\16inval\15scanned\14error\13eintr\12done\11freebuf" \ - "\10delwri\7call\6cache\4bad\3async\2needcommit\1age" + "\10delwri\7call\6cache\4direct\3async\2needcommit\1age" /* * These flags are kept in b_xflags. Index: sys/fcntl.h =================================================================== RCS file: /home/ncvs/src/sys/sys/fcntl.h,v retrieving revision 1.9.2.1 diff -u -r1.9.2.1 fcntl.h --- sys/fcntl.h 2000/08/22 01:46:30 1.9.2.1 +++ sys/fcntl.h 2001/05/17 04:01:47 @@ -98,15 +98,18 @@ /* Defined by POSIX 1003.1; BSD default, but must be distinct from O_RDONLY. */ #define O_NOCTTY 0x8000 /* don't assign controlling terminal */ +/* Attempt to bypass buffer cache */ +#define O_DIRECT 0x00010000 + #ifdef _KERNEL /* convert from open() flags to/from fflags; convert O_RD/WR to FREAD/FWRITE */ #define FFLAGS(oflags) ((oflags) + 1) #define OFLAGS(fflags) ((fflags) - 1) /* bits to save after open */ -#define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK) +#define FMASK (FREAD|FWRITE|FAPPEND|FASYNC|FFSYNC|FNONBLOCK|O_DIRECT) /* bits settable by fcntl(F_SETFL, ...) */ -#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM) +#define FCNTLFLAGS (FAPPEND|FASYNC|FFSYNC|FNONBLOCK|FPOSIXSHM|O_DIRECT) #endif /* Index: sys/file.h =================================================================== RCS file: /home/ncvs/src/sys/sys/file.h,v retrieving revision 1.22.2.5 diff -u -r1.22.2.5 file.h --- sys/file.h 2001/02/26 04:23:21 1.22.2.5 +++ sys/file.h 2001/05/17 04:34:53 @@ -56,15 +56,14 @@ */ struct file { LIST_ENTRY(file) f_list;/* list of active files */ - short f_flag; /* see fcntl.h */ + short f_FILLER3; /* (old f_flag) */ #define DTYPE_VNODE 1 /* file */ #define DTYPE_SOCKET 2 /* communications endpoint */ #define DTYPE_PIPE 3 /* pipe */ #define DTYPE_FIFO 4 /* fifo (named pipe) */ #define DTYPE_KQUEUE 5 /* event queue */ short f_type; /* descriptor type */ - short f_FILLER1; /* (OLD) reference count */ - short f_FILLER2; /* (OLD) references from message queue */ + u_int f_flag; /* see fcntl.h */ struct ucred *f_cred; /* credentials associated with descriptor */ struct fileops { int (*fo_read) __P((struct file *fp, struct uio *uio, Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.111.2.4 diff -u -r1.111.2.4 vnode.h --- sys/vnode.h 2000/12/30 01:51:10 1.111.2.4 +++ sys/vnode.h 2001/05/17 04:49:14 @@ -213,6 +213,7 @@ #define IO_VMIO 0x20 /* data already in VMIO space */ #define IO_INVAL 0x40 /* invalidate after I/O */ #define IO_ASYNC 0x80 /* bawrite rather then bdwrite */ +#define IO_DIRECT 0x100 /* attempt to bypass buffer cache */ /* * Modes. Some values same as Ixxx entries from inode.h for now. Index: ufs/ufs/ufs_readwrite.c =================================================================== RCS file: /home/ncvs/src/sys/ufs/ufs/ufs_readwrite.c,v retrieving revision 1.65.2.6 diff -u -r1.65.2.6 ufs_readwrite.c --- ufs/ufs/ufs_readwrite.c 2000/12/30 01:51:11 1.65.2.6 +++ ufs/ufs/ufs_readwrite.c 2001/05/18 03:51:55 @@ -278,6 +278,15 @@ } /* + * If IO_DIRECT then set B_DIRECT for the buffer. This + * will cause us to attempt to release the buffer later on + * and will cause the buffer cache to attempt to free the + * underlying pages. + */ + if (ioflag & IO_DIRECT) + bp->b_flags |= B_DIRECT; + + /* * We should only get non-zero b_resid when an I/O error * has occurred, which should cause us to break above. * However, if the short read did not cause an error, @@ -319,12 +328,12 @@ if (error) break; - if ((ioflag & IO_VMIO) && - (LIST_FIRST(&bp->b_dep) == NULL)) { + if ((ioflag & (IO_VMIO|IO_DIRECT)) && + (LIST_FIRST(&bp->b_dep) == NULL)) { /* - * If there are no dependencies, and - * it's VMIO, then we don't need the buf, - * mark it available for freeing. The VM has the data. + * If there are no dependencies, and it's VMIO, + * then we don't need the buf, mark it available + * for freeing. The VM has the data. */ bp->b_flags |= B_RELBUF; brelse(bp); @@ -346,8 +355,8 @@ * so it must have come from a 'break' statement */ if (bp != NULL) { - if ((ioflag & IO_VMIO) && - (LIST_FIRST(&bp->b_dep) == NULL)) { + if ((ioflag & (IO_VMIO|IO_DIRECT)) && + (LIST_FIRST(&bp->b_dep) == NULL)) { bp->b_flags |= B_RELBUF; brelse(bp); } else { @@ -486,6 +495,8 @@ ap->a_cred, flags, &bp); if (error != 0) break; + if (ioflag & IO_DIRECT) + bp->b_flags |= B_DIRECT; if (uio->uio_offset + xfersize > ip->i_size) { ip->i_size = uio->uio_offset + xfersize; @@ -498,10 +509,19 @@ error = uiomove((char *)bp->b_data + blkoffset, (int)xfersize, uio); - if ((ioflag & IO_VMIO) && - (LIST_FIRST(&bp->b_dep) == NULL)) + if ((ioflag & (IO_VMIO|IO_DIRECT)) && + (LIST_FIRST(&bp->b_dep) == NULL)) { bp->b_flags |= B_RELBUF; + } + /* + * If IO_SYNC each buffer is written synchronously. Otherwise + * if we have a severe page deficiency write the buffer + * asynchronously. Otherwise try to cluster, and if that + * doesn't do it then either do an async write (if O_DIRECT), + * or a delayed write (if not). + */ + if (ioflag & IO_SYNC) { (void)bwrite(bp); } else if (vm_page_count_severe() || @@ -516,6 +536,9 @@ } else { bawrite(bp); } + } else if (ioflag & IO_DIRECT) { + bp->b_flags |= B_CLUSTEROK; + bawrite(bp); } else { bp->b_flags |= B_CLUSTEROK; bdwrite(bp); Index: vm/vm_page.c =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_page.c,v retrieving revision 1.147.2.6 diff -u -r1.147.2.6 vm_page.c --- vm/vm_page.c 2001/03/03 23:06:09 1.147.2.6 +++ vm/vm_page.c 2001/05/17 04:22:38 @@ -1353,6 +1353,31 @@ } /* + * vm_page_try_to_free() + * + * Attempt to free the page. If we cannot free it, we do nothing. + * 1 is returned on success, 0 on failure. + */ + +int +vm_page_try_to_free(m) + vm_page_t m; +{ + if (m->dirty || m->hold_count || m->busy || m->wire_count || + (m->flags & (PG_BUSY|PG_UNMANAGED))) { + return(0); + } + vm_page_test_dirty(m); + if (m->dirty) + return(0); + vm_page_busy(m); + vm_page_protect(m, VM_PROT_NONE); + vm_page_free(m); + return(1); +} + + +/* * vm_page_cache * * Put the specified page onto the page cache queue (if appropriate). Index: vm/vm_page.h =================================================================== RCS file: /home/ncvs/src/sys/vm/vm_page.h,v retrieving revision 1.75.2.5 diff -u -r1.75.2.5 vm_page.h --- vm/vm_page.h 2000/12/30 01:51:11 1.75.2.5 +++ vm/vm_page.h 2001/05/17 04:23:05 @@ -406,6 +406,7 @@ vm_page_t vm_page_grab __P((vm_object_t, vm_pindex_t, int)); void vm_page_cache __P((register vm_page_t)); int vm_page_try_to_cache __P((vm_page_t)); +int vm_page_try_to_free __P((vm_page_t)); void vm_page_dontneed __P((register vm_page_t)); static __inline void vm_page_copy __P((vm_page_t, vm_page_t)); static __inline void vm_page_free __P((vm_page_t)); To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 22:58:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from blount.mail.mindspring.net (blount.mail.mindspring.net [207.69.200.226]) by hub.freebsd.org (Postfix) with ESMTP id 2E7C237B422 for ; Thu, 17 May 2001 22:58:30 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from mindspring.com (pool0518.cvx7-bradley.dialup.earthlink.net [209.178.166.8]) by blount.mail.mindspring.net (8.9.3/8.8.5) with ESMTP id BAA21884; Fri, 18 May 2001 01:58:15 -0400 (EDT) Message-ID: <3B04BA0D.8E0CAB90@mindspring.com> Date: Thu, 17 May 2001 22:58:37 -0700 From: Terry Lambert Reply-To: tlambert2@mindspring.com X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Matt Dillon Cc: Rik van Riel , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: <200105161754.f4GHsCd73025@earth.backplane.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: X-Loop: FreeBSD.ORG Matt Dillon wrote: > Terry's description of 'ld' mmap()ing and doing all > sorts of random seeking causing most UNIXes, including > FreeBSD, to have a brainfart of the dataset is too big > to fit in the cache is true as far as it goes, but > there really isn't much we can do about that situation > 'automatically'. Without hints, the system can't predict > the fact that it should be trying to cache the whole of > the object files being accessed randomly. A hint could > make performance much better... a simple madvise(... > MADV_SEQUENTIAL) on the mapped memory inside LD would > probably be beneficial, as would madvise(... MADV_WILLNEED). I don't understand how either of those things could help but make overall performance worse. The problem is the program in question is seeking all over the place, potentially multiple times, in order to avoid building the table in memory itself. For many symbols, like "printf", it will hit the area of the library containing their addresses many, many times. The problem in this case is _truly_ that the program in question is _really_ trying to optimize its performance at the expense of other programs in the system. The system _needs_ to make page-ins by this program come _at the expense of this program_, rather than thrashing all other programs out of core, only to have the quanta given to these (now higher priority) programs used to thrash the pages back in, instead of doing real work. The problem is what to do about this badly behaved program, so that the system itself doesn't spend unnecessary time undoing its evil, and so that other (well behaved) programs are not unfairly penalized. Cutler suggested a working set quota (first in VMS, later in NT) to deal with these programs. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Thu May 17 23:20:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id 734BD37B422 for ; Thu, 17 May 2001 23:20:32 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.3/8.11.2) id f4I6KNd05878; Thu, 17 May 2001 23:20:23 -0700 (PDT) (envelope-from dillon) Date: Thu, 17 May 2001 23:20:23 -0700 (PDT) From: Matt Dillon Message-Id: <200105180620.f4I6KNd05878@earth.backplane.com> To: Terry Lambert Cc: Rik van Riel , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping References: <200105161754.f4GHsCd73025@earth.backplane.com> <3B04BA0D.8E0CAB90@mindspring.com> Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: X-Loop: FreeBSD.ORG :I don't understand how either of those things could help :but make overall performance worse. : :The problem is the program in question is seeking all :over the place, potentially multiple times, in order :to avoid building the table in memory itself. : :For many symbols, like "printf", it will hit the area :of the library containing their addresses many, many :times. : :The problem in this case is _truly_ that the program in :question is _really_ trying to optimize its performance :at the expense of other programs in the system. The linker is seeking randomly as a side effect of the linking algorithm. It is not doing it on purpose to try to save memory. Forcing the VM system to think it's sequential causes the VM system to perform read-aheads, generally reducing the actual amount of physical seeking that must occur by increasing the size of the chunks read from disk. Even if the linker's dataset is huge, increasing the chunk size is beneficial because linkers ultimately access the entire object file anyway. Trying to save a few seeks is far more important then reading extra data and having to throw half of it away. :The problem is what to do about this badly behaved program, :so that the system itself doesn't spend unnecessary time :undoing its evil, and so that other (well behaved) programs :are not unfairly penalized. : :Cutler suggested a working set quota (first in VMS, later :in NT) to deal with these programs. : :-- Terry The problem is not the resident set size, it's the seeking that the program is causing as a matter of course. Be that as it may, the resident set size can be limited with the 'memoryuse' sysctl. The system imposes the specified limit only when the memory subsystem is under pressure. You can also reduce the amount of random seeking the linker does by ordering the object modules within the library to forward-reference the dependancies. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 1:31:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from nothing-going-on.demon.co.uk (pc-62-31-42-140-hy.blueyonder.co.uk [62.31.42.140]) by hub.freebsd.org (Postfix) with ESMTP id 30D9237B422; Fri, 18 May 2001 01:31:31 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.11.3/8.11.3) id f4I8Tan10440; Fri, 18 May 2001 09:29:36 +0100 (BST) (envelope-from nik) Date: Fri, 18 May 2001 09:28:16 +0100 From: Nik Clayton To: Nik Clayton Cc: arch@freebsd.org Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010518092815.B10344@catkin.nothing-going-on.org> References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010518010720.A8037@catkin.nothing-going-on.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="ZfOjI3PrQbgiZnxM" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010518010720.A8037@catkin.nothing-going-on.org>; from nik@freebsd.org on Fri, May 18, 2001 at 01:07:20AM +0100 Organization: FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: X-Loop: FreeBSD.ORG --ZfOjI3PrQbgiZnxM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 18, 2001 at 01:07:20AM +0100, Nik Clayton wrote: > (c) Tweaked the output format. >=20 > Byte 1 Output format version (currently 1) > Byte 2 Width of the display at snapshot time, in characters > Byte 3 Depth of the display at snapshot time, in characters > Byte 4+ Snapshot data Following the suggestion of Jeroen C. van Gelderen this is now Bytes 1 thru 8 Literal "SCRSHOT_" Byte 9 Version number Byte 10 Remaining number of bytes in the header Byte 11 Width of the display Byte 12 Depth of the display So a dump of an 80x25 screen would start (in hex) 53 43 52 53 48 4f 54 5f 01 02 50 19=20 ----------------------- -- -- -- -- | | | | ` 25 decimal | | | `--- 80 decimal | | `------ 2 remaining bytes of header data | `--------- File format version 1 `------------------------ Literal "SCRSHOT_" N --=20 FreeBSD: The Power to Serve http://www.freebsd.org/ FreeBSD Documentation Project http://www.freebsd.org/docproj/ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- --ZfOjI3PrQbgiZnxM Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsE3R4ACgkQk6gHZCw343Ue9QCcDkgdZKQnE4iNXLnhcy2WcK++ jsgAoIjdF4XJF956nEARtcEU5Ay8Hvwq =5b/e -----END PGP SIGNATURE----- --ZfOjI3PrQbgiZnxM-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 1:31:56 2001 Delivered-To: freebsd-arch@freebsd.org Received: from nothing-going-on.demon.co.uk (pc-62-31-42-140-hy.blueyonder.co.uk [62.31.42.140]) by hub.freebsd.org (Postfix) with ESMTP id 515F037B422 for ; Fri, 18 May 2001 01:31:53 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.11.3/8.11.3) id f4I8AJd10354; Fri, 18 May 2001 09:10:19 +0100 (BST) (envelope-from nik) Date: Fri, 18 May 2001 09:09:57 +0100 From: Nik Clayton To: Matt Dillon Cc: Tor.Egge@fast.no, arch@FreeBSD.ORG Subject: Re: Final O_DIRECT patch (first stage, without rawread/rawwrite) Message-ID: <20010518090956.A10344@catkin.nothing-going-on.org> References: <200105162222.f4GMMpC81247@earth.backplane.com> <200105162331.BAA04708@midten.fast.no> <200105180440.f4I4eiB05429@earth.backplane.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="EeQfGwPcQSOJBaQU" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105180440.f4I4eiB05429@earth.backplane.com>; from dillon@earth.backplane.com on Thu, May 17, 2001 at 09:40:44PM -0700 Organization: FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: X-Loop: FreeBSD.ORG --EeQfGwPcQSOJBaQU Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, May 17, 2001 at 09:40:44PM -0700, Matt Dillon wrote: > The patch below is for -stable. I will continue testing it on stable > through the weekend. I'll probably commit the -current version on the > weekend and the stable version the weekend after. Man page update? N --=20 FreeBSD: The Power to Serve http://www.freebsd.org/ FreeBSD Documentation Project http://www.freebsd.org/docproj/ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- --EeQfGwPcQSOJBaQU Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsE2NQACgkQk6gHZCw343UzJgCeLScwpluw/fZCgOX6mB/hbAtD uxYAn1G08G2AM8FDsMFCL3pP+x+2Uh9X =MnZ5 -----END PGP SIGNATURE----- --EeQfGwPcQSOJBaQU-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 3: 0:33 2001 Delivered-To: freebsd-arch@freebsd.org Received: from areilly.bpc-users.org (CPE-144-132-234-126.nsw.bigpond.net.au [144.132.234.126]) by hub.freebsd.org (Postfix) with SMTP id 2871737B423 for ; Fri, 18 May 2001 03:00:24 -0700 (PDT) (envelope-from areilly@bigpond.net.au) Received: (qmail 21171 invoked by uid 1000); 18 May 2001 10:00:16 -0000 From: "Andrew Reilly" Date: Fri, 18 May 2001 20:00:16 +1000 To: Matt Dillon Cc: Terry Lambert , Rik van Riel , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping Message-ID: <20010518200016.A21017@gurney.reilly.home> References: <200105161754.f4GHsCd73025@earth.backplane.com> <3B04BA0D.8E0CAB90@mindspring.com> <200105180620.f4I6KNd05878@earth.backplane.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105180620.f4I6KNd05878@earth.backplane.com>; from dillon@earth.backplane.com on Thu, May 17, 2001 at 11:20:23PM -0700 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 11:20:23PM -0700, Matt Dillon wrote: >Terry wrote: > :The problem in this case is _truly_ that the program in > :question is _really_ trying to optimize its performance > :at the expense of other programs in the system. > > The linker is seeking randomly as a side effect of > the linking algorithm. It is not doing it on purpose to try > to save memory. Forcing the VM system to think it's > sequential causes the VM system to perform read-aheads, > generally reducing the actual amount of physical seeking > that must occur by increasing the size of the chunks > read from disk. Even if the linker's dataset is huge, > increasing the chunk size is beneficial because linkers > ultimately access the entire object file anyway. Trying > to save a few seeks is far more important then reading > extra data and having to throw half of it away. I know that this problem is real in the case of data base index accesses---databases have data sets larger than RAM almost by definition---and that the problem (of dealing with "randomly" accessed memory mapped files) should be neatly solved in general. But is this issue of linking really the lynch pin? Are there _any_ programs and library sets where the union of the code sizes is larger than physical memory? I haven't looked at the problem myself, but (on the surface) it doesn't seem too likely. There is a grand total of 90M of .a files on my system (/usr/lib, /usr/X11/lib, and /usr/local/lib), and I doubt that even a majority of them would be needed at once. -- Andrew To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 6:56:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from helium.chromatix.org.uk (turnover.lancs.ac.uk [148.88.17.220]) by hub.freebsd.org (Postfix) with ESMTP id 1296B37B43C for ; Fri, 18 May 2001 06:56:12 -0700 (PDT) (envelope-from chromi@cyberspace.org) Received: from dolphin.chromatix.org.uk ([192.168.239.105]) by helium.chromatix.org.uk with esmtp (Exim 3.15 #5) id 150kd2-0007iU-00; Fri, 18 May 2001 14:49:16 +0100 X-Sender: chromi@helium.chromatix.org.uk Message-Id: In-Reply-To: <200105180620.f4I6KNd05878@earth.backplane.com> References: <200105161754.f4GHsCd73025@earth.backplane.com> <3B04BA0D.8E0CAB90@mindspring.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Fri, 18 May 2001 14:49:09 +0100 To: Matt Dillon , Terry Lambert From: Jonathan Morton Subject: Re: on load control / process swapping Cc: Rik van Riel , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG >:Cutler suggested a working set quota (first in VMS, later >:in NT) to deal with these programs. > The problem is not the resident set size, it's the > seeking that the program is causing as a matter of > course. The RSS of 'ld' isn't the problem, no. However, the working-set idea would place an effective and sensible limit of the size of the disk cache, by ensuring that other apps aren't being paged out beyond their non-working sets. Does this make sense? FWIW, I've been running with a 2-line hack in my kernel for some weeks now, which essentially forces the RSS of each process not to be forced below some arbitrary "fair share" of the physical memory available. It's not a very clean hack, but it improves performance by a very large margin under a thrashing load. The only problem I'm seeing is a deadlock when I run out of VM completely, but I think that's a separate issue that others are already working on. To others: is there already a means whereby we can (almost) calculate the WS of a given process? The "accessed" flag isn't a good one, but maybe the 'age' value is better. However, I haven't quite clicked on how the 'age' value is affected in either direction. -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 7:12:27 2001 Delivered-To: freebsd-arch@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id DE7CA37B423; Fri, 18 May 2001 07:12:08 -0700 (PDT) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.2/8.11.2) id f4IEBBa86305; Fri, 18 May 2001 17:11:11 +0300 (EEST) (envelope-from ru) Date: Fri, 18 May 2001 17:11:11 +0300 From: Ruslan Ermilov To: Brian Somers , Warner Losh , Bruce Evans , cvs-all@FreeBSD.ORG, current@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG Subject: Re: Where to put include files (was: cvs commit: src Makefile.inc1) Message-ID: <20010518171110.D81893@sunbay.com> Mail-Followup-To: Brian Somers , Warner Losh , Bruce Evans , cvs-all@FreeBSD.ORG, current@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG References: <200105171637.f4HGbsb65668@hak.lan.Awfulhak.org> <20010517195251.A90318@sunbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010517195251.A90318@sunbay.com>; from ru@FreeBSD.org on Thu, May 17, 2001 at 07:52:51PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, May 17, 2001 at 07:52:51PM +0300, Ruslan Ermilov wrote: [...] > > There are 59 Makefiles that have -I${.CURDIR}/(../)+sys in them. > All these are bogus. We should get rid of all of them (-I's). > > So far, I have found sbin/mount_* use headers from /sys/miscfs/ > that are not installed into /usr/include, but should be. Where > should these be installed? /usr/include/fs/ or > should we preserve the /usr/include/miscfs/ layout like in > /sys/miscfs? Modern fs'es install their headers into include/fs > and old ones in include/. > I have removed the -I${.CURDIR}/.../sys from the half of Makefiles that do not actually need it. Here is the rest of Makefiles that have the -I${.CURDIR}/.../sys in them, and it's currently required because they use headers from /sys that do not get installed into /usr/include (but should): sbin/atm/atm/Makefile sbin/atm/fore_dnld/Makefile sbin/atm/ilmid/Makefile sbin/mount_null/Makefile sbin/mount_portal/Makefile sbin/mount_umap/Makefile sbin/mount_union/Makefile sbin/vinum/Makefile usr.sbin/acpi/Makefile.inc very interesting example! usr.sbin/ancontrol/Makefile usr.sbin/dpt/dpt_ctlinfo/Makefile usr.sbin/dpt/dpt_ctls/Makefile usr.sbin/dpt/dpt_dm/Makefile usr.sbin/dpt/dpt_led/Makefile these even don't compile!!! usr.sbin/dpt/dpt_sig/Makefile usr.sbin/dpt/dpt_softc/Makefile usr.sbin/dpt/dpt_sysinfo/Makefile usr.sbin/mlxcontrol/Makefile usr.sbin/pciconf/Makefile usr.sbin/pnpinfo/Makefile usr.sbin/pstat/Makefile usr.sbin/raycontrol/Makefile usr.sbin/setkey/Makefile usr.sbin/sicontrol/Makefile Cheers, -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 7:54:17 2001 Delivered-To: freebsd-arch@freebsd.org Received: from whale.sunbay.crimea.ua (whale.sunbay.crimea.ua [212.110.138.65]) by hub.freebsd.org (Postfix) with ESMTP id 86D0037B422; Fri, 18 May 2001 07:54:04 -0700 (PDT) (envelope-from ru@whale.sunbay.crimea.ua) Received: (from ru@localhost) by whale.sunbay.crimea.ua (8.11.2/8.11.2) id f4IErZr91193; Fri, 18 May 2001 17:53:35 +0300 (EEST) (envelope-from ru) Date: Fri, 18 May 2001 17:53:35 +0300 From: Ruslan Ermilov To: Brian Somers , Warner Losh , Bruce Evans , cvs-all@FreeBSD.ORG, current@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG Subject: Re: Where to put include files (was: cvs commit: src Makefile.inc1) Message-ID: <20010518175335.A90576@sunbay.com> Mail-Followup-To: Brian Somers , Warner Losh , Bruce Evans , cvs-all@FreeBSD.ORG, current@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG References: <200105171637.f4HGbsb65668@hak.lan.Awfulhak.org> <20010517195251.A90318@sunbay.com> <20010518171110.D81893@sunbay.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010518171110.D81893@sunbay.com>; from ru@FreeBSD.org on Fri, May 18, 2001 at 05:11:11PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, May 18, 2001 at 05:11:11PM +0300, Ruslan Ermilov wrote: > On Thu, May 17, 2001 at 07:52:51PM +0300, Ruslan Ermilov wrote: > [...] > > > > There are 59 Makefiles that have -I${.CURDIR}/(../)+sys in them. > > All these are bogus. We should get rid of all of them (-I's). > > > > So far, I have found sbin/mount_* use headers from /sys/miscfs/ > > that are not installed into /usr/include, but should be. Where > > should these be installed? /usr/include/fs/ or > > should we preserve the /usr/include/miscfs/ layout like in > > /sys/miscfs? Modern fs'es install their headers into include/fs > > and old ones in include/. > > > I have removed the -I${.CURDIR}/.../sys from the half of Makefiles > that do not actually need it. Here is the rest of Makefiles that > have the -I${.CURDIR}/.../sys in them, and it's currently required > because they use headers from /sys that do not get installed into > /usr/include (but should): > [...] > > sbin/mount_null/Makefile > sbin/mount_portal/Makefile > sbin/mount_umap/Makefile > sbin/mount_union/Makefile > FS headers should go into /usr/include/fs/fs.h, one per each filesystem. Boris, could you please move smbfs.h one directory up from the /usr/include/fs/smbfs/? Also, installing of smbfs_node.h and smbfs_subr.h seems to be not required as these are used solely within the kernel. Cheers, -- Ruslan Ermilov Oracle Developer/DBA, ru@sunbay.com Sunbay Software AG, ru@FreeBSD.org FreeBSD committer, +380.652.512.251 Simferopol, Ukraine http://www.FreeBSD.org The Power To Serve http://www.oracle.com Enabling The Information Age To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 14:53: 4 2001 Delivered-To: freebsd-arch@freebsd.org Received: from obsecurity.dyndns.org (adsl-63-207-60-32.dsl.lsan03.pacbell.net [63.207.60.32]) by hub.freebsd.org (Postfix) with ESMTP id E702F37B42C; Fri, 18 May 2001 14:53:00 -0700 (PDT) (envelope-from kris@obsecurity.org) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id F339766D16; Fri, 18 May 2001 14:52:59 -0700 (PDT) Date: Fri, 18 May 2001 14:52:59 -0700 From: Kris Kennaway To: Nik Clayton Cc: arch@freebsd.org Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010518145259.A42294@xor.obsecurity.org> References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010518010720.A8037@catkin.nothing-going-on.org> <20010518092815.B10344@catkin.nothing-going-on.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="r5Pyd7+fXNt84Ff3" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010518092815.B10344@catkin.nothing-going-on.org>; from nik@freebsd.org on Fri, May 18, 2001 at 09:28:16AM +0100 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --r5Pyd7+fXNt84Ff3 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 18, 2001 at 09:28:16AM +0100, Nik Clayton wrote: > On Fri, May 18, 2001 at 01:07:20AM +0100, Nik Clayton wrote: > > (c) Tweaked the output format. > >=20 > > Byte 1 Output format version (currently 1) > > Byte 2 Width of the display at snapshot time, in characters > > Byte 3 Depth of the display at snapshot time, in characters > > Byte 4+ Snapshot data >=20 > Following the suggestion of Jeroen C. van Gelderen this is now >=20 > Bytes 1 thru 8 Literal "SCRSHOT_" > Byte 9 Version number > Byte 10 Remaining number of bytes in the header > Byte 11 Width of the display > Byte 12 Depth of the display Would you like to submit a file(1) signature which detects these files to Christos Zoulas ? Kris --r5Pyd7+fXNt84Ff3 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iD8DBQE7BZm6Wry0BWjoQKURAsghAKDM7iGs/aafFj5Rrhqzuxh7ALeSmwCgvSpH lHkvVG3qcFjMJM9Soj8/Se4= =NCAj -----END PGP SIGNATURE----- --r5Pyd7+fXNt84Ff3-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 16:35:40 2001 Delivered-To: freebsd-arch@freebsd.org Received: from nothing-going-on.demon.co.uk (pc-62-31-42-140-hy.blueyonder.co.uk [62.31.42.140]) by hub.freebsd.org (Postfix) with ESMTP id 427EA37B422; Fri, 18 May 2001 16:35:36 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.11.3/8.11.3) id f4INOpY15673; Sat, 19 May 2001 00:24:51 +0100 (BST) (envelope-from nik) Date: Sat, 19 May 2001 00:24:51 +0100 From: Nik Clayton To: Kris Kennaway Cc: Nik Clayton , arch@freebsd.org Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer Message-ID: <20010519002451.B15530@catkin.nothing-going-on.org> References: <20010517121902.A3047@catkin.nothing-going-on.org> <20010518010720.A8037@catkin.nothing-going-on.org> <20010518092815.B10344@catkin.nothing-going-on.org> <20010518145259.A42294@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="0QFb0wBpEddLcDHQ" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010518145259.A42294@xor.obsecurity.org>; from kris@obsecurity.org on Fri, May 18, 2001 at 02:52:59PM -0700 Organization: FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --0QFb0wBpEddLcDHQ Content-Type: multipart/mixed; boundary="FFoLq8A0u+X9iRU8" Content-Disposition: inline --FFoLq8A0u+X9iRU8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, May 18, 2001 at 02:52:59PM -0700, Kris Kennaway wrote: > > Following the suggestion of Jeroen C. van Gelderen this is now > >=20 > > Bytes 1 thru 8 Literal "SCRSHOT_" > > Byte 9 Version number > > Byte 10 Remaining number of bytes in the header > > Byte 11 Width of the display > > Byte 12 Depth of the display >=20 > Would you like to submit a file(1) signature which detects these files > to Christos Zoulas ? Done. I've attached it here as well for completeness. N --=20 FreeBSD: The Power to Serve http://www.freebsd.org/ FreeBSD Documentation Project http://www.freebsd.org/docproj/ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- --FFoLq8A0u+X9iRU8 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=magic # # Files generated by FreeBSD scrshot(1)/vidcontrol(1) utilities # 0 string SCRSHOT_ scrshot(1) screenshot, >8 byte x version %d, >9 byte 2 %d bytes in header, >>10 byte x %d chars wide by >>11 byte x %d chars high --FFoLq8A0u+X9iRU8-- --0QFb0wBpEddLcDHQ Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsFr0IACgkQk6gHZCw343XpTgCdF/iUnVTsdWkkuyoN4tAUSeKg iXEAmwbCSSZ/SFHNcIu73t6yOMAHFDET =Pxbt -----END PGP SIGNATURE----- --0QFb0wBpEddLcDHQ-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 16:43: 5 2001 Delivered-To: freebsd-arch@freebsd.org Received: from kalaid.f2f.com.ua (kalaid.f2f.com.ua [62.149.0.33]) by hub.freebsd.org (Postfix) with ESMTP id 175AA37B42C; Fri, 18 May 2001 16:42:40 -0700 (PDT) (envelope-from sobomax@mail-in.net) Received: from mail.uic-in.net (root@[212.35.189.4]) by kalaid.f2f.com.ua (8.11.3/8.11.1) with ESMTP id f4INhpE26880; Sat, 19 May 2001 02:43:53 +0300 (EEST) (envelope-from sobomax@mail-in.net) Received: from notebook.vega.com (das0-l86.uic-in.net [212.35.189.213]) by mail.uic-in.net (8.11.3/8.11.3) with ESMTP id f4INgJx36064; Sat, 19 May 2001 02:42:28 +0300 (EEST) (envelope-from sobomax@mail-in.net) Date: Sat, 19 May 2001 02:42:28 +0300 (EEST) Message-Id: <200105182342.f4INgJx36064@mail.uic-in.net> To: nik@FreeBSD.org, ru@FreeBSD.org, audit@FreeBSD.org, arch@FreeBSD.org From: Maxim Sobolev Reply-To: sobomax@FreeBSD.org Subject: Integrating new scrshot(1) utility into vidcontrol(1) [patch for review] X-Mailer: Pygmy (v0.5.7) Content-type: text/plain Content-Transfer-Encoding: quoted-printable Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Ok, as it was agreed I've integrated scrshot(1) into vidcontrol(1) and also added ability to dump contents of display buffer in plain text format, so you don't even need a special utility to see what's going on on a console of display-less machine. :-) Please somebody review attached patches (esp. manpage). Thank you! -Maxim Index: vidcontrol.1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/usr.sbin/vidcontrol/vidcontrol.1,v retrieving revision 1.34 diff -d -u -r1.34 vidcontrol.1 --- vidcontrol.1=092001/04/18 15:51:56=091.34 +++ vidcontrol.1=092001/05/18 23:32:37 @@ -36,6 +36,8 @@ .Op Fl M Ar char .Op Fl m Cm on | off .Op Fl r Ar foreground Ar background +.Op Fl p +.Op Fl P .Op Fl s Ar number .Op Fl t Ar N | Cm off .Op Fl x @@ -185,6 +187,21 @@ Used together with the = .Xr moused 8 daemon for text mode cut & paste functionality. +.It Fl p +Capture the current contents of the video buffer corresponding +to the terminal device referred to by standard input. +.Nm +writes contents of the video buffer to the standard +output in a raw binary format. For details about that +format see +.Sx Format of Video Buffer Dump +below. +.It Fl P +Same as +.Fl p , +but dump contents of the video buffer in a plain text format +ignoring nonprintable characters and information about text +attributes. .It Fl r Ar foreground background Change reverse mode colors to .Ar foreground = @@ -253,6 +270,106 @@ option. See .Xr syscons 4 for more details on this kernel option. +.Ss Format of Video Buffer Dump +The +.Nm +utility uses the +.Xr syscons 4 +.Dv CONS_SCRSHOT +.Xr ioctl 2 +to capture the current contents of the video buffer. +.Nm +writes version and additional information to the standard +output, followed by the contents of the terminal device. +.Pp +PC video memory is typically arranged in two byte tuples, +one per character position. In each tuple, the first byte +will be the character code, and the second byte is the +character's color attribute. +.Pp +The color attribute byte is further broken down in to the +low nibble, which specifies which of 16 different foreground +colors is active, and the high nibble, which specifies which +of 16 different background colors is active. +.Pp +.Bl -hang -offset indent -compact +.It 0 +Black +.It 1 +Blue +.It 2 +Green +.It 3 +Cyan +.It 4 +Red +.It 5 +Magenta +.It 6 +Brown +.It 7 +White +.It 8 +Grey +.It 9 +Light Blue +.It 10 +Light Green +.It 11 +Light Cyan +.It 12 +Light Red +.It 13 +Light Magenta +.It 14 +Yellow +.It 15 +White +.El +.Pp +It can be seen that the last 8 colors are brighter +versions of the first 8. +.Pp +For example, the two bytes +.Pp +.Dl "65 158" +.Pp +specify an uppercase A (character code 65), in +yellow (low nibble 15) on a light blue background +(high nibble 9). +.Pp +The +.Nm +output contains a small header which includes additional +information which may be useful to utilities processing +the output. +.Pp +The first 10 bytes are always arranged as follows: +.Bl -column "Byte range" "Contents" -offset indent +.It Sy "Byte Range=09Contents" +.It "1 thru 8=09Literal text" Dq Li SCRSHOT_ +.It "9=09File format version number" +.It "10=09Remaining number of bytes in the header" +.El +.Pp +Subsequent bytes depend on the version number. +.Bl -column "Version" "13 and up" -offset indent +.It Sy "Version=09Byte=09Meaning" +.It "1=0911=09Terminal width, in characters" +.It "=0912=09Terminal depth, in characters" +.It "=0913 and up=09The snapshot data" +.El +.Pp +So a dump of an 80x25 screen would start (in hex) +.Bd -literal -offset indent +53 43 52 53 48 4f 54 5f 01 02 50 19 +----------------------- -- -- -- -- + | | | | ` 25 decimal + | | | `--- 80 decimal + | | `------ 2 remaining bytes of header data + | `--------- File format version 1 + `------------------------ Literal "SCRSHOT_" +.Ed .Sh VIDEO OUTPUT CONFIGURATION .Ss Boot Time Configuration You may set the following variables in @@ -329,6 +446,18 @@ some LCD models): .Pp .Dl vidcontrol -g 100x37 VESA_800x600 +.Pp +The following command will capture the contents of the first virtual +terminal, and redirect the output to the +.Pa shot.scr +file: +.Pp +.Dl vidcontrol -p < /dev/ttyv0 > shot.scr +.Pp +The following command will dump contents of the forth virtual terminal +to the standard output in the human readable format: +.Pp +.Dl vidcontrol -P < /dev/ttyv3 .Sh SEE ALSO .Xr kbdcontrol 1 , .Xr vidfont 1 , @@ -339,5 +468,13 @@ .Xr rc.conf 5 , .Xr kldload 8 , .Xr moused 8 +.Xr watch 8 +.Pp +The various +.Li shot2* +utilities in the +.Li textproc +category of the +.Em "Ports Collection" . .Sh AUTHORS .An S\(/oren Schmidt Aq sos@FreeBSD.org Index: vidcontrol.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /home/ncvs/src/usr.sbin/vidcontrol/vidcontrol.c,v retrieving revision 1.36 diff -d -u -r1.36 vidcontrol.c --- vidcontrol.c=092001/04/21 13:50:32=091.36 +++ vidcontrol.c=092001/05/18 23:32:37 @@ -50,6 +50,11 @@ #define _VESA_800x600_DFL_ROWS 25 #define _VESA_800x600_DFL_FNSZ 16 = +#define DUMP_RAW=090 +#define DUMP_TXT=091 + +#define DUMP_FMT_REV=091 + char =09legal_colors[16][16] =3D { =09"black", "blue", "green", "cyan", =09"red", "magenta", "brown", "white", @@ -70,8 +75,8 @@ =09fprintf(stderr, "%s\n%s\n%s\n%s\n", "usage: vidcontrol [-r fg bg] [-b color] [-c appearance] [-d] [-l scrmap]"= , " [-i adapter | mode] [-L] [-M char] [-m on|off]", -" [-f size file] [-s number] [-t N|off] [-x] [-g geometry= ]", = -" [mode] [fgcol [bgcol]] [show]"); +" [-f size file] [-s number] [-t N|off] [-x] [-g geometry= ]", +" [-p] [-P] [mode] [fgcol [bgcol]] [show]"); =09exit(1); } = @@ -638,6 +643,77 @@ =09=09info.mv_rev.fore, info.mv_rev.back); } = +/* + * Snapshot the video memory of that terminal, using the CONS_SCRSHOT + * ioctl, and writes the results to stdout either in the special + * binary format (see manual page for details), or in the plain + * text format. + */ +void +dump_screen(int mode) +{ +=09scrshot_t shot; +=09vid_info_t info; + +=09info.size =3D sizeof(info); +=09if (ioctl(0, CONS_GETINFO, &info) =3D=3D -1) { +=09=09warn("failed to obtain current video mode parameters"); +=09=09return; +=09} + +=09shot.buf =3D alloca(info.mv_csz * info.mv_rsz * sizeof(u_int16_t)); +=09if (shot.buf =3D=3D NULL) { +=09=09warn("failed to allocate memory for dump"); +=09=09return; +=09} + +=09shot.xsize =3D info.mv_csz; +=09shot.ysize =3D info.mv_rsz; +=09if (ioctl(0, CONS_SCRSHOT, &shot) =3D=3D -1) { +=09=09warn("failed to get dump of the screen"); +=09=09return; +=09} + +=09if (mode =3D=3D DUMP_RAW) { +=09=09printf("SCRSHOT_%c%c%c%c", DUMP_FMT_REV, 2, +=09=09 shot.xsize, shot.ysize); +=09=09fflush(stdout); + +=09=09(void)write(STDOUT_FILENO, shot.buf, +=09=09=09 shot.xsize * shot.ysize * sizeof(u_int16_t)); +=09} else { +=09=09char *line; +=09=09int x, y; +=09=09u_int16_t ch; + +=09=09line =3D alloca(shot.xsize + 1); +=09=09if (line =3D=3D NULL) { +=09=09=09warn("failed to allocate memory for line buffer"); +=09=09=09return; +=09=09} + +=09=09for (y =3D 0; y < shot.ysize; y++) { +=09=09=09for (x =3D 0; x < shot.xsize; x++) { +=09=09=09=09ch =3D shot.buf[x + (y * shot.xsize)]; +=09=09=09=09ch &=3D 0xff; +=09=09=09=09if (isprint(ch) =3D=3D 0) +=09=09=09=09=09ch =3D ' '; +=09=09=09=09line[x] =3D (char)ch; +=09=09=09} + +=09=09=09/* Trim trailing spaces */ +=09=09=09do { +=09=09=09=09line[x--] =3D '\0'; +=09=09=09} while (line[x] =3D=3D ' ' && x !=3D 0); + +=09=09=09puts(line); +=09=09} +=09=09fflush(stdout); +=09} + +=09return; +} + int main(int argc, char **argv) { @@ -648,7 +724,7 @@ =09info.size =3D sizeof(info); =09if (ioctl(0, CONS_GETINFO, &info) < 0) =09=09err(1, "must be on a virtual console"); -=09while((opt =3D getopt(argc, argv, "b:c:df:g:i:l:LM:m:r:s:t:x")) !=3D -1= ) +=09while((opt =3D getopt(argc, argv, "b:c:df:g:i:l:LM:m:pPr:s:t:x")) !=3D = -1) =09=09switch(opt) { =09=09=09case 'b': =09=09=09=09set_border_color(optarg); @@ -689,6 +765,12 @@ =09=09=09=09break; =09=09=09case 'm': =09=09=09=09set_mouse(optarg); +=09=09=09=09break; +=09=09=09case 'p': +=09=09=09=09dump_screen(DUMP_RAW); +=09=09=09=09break; +=09=09=09case 'P': +=09=09=09=09dump_screen(DUMP_TXT); =09=09=09=09break; =09=09=09case 'r': =09=09=09=09set_reverse_colors(argc, argv, &optind); To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 19:18:57 2001 Delivered-To: freebsd-arch@freebsd.org Received: from netbank.com.br (garrincha.netbank.com.br [200.203.199.88]) by hub.freebsd.org (Postfix) with ESMTP id 921C137B440 for ; Fri, 18 May 2001 19:18:52 -0700 (PDT) (envelope-from riel@conectiva.com.br) Received: from surriel.ddts.net (1-135.ctame701-1.telepar.net.br [200.181.137.135]) by netbank.com.br (Postfix) with ESMTP id 13D5B46815; Fri, 18 May 2001 23:17:22 -0300 (BRST) Received: from localhost (pexatx@localhost [127.0.0.1]) by surriel.ddts.net (8.11.3/8.11.2) with ESMTP id f4J2Ibu02011; Fri, 18 May 2001 23:18:37 -0300 Date: Fri, 18 May 2001 23:18:37 -0300 (BRST) From: Rik van Riel X-Sender: riel@imladris.rielhome.conectiva To: Jonathan Morton Cc: Matt Dillon , Terry Lambert , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Subject: Re: on load control / process swapping In-Reply-To: Message-ID: X-spambait: aardvark@kernelnewbies.org X-spammeplease: aardvark@nl.linux.org MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 18 May 2001, Jonathan Morton wrote: > FWIW, I've been running with a 2-line hack in my kernel for some weeks > now, which essentially forces the RSS of each process not to be forced > below some arbitrary "fair share" of the physical memory available. > It's not a very clean hack, but it improves performance by a very > large margin under a thrashing load. The only problem I'm seeing is a > deadlock when I run out of VM completely, but I think that's a > separate issue that others are already working on. I'm pretty sure I know what you're running into. Say you guarantee a minimum of 3% of memory for each process; now when you have 30 processes running your memory is full and you cannot reclaim any pages when one of the processes runs into a page fault. The minimum RSS guarantee is a really nice thing to prevent the proverbial root shell from thrashing, but it really only works if you drop such processes every once in a while and swap them out completely. You especially need to do this when you're getting tight on memory and you have idle processes sitting around using their minimum RSS worth of RAM ;) It'd work great together with load control though. I guess I should post a patch for - simple&naive - load control code once I've got the inodes and the dirty page writeout code balancing fixed. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://distro.conectiva.com/ Send all your spam to aardvark@nl.linux.org (spam digging piggy) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 19:57:15 2001 Delivered-To: freebsd-arch@freebsd.org Received: from helium.chromatix.org.uk (turnover.lancs.ac.uk [148.88.17.220]) by hub.freebsd.org (Postfix) with ESMTP id BA2E837B42C for ; Fri, 18 May 2001 19:57:11 -0700 (PDT) (envelope-from chromi@cyberspace.org) Received: from dolphin.chromatix.org.uk ([192.168.239.105]) by helium.chromatix.org.uk with esmtp (Exim 3.15 #5) id 150wuv-0008A3-00; Sat, 19 May 2001 03:56:33 +0100 X-Sender: chromi@helium.chromatix.org.uk Message-Id: In-Reply-To: References: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Sat, 19 May 2001 03:56:14 +0100 To: Rik van Riel From: Jonathan Morton Subject: Re: on load control / process swapping Cc: Matt Dillon , Terry Lambert , Charles Randall , Roger Larsson , arch@FreeBSD.ORG, linux-mm@kvack.org, sfkaplan@cs.amherst.edu Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG >> FWIW, I've been running with a 2-line hack in my kernel for some weeks >> now, which essentially forces the RSS of each process not to be forced >> below some arbitrary "fair share" of the physical memory available. >> It's not a very clean hack, but it improves performance by a very >> large margin under a thrashing load. The only problem I'm seeing is a >> deadlock when I run out of VM completely, but I think that's a >> separate issue that others are already working on. > >I'm pretty sure I know what you're running into. > >Say you guarantee a minimum of 3% of memory for each process; >now when you have 30 processes running your memory is full and >you cannot reclaim any pages when one of the processes runs >into a page fault. Actually I already thought of that one, and made it a "fair share" of the system rather than a fixed amount. IOW, the guaranteed amount is something like (total_memory / nr_processes). I think I was even sane enough to lower this value slightly to allow for some buffer/cache memory, but I didn't allow for locked pages (including the kernel itself). The deadlock happened when the swap ran out, not the physical RAM, and is independent of this particular hack - remember I'm running with some out_of_memory() fixes and some other hackery I did a month or so ago (remember that massive "OOM killer" thread?). I should try to figure those out and present cleaned-up versions for further perusal... -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 20:30:35 2001 Delivered-To: freebsd-arch@freebsd.org Received: from obsecurity.dyndns.org (adsl-63-207-60-32.dsl.lsan03.pacbell.net [63.207.60.32]) by hub.freebsd.org (Postfix) with ESMTP id AED8937B424; Fri, 18 May 2001 20:30:24 -0700 (PDT) (envelope-from kris@obsecurity.org) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 4DA7067B2A; Fri, 18 May 2001 20:30:24 -0700 (PDT) Date: Fri, 18 May 2001 20:30:24 -0700 From: Kris Kennaway To: Greg Lehey Cc: arch@FreeBSD.org, Ruslan Ermilov , cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Compiler-neutral warning flags Message-ID: <20010518203024.A20917@xor.obsecurity.org> Reply-To: arch@FreeBSD.org References: <200105181040.f4IAeYi56574@freefall.freebsd.org> <20010519111635.I7513@wantadilla.lemis.com> <20010518191949.A2362@xor.obsecurity.org> <20010519124613.C64759@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="MGYHOYXEY6WxJCY8" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010519124613.C64759@wantadilla.lemis.com>; from grog@lemis.com on Sat, May 19, 2001 at 12:46:13PM +0930 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --MGYHOYXEY6WxJCY8 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 19, 2001 at 12:46:13PM +0930, Greg Lehey wrote: > On Friday, 18 May 2001 at 19:19:49 -0700, Kris Kennaway wrote: > > On Sat, May 19, 2001 at 11:16:35AM +0930, Greg Lehey wrote: > >> On Friday, 18 May 2001 at 3:40:34 -0700, Ruslan Ermilov wrote: > >>> ru 2001/05/18 03:40:34 PDT > >>> > >>> Modified files: > >>> usr.bin/scrshot Makefile > >>> Log: > >>> Remove GCC-ism (-Wall). > >> > >> I suspect I've missed something here. What's wrong with -Wall? > > > > It's a GCC-ism. >=20 > I thought we were using gcc. What's the flag that provides equivalent > functionality? We don't have a compiler-neutral way to do this. It would be great to be able to turn on -Werror and -Wall when building with gcc, as various parts of the tree get clean, to prevent the introduction of new warnings. NetBSD have a way to enable compiler warning flags (and other related stuff, like compiling in debugging assertions) which we should probably look into. I can take a look at this over the weekend. Kris --MGYHOYXEY6WxJCY8 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iD8DBQE7BejPWry0BWjoQKURAroJAJ9nKS9pMA4PpHPWaBTJ+vVJeREs1wCgihow euw4LqgrapshM+CNyjP0Q8k= =Xv5x -----END PGP SIGNATURE----- --MGYHOYXEY6WxJCY8-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 21:22:16 2001 Delivered-To: freebsd-arch@freebsd.org Received: from wantadilla.lemis.com (wantadilla.lemis.com [192.109.197.80]) by hub.freebsd.org (Postfix) with ESMTP id 06F3837B422; Fri, 18 May 2001 21:21:53 -0700 (PDT) (envelope-from grog@lemis.com) Received: by wantadilla.lemis.com (Postfix, from userid 1004) id F26846ACBE; Sat, 19 May 2001 13:26:58 +0930 (CST) Date: Sat, 19 May 2001 13:26:58 +0930 From: Greg Lehey To: arch@FreeBSD.org Cc: Ruslan Ermilov , cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: Compiler-neutral warning flags Message-ID: <20010519132658.H64759@wantadilla.lemis.com> References: <200105181040.f4IAeYi56574@freefall.freebsd.org> <20010519111635.I7513@wantadilla.lemis.com> <20010518191949.A2362@xor.obsecurity.org> <20010519124613.C64759@wantadilla.lemis.com> <20010518203024.A20917@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010518203024.A20917@xor.obsecurity.org>; from kris@obsecurity.org on Fri, May 18, 2001 at 08:30:24PM -0700 Organization: LEMIS, PO Box 460, Echunga SA 5153, Australia Phone: +61-8-8388-8286 Fax: +61-8-8388-8725 Mobile: +61-418-838-708 WWW-Home-Page: http://www.lemis.com/~grog X-PGP-Fingerprint: 6B 7B C3 8C 61 CD 54 AF 13 24 52 F8 6D A4 95 EF Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Friday, 18 May 2001 at 20:30:24 -0700, Kris Kennaway wrote: > On Sat, May 19, 2001 at 12:46:13PM +0930, Greg Lehey wrote: >> On Friday, 18 May 2001 at 19:19:49 -0700, Kris Kennaway wrote: >>> On Sat, May 19, 2001 at 11:16:35AM +0930, Greg Lehey wrote: >>>> On Friday, 18 May 2001 at 3:40:34 -0700, Ruslan Ermilov wrote: >>>>> ru 2001/05/18 03:40:34 PDT >>>>> >>>>> Modified files: >>>>> usr.bin/scrshot Makefile >>>>> Log: >>>>> Remove GCC-ism (-Wall). >>>> >>>> I suspect I've missed something here. What's wrong with -Wall? >>> >>> It's a GCC-ism. >> >> I thought we were using gcc. What's the flag that provides equivalent >> functionality? > > We don't have a compiler-neutral way to do this. > > It would be great to be able to turn on -Werror and -Wall when > building with gcc, as various parts of the tree get clean, to prevent > the introduction of new warnings. Have I missed more than I thought? Are we contemplating using a different compiler? That would make it more reasonable. > NetBSD have a way to enable compiler warning flags (and other related > stuff, like compiling in debugging assertions) which we should > probably look into. I can take a look at this over the weekend. Sounds good. Greg -- Finger grog@lemis.com for PGP public key See complete headers for address and phone numbers To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Fri May 18 22: 4:40 2001 Delivered-To: freebsd-arch@freebsd.org Received: from dragon.nuxi.com (trang.nuxi.com [209.152.133.57]) by hub.freebsd.org (Postfix) with ESMTP id 18D3A37B424; Fri, 18 May 2001 22:04:36 -0700 (PDT) (envelope-from obrien@NUXI.com) Received: (from obrien@localhost) by dragon.nuxi.com (8.11.3/8.11.1) id f4J527620078; Fri, 18 May 2001 22:02:07 -0700 (PDT) (envelope-from obrien) Date: Fri, 18 May 2001 22:02:07 -0700 From: "David O'Brien" To: Greg Lehey Cc: arch@FreeBSD.org, Ruslan Ermilov , cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: Compiler-neutral warning flags Message-ID: <20010518220207.A20060@dragon.nuxi.com> Reply-To: obrien@FreeBSD.org References: <200105181040.f4IAeYi56574@freefall.freebsd.org> <20010519111635.I7513@wantadilla.lemis.com> <20010518191949.A2362@xor.obsecurity.org> <20010519124613.C64759@wantadilla.lemis.com> <20010518203024.A20917@xor.obsecurity.org> <20010519132658.H64759@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010519132658.H64759@wantadilla.lemis.com>; from grog@lemis.com on Sat, May 19, 2001 at 01:26:58PM +0930 X-Operating-System: FreeBSD 5.0-CURRENT Organization: The NUXI BSD group X-Pgp-Rsa-Fingerprint: B7 4D 3E E9 11 39 5F A3 90 76 5D 69 58 D9 98 7A X-Pgp-Rsa-Keyid: 1024/34F9F9D5 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, May 19, 2001 at 01:26:58PM +0930, Greg Lehey wrote: > Have I missed more than I thought? Are we contemplating using a > different compiler? That would make it more reasonable. It is a lofty goal. But it also affects tools like lint, as other tools do not know what to do with -Wall. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat May 19 8:38:57 2001 Delivered-To: freebsd-arch@freebsd.org Received: from warsaw.newsscope.com (warsaw.newsscope.com [64.21.68.186]) by hub.freebsd.org (Postfix) with ESMTP id 69C0137B42C; Sat, 19 May 2001 08:38:54 -0700 (PDT) (envelope-from chuck@newsscope.com) Received: from localhost (chuck@localhost) by warsaw.newsscope.com (8.11.1/8.11.1) with ESMTP id f4JH8xd63820; Sat, 19 May 2001 13:08:59 -0400 (EDT) (envelope-from chuck@newsscope.com) Date: Sat, 19 May 2001 13:08:59 -0400 (EDT) From: Chuck Youse To: Terry Lambert Cc: Peter Pentchev , Ruslan Ermilov , Nik Clayton , arch@FreeBSD.ORG Subject: Re: [PATCH] syscons ioctl() to grab text mode buffer In-Reply-To: <200105171816.LAA09113@usr05.primenet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Thu, 17 May 2001, Terry Lambert wrote: > On a practical note, the code generated merely inverts the sense > of the same cmpl at default optimization, and at -O2 ends up being > either: > > testl %eax,%eax > jge .L3 > > or: > > cmpl $-1,%eax > jne .L3 > > So the number of instruction cycles is identical. Off the top of Well, the second sequence is longer by a byte. I don't have the data sheets handy, it probably also takes another cycle as it involves an immediate operand sign-extended to 32-bits instead of just a register-register compare. Just nitpicking Chuck To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat May 19 8:59:20 2001 Delivered-To: freebsd-arch@freebsd.org Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by hub.freebsd.org (Postfix) with ESMTP id 2921537B422; Sat, 19 May 2001 08:59:14 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from bde.zeta.org.au (bde.zeta.org.au [203.2.228.102]) by mailman.zeta.org.au (8.9.3/8.8.7) with ESMTP id BAA15690; Sun, 20 May 2001 01:59:10 +1000 Date: Sun, 20 May 2001 01:57:45 +1000 (EST) From: Bruce Evans X-Sender: bde@besplex.bde.org To: Ruslan Ermilov Cc: Brian Somers , Warner Losh , cvs-all@FreeBSD.ORG, current@FreeBSD.ORG, freebsd-arch@FreeBSD.ORG Subject: Re: Where to put include files (was: cvs commit: src Makefile.inc1) In-Reply-To: <20010518175335.A90576@sunbay.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Fri, 18 May 2001, Ruslan Ermilov wrote: > > sbin/mount_null/Makefile > > sbin/mount_portal/Makefile > > sbin/mount_umap/Makefile > > sbin/mount_union/Makefile > > > FS headers should go into /usr/include/fs/fs.h, one per > each filesystem. without a slash? This isn't so clear. Lots of headers may be needed for _fsck. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat May 19 11:52:51 2001 Delivered-To: freebsd-arch@freebsd.org Received: from nothing-going-on.demon.co.uk (pc-62-31-42-140-hy.blueyonder.co.uk [62.31.42.140]) by hub.freebsd.org (Postfix) with ESMTP id F203837B424; Sat, 19 May 2001 11:52:42 -0700 (PDT) (envelope-from nik@nothing-going-on.demon.co.uk) Received: (from nik@localhost) by nothing-going-on.demon.co.uk (8.11.3/8.11.3) id f4JIiZ522306; Sat, 19 May 2001 19:44:35 +0100 (BST) (envelope-from nik) Date: Sat, 19 May 2001 19:44:35 +0100 From: Nik Clayton To: sobomax@FreeBSD.org Cc: nik@FreeBSD.org, ru@FreeBSD.org, audit@FreeBSD.org, arch@FreeBSD.org Subject: Re: Integrating new scrshot(1) utility into vidcontrol(1) [patch for review] Message-ID: <20010519194435.A22224@catkin.nothing-going-on.org> References: <200105182342.f4INgJx36064@mail.uic-in.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="h31gzZEtNLTqOjlF" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105182342.f4INgJx36064@mail.uic-in.net>; from sobomax@mail-in.net on Sat, May 19, 2001 at 02:42:28AM +0300 Organization: FreeBSD Project Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --h31gzZEtNLTqOjlF Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 19, 2001 at 02:42:28AM +0300, Maxim Sobolev wrote: > Ok, as it was agreed I've integrated scrshot(1) into > vidcontrol(1)=20 Any opinions on selecting this option by default if vidcontrol is invoked as scrshot? > and also added ability to dump contents > of display buffer in plain text format, so you don't > even need a special utility to see what's going on > on a console of display-less machine. :-) Isn't that what watch(8) is for? [...] > + if (mode =3D=3D DUMP_RAW) { > + printf("SCRSHOT_%c%c%c%c", DUMP_FMT_REV, 2, > + shot.xsize, shot.ysize); > + fflush(stdout); You've duplicated a buf of mine, "SCRSHOT_" should probably be #define. [...] > + /* Trim trailing spaces */ > + do { > + line[x--] =3D '\0'; > + } while (line[x] =3D=3D ' ' && x !=3D 0); I'm not sure that's necessary (nor is the trimming of notionally unprintable characters). The point was to get an accurate dump of the video buffer contents. We've already had requests to automatically convert the 8 bit line drawing characters in to 7 bit ones (which shot2txt now does). I'd prefer to see all the post-processing done outside of vidcontrol, otherwise you're on a slope in terms of what functionality is deemed acceptable for vidcontrol vs. what functionality has to be put in another utility. Apart from those quibbles, great. Thanks for doing the work. N --=20 FreeBSD: The Power to Serve http://www.freebsd.org/ FreeBSD Documentation Project http://www.freebsd.org/docproj/ --- 15B8 3FFC DDB4 34B0 AA5F 94B7 93A8 0764 2C37 E375 --- --h31gzZEtNLTqOjlF Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.5 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsGvxIACgkQk6gHZCw343UegACdFjYxkdhupSRto5zIe+2aebon yF0AnAm4iXYRsNbmgvNeiAyqxgvZRL5+ =5eXE -----END PGP SIGNATURE----- --h31gzZEtNLTqOjlF-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat May 19 12:40:18 2001 Delivered-To: freebsd-arch@freebsd.org Received: from kalaid.f2f.com.ua (kalaid.f2f.com.ua [62.149.0.33]) by hub.freebsd.org (Postfix) with ESMTP id C709937B422; Sat, 19 May 2001 12:39:53 -0700 (PDT) (envelope-from sobomax@mail-in.net) Received: from mail.uic-in.net (root@[212.35.189.4]) by kalaid.f2f.com.ua (8.11.3/8.11.1) with ESMTP id f4JJfDE44190; Sat, 19 May 2001 22:41:15 +0300 (EEST) (envelope-from sobomax@mail-in.net) Received: from notebook.vega.com (das0-l92.uic-in.net [212.35.189.219]) by mail.uic-in.net (8.11.3/8.11.3) with ESMTP id f4JJdjr00870; Sat, 19 May 2001 22:39:47 +0300 (EEST) (envelope-from sobomax@mail-in.net) Date: Sat, 19 May 2001 22:39:47 +0300 (EEST) Message-Id: <200105191939.f4JJdjr00870@mail.uic-in.net> To: nik@FreeBSD.ORG Cc: nik@FreeBSD.ORG, ru@FreeBSD.ORG, audit@FreeBSD.ORG, arch@FreeBSD.ORG From: Maxim Sobolev Reply-To: sobomax@FreeBSD.ORG Subject: Re: Integrating new scrshot(1) utility into vidcontrol(1) [patch for review] X-Mailer: Pygmy (v0.5.7) In-Reply-To: <20010519194435.A22224@catkin.nothing-going-on.org> Content-type: text/plain Content-Transfer-Encoding: quoted-printable Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG On Sat, 19 May 2001 19:44:35 +0100, Nik Clayton wrote: > On Sat, May 19, 2001 at 02:42:28AM +0300, Maxim Sobolev wrote: > > Ok, as it was agreed I've integrated scrshot(1) into > > vidcontrol(1) = > = > Any opinions on selecting this option by default if vidcontrol is > invoked as scrshot? I don't really think it is worth effort. There is no POLA to preserve. > > and also added ability to dump contents > > of display buffer in plain text format, so you don't > > even need a special utility to see what's going on > > on a console of display-less machine. :-) > = > Isn't that what watch(8) is for? No, this and watch(8)'s functionality are ortogonal because `vidcontrol -P' could be used to dump what *is* already on the terminal, while watch(8) what *will* be on terminal after it has been started. > > +=09if (mode =3D=3D DUMP_RAW) { > > +=09=09printf("SCRSHOT_%c%c%c%c", DUMP_FMT_REV, 2, > > +=09=09 shot.xsize, shot.ysize); > > +=09=09fflush(stdout); > = > You've duplicated a buf of mine, "SCRSHOT_" should probably be #define. Well, frankly speaking I don't think it matters. I won't have a problem, though, if you will change it to whatever you think is appropriate. > > +=09=09=09/* Trim trailing spaces */ > > +=09=09=09do { > > +=09=09=09=09line[x--] =3D '\0'; > > +=09=09=09} while (line[x] =3D=3D ' ' && x !=3D 0); > = > I'm not sure that's necessary (nor is the trimming of notionally > unprintable characters). The point was to get an accurate dump of the > video buffer contents. As I clearly stated in the manpage, `-P' option intended only for getting quick'n'dirty human-readable dump. For accurate dump with attributes, 8-bit clearness and so on one should really use `-p' option. > We've already had requests to automatically convert the 8 bit line > drawing characters in to 7 bit ones (which shot2txt now does). I'd > prefer to see all the post-processing done outside of vidcontrol, > otherwise you're on a slope in terms of what functionality is deemed > acceptable for vidcontrol vs. what functionality has to be put in > another utility. In my view, in 90% of cases my -P option will cover the users' needs. -p plus external utility will cover remaining 10%. > Apart from those quibbles, great. Thanks for doing the work. Thank *you* for doing it. I've merely just rearranged things a bit. BTW, I think that the kernel's part of this feature has to be extended to dump not only visible portion of the screen buffer, but the whole history buffer as well. What do you think? -Maxim To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message From owner-freebsd-arch Sat May 19 12:43:25 2001 Delivered-To: freebsd-arch@freebsd.org Received: from tao.org.uk (genesis.tao.org.uk [212.135.162.62]) by hub.freebsd.org (Postfix) with ESMTP id 8993C37B424; Sat, 19 May 2001 12:43:20 -0700 (PDT) (envelope-from joe@tao.org.uk) Received: by tao.org.uk (Postfix, from userid 100) id 25A8F22; Sat, 19 May 2001 20:43:19 +0100 (BST) Date: Sat, 19 May 2001 20:43:19 +0100 From: Josef Karthauser To: sobomax@FreeBSD.ORG Cc: nik@FreeBSD.ORG, ru@FreeBSD.ORG, audit@FreeBSD.ORG, arch@FreeBSD.ORG Subject: Re: Integrating new scrshot(1) utility into vidcontrol(1) [patch for review] Message-ID: <20010519204319.B2145@tao.org.uk> References: <20010519194435.A22224@catkin.nothing-going-on.org> <200105191939.f4JJdjr00870@mail.uic-in.net> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-md5; protocol="application/pgp-signature"; boundary="cvVnyQ+4j833TQvp" Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200105191939.f4JJdjr00870@mail.uic-in.net>; from sobomax@mail-in.net on Sat, May 19, 2001 at 10:39:47PM +0300 Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG --cvVnyQ+4j833TQvp Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Sat, May 19, 2001 at 10:39:47PM +0300, Maxim Sobolev wrote: >=20 > BTW, I think that the kernel's part of this feature has > to be extended to dump not only visible portion of the > screen buffer, but the whole history buffer as well. What > do you think? >=20 Now _that_'d be cool. :) Joe --cvVnyQ+4j833TQvp Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.4 (FreeBSD) Comment: For info see http://www.gnupg.org iEYEARECAAYFAjsGzNYACgkQXVIcjOaxUBZ5OQCdEp+xN2iQ9mEHaSuyjCuJjxsL aNgAoLmnHbYx9qBuNipo2Lf0CEWVrhXt =cEZb -----END PGP SIGNATURE----- --cvVnyQ+4j833TQvp-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message