From owner-freebsd-hackers Fri Sep 6 11:10:11 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id LAA17279 for hackers-outgoing; Fri, 6 Sep 1996 11:10:11 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id LAA17226 for ; Fri, 6 Sep 1996 11:09:47 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA11517; Fri, 6 Sep 1996 11:02:06 -0700 From: Terry Lambert Message-Id: <199609061802.LAA11517@phaeton.artisoft.com> Subject: Re: FreeBSD vs. Linux 96 (my impressions) To: koshy@india.hp.com (A JOSEPH KOSHY) Date: Fri, 6 Sep 1996 11:02:05 -0700 (MST) Cc: terry@lambert.org, jkh@time.cdrom.com, jehamby@lightside.com, imp@village.org, lada@ws2301.gud.siemens.co.at, dennis@etinc.com, hackers@FreeBSD.org In-Reply-To: <199609060508.AA079396512@fakir.india.hp.com> from "A JOSEPH KOSHY" at Sep 6, 96 10:08:31 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > tl> This is mostly because the BSD namei() interface is a piece of shit no > tl> one seems prepared to allow a change to because there are one or two > tl> CSRG hackers locked in a closet somewhere, and every once in a while > tl> they shove something out under the door, and God Forbid we lose out > tl> on the ability to integrate those occasional changes. > > On another point, I did some basic kernel profiling while doing some > assorted operations (make kernel, find | cpio -O /dev/null) etc. > > Surprisingly `namei' turned out to be the single biggest contributor to > time spent in the kernel. I can understand the find -- a more balanced benchmark would be to run a DOS/Windows client running the Ziff/Davis suite from Netbench (DiskMix) against a BSD server. Alternately, there are several commercial suites which would cost you several hundred dollars to acquire (or more than that to rewrite). The LM/Bench stuff is not much better than the find, since it biases FS operations toward directory ops -- the single biggest FS usage is for read calls, then writes, then directory ops, then all other ops. On my own system, profiling shows that the single biggest time is data copies, by about a factor of 5:1 over all other sources of delay. You can fix this somewhat by picking an optimum bcopy() implementation per processor in the uiomove() code. The uiomove() code is also needlessly complex to support the "struct fileops" abstraction (deadfs -- unnecessary, specfs -- replaced by devfs to not use fileops, and pipes -- should be implemented in an unexported FS name space). You can get about 2% by cleaning up the relative root code, at the cost of having to specify a relative root vnode in all cases by inheriting the root at process creation from the fork()ing process. This only means that you have to set the root for the init process, something you do for it's current directory anyway. The namei() call tends to copy the path string around, and so is a big offender; this is correctable with a couple of interface changes; the nameifree() change drops it about 10%, for instance, by moving the alloc/free operation to the same API level, and reducing the extra testing that has to go on everywhere in the error cases. Changing the path string into a pre-parsed list of path components is about another 6% win, and you can get another 8% by putting in the change to not go through with the allocation on a preexisting element. This complicated parsing of symbolic links, since it means you have to loop-unroll the mutual recusrsion (which is how symbolic links are currently implemented). To avoid using too much kernel stack, you have to reduce the stack usage to get there -- another reason for a pre-parsed list not allocated on the stack. Moving a lot of the flag based complexity out of the VOP_LOOKUP will flatten the function call graph, and save another 8% in the non-failure case, as well as making the code less subject to misimplementation by moving it out of the per-FS VOP_LOOKUP code. For instance, the directory name cache code wants to be in the common lookup code instead of the per FS lookup code. You would use a per FS instance (vfsstruct) flag to enable/disable the six or so cache conditions (create/delete/negative cache, etc.). The union FS would have to be expanded to include cache information for its inferior FS -- basically an issue for the FS layers which fan-out 1:N mappings. Finally, presorting the function vector list at the time you register the FS allows you to change the indirect function reference for the VOP_* vnode_if.c calls into macro references, which throws out the additional stack call and functioncall overhead of simply using the VOP interface at all (push-call-push-call-ret-pop-ret-pop simply decomposes to push-call-ret-pop). This is only about 1% for the VOP_LOOKUP, but ends up being about 7% overall in the Ziff/Davis benchmarks. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.