Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 03 Mar 1995 23:34:41 -0800
From:      David Greenman <davidg@Root.COM>
To:        "Russell L. Carter" <rcarter@geli.com>
Cc:        current@FreeBSD.org
Subject:   Re: "feel" of recent systems 
Message-ID:  <199503040734.XAA00315@corbin.Root.COM>
In-Reply-To: Your message of "Fri, 03 Mar 95 22:03:09 PST." <199503040603.WAA18138@geli.clusternet> 

next in thread | previous in thread | raw e-mail | index | archive | help
>|   There are multiple bugs in several parts of the system that could be
>|causing your specific problem. The NCR driver has been changing, for instance,
>
>#1, the hesitancy is *not* a problem. (IMHO)

   Well, I think unexplained hesitancies are a problem. If they're caused by
known and acceptable reasons, then that's fine. ...but I'd be a poor kernel
developer if I overlooked anomolous behavior caused by unknown reasons.

>|and this might have something to do with it. ...and of course there are the
>|problems with buffer management/directory caching that we still haven't found
>|an optimal solution for. We always strive to strike the best balance between
>|overall performance and responsiveness...but -current isn't production code,
>
>#2, Who claimed it was, or ever should be?  I take jkh's representations
>    to heart.

   The above was in response to the "could anyone improving the system
comment on their philosophy for doing these changes?  Does the overall system
throughput improve?" pseudo flamebait. This is a leading question and suggests
that we intentionally "improve" the system by making it perform worse. This
of course is silly. In the past few days I've personally been very concerned
about system stability - there were serious bugs in vfs_bio.c and vfs_cluster.c
that would cause the machine to hang. There may still be bugs - but
performance decreasing or not, these changes are required if you want the
system to run longer than a few hours.

>|and we make no representations that it is. If you could be more specific
>|about certain kinds of operations that appear slower, this would help us
>
>#3, (This will come off the wrong way, but damn the torpedos:) 
>     Use the system dammit, and you'll notice the delays...

   I could go into a long description about the extensive testing that I do
here on multiple machines...on how I've spent over a 100 hours of time just in
the past few weeks doing various forms of load testing and analysis...but
instead I'll answer this with "I do". I've noticed slowdowns only during
certain tests - and then only with certain configurations of memory. I don't
know how much memory your machine has, but it appears that the worst case is
about 16MB of RAM. The machine I do most of the FreeBSD development on has
64MB...and I haven't noticed any problems in that case.

>|find the problems (I saw your Bonnie results...these really aren't very
>|useful by themselves, however, as they are affected too much by local disk
>|fragmentation).
>
>#4, Wrong!   Nothing personal intended!!!!!!
>
>I've used these on a couple of dozen systems, running a lot of different
>unices, and if they had susceptibilities I would smoke them out myself.
>I have absolutely nothing to gain by using inaccurate tools.

   Indeed, it's always important to use the right tools. The primary problem
that we have been trying to solve has to do with faulty algorithms for
directory and metadata caching. There are some severe problems with directory
cache buffers getting flushed out by 'VMIO' buffers. Something seems to cause
the directory cache to shrink and then stay small. This doesn't happen all the
time, and seems to be triggered by specific events. This obviously has nothing
to do with linear or random disk access which the Bonnie benchmark tests. This
aside, Bonnie (and any test of sequential disk access) is often skewed by
filesystem fragmentation and unless it is run on a freshly newfs'd filesystem,
it isn't a very good measure of a system's throughput capability.

>If you're trying to say the scsi system isn't moderately broken, performance
>wise, since the 021095-SNAP, I'd really like to know why.

   I'm not saying this at all. It may very well be the case that the NCR
driver isn't performing as it should be - perhaps Stefan Esser might be able
to say something about this. I suppose my only point in all of this is that
it is too soon to be complaining about the day-to-day performance differences
of -current. I admit that there are problems, and I promise to do what I can
to resolve them...we won't release 2.1 until this has been fixed.
   I might also add that John Dyson (the author of vfs_bio.c and parts of
vfs_cluster.c) has put a lot of time into solving the performance problems. He
was called out of town on emergency business yesterday and won't be able to
continue working on this until sometime late next week. Just before he left,
he came up with a set of changes which are thought to solve the problem...the
only hitch is that during the testing his root filesystem was corrupted. I
can send you these "fixes" if you'd like. :-)
   I've become ill in the past two days (cold or flu), but if I'm not feeling
too bad tomorrow, I may work on this some. I'd be happy to include you in
testing if you're interested (and if I come up with something).

-DG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503040734.XAA00315>