Date: Fri, 03 Mar 1995 23:34:41 -0800 From: David Greenman <davidg@Root.COM> To: "Russell L. Carter" <rcarter@geli.com> Cc: current@FreeBSD.org Subject: Re: "feel" of recent systems Message-ID: <199503040734.XAA00315@corbin.Root.COM> In-Reply-To: Your message of "Fri, 03 Mar 95 22:03:09 PST." <199503040603.WAA18138@geli.clusternet>
next in thread | previous in thread | raw e-mail | index | archive | help
>| There are multiple bugs in several parts of the system that could be >|causing your specific problem. The NCR driver has been changing, for instance, > >#1, the hesitancy is *not* a problem. (IMHO) Well, I think unexplained hesitancies are a problem. If they're caused by known and acceptable reasons, then that's fine. ...but I'd be a poor kernel developer if I overlooked anomolous behavior caused by unknown reasons. >|and this might have something to do with it. ...and of course there are the >|problems with buffer management/directory caching that we still haven't found >|an optimal solution for. We always strive to strike the best balance between >|overall performance and responsiveness...but -current isn't production code, > >#2, Who claimed it was, or ever should be? I take jkh's representations > to heart. The above was in response to the "could anyone improving the system comment on their philosophy for doing these changes? Does the overall system throughput improve?" pseudo flamebait. This is a leading question and suggests that we intentionally "improve" the system by making it perform worse. This of course is silly. In the past few days I've personally been very concerned about system stability - there were serious bugs in vfs_bio.c and vfs_cluster.c that would cause the machine to hang. There may still be bugs - but performance decreasing or not, these changes are required if you want the system to run longer than a few hours. >|and we make no representations that it is. If you could be more specific >|about certain kinds of operations that appear slower, this would help us > >#3, (This will come off the wrong way, but damn the torpedos:) > Use the system dammit, and you'll notice the delays... I could go into a long description about the extensive testing that I do here on multiple machines...on how I've spent over a 100 hours of time just in the past few weeks doing various forms of load testing and analysis...but instead I'll answer this with "I do". I've noticed slowdowns only during certain tests - and then only with certain configurations of memory. I don't know how much memory your machine has, but it appears that the worst case is about 16MB of RAM. The machine I do most of the FreeBSD development on has 64MB...and I haven't noticed any problems in that case. >|find the problems (I saw your Bonnie results...these really aren't very >|useful by themselves, however, as they are affected too much by local disk >|fragmentation). > >#4, Wrong! Nothing personal intended!!!!!! > >I've used these on a couple of dozen systems, running a lot of different >unices, and if they had susceptibilities I would smoke them out myself. >I have absolutely nothing to gain by using inaccurate tools. Indeed, it's always important to use the right tools. The primary problem that we have been trying to solve has to do with faulty algorithms for directory and metadata caching. There are some severe problems with directory cache buffers getting flushed out by 'VMIO' buffers. Something seems to cause the directory cache to shrink and then stay small. This doesn't happen all the time, and seems to be triggered by specific events. This obviously has nothing to do with linear or random disk access which the Bonnie benchmark tests. This aside, Bonnie (and any test of sequential disk access) is often skewed by filesystem fragmentation and unless it is run on a freshly newfs'd filesystem, it isn't a very good measure of a system's throughput capability. >If you're trying to say the scsi system isn't moderately broken, performance >wise, since the 021095-SNAP, I'd really like to know why. I'm not saying this at all. It may very well be the case that the NCR driver isn't performing as it should be - perhaps Stefan Esser might be able to say something about this. I suppose my only point in all of this is that it is too soon to be complaining about the day-to-day performance differences of -current. I admit that there are problems, and I promise to do what I can to resolve them...we won't release 2.1 until this has been fixed. I might also add that John Dyson (the author of vfs_bio.c and parts of vfs_cluster.c) has put a lot of time into solving the performance problems. He was called out of town on emergency business yesterday and won't be able to continue working on this until sometime late next week. Just before he left, he came up with a set of changes which are thought to solve the problem...the only hitch is that during the testing his root filesystem was corrupted. I can send you these "fixes" if you'd like. :-) I've become ill in the past two days (cold or flu), but if I'm not feeling too bad tomorrow, I may work on this some. I'd be happy to include you in testing if you're interested (and if I come up with something). -DG
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199503040734.XAA00315>