Date: Fri, 16 May 1997 10:34:27 -0700 (MST) From: Terry Lambert <terry@lambert.org> To: james@westongold.com (James Mansion) Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: mmap() Message-ID: <199705161734.KAA17584@phaeton.artisoft.com> In-Reply-To: <337C3FAE.4295@westongold.com> from "James Mansion" at May 16, 97 12:06:22 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > so John would have to amke a decision to accept the non-0 degradation > > on that basis. > > I take issue with moans about 'it costs more than zero' and > being so defensive about a few cycles. Its not the first time this > attitude is apperent on this list. [ ... ] > I know that VM pagein code is used a lot, but how much of the total of > (say) ftp.cdrom.com is in this code? Say 5%? Heck, you could add 20% > to the code path before you get a 'measurable' difference of 1%. (And > do I believe its 5%? No. It'll be less than this). [ ... ] I am not opposed to this change. I am suggesting that you write the code. I am also cautioning you as to who will have the final say on whether or not your code is integrated, and the issues he may be looking at after having been beaten up on various performance issues vis-a-vis Linux for about three years now (making that particular code path a bit sensitive). > Its much more likely in my view that the kind of graph closure coding > that you propose for a fine grained SMP system will have a measurable > impact. You're right. It will. It will make the UP kernel preemptive, and then FreeBSD will get the same 60% improvement in I/O latency that UnixWare got when we did it at USL. Only FreeBSD will probably be better than 60%, because it will use Soft Updates instead of Delayed Ordered Writes within the concurrency graph. And it will have the effect of reducing interprocessor synchronization (not mattering whether it's by IPI or by MESI hardware coherency) and therfore FreeBSD will be SMP scalable to more processors than UnixWare before bus effects make it cost-ineffective to add more processors. Not that this has anything to do with predictive fault-ahead in a VM system. > Please, people, be reasonable in your opposition to extending code > paths. Again, I'm not opposed. Feel Free to implement your code changes, and after doing so, feel free to submit them with as flimsy or as robust a set of performance tradeoff benchmarks as you see fit. It's not like I'll have any part in the decision to adopt or not adopt the code. The real problem I have is that you seem to be asking someone else to do the code. It would be a different story if you had done the code, and were simply asking that it be integrated (I happen to have this particular problem myself). > Well, clearly you have to check to see if the flag is set, but then > you've got a whole bunch of flags to check anyway. > > Not sure why you need to save the last value - why not just page in > multiple pages right away? If the latency for getting a subsequent > page is low, this will probably be cheaper than trying to set up an > async request. Because MADV_SEQUENTIAL is the wrong flag to use for this. Really, what the flag does is deprioritize-behind pages. Arguably, this should be names something else, and MADV_SEQUENTIAL should be reserved for meaning what you apparently want it to mean... it's broken to deprioritize the pages if your sequential access is (for example) occuring on anyting more than 4k blocks. > > No. You'd think data pages, would, though. Datapage might benefit from > > this optimization as well, so you may be able to sell it that way. It's > > just that your case is odd, so it's not likely to attract a lot of effort > > to optimize. I'd suggest you make the changes yourself, if you can, and > > then see what it does as far as performance. > > (Can't, I'm afraid, owing to hardware failure at the moment. Not to > mention imminent arrival of triplets.) Well, if you are asking someone else to do the code, you need to ask John Dyson, directly. Or on -current, not on -hackers. > I would have thought that code pages would benefit too, especially if > you have an opportunity to perform reordering of the functions so that > there is good locality of reference. Admittedly, ld doesn't do this now. Well, you are assuming predictive forward locality, actually. So ld is not the only thing that would be involved... the compiler code generator out to generate function blocks out of order relative to their physical location in the source file, ideally, if this were the case. Alternately, functions should get their own ELF sections and always be PIC so that the sections could be reordered for page preference. Even then, you are at best engaged in statistical branch path prediction... when do you turn "learning" off? It's like the problem with back propagation Neural nets once they've been trained. You have to reset the data state for each sample so that the training is not adversely affected by the output no longer being clamped; so when do you "clamp" the object order? > I think this is relevant, though Win32 targetted: > http://www.cs.washington.edu/homes/bershad/etch/index.html > > I'm sure its not the only such system. No. The University of Utah reference I gave in the CLUSTERING discussion is similar, but for UNIX and UNIX-like systems (thanks for the pointer to this one, though). Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199705161734.KAA17584>