Date: Fri, 16 May 1997 12:06:22 +0100 From: James Mansion <james@westongold.com> To: freebsd-hackers@freebsd.org Subject: Re: mmap() Message-ID: <337C3FAE.4295@westongold.com> References: <199705152313.QAA16073@phaeton.artisoft.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Terry Lambert wrote: > > > > Maybe you can convince John Dyson that coding this would be fun (it > > > might even actually *be* fun 8-)), and then checking the degradation > > > this causes in the general case to see if it's unacceptably high for > > > your special case. > > > > I can't see that this would be a high cost. You'd only tell the real > > benefit on a loaded system anyway. > > It would have a cost higher than not doing it (ie: non-zero). Using > mmap() and then doing sequential I/O is probably a very limited market, I disagree that there is a limited market. If you have BLOB data stored in an RDBMS then I'd think it quite likely that you'll access sequential pages, same for any sort of table or index scan. I can't see that it makes much sense to COPY data into memory that has been allocated swap space if you don't need to. Of course, in such cases you might not have given the sequential hint and such a heuristic as you describe is more valuable in this case. I'd also expect it to be a worthwhile technique for web servers and FTP servers. > so John would have to amke a decision to accept the non-0 degradation > on that basis. I take issue with moans about 'it costs more than zero' and being so defensive about a few cycles. Its not the first time this attitude is apperent on this list. Consider this case: - you have a store with every page in - you have a decision with every page in on whether to start another (maybe several) reads These decisions will take a few branches only. How long - a microsecond? 10 microsoeconds? This is for an operation which is going to a hardware device FFS! I know that VM pagein code is used a lot, but how much of the total of (say) ftp.cdrom.com is in this code? Say 5%? Heck, you could add 20% to the code path before you get a 'measurable' difference of 1%. (And do I believe its 5%? No. It'll be less than this). The same argument applies to the single SMP binary case - if the kernel is 10% of the load profile and you degrade the whole lot by 10% (which is a LOT for this sort of thing) you'll get a massive hit of a whole 1%. Big deal. We *know* that the Linux kernel is relatively unaffected by compiler optimisation levels providing that the obvious frame-pointer expense isn't incurred, and there are suggestions that changes to eg register calling will still have negligible effect, so it is somewhat unlikely that minor algorithms of this sort are going to hurt. Its much more likely in my view that the kind of graph closure coding that you propose for a fine grained SMP system will have a measurable impact. Please, people, be reasonable in your opposition to extending code paths. Its not bad if the cost is low and you get material benefits, and its quite easy to estimate upper bounds for the user-level hit that you'd see for most things. > > I'm not saying that it's unlikely, only that there *is* a trade involved, > and so a decision tobe made. Sure, but let's keep things in perspective. Knee-jerk reaction seems to be 'there's a cost! There's a cost!'. Clearly there is, and clearly there's no gain without some (small) pain. > > > Could you even use the simpler and hugely kludgy approach of 'if the > > faulting process' descriptor has sequential access, then fault the next > > <n> pages' where <n> is some tunable value? > > This would lead to bursty operation without high and low watermarks... > and you are back to storing the last page for that compare. Well, not necessarily, if the 'follow-on' pages are faulted using a low-level facility that DOESN'T make this decision. You'll only start another multi-page read when you miss the current set. Conceivably you could fault them all in in a synchronous operation. Hopefully this would fit well with the layout of files on disk, and you could limit it to a read of data from the same track or cylinder so that the extra cost in terms of IO will be low. Its only an advisory optimisation anyway. > > > Factor 2 degradation seems a big hit, particularly when using a > > technique that 'should' be efficient, but maybe this would be cheap > > and remove much of the performance hit? > > Well, it's a "not doing an optimization" hit, not a "doing something > wrong hit". Frankly, sequentially accessing mmap()'ed data is... wierd. Its not wrong by any means, and is quite common if you have to work on large files (say, larger than core) that you will normally scan but may also seek in. Dumb to make the swapper work hard if you are just doing read-only access. Its also the case that you might have these large data stores shared between multiple processes and reading chunks into private swappable memory seems doubly wasteful in this case. > > 8-). > > Like I said, best bet is to talk to John (or do it yourself). > > > The technique you mention is more along the lines of one that one might > > use for cases where the MADV_SEQUENTIAL 'hint' has NOT been given and > > you still want a heuristic to identify when sequential access is > > nevertheless occuring locally, as might be the case with (say) a DBMS. > > Yes. > > > In this case, the hint is plainly given and you might act on it more > > directly. > > Yes again; but you then add a compare for the hint, and still need to > save the last value, and compare it if the hint is set. I guess it > saves you the save in the general case. Well, clearly you have to check to see if the flag is set, but then you've got a whole bunch of flags to check anyway. Not sure why you need to save the last value - why not just page in multiple pages right away? If the latency for getting a subsequent page is low, this will probably be cheaper than trying to set up an async request. > > Really, I think the hit is negligible; however, it's not my decision to > make, no matter what. I'm just suggesting possible approaches to the > problem. > > > What happens for executable image pages? Do those get any readahead > > now? > > No. You'd think data pages, would, though. Datapage might benefit from > this optimization as well, so you may be able to sell it that way. It's > just that your case is odd, so it's not likely to attract a lot of effort > to optimize. I'd suggest you make the changes yourself, if you can, and > then see what it does as far as performance. (Can't, I'm afraid, owing to hardware failure at the moment. Not to mention imminent arrival of triplets.) I would have thought that code pages would benefit too, especially if you have an opportunity to perform reordering of the functions so that there is good locality of reference. Admittedly, ld doesn't do this now. I think this is relevant, though Win32 targetted: http://www.cs.washington.edu/homes/bershad/etch/index.html I'm sure its not the only such system. > > Regards, > Terry Lambert > terry@lambert.org > --- > Any opinions in this posting are my own and not those of my present > or previous employers. -- Westongold Ltd C++/Java Multithread development and libraries +44 1920 444284 info@westongold.com
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?337C3FAE.4295>