Date: Wed, 5 Nov 2014 05:24:17 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Brooks Davis <brooks@freebsd.org> Cc: Hans Petter Selasky <hps@selasky.org>, Mateusz Guzik <mjguzik@gmail.com>, jmallett@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org, Poul-Henning Kamp <phk@phk.freebsd.dk>, src-committers@freebsd.org, Bruce Evans <brde@optusnet.com.au>, Konstantin Belousov <kostikbel@gmail.com>, Julian Elischer <julian@freebsd.org> Subject: Re: svn commit: r274017 - head/sys/kern Message-ID: <20141105040359.F1613@besplex.bde.org> In-Reply-To: <20141104151101.GG29192@spindle.one-eyed-alien.net> References: <201411030746.sA37kpPu037113@svn.freebsd.org> <54573AEE.9010602@freebsd.org> <54573B87.7000801@freebsd.org> <54573CD2.1000702@selasky.org> <20141103092132.GH29497@dft-labs.eu> <20141103100847.GK53947@kib.kiev.ua> <20141104045159.E1605@besplex.bde.org> <79454.1415043356@critter.freebsd.dk> <20141104054144.GB4032@dft-labs.eu> <20141104151101.GG29192@spindle.one-eyed-alien.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 4 Nov 2014, Brooks Davis wrote: > On Tue, Nov 04, 2014 at 06:41:44AM +0100, Mateusz Guzik wrote: >> re-sent with trimmed cc >> >> On Mon, Nov 03, 2014 at 07:35:56PM +0000, Poul-Henning Kamp wrote: >>> -------- >>> In message <20141104045159.E1605@besplex.bde.org>, Bruce Evans writes: >>> >>>> This optimization is probably minor, but reminds me of other syscalls >>>> that would benefit using a single largish allocation up front: >>>> - all vfs calls that start with namei(). They allocate PATH_MAX (1K) >>>> bytes and more. An 8K stack has plently to spare after allocating >>>> 1K. However, if malloc() is used then the PATH_MAX limit shouldn't >>>> exist. Only a limit to prevent denial of service is needed. >>> >>> We should actually roll a new rev of all syscalls which take a path >>> and have them pass strlen(path)+1 into the kernel. >>> >>> That would allow both precise allocation and faster copyin, followed >>> by a check that the last byte is (still) a NUL. >> >> I think we can speed up things on amd64 no problem without affecting >> userspace. >> >> amd64's copyinstr (and most likely all others) copy byte by byte. What >> we can do is copy word size and fall back to byte by byte on page fault. >> Ten on each iteration check if 0 was encountered. The speed of copyinstr is unimportant. If it were, then someone would have optimized it 25 years ago when memory was slower and caches and pipelines and branch prediction were smaller or nonexistent, so that the improvement would have been larger, bith relatively and absolutely. amd64 ran for many years with unoptimized C versions of most string functions in libc. These were only 2-8 times slower than best possible. No one noticed the slowness or the speedup from changing them. copyinstr is a few orders of magnitude (base 10) less important. > As long as you handle unaligned starting addresses, you can safely copy > aligned-word by aligned-word without the risk of spurious pagefaults > (libc makes this assumption all over the place). You may trip other > potential memory protection mechanims which aren't limited to page > alignment, but for exisiting hardware you don't need to worry about it. x86 libc string functions mostly avoid doing alignment, since that would just be slower for the usual case where everthing is aligned. This depends on the hardware allowing misaligned accesses and being only slightly slower for them. The main exceptions are in functions "optimized" to do word-sized accesses. They have to align to avoid running off the end of arrays and getting pagefaults or worse. I doubt that word accesses is a useful optimization for normal use of string functions. Testing showed that the extra startup time is amortized by the string length being just 1 or 2 words, but that is for cached strings. The cached case now runs so fast (several GB/sec even with bytewise accesses) that it is hard to find uses where its overhead is not in the noise. The uncached case runs slow due to cache misses; bytewise accesses add a little noise to its slowness. Pathname lookups in the kernel are probably closest to the uncached case. A typical use is 1 pathname access per file (to open the file). > That said, I prefer phk's suggestion in many ways. It places the burden on > userspace where it belongs and allows precise allocation in the kernel. I don't prefer it, of course. The kernel still can't trust any user parameters. copyin() may be a little faster than copyinstr(), but now userland is slower since it has to do a strlen() before every syscall that passes a counted string. Better try the modest project of changing so to require counted strings everywhere. BTW, copystr() should never have been implemented in asm. It just gives copyinstr()'s error handling for kernel strings. This is convenient, but "optimizing" copystr() by writing it in asm is even less important than for copyinstr(). It can be written in a few lines of C using standard string functions, e.g., by using almost the same code as phk proposed for user pathnames: size_t lencopied, slen; slen = strlen(kfaddr); lencopied = slen + 1; if (lencopied >= len) lencopied = len; bcopy(kfaddr, kdaddr, lencopied); if (done != NULL) *done = lencopied; return (slen < len ? 0 : ENAMETOOLONG); This is a little simpler than for copying from user space, since bcopy() can't trap. This is likely to be faster than the "optimized" asm versions since it doesn't always use bytewise accesses. strlen() gives unnecessary accesses, but they might not be bytewise. The man page is quite quite broken: - it says that the [source] string is at most 'len' bytes long, but I think 'len' is actually the size of the target buffer; this should be at least 1 larger than the source or target string length. - it doesn't say if the target buffer contains a (terminated) string on error. Since termination is impossible when len == 0, it cannot require termination. These bugs are missing in at least the i386 asm implementation. I think it just copies as much of the source string as possible. The result is terminated iff there is space for the terminator. The above is supposed to match the i386 implementation, not the man page. copystr() can now be implemented even more simply but much more slowly using snprintf(). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141105040359.F1613>