Date: Sat, 3 Feb 2018 22:52:59 +1100 (EST) From: Bruce Evans <brde@optusnet.com.au> To: brooks@freebsd.org Cc: freebsd-bugs@freebsd.org Subject: Re: [Bug 225626] r325865 malloc vs bzero Message-ID: <20180203215302.T1064@besplex.bde.org> In-Reply-To: <bug-225626-8-RsDlqNTq6z@https.bugs.freebsd.org/bugzilla/> References: <bug-225626-8@https.bugs.freebsd.org/bugzilla/> <bug-225626-8-RsDlqNTq6z@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 2 Feb 2018 a bug that doesn't want replies@freebsd.org wrote: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=225626 > > --- Comment #1 from Brooks Davis <brooks@FreeBSD.org> --- > I'd agree it's pointless, but there's seriously nothing wrong with the fix > other than making a path that isn't performance relevant slightly slower. If > you want to submit a patch that would likely be fine. Bug reports aren't the > places for this discussion. Changing to using malloc() is correct, since the data is too large to put on the kernel stack. > Note that memset should be used in preference to bzero as the compiler should > be able to elide most of the cost of the memset since it can emit it inline and > then delete the dead stores. Note that memset() should _not_ be used in preference to bzero() since: - using memset() in the kernel is a style bug, except possibly with a nonzero fill byte - the existence of memset() in the kernel is an umplementation style bug, except possibly with a nonzero fill byte. It was intentionally left out in 4.4BSD and in old versions of FreeBSD. It is mainly compatibilty cruft for contrib'ed code that doesn't know kernel APIs and used to have private definitions of it duplicated ad nauseum. - using memset() instead of bzero() in the kernel is a pessimization. Since memset() is only compatibilty cruft and should not be used, it is intentionally not as optimized as bzero(). One of the optimizations is that bzero() is optimized to let the compiler inline it (up to a too-hard-coded size of 64 bytes), while memset() is pessimized to not let the compiler inline it. The kernel is compiled with -ffreestanding. This turns off all builtins, since a kernel function named foo() is in general unrelated to a standard function named foo(). None are turned back on in <sys> headers, but bzero() is optimized using __builtin_memset(). Not so simlarly for memcpy(). Its use in the kernel is now just a style bug, since the compiler is not allowed to inline it (except in my version of course). However, in old versions of FreeBSD which were not compiled with -ffreestanding, memcpy() was supposed to be used instead of bcopy() for all small fixed-sized copies that the compiler would inline up to a MD size, but for no other cases. The compatibility cruft of an extern memcpy() was added for cases where the compiler didn't inline memcpy(). Since memcpy() was unimportant, it was intentionally not as optimized as bzero(). It wasn't pessimized enough to prevent it being used as a style bug. Perhaps a linker warning like the one for gets() should have been used to inhibit its use. Warnings from -Winline are related. This should have been implemented like bcopy() is now, with an internal conversion to __builtin_memcpy() for small fixed-sized copies, but with a fallback to an implementation-detail function like __memcpy() to keep memcpy() out of the KPI. FreeBSD was changed to use -ffreestanding because without it the compiler is allowed to inline functions like printf() and gcc started doing that (it converts printf(3) into puts() galore, and puts() doesn't exist in the kernel). This broke all inlining, but no one cared (except me of course). It turns out that inlining and other optimizations and pessimizations make little difference. bcopy() was only significantly faster than memcpy() for large copies on Pentium-1 in ~1997, using special optimizations for Pentium-1 that are pessimizations on most later x86 CPUs. After removing these optimizations, bcopy() is almost the same as memcpy() on x86. bcopy() has more setup overhead, so tends to be slower. Another development is "fast strings" on newer x86. With this, "rep movsb" is faster than "rep movsd" since it is has less setup and finishup overhead. The implementation of both bcopy() and memcpy() is still generic with some tuning for the original i386, so it doesn't benefit much from this. "rep movs*" still has a lot of internal setup overhead, so it is a bad method for small copies. "fast strings" also affects inlining. Compilers used to generate lots of "[rep] movs*"s, but this was a bad method for almost all sizes so compilers stopped doing this long ago. For small sizes, it is bad because of high internal setup overhead, and for large sizes it is bad because the library function might be able to do it better and the compiler doesn't know how much better or worse the library function is. Now with "fast strings", bot the compiler and library can just use "rep movsb" for large copies, but this is hard to configure. "large" is quite large -- normally 1K or even 4K. The compiler can and does have zillions of variants depending on -march. Having zillions of variants in the library is not so easy, and might end up as a pessimization unless the correct variant is selected at compile time. Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180203215302.T1064>