Date: Wed, 13 Mar 2002 23:34:11 +0100 (CET) From: BOUWSMA Beery <freebsd-user@dcf77-zeit.netscum.dyndns.dk> To: hackers@freebsd.org Subject: [LONG] Re: Performance of FreeBSD vs NetBSD (was: Re: Performance of -current vs -stable) Message-ID: <200203132234.g2DMYBO03364@beerswilling.netscum.dyndns.dk>
next in thread | raw e-mail | index | archive | help
(sorry for the delay in following-up to this thread; when the Big Blue Room is cloudless and approaching 25 degrees at this time of year, I feel an uncontrollable craving to lock myself in that room most of the day) I wrote: > > Hmmm, a few weeks ago I did some totally unscientific testing, noting > > that -current was much slower than -stable, by playing an mp3 with an And then a lot of people responded. So let me attempt to restate things, and possibly clear things up thereby. Or, you can just skip straight to the end, where I reveal just what I did to restore similar performance with FreeBSD that I saw under NetBSD, which shall terminate this thread. One can hope. My observations were as follows: o) I had problems doing ``work'' and listening to mp3s with a native mpg123 binary under both FreeBSD-CURRENT and 4.5-STABLE. o) I had no problems with a comparable native binary and NetBSD-current. o) Both FreeBSD-CURRENT and FreeBSD-stable performed roughly identically, both with and without the kernel WITNESS option, so I wasn't seeing the killer performance there that others have noted, just as a side note. I then asked if any of the config options I posted from part of my kernel configuration for -STABLE were known show-stoppers to be avoided. By the time I had updated my archive of the mailing lists a day or two later, nobody had pointed an accusing finger, so I've decided to do somewhat more extensive testing. Seat-of-the-pants observation with `top' showed an apparent improvement by a factor of two in CPU usage when running the native NetBSD binary under NetBSD. Other observations I've made, that I'll be using as datapoints later, are that a normal `buildworld' for both -current and -stable on this 75MHz hardware take somewhere arount 1000 to 1100 minutes or so; also, a `nice'd `installworld' (out of necessity niced in order to get relatively real-time audio playback with only a few pauses each minute) took two or three hours when running mpg123. Then I took one of the FreeBSD binaries and re-linked it statically, in order to run it under NetBSD as well as both FreeBSDen. With this, the FreeBSD performance was unchanged, whilst that of NetBSD actually improved by `top' to a ratio of ~3:1 CPU usage by FreeBSD. Now I'll be doing other tests, to guess whether this is a real system- like issue, or if it only affects mpg123, or my audio setup, or what. Ideas include timing a comparable build process under NetBSD (which does rather differ from that of FreeBSD, so perhaps only for amusement value), and attempting to run the same build process with both Net- and FreeBSD. Other tests limited to a `buildkernel' may be tried, so I can get more results than a build per day and a half. Hey, oboyoboy, one -STABLE FreeBSD test gave these results: bash-2.05a$ time /usr/obj/ports/5.0/usr/ports/audio/mpg123/work/mpg123-0.59r/mp g123-static-O3-current -t -v /usr/home/mp3/hr-XXL-chillout-11.aug.mp3 [...] Playing MPEG stream from hr-XXL-chillout-11.aug.mp3 ... Junk at the beginning 00000000 MPEG 1.0, Layer: III, Freq: 44100, mode: Joint-Stereo, modext: 2, BPF : 522 Channels: 2, copyright: No, original: Yes, CRC: No, emphasis: 0. Bitrate: 160 Kbits/s, Extension value: 0 Audio: 1:1 conversion, rate: 44100, encoding: signed 16 bit, channels: 2 Frame# 308633 [ 0], Time: 134:22.24 [00:00.00], [120:20] Decoding of hr-XXL-chillout-11.aug.mp3 finished. real 35m43.727s user 33m41.078s sys 0m19.797s bash-2.05a$ This seems to imply that at 35 realtime minutes for a 120 minute file, FreeBSD-STABLE can play back at about 3 1/2 times realtime on a lightly loaded system. Much closer to the NetBSD `top' CPU ratio. This points to the actual sound k0deZ as being responsible for the slowdown that I experience. Now I'll try to respond to points others have made and further muddy the waters, or something. Martin Ankerl noted: > One real test is to > measure how long your machine needs to decode a stream without threads with > 100% CPU. Using mpg123 you can do this with > time mpg123 -t mp3stream.mp3 Good idea. Here's NetBSD-current compared with FreeBSD-current (same static binary on all three OSen): (time /usr/obj/ports/5.0/usr/ports/audio/mpg123/work/mpg123-0.59r/mpg123-static- O3-current -t -v /usr/home/mp3/hr-XXL-chillout-21.okt.mp3 ) NetBSD: [...] Playing MPEG stream from hr-XXL-chillout-21.okt.mp3 ... Junk at the beginning 00000000 MPEG 1.0, Layer: III, Freq: 44100, mode: Joint-Stereo, modext: 2, BPF : 417 Channels: 2, copyright: No, original: Yes, CRC: No, emphasis: 0. Bitrate: 128 Kbits/s, Extension value: 0 Audio: 1:1 conversion, rate: 44100, encoding: signed 16 bit, channels: 2 Frame# 382927 [ 0], Time: 166:42.99 [00:00.00], [174:13] Decoding of hr-XXL-chillout-21.okt.mp3 finished. real 49m56.748s user 48m10.644s sys 0m18.401s FreeBSD-CURRENT: [...] Frame# 382927 [ 0], Time: 166:42.99 [00:00.00], [174:13] Decoding of hr-XXL-chillout-21.okt.mp3 finished. real 52m28.443s user 49m5.392s sys 0m48.884s This difference here is *not* something I'm going to lose sleep over; both systems mostly comparably idle -- this would be a Junior Nitpicking Kernel Hacker task, if anything. > or > time mpg123 -s mp3stream.mp3 > /dev/null > if you additionally want to measure the I/O time. Why not? I've got three OSen to compare, so here I try FreeBSD-4.5-STABLE against NetBSD. time /usr/obj/ports/5.0/usr/ports/audio/mpg123/work/mpg123-0.59r/mpg123-static-O 3-current -s /home/mp3/radio1-johnpeel-07.nov.mp3 > /dev/null NetBSD: [...] Playing MPEG stream from radio1-johnpeel-07.nov.mp3 ... Junk at the beginning 00000000 MPEG 1.0 layer III, 128 kbit/s, 44100 Hz joint-stereo [120:00] Decoding of radio1-johnpeel-07.nov.mp3 finished. 2306.8u 10.6s 39:43.09 97.2% 0+0k 0+0io 533pf+0w FreeBSD-STABLE: [...] [120:00] Decoding of radio1-johnpeel-07.nov.mp3 finished. 2076.543u 13.556s 35:59.23 96.7% 539+482k 0+0io 1582pf+0w Well, well. According to *this* test, I would expect *not* to be seeing far worse performance with -stable than with NetBSD. Brian T.Schellenberger noted: > FWIW, I listten to music and do other things all the time, but I have 512M of > ram and a 900MHz CPU, and I'm guessng he doesn't. Well, I do (sort of), but what fun is that? :-) I mean, for most of what I do, it makes little difference if my machine is 99,9% idle vs 97% idle. Using today's `slow' machines makes inefficiencies all the more obvious, and it's great when I can take these two machines that a friend tossed out, never wanting to set eyes upon again, and with FreeBSD or NetBSD or whatever, get a workstation (or server) which can play smooth audio plus let one do Real Work. Kris Kennaway observed: > > | As you are no doubt aware there are significant infrastructural > > | changes in -current relating to SMP scalability. [...] > > | Basically, it's a known issue. > > At -stable as well as -current or at -current only? > What I'm talking about is a -current issue only. I don't recall > reading the earlier thread. Okay. I see a minor performance hit with -current that isn't enough to get me riled up, with the above tests (decoding only, audio path doesn't enter the picture). However, when the audio path enters the picture in both -current and -stable, then I see a major bigtime performance drop, while NetBSD's observed unscientific unofficial performance doesn't appear to suffer. Given that -current has known issues, and is in a state of flux, and I see only a minor performance difference between -current and -stable, I'll probably just conduct further tests against -stable. Also, the last time I did any serious audio work with FreeBSD was back when 3.3(?) was -stable, and then there were definite issues with 4.0 as -current that made it unsuitable for any of the uses I needed. I really should throw together a releng-3 machine as Yet Another Reference for a number of things I see. However, I've been using one of the PCI audio cards with 4-stable recently without experiencing the problems with audio sampling that 4-current of a couple years back gave me. And another thing, back when I was using FreeBSD-3.x for my audio machines, I saw noticeably better performance with them compared to NetBSD-current of the time, such that I settled on FreeBSD for the audio sampling/encoding machines I built then. Not a factor of two or three or anything, but enough to give me more breathing room. Luigi Rizzo asked again: > > > > what compile time options were used in the two cases ? > > > > They surely can make a huge difference. > > > Could it also be a possibility, that the NetBSD defaults differ from > > > the FreeBSD defaults, I think this could make some difference too. :) > > actually he mentioned in his post that he used the _same_ binary on fbsd > > and netbsd (statically linked, netbsd with fbsd emu layer) > actually later in the same message he mentioned he used a different > binary, and the "top" output showed two different names. This all goes to show that my messages are no glistening example of brevity and clarity, and that I probably need to go into politics. The long story is that I recompiled all three *BSD binaries long ago to get a modest speedup (50%?) by stealing the NetBSD options for FreeBSD, or vice versa, or something. So the native binaries were somewhat similar, if not identical. For reference, here's what gave me the FreeBSD-STABLE binary that may have been used for my initial report, before building the static version: <clickety-click> waaah! I just lost my working directory from .build_done.mpg123-0.59r_4 including all my hacks and Makefile modifications! Curses. Oh well, here's a guess as to what I used to get the FreeBSD-CURRENT binary, which I relinked statically to use for the tests after my initial observations, based on the Makefile contents and the options I gave: -O -pipe -O3 -mpentium -mcpu=pentium -march=pentium -Wall -ansi -pedantic -funroll-all-loops -ffast-math -fomit-frame-pointer \ -DROT_I386 -DI386_ASSEM -DREAL_IS_FLOAT -DPENTIUM_OPT -DREAD_MMAP -DUSE_MMAP -DOSS -DTERM_CONTROL The NetBSD options probably resemble netbsd-i386-elf: $(MAKE) CC=cc LDFLAGS=-static \ OBJECTS='decode_i386.o dct64_i386.o decode_i586.o \ audio_sun.o term.o' \ CFLAGS='$(CFLAGS) -Wall -ansi -pedantic -O4 -fomit-frame-pointer \ -funroll-all-loops -ffast-math -DROT_I386 \ -DI386_ASSEM -DPENTIUM_OPT -DREAL_IS_FLOAT -DUSE_MMAP \ -DREAD_MMAP -DNETBSD -DTERM_CONTROL' \ mpg123-make Since I wasn't suffering awful performance with this binary natively, I didn't tweak it much beyond what that package build gave me. Initially I was using the native binaries from each OS and release, but after building the FreeBSD static binaries, I used that for later tests. When I ran that on both FreeBSD-current and -stable, I saw performance almost identical to the non-static binaries, so I didn't bother to capture the `top' output that I already had from the native shared library binaries, which is why the executable names differed. Where I did see a change, in running the FreeBSD static binary with NetBSD compatibility, I did capture the new `top' output. Sorry that it wasn't clear that the pictured `top' outputs under FreeBSD were valid for the -static binary as well. FWIW, I've just built a new -stable binary, using cc -O -pipe -DMAXPARTITIONS=16 -DINET6 -O3 -mpentium -mcpu=pentium -march=pent ium -O3 -Wall -ansi -pedantic -funroll-all-loops -ffast-math -fomit-frame-point er -DROT_I386 -DI386_ASSEM -DREAL_IS_FLOAT -DPENTIUM_OPT -DREAD_MMAP -DUSE_MMA P -DTERM_CONTROL -c mpg123.c (the -DMAXPARTITIONS is intended for kernel/world building, but I didn't see an obvious way to specify a world-only `make' in make.conf the way one can for kernel builds, so everything gets it so that I don't have to hack as many kernel source files -- I think `COPTS' does the opposite of what I want, giving additional flags used when *not* building world, if I read right) The -DOSS is missing to match NetBSD, but still no joy at startup, where `truss' reveals many seconds spent in THIS SOFTWARE COMES WITH ABSOLUTELY NO WARRANTY! USE AT YOUR OWN RISK! write(2,0xbfbff384,71) = 71 (0x47) open("/dev/dsp",0x1,00) = 3 (0x3) ioctl(3,SNDCTL_DSP_GETBLKSIZE,0x8093f20) = 0 (0x0) ioctl(3,AUDIO_COMPAT_FLUSH,0x0) = 0 (0x0) ioctl(3,SNDCTL_DSP_SETFMT,0xbfbffa10) = 0 (0x0) ioctl(3,SNDCTL_DSP_STEREO,0xbfbffa0c) = 0 (0x0) ioctl(3,SNDCTL_DSP_SPEED,0xbfbffa08) = 0 (0x0) ioctl(3,SNDCTL_DSP_SETFMT,0xbfbffa10) = 0 (0x0) ioctl(3,SNDCTL_DSP_STEREO,0xbfbffa0c) = 0 (0x0) ioctl(3,SNDCTL_DSP_SPEED,0xbfbffa08) = 0 (0x0) [hundreds such lines snipped] ioctl(3,SNDCTL_DSP_SETFMT,0xbfbffa10) = 0 (0x0) ioctl(3,SNDCTL_DSP_STEREO,0xbfbffa0c) = 0 (0x0) ioctl(3,SNDCTL_DSP_SPEED,0xbfbffa08) = 0 (0x0) close(3) = 0 (0x0) __sysctl(0xbfbffa00,0x2,0x8094900,0xbfbff9fc,0x0,0x0) = 0 (0x0) sigaction(SIGINT,0xbfbffa5c,0xbfbffa44) = 0 (0x0) open("hr-XXL-clubnite-from-27.okt.2001.mp3",0x0,07) = 3 (0x3) lseek(3,0x0,2) = 146176000 (0x8b67800) =-=-=-=-=-=-=-=-=-=-=-= Well, if you're down here, you either read all of the above, so that you've earned hearing about what I discovered gave me CPU back, or else you've skipped down here to learn that. Here goes. Now, with -stable, I'm running the static mpg123 I used earlier, and I see CPU states: 28.3% user, 0.0% nice, 3.1% system, 2.7% interrupt, 65.9% idle PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND 250 beer 33 0 141M 2524K RUN 3:41 29.15% 29.15% mpg123-static- 157 root 10 -52 888K 548K nanslp 1:18 0.10% 0.10% radioclkd 90 root 2 -52 2560K 1544K select 0:11 0.00% 0.00% ntpd 244 root 28 0 1440K 1180K RUN 0:07 0.00% 0.00% top As a reminder, here's how NetBSD-native looked: > CPU states: 38.1% user, 0.0% nice, 1.5% system, 1.0% interrupt, 59.4% idle > PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND > 229 beer 10 0 308K 3828K aud_wr 1:17 37.16% 37.16% mpg123 And here's how NetBSD with this same FreeBSD static binary seemed to look: > CPU states: 20.3% user, 0.0% nice, 1.0% system, 0.0% interrupt, 78.7% idle > PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND > 241 beer 36 0 512K 2020K RUN 0:24 21.71% 21.63% mpg123-stati I'm not going to worry over this unscientific difference; I'm already seeing about a factor of two improvement under FreeBSD. But why? > sound card: sbc0: <ESS ES1868> at port 0x220-0x22f,0x388-0x38b,0x330-0x331 irq 5 drq 1,0 on isa0 > pcm0: <ESS 18xx DSP> on sbc0 Just as a test, I switched out this card, which I had previously used for all the measurements and observations, for a different one: pcm0: <Creative CT5880-C> port 0xfcc0-0xfcff irq 9 at device 11.0 on pci0 So, it seems that the `sbc' soundblaster k0deZ, as used by my ES1868, are responsible for the slowdown I saw. What remains to do are such things as... o) Seeing if NetBSD's performance changes any way with this card [answer: maybe...] o) Seeing if the 20%-CPU figure I saw under NetBSD is repeatable, or... [answer: yes, except that with this card, it's more like 17%] o) Trying other genuine SB16 cards to see if there's a difference o) Timing a `buildkernel' or something with mpg123 in parallel [answer: dropouts happen occasionally since I didn't `nice' the mpg123, but it takes a whopping 64min] o) Doing real ``work'' while listening to music thanks to all who replied, and hope something above is useful barry bouwsma To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200203132234.g2DMYBO03364>