Date: Thu, 27 Apr 2006 17:08:11 +0300 (EEST) From: Dmitry Pryanishnikov <dmitry@atlantis.dp.ua> To: freebsd-stable@freebsd.org Subject: RELENG_4 -> 5 -> 6: significant performance regression Message-ID: <20060427160536.M96305@atlantis.atlantis.dp.ua>
next in thread | raw e-mail | index | archive | help
Hello! I've done simple (yet, I hope, reality-reflecting) performance benchmarking different STABLE branches (4 vs 5 vs 6) using the following hardware: CPU: Pentium II/Pentium II Xeon/Celeron (334.09-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x665 Stepping = 5 Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PA T,PSE36,MMX,FXSR> real memory = 134152192 (127 MB) ... rl0: <RealTek 8139 10/100BaseTX> port 0xe800-0xe8ff mem 0xdc101000-0xdc1010ff irq 5 at device 20.0 on pci0 ... fxp0: <Intel 82559 Pro/100 Ethernet> port 0xe400-0xe43f mem 0xdc100000-0xdc100fff,0xdc000000-0xdc0fffff irq 7 at device 19.0 on pci0 ... ad0: 76351MB <SAMSUNG SP0802N TK100-24> at ata0-master UDMA33 and just restoring precompiled 4/5/6-STABLE to the same HDD partition. I've used the following kernel config for 4-STABLE: ident TEST machine i386 maxusers 32 makeoptions CONF_CFLAGS=-fno-builtin makeoptions DEBUG=-g options INCLUDE_CONFIG_FILE cpu I686_CPU options COMPAT_43 options USER_LDT options SYSVSHM options SYSVSEM options SYSVMSG options INVARIANTS options INVARIANT_SUPPORT options USERCONFIG options INET options FAST_IPSEC options IPSEC_FILTERGIF pseudo-device ether pseudo-device vlan 1 pseudo-device loop pseudo-device bpf pseudo-device ppp 8 options PPP_BSDCOMP options PPP_DEFLATE options PPP_FILTER options IPFIREWALL options IPFW2 options IPFIREWALL_VERBOSE options IPFIREWALL_VERBOSE_LIMIT=100 options IPFIREWALL_FORWARD options IPDIVERT options IPSTEALTH options ICMP_BANDLIM options DUMMYNET options FFS options FFS_ROOT options SOFTUPDATES options QUOTA options P1003_1B options _KPOSIX_PRIORITY_SCHEDULING options _KPOSIX_VERSION=199309L pseudo-device pty pseudo-device crypto device isa device atkbdc0 at isa? port IO_KBD device atkbd0 at atkbdc? irq 1 device psm0 at atkbdc? irq 12 device vga0 at isa? pseudo-device splash device sc0 at isa? options SC_HISTORY_SIZE=1000 options SC_TWOBUTTON_MOUSE device npx0 at nexus? port IO_NPX flags 0x0 irq 13 device ata device atadisk options ATA_STATIC_ID device fdc0 at isa? port IO_FD1 irq 6 drq 2 device fd0 at fdc0 drive 0 device fd1 at fdc0 drive 1 device sio0 at isa? port IO_COM1 irq 4 device sio1 at isa? port IO_COM2 irq 3 device pci and slightly modified it for 5/6-STABLE, here is the diff ("<" = 4-only option, ">" - 5/6-only): > options SCHED_4BSD < options USER_LDT < options USERCONFIG < pseudo-device ether < pseudo-device vlan 1 < pseudo-device loop < pseudo-device bpf < pseudo-device ppp 8 > device ether > device loop > device bpf < options IPFW2 > options IPFIREWALL_FORWARD_EXTENDED < options ICMP_BANDLIM < options FFS_ROOT < options P1003_1B < options _KPOSIX_VERSION=199309L < pseudo-device pty < pseudo-device crypto > device pty > device crypto < device atkbdc0 at isa? port IO_KBD < device atkbd0 at atkbdc? irq 1 < device psm0 at atkbdc? irq 12 < device vga0 at isa? < pseudo-device splash < device sc0 at isa? --- > device atkbdc > device atkbd > device psm > options KBD_INSTALL_CDEV > device vga > device splash > device sc < device npx0 at nexus? port IO_NPX flags 0x0 irq 13 > device npx < device fdc0 at isa? port IO_FD1 irq 6 drq 2 < device fd0 at fdc0 drive 0 < device fd1 at fdc0 drive 1 < device sio0 at isa? port IO_COM1 irq 4 < device sio1 at isa? port IO_COM2 irq 3 Also I've set kern.hz="100" in /boot/loader.conf for every system. I've effectively excluded ipfw from the game by using 'add 1 pass all from any to any' rule. I hope, I've compared apples with apples this way. For every x-STABLE, I've received large ISO image via FTP in binary mode twice: using rl NIC and using fxp one, both in 10baseT mode (got approx. 1 Mbyte/s transfer rate). I've noted CPU utilization which gave "systat -vm 1" once numbers have stabilized. Here are the results (average numbers, %User and %Nice are close to zero): %Sys %Intr %Idl RELENG_4 + rl0 14 14 72 RELENG_4 + fxp0 14 10 76 RELENG_5 + rl0 40 30 30 RELENG_5 + fxp0 35 25 40 RELENG_6 + rl0 45 40 15 RELENG_6 + fxp0 45 35 20 I've tried to verify these numbers by running 'md5 -t' in parallel with download and measuring wall time: "time md5 -t". Indeed, under RELENG_4 I've got 43 sec on wall clock time for this benchmark vs 2:01 for RELENG_5 and 2:05 under RELENG_6 (I don't understand why difference is so low between 5 and 6 here). I would call these numbers discouraging. Actually such high CPU usage during the relatively simple processing to HDD of _only_ 10 Mbit/s traffic will surely prevent deployment of 6-STABLE on many not-very-powerful production servers. Am I missing something simple regarding compile-time or runtime optimization? Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: dmitry@atlantis.dp.ua nic-hdl: LYNX-RIPE
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060427160536.M96305>