Date: Wed, 23 Mar 2005 16:45:40 -0500 From: Sven Willenberger <sven@dmv.com> To: freebsd-amd64@freebsd.org Subject: Re: amr performance woes and a bright side [UPDATE] Message-ID: <1111614340.10569.22.camel@lanshark.dmv.com> In-Reply-To: <1110895353.4291.16.camel@lanshark.dmv.com> References: <1110847561.3412.38.camel@lanshark.dmv.com> <1110895353.4291.16.camel@lanshark.dmv.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 2005-03-15 at 09:02 -0500, Sven Willenberger wrote: > On Mon, 2005-03-14 at 21:59 -0700, Scott Long wrote: > > Sven Willenberger wrote: > > > I have been testing a new box for utlimate use as a postgresql server: > > > dual opteron (2.2GHz), 8G RAM, LSI 320-2x megaraid (battery-backed > > > memory) with 2 single 73 GB drives and an 8x146GB RAID 0+1 array > > > (hitachi U320 10k RPM). In doing so I have also tested the amd64 > > > 5.3-stable release againt gentoo x86_64 and Fedora FC3/x86_64. > > > > > > First the bad news: > > > > > > The linux boxen were configured with the postgres data drives on the > > > raid 0+1 using XFS with a separate pg_xlog on a different drive. Both > > > gentoo and FC3 were using 2.6.x kernels using the x86_64 distro. > > > pgbench was initialized using no scaling factor (1 mill rows), scaling > > > 10 (10 million) and 100. > > > With no scaling the linux boxen hit about 160 tps using 10 connections > > > and 1000 -2000 transactions. > > > The BSD system hit 100-120 tps. This is a difference I could potentially > > > live with. Now enter the scaled tables: > > > Linux systems hit marks of 450+ tps when pgbenching againt millions of > > > rows while the BSD box stayed at 100 tps or worse .. dipping as low as > > > 90 tps. > > > > > > Bonnie benchmarks: > > > Linux: > > > Sequential output: Per Char = 65000 K/sec, Block = 658782 K/sec, Rewrite > > > = 639654 K/sec > > > Sequential input: Per Char = 66240 K/sec, Block = 1278993 K/sec > > > Sequential create: create 641/sec , read n/a, delete 205/sec > > > Random create: create 735/sec, read n/a, delete 126/sec > > > > > > BSD: > > > Sequential output: Per Char = 370K/sec (!!), block = 132281 K/sec, > > > Rewrite = 124070 K/sec > > > Sequential input: Per Char = 756 K/sec, block = 700402 K/sec > > > Sequential create: create 139/sec, read 6308/seec, delete n/a > > > Random create: create 137/sec, read 5877/sec, delete n/a > > > > > > the bonnie tests were run several times with similar results. > > > > > > It would seem to me that the pgbench marks and tests are being hampered > > > by comparatively poor I/O to the raid array and disks under the amr > > > driver control. I am hoping there are some tweaks that I could do or > > > perhaps some patches to the driver in -CURRENT that could be > > > applied/backported/MFC'ed to try and improve this performance. > > > > > > Oh, the "bright" side? FreeBSD is the only OS here that didn't kernel > > > Oops due to memory allocation issues, or whatever caused them (the > > > backtrace showed kmalloc). That may be because of the XFS file system (I > > > didn't try EXT3 or its kin) or because of issues with LSI and the linux > > > kernel or who knows what. I am hoping to get the stability and OS > > > performance of FreeBSD and the raw disk performance witnessed in the > > > Linux systems all rolled up into one. Help? > > > > > > Sven > > > > > > > First of all, are you using the same hardware and just switching the OS? > > Are you sure that the RAID and disk cache settings are identical? > > Second, some of the Linux numbers are very hard to believe; PCI-X has a > > theoretical bandwidth of 1066MB/sec, so it's highly unlikely that you're > > going to get 1249MB/sec out of it in the block read test. bonnie is an > > excellent tool for testing the randomness of cache effects and memory > > bandwidth, it's not so good at testing actual I/O performance =-) > > > > So setting aside the bonnie tests, the PQSQL stats do indeed show a > > problem. Is PGSQL threaded? If so, you might be running into some of > > the threading performance problems that are well known and are being > > worked on. I don't know a whole lot about PGSQL or the tests that you > > are talking about, but if you had an easy recipe for duplicating your > > test environment, I'd like to experiment some myself. > > > > Scott > > Yes, these tests were done on the same hardware, with the same hardware > raid configuration with fresh OS installs for each battery of tests. The > bonnie numbers do seem a bit out of whack upon closer scrutiny. > > As far as setting up PGSQL, in each case it was set up from packages > (FreeBSD ports, Gentoo emerge, FC3 yum) and the postgresql.conf file was > adjusted to use the same set of values for memory, etc. > > pgbench (postgresql-contrib) was run as follows for testing: > pgbench -i -U postgres/pgsql pgtest (where the User is either postgres > or pgsql depending on platform and pgtest is the test db set up using > createdb) > pgbench -c 10 -t 1000 -U pgsql pgtest > pgbench -c 10 -t 4000 -U pgsql pgtest > pgbench -c 10 -t 10000 -U pgsql pgtest > pgbench -i -s 10 -U pgsql pgtest (scaling factor of 10 to increase the > table sizes for benchmarking) > pgbench -c 10 -t 1000 etc ...... > pgbench -i -s 100 -U pgsql pgtest > > Sven > Just thought I would share this: After moving around to a few more OSes (including Solaris 10 which witnessed performance numbers simlar to those of FreeBSD) one of the hard drives in the array failed; I suspect it was a factory defective drive as it was brand new -- not sure if that impacted performance or not (in the original tests) but it may have as the controller tried to keep skipping past bad sectors. Anyway, now with a battery backed raid controller in place and 6 functioning drives (raid striped across 3 pairs of mirrors) and write-back enabled I am seeing the pgbench give me readings of 650+ tps!! :-) These numbers would indicate to me (as did the odd bonnie++ numbers) that the 2.6 linux kernel with xfs was using some heavy duty write caching. I suspect the kernel panics I was seeing were the result of poor handling of the defective drive when trying to fsync or memory mismanagement of the cacched write data. At any rate, I am glad I can stay with the FreeBSD option now. Sven
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1111614340.10569.22.camel>