Date: Mon, 08 Jan 2007 00:15:22 +0100 From: Ivan Voras <ivoras@fer.hr> To: freebsd-questions@freebsd.org Subject: Re: Tuning PostgreSQL for bulk imports Message-ID: <enruui$v68$1@sea.gmane.org> In-Reply-To: <200701071618.11242.kirk@strauser.com> References: <200701071301.27423.kirk@strauser.com> <enrnt5$9h6$1@sea.gmane.org> <200701071618.11242.kirk@strauser.com>
next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig00ECDF92C285B46F597CD921 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Kirk Strauser wrote: > On Sunday 07 January 2007 15:15, Ivan Voras wrote: >> - What might help you is to keep the WAL (write-ahead-log, i.e. journa= l) >> files on a completely separate (and fast) drive from the rest of the >> database, to allow parallelism and speed. For best results, format it >> with 32k blocks/8k fragments. >=20 > Thanks for the idea. Assuming I actually get my wish of a matched set = of 4=20 > high speed drives, would I be better off setting one aside for the jour= nal,=20 > or striping them all together so everything benefits? Everything is first written to the WAL, and then copied to the "normal" database. So, speed is important, but maybe striping all 4 drives would be an overkill, though. The logs are written sequentially so seek times are not that important. What are your current IO rates? Since you have only two drives you might be restricted by available disk bandwidth... >> - If you don't specifically need the atomicity of transactions, you >> might divide your import into many small transactions, for example one= >> for every 100,000 rows instead of doing 8 million at once. >=20 > Would that actually make a difference in total elapsed time spent impor= ting? It will certainly lower the final "commit" time and avoid copying large chunks between the WAL and the database, but I don't have my own measurements. Do you have a SMP machine? If so, you can try importing NCPU of the big transactions in parallel. This could help you almost linearly, if your disks allow it. (see http://www.postgresql.org/docs/8.1/interactive/runtime-config-wal.html : you can increase commit_delay and decrease commit_siblings to achieve sort of localized-async operation across several (parallel) transactions)= I've just remembered - do you run VACUUM ANALYZE after your big imports? You should (to get performance on SELECTs afterwards, but it won't help the imports themselves). >> Also, what version of PostgreSQL are you using? As a general rule, the= >> newer the version, the faster it is. This is especially true if you're= >> using 7.x - go to 8.1.5 immediately (but don't use 8.2 until 8.2.1 get= s >> out). >=20 > We're already running 8.2 because it fixed some problems we were having= with=20 > 8.1.5. Other than the excessively long import times, it's absolutely=20 > screaming and we couldn't be more pleased. Ok. Be careful not to use SELECT ... LIMIT ALL ;) --------------enig00ECDF92C285B46F597CD921 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFoX8QldnAQVacBcgRAvKjAKD7JUUai6aglWT+r0FyyVZfVvWNyACg6a3A JgbKfyIDMTI7XGREDDnORhI= =4sNE -----END PGP SIGNATURE----- --------------enig00ECDF92C285B46F597CD921--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?enruui$v68$1>