Date: Sun, 7 Jan 2007 20:54:47 -0600 From: Kirk Strauser <kirk@strauser.com> To: freebsd-questions@freebsd.org Subject: Re: Tuning PostgreSQL for bulk imports Message-ID: <200701072054.48442.kirk@strauser.com> In-Reply-To: <20070108122245.53e3140b@localhost> References: <200701071301.27423.kirk@strauser.com> <200701071618.11242.kirk@strauser.com> <20070108122245.53e3140b@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart3671080.6uiMXo8huO Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Sunday 07 January 2007 19:22, Norberto Meijome wrote: > I could be wrong and this only apply to generating indexes? That's what we're doing now. By dropping the table and recreating it, all = the=20 index maintenance gets deferred until one batch at the end (which is vastly= =20 faster in practice). We also wrap the whole thing in a transaction so that= =20 select queries still have access to the old data until the instant the comm= it=20 is finished, assuming that the import doesn't fail for some reason and get= =20 rolled back. > Something also to consider is , do you *have* to import all that data eve= ry > time? ie., can you create data partitions, assign a new table to each > day ,create the appropriate rules, and then only dump from FoxPro the last > day's info? I wish. First, we run the import hourly (more or less). I save the output= of=20 the last hour's Foxpro->PostgreSQL conversion, and run a hand-written=20 optimized diff against it. If nothing has changed, that table gets skipped= =2E =20 If a reasonably small percentage of rows have changed (which is almost alwa= ys=20 the case), then I re-write it as a serious of deletes followed by a bulk=20 import. Basically, I patch the table. It's nice to see a twenty million r= ow=20 table update reduced to a patch file 100 lines long. Oh, for the record, m= y=20 diff algorithm is written in Python and is still IO limited, even when=20 several copies are running in parallel. > if cpu is hitting to hard, you can always run nice pgsql while importing > this - it will still take 100% cpu, but yield as soon as something else > needs to do some work . Of course, this wont help if you are IO bound (i > think? ). *are* you CPU bound ? {Un,}fortunately, no. The CPUs are still idle enough to get a lot of other= =20 processing done without slowing the import process. > Let us know what solution you come up with, and good luck :) Will do, and thanks! =2D-=20 Kirk Strauser --nextPart3671080.6uiMXo8huO Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (FreeBSD) iD8DBQBFobJ45sRg+Y0CpvERAg+cAJ4gfRM7Cdk2zags461NGEvIf1F6cQCgiTCc XGyAPB3D22NgKYiNsfxHAKI= =0kVj -----END PGP SIGNATURE----- --nextPart3671080.6uiMXo8huO--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701072054.48442.kirk>