Date: Sun, 7 Jan 2007 20:54:47 -0600 From: Kirk Strauser <kirk@strauser.com> To: freebsd-questions@freebsd.org Subject: Re: Tuning PostgreSQL for bulk imports Message-ID: <200701072054.48442.kirk@strauser.com> In-Reply-To: <20070108122245.53e3140b@localhost> References: <200701071301.27423.kirk@strauser.com> <200701071618.11242.kirk@strauser.com> <20070108122245.53e3140b@localhost>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart3671080.6uiMXo8huO
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
On Sunday 07 January 2007 19:22, Norberto Meijome wrote:
> I could be wrong and this only apply to generating indexes?
That's what we're doing now. By dropping the table and recreating it, all =
the=20
index maintenance gets deferred until one batch at the end (which is vastly=
=20
faster in practice). We also wrap the whole thing in a transaction so that=
=20
select queries still have access to the old data until the instant the comm=
it=20
is finished, assuming that the import doesn't fail for some reason and get=
=20
rolled back.
> Something also to consider is , do you *have* to import all that data eve=
ry
> time? ie., can you create data partitions, assign a new table to each
> day ,create the appropriate rules, and then only dump from FoxPro the last
> day's info?
I wish. First, we run the import hourly (more or less). I save the output=
of=20
the last hour's Foxpro->PostgreSQL conversion, and run a hand-written=20
optimized diff against it. If nothing has changed, that table gets skipped=
=2E =20
If a reasonably small percentage of rows have changed (which is almost alwa=
ys=20
the case), then I re-write it as a serious of deletes followed by a bulk=20
import. Basically, I patch the table. It's nice to see a twenty million r=
ow=20
table update reduced to a patch file 100 lines long. Oh, for the record, m=
y=20
diff algorithm is written in Python and is still IO limited, even when=20
several copies are running in parallel.
> if cpu is hitting to hard, you can always run nice pgsql while importing
> this - it will still take 100% cpu, but yield as soon as something else
> needs to do some work . Of course, this wont help if you are IO bound (i
> think? ). *are* you CPU bound ?
{Un,}fortunately, no. The CPUs are still idle enough to get a lot of other=
=20
processing done without slowing the import process.
> Let us know what solution you come up with, and good luck :)
Will do, and thanks!
=2D-=20
Kirk Strauser
--nextPart3671080.6uiMXo8huO
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (FreeBSD)
iD8DBQBFobJ45sRg+Y0CpvERAg+cAJ4gfRM7Cdk2zags461NGEvIf1F6cQCgiTCc
XGyAPB3D22NgKYiNsfxHAKI=
=0kVj
-----END PGP SIGNATURE-----
--nextPart3671080.6uiMXo8huO--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200701072054.48442.kirk>
