Date: Mon, 7 Nov 2005 11:29:32 -0600 From: Kirk Strauser <kirk@strauser.com> To: freebsd-questions@freebsd.org Subject: Re: Fast diff command for large files? Message-ID: <200511071129.34262.kirk@strauser.com> In-Reply-To: <cone.1131381646.500858.17113.1000@zoraida.natserv.net> References: <200511040956.19087.kirk@strauser.com> <200511041129.17912.kirk@strauser.com> <cone.1131381646.500858.17113.1000@zoraida.natserv.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2694721.7iXZAHlP67
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
On Monday 07 November 2005 10:40, francisco@natserv.net wrote:
> I had the same setup a while back.
> A few suggestions.
Thanks for the tips; unfortunately, any fix that involves touching the=20
=46oxPro code is basically impossible. It's not that we *can't*, but that=
=20
the sole FoxPro programmer at our company is completely occupied with other=
=20
projects.
> What type of system is this? In particular do any record can be modified
> or are only recent records changed?
Nope - every line in each table is subject to change.
Here's how our current system works:
1) Copy each FoxPro table file (and associated memo file if one exists) to =
a=20
Unix server via Samba.
2) Run my modified version of the "xbase" program to convert each table to =
a=20
tab-delimited file that can be loaded into PostgreSQL using the "copy=20
table" command. These files are named "foo.dump", "bar.dump", etc.
3) If "foo.dump-old" exists:
a) Using Andrew's algorithm, get the difference between foo.dump-old and
foo.dump. Write these out as a set of "delete from ..." commands and
a "copy table" command. Pipe this relatively tiny file into the
"psql" command to upload the modifications.
Otherwise:
b) Use the psql command to upload foo.dump
4) "mv foo.dump foo.dump-old"
5) Profit!
I've already cut the runtime in half. The next big step is going to be=20
getting our Windows admin to install rsync on the fileserver so that we can=
=20
minimize the time spent in step one. With the exception of the space=20
required by keeping the old version of the dump files (step 4), this is=20
exceeding all of our performance expectations by a wide margin.
Even better, step 3a cuts the time that the PostgreSQL server has to spend=
=20
committing the new data by several orders of magnitude. The net effect is=
=20
that our web visitors don't see a noticeable slowdown during the import=20
stage. =20
=2D-=20
Kirk Strauser
--nextPart2694721.7iXZAHlP67
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
iD8DBQBDb47+5sRg+Y0CpvERAn3HAJ48eDr6BzIr4ynASeXtd4EQPKRiLACdFfe1
VMB6s+iELhud7pAvWPhqRBU=
=7EEU
-----END PGP SIGNATURE-----
--nextPart2694721.7iXZAHlP67--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511071129.34262.kirk>
