Date: Mon, 7 Nov 2005 11:29:32 -0600 From: Kirk Strauser <kirk@strauser.com> To: freebsd-questions@freebsd.org Subject: Re: Fast diff command for large files? Message-ID: <200511071129.34262.kirk@strauser.com> In-Reply-To: <cone.1131381646.500858.17113.1000@zoraida.natserv.net> References: <200511040956.19087.kirk@strauser.com> <200511041129.17912.kirk@strauser.com> <cone.1131381646.500858.17113.1000@zoraida.natserv.net>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2694721.7iXZAHlP67 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Monday 07 November 2005 10:40, francisco@natserv.net wrote: > I had the same setup a while back. > A few suggestions. Thanks for the tips; unfortunately, any fix that involves touching the=20 =46oxPro code is basically impossible. It's not that we *can't*, but that= =20 the sole FoxPro programmer at our company is completely occupied with other= =20 projects. > What type of system is this? In particular do any record can be modified > or are only recent records changed? Nope - every line in each table is subject to change. Here's how our current system works: 1) Copy each FoxPro table file (and associated memo file if one exists) to = a=20 Unix server via Samba. 2) Run my modified version of the "xbase" program to convert each table to = a=20 tab-delimited file that can be loaded into PostgreSQL using the "copy=20 table" command. These files are named "foo.dump", "bar.dump", etc. 3) If "foo.dump-old" exists: a) Using Andrew's algorithm, get the difference between foo.dump-old and foo.dump. Write these out as a set of "delete from ..." commands and a "copy table" command. Pipe this relatively tiny file into the "psql" command to upload the modifications. Otherwise: b) Use the psql command to upload foo.dump 4) "mv foo.dump foo.dump-old" 5) Profit! I've already cut the runtime in half. The next big step is going to be=20 getting our Windows admin to install rsync on the fileserver so that we can= =20 minimize the time spent in step one. With the exception of the space=20 required by keeping the old version of the dump files (step 4), this is=20 exceeding all of our performance expectations by a wide margin. Even better, step 3a cuts the time that the PostgreSQL server has to spend= =20 committing the new data by several orders of magnitude. The net effect is= =20 that our web visitors don't see a noticeable slowdown during the import=20 stage. =20 =2D-=20 Kirk Strauser --nextPart2694721.7iXZAHlP67 Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- iD8DBQBDb47+5sRg+Y0CpvERAn3HAJ48eDr6BzIr4ynASeXtd4EQPKRiLACdFfe1 VMB6s+iELhud7pAvWPhqRBU= =7EEU -----END PGP SIGNATURE----- --nextPart2694721.7iXZAHlP67--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511071129.34262.kirk>