From owner-freebsd-questions@FreeBSD.ORG Fri Nov 4 17:29:21 2005 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D575516A41F for ; Fri, 4 Nov 2005 17:29:21 +0000 (GMT) (envelope-from kirk@strauser.com) Received: from kanga.honeypot.net (kanga.honeypot.net [208.162.254.122]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3941E43D48 for ; Fri, 4 Nov 2005 17:29:21 +0000 (GMT) (envelope-from kirk@strauser.com) Received: from localhost (localhost [127.0.0.1]) by kanga.honeypot.net (Postfix) with ESMTP id 369B521F881 for ; Fri, 4 Nov 2005 11:29:20 -0600 (CST) Received: from kanga.honeypot.net ([127.0.0.1]) by localhost (kanga.honeypot.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 05470-13 for ; Fri, 4 Nov 2005 11:29:19 -0600 (CST) Received: from janus.daycos.com (janus.daycos.com [204.26.70.77]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by kanga.honeypot.net (Postfix) with ESMTP id 6E9C121D6DA for ; Fri, 4 Nov 2005 11:29:19 -0600 (CST) From: Kirk Strauser To: freebsd-questions@freebsd.org Date: Fri, 4 Nov 2005 11:29:12 -0600 User-Agent: KMail/1.8.2 References: <200511040956.19087.kirk@strauser.com> <436B8ADF.4000703@mac.com> In-Reply-To: <436B8ADF.4000703@mac.com> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart3264035.pAfWVuXc3O"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200511041129.17912.kirk@strauser.com> X-Virus-Scanned: amavisd-new at honeypot.net Subject: Re: Fast diff command for large files? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 04 Nov 2005 17:29:21 -0000 --nextPart3264035.pAfWVuXc3O Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Friday 04 November 2005 10:22, Chuck Swiger wrote: > Multigigabyte? Find another approach to solving the problem, a text-base > diff is going to require excessive resources and time. A 64-bit platform > with 2 GB of RAM & 3GB of swap requires ~1000 seconds to diff ~400MB. There really aren't many options. For the patient, here's what's happening: Our legacy application runs on FoxPro. Our web application runs on a=20 PostgreSQL database that's a mirror of the FoxPro tables. We do the mirroring by running a program that dumps the FoxPro tables out a= s=20 tab-delimited files. Thus far, we'd been using PostgreSQL's "copy from"=20 command to read those files into the database. In reality, though, a very,= =20 very small percentage of rows in those tables actually change. So, I wrote= =20 a program that takes the output of diff and converts it into a series of=20 "delete" and "insert" commands; benchmarking shows that this is roughly 300= =20 times faster in our use. And that's why I need a fast diff. Even if it takes as long as the databas= e=20 bulk loads, we can run it on another server and use 20 seconds of CPU for=20 PostgreSQL instead of 45 minutes. The practical upshot is that the=20 database will never get sluggish, even if the other "diff server" is loaded= =20 to the gills. =2D-=20 Kirk Strauser --nextPart3264035.pAfWVuXc3O Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- iD8DBQBDa5pt5sRg+Y0CpvERAlDzAJ4ljAuI//Jf9YABy5bC2+C3g7NAcgCeMt6J 6fvneAVD2YqkCQBaMpVeQXU= =kX3b -----END PGP SIGNATURE----- --nextPart3264035.pAfWVuXc3O--