Date: Mon, 7 Nov 2005 09:48:22 -0600 From: Kirk Strauser <kirk@strauser.com> To: freebsd-questions@freebsd.org Subject: Re: Fast diff command for large files? Message-ID: <200511070948.27910.kirk@strauser.com> In-Reply-To: <cb5206420511060539qe4d7c40i198e806950c60482@mail.gmail.com> References: <200511040956.19087.kirk@strauser.com> <200511060657.39674.kirk@strauser.com> <cb5206420511060539qe4d7c40i198e806950c60482@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--nextPart2449820.Ro4SCRXWNq
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline
On Sunday 06 November 2005 07:39, Andrew P. wrote:
> Note, that the difference must be kept in RAM, so it won't work if there=
=20
> are multi-gig diffs, but it will work very fast if the diffs are only=20
> 10-100Mb, it will work at close to I/O speed if the diff is under 10Mb. =
=20
Thanks, Andrew! My Python script runs that algorithm in 17 seconds on a=20
400MB file with 10% CPU.
=46or anyone interested, here's my implementation. Note that the readline(=
)=20
method in Python always returns something, even at EOF (at which point you=
=20
get an empty string). Also, empty strings evaluate as "false", which is=20
why the "if not (oldline or newline): break" code exits at the end.
old_records =3D []
new_records =3D []
while 1:
oldline, newline =3D oldfile.readline(), newfile.readline()
if not (oldline or newline):
break
if oldline =3D=3D newline:
continue
try:
new_records.remove(oldline)
except ValueError:
if oldline:
old_records.append(oldline)
try:
old_records.remove(newline)
except ValueError:
if newline:
new_records.append(newline)
> Hope this gives you some idea.
It did. It must've been a long work week, because that all seems so obviou=
s=20
in retrospect but was completely opaque at the time. Thanks again!
=2D-=20
Kirk Strauser
--nextPart2449820.Ro4SCRXWNq
Content-Type: application/pgp-signature
-----BEGIN PGP SIGNATURE-----
iD8DBQBDb3dL5sRg+Y0CpvERAhUcAJ0XNZ4mWtxZgvUbbPbWbX77lI/CmwCfWZrH
aiMPAA3WfoC1eKlNWbAMiGA=
=qYPx
-----END PGP SIGNATURE-----
--nextPart2449820.Ro4SCRXWNq--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200511070948.27910.kirk>
