Date: Mon, 02 Jul 2007 00:55:25 -0700 From: Garrett Cooper <youshi10@u.washington.edu> To: "[LoN]Kamikaze" <LoN_Kamikaze@gmx.de> Cc: ports@FreeBSD.org Subject: Re: +CONTENTS files Message-ID: <4688AF6D.90904@u.washington.edu> In-Reply-To: <46889F5D.70801@gmx.de> References: <46887FD3.3080307@u.washington.edu> <46889F5D.70801@gmx.de>
next in thread | previous in thread | raw e-mail | index | archive | help
[LoN]Kamikaze wrote: > Garrett Cooper wrote: > >> Pardon me for being naive, but wouldn't it be wiser for all of the data >> in the +CONTENTS file to be aggregated into sections instead of having >> line by line info? >> >> Example (net/samba_3.0.25a): >> >> @comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb >> man/man1/log2pcap.1.gz >> [~100 lines of repetitive data...] >> @comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc >> man/man8/vfs_notify_fam.8.gz >> >> Could be aggregated into: >> >> @MD5 >> 9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz >> c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz >> [etc..] >> @end MD5 >> >> or something similar to XML. >> >> This would reduce the filesize from n bytes to n - (9 + 4 -1) * >> i_entries + 8. In larger package files this would reduce the amount of >> data parsing by a long shot. Also, more powerful scripting languages >> like Perl, Python, or smart parsers in C could make short work of this >> data and just extract the MD5 elements for comparison. >> >> Also, by doing a little extra work when creating packages by >> organizing all the sections together, I think that the file size could >> be reduced by a large degree. >> >> Similar fields to @comment MD5 could be reduced I believe, but with >> less benefit maybe, other than just the @unexec rmdir, etc lines. >> >> Either that, or the data should be organized into separate files I >> think (increases number of files, but reduces overall processing time IMO). >> >> Thanks, >> -Garrett >> > > > In some cases the order of data stored is important and thus it cannot be > seperated into section. Also, this layout allows for very simple parsing with > usual UNIX tools (sed, cut, awk, perl, simply everything). Unlike XML, which is > rather complex and thus does not belong into base, in my opinion. > I didn't say XML exactly. I say XML-like, with implied end and begin tags, but keeping with the Makefile like syntax of @MD5 ... @end MD5, or something similar. The only plus I can see is from cut, but I would think that sed, awk, and perl would work much better with a revised format.. My point being is that the +CONTENTS file is bloated a lot by useless lines, and it would help speed up package processing if it was clipped or reduced somehow I would think. Plus, expat's MIT license, which I believe is compatible with the BSD license (or more compatible than the GPL variants). The only difference that stands out on the MIT license from what I can tell is paragraph 3 in the BSD license isn't present. -Garrett
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4688AF6D.90904>