From owner-freebsd-ports@FreeBSD.ORG  Mon Jul  2 06:46:59 2007
Return-Path: <owner-freebsd-ports@FreeBSD.ORG>
X-Original-To: ports@FreeBSD.org
Delivered-To: freebsd-ports@FreeBSD.ORG
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
	by hub.freebsd.org (Postfix) with ESMTP id ED91116A400
	for <ports@FreeBSD.org>; Mon,  2 Jul 2007 06:46:59 +0000 (UTC)
	(envelope-from LoN_Kamikaze@gmx.de)
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20])
	by mx1.freebsd.org (Postfix) with SMTP id 6394613C457
	for <ports@FreeBSD.org>; Mon,  2 Jul 2007 06:46:59 +0000 (UTC)
	(envelope-from LoN_Kamikaze@gmx.de)
Received: (qmail invoked by alias); 02 Jul 2007 06:46:57 -0000
Received: from nat-wh-1.rz.uni-karlsruhe.de (EHLO mobileKamikaze.norad)
	[129.13.72.169]
	by mail.gmx.net (mp046) with SMTP; 02 Jul 2007 08:46:57 +0200
X-Authenticated: #5465401
X-Provags-ID: V01U2FsdGVkX18FnRxsdBMFLVOU+sWlbBn6zKW3ldSKYjHHJrIld7
	IHqOt4XwmSkh66
Message-ID: <46889F5D.70801@gmx.de>
Date: Mon, 02 Jul 2007 08:46:53 +0200
From: "[LoN]Kamikaze" <LoN_Kamikaze@gmx.de>
User-Agent: Thunderbird 2.0.0.4 (X11/20070616)
MIME-Version: 1.0
To: Garrett Cooper <youshi10@u.washington.edu>
References: <46887FD3.3080307@u.washington.edu>
In-Reply-To: <46887FD3.3080307@u.washington.edu>
X-Enigmail-Version: 0.95.1
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Cc: ports@FreeBSD.org
Subject: Re: +CONTENTS files
X-BeenThere: freebsd-ports@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Porting software to FreeBSD <freebsd-ports.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-ports>,
	<mailto:freebsd-ports-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-ports>
List-Post: <mailto:freebsd-ports@freebsd.org>
List-Help: <mailto:freebsd-ports-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-ports>,
	<mailto:freebsd-ports-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Jul 2007 06:47:00 -0000

Garrett Cooper wrote:
> Pardon me for being naive, but wouldn't it be wiser for all of the data
> in the +CONTENTS file to be aggregated into sections instead of having
> line by line info?
> 
> Example (net/samba_3.0.25a):
> 
> @comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb
> man/man1/log2pcap.1.gz
> [~100 lines of repetitive data...]
> @comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc
> man/man8/vfs_notify_fam.8.gz
> 
>    Could be aggregated into:
> 
> @MD5
> 9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz
> c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz
> [etc..]
> @end MD5
> 
>    or something similar to XML.
> 
>    This would reduce the filesize from n bytes to n - (9 + 4 -1) *
> i_entries + 8. In larger package files this would reduce the amount of
> data parsing by a long shot. Also, more powerful scripting languages
> like Perl, Python, or smart parsers in C could make short work of this
> data and just extract the MD5 elements for comparison.
> 
>    Also, by doing a little extra work when creating packages by
> organizing all the sections together, I think that the file size could
> be reduced by a large degree.
> 
>    Similar fields to @comment MD5 could be reduced I believe, but with
> less benefit maybe, other than just the @unexec rmdir, etc lines.
> 
>    Either that, or the data should be organized into separate files I
> think (increases number of files, but reduces overall processing time IMO).
> 
> Thanks,
> -Garrett


In some cases the order of data stored is important and thus it cannot be
seperated into section. Also, this layout allows for very simple parsing with
usual UNIX tools (sed, cut, awk, perl, simply everything). Unlike XML, which is
rather complex and thus does not belong into base, in my opinion.