From owner-freebsd-ports@FreeBSD.ORG Mon Jul 2 09:59:29 2007 Return-Path: X-Original-To: ports@freebsd.org Delivered-To: freebsd-ports@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9134F16A41F for ; Mon, 2 Jul 2007 09:59:29 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from redbull.bpaserver.net (redbullneu.bpaserver.net [213.198.78.217]) by mx1.freebsd.org (Postfix) with ESMTP id 1D6FF13C468 for ; Mon, 2 Jul 2007 09:59:29 +0000 (UTC) (envelope-from alexander@leidinger.net) Received: from outgoing.leidinger.net (p54A55786.dip.t-dialin.net [84.165.87.134]) by redbull.bpaserver.net (Postfix) with ESMTP id D739C2E24E; Mon, 2 Jul 2007 11:59:20 +0200 (CEST) Received: from webmail.leidinger.net (webmail.Leidinger.net [192.168.1.102]) by outgoing.leidinger.net (Postfix) with ESMTP id 634145B6138; Mon, 2 Jul 2007 11:57:34 +0200 (CEST) Received: (from www@localhost) by webmail.leidinger.net (8.13.8/8.13.8/Submit) id l629vYfP065145; Mon, 2 Jul 2007 11:57:34 +0200 (CEST) (envelope-from Alexander@Leidinger.net) Received: from pslux.cec.eu.int (pslux.cec.eu.int [158.169.9.14]) by webmail.leidinger.net (Horde MIME library) with HTTP; Mon, 02 Jul 2007 11:57:33 +0200 Message-ID: <20070702115733.3fotau92scgs4g4s@webmail.leidinger.net> X-Priority: 3 (Normal) Date: Mon, 02 Jul 2007 11:57:33 +0200 From: Alexander Leidinger To: Garrett Cooper References: <46887FD3.3080307@u.washington.edu> <46889F5D.70801@gmx.de> <4688AF6D.90904@u.washington.edu> In-Reply-To: <4688AF6D.90904@u.washington.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.4) / FreeBSD-7.0 X-BPAnet-MailScanner-Information: Please contact the ISP for more information X-BPAnet-MailScanner: Found to be clean X-BPAnet-MailScanner-SpamCheck: not spam, SpamAssassin (not cached, score=4.05, required 8, BAYES_50 2.50, DKIM_POLICY_SIGNSOME 0.00, MD5_CONTENT -0.10, MIME_QP_LONG_LINE 1.40, RDNS_DYNAMIC 0.10, TW_DX 0.08, TW_SD 0.08) X-BPAnet-MailScanner-SpamScore: ssss X-BPAnet-MailScanner-From: alexander@leidinger.net X-Spam-Status: No Cc: ports@FreeBSD.org, "\[LoN\]Kamikaze" Subject: Re: +CONTENTS files X-BeenThere: freebsd-ports@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting software to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Jul 2007 09:59:29 -0000 Quoting Garrett Cooper (from Mon, 02 Jul =20 2007 00:55:25 -0700): > [LoN]Kamikaze wrote: >> Garrett Cooper wrote: >> >>> Pardon me for being naive, but wouldn't it be wiser for all of the data >>> in the +CONTENTS file to be aggregated into sections instead of having >>> line by line info? >>> >>> Example (net/samba_3.0.25a): >>> >>> @comment MD5:9e94560ac5e757d3bc5f922dcf3ab4fb >>> man/man1/log2pcap.1.gz >>> [~100 lines of repetitive data...] >>> @comment MD5:9f5fc8df2a1383a175e165ef2e0b10cc >>> man/man8/vfs_notify_fam.8.gz >>> >>> Could be aggregated into: >>> >>> @MD5 >>> 9e94560ac5e757d3bc5f922dcf3ab4fb man/man1/log2pcap.1.gz >>> c58f068d603a12d4af867c15cf77e636 man/man1/nmblookup.1.gz >>> [etc..] >>> @end MD5 >>> >>> or something similar to XML. >>> >>> This would reduce the filesize from n bytes to n - (9 + 4 -1) * >>> i_entries + 8. In larger package files this would reduce the amount of >>> data parsing by a long shot. Also, more powerful scripting languages >>> like Perl, Python, or smart parsers in C could make short work of this >>> data and just extract the MD5 elements for comparison. >>> >>> Also, by doing a little extra work when creating packages by >>> organizing all the sections together, I think that the file size could >>> be reduced by a large degree. >>> >>> Similar fields to @comment MD5 could be reduced I believe, but with >>> less benefit maybe, other than just the @unexec rmdir, etc lines. >>> >>> Either that, or the data should be organized into separate files I >>> think (increases number of files, but reduces overall processing time IM= O). >> In some cases the order of data stored is important and thus it cannot be >> seperated into section. Also, this layout allows for very simple =20 >> parsing with >> usual UNIX tools (sed, cut, awk, perl, simply everything). Unlike =20 >> XML, which is >> rather complex and thus does not belong into base, in my opinion. We have libbsdxml in the base already (an old version of one in the ports). > I didn't say XML exactly. I say XML-like, with implied end and begin > tags, but keeping with the Makefile like syntax of @MD5 ... @end MD5, > or something similar. The problem is, that a change would break existing installations, as =20 they can not cope with such a new format. Feel free to propose =20 improvements, but you need to keep in your mind, that any supported =20 FreeBSD release has to be able to install packages with only the =20 package tools available in the basesystem. > My point being is that the +CONTENTS file is bloated a lot by > useless lines, and it would help speed up package processing if it was > clipped or reduced somehow I would think. You need to provide numbers. Without them this is pure speculation. And you have to explain, why the current parsing routines can not be =20 speed up for the current format, maybe the implementation is just a =20 little bit outdated compared to todays parsing knowledge... Bye, Alexander. --=20 Life is a grand adventure -- or it is nothing. =09=09-- Helen Keller http://www.Leidinger.net Alexander @ Leidinger.net: PGP ID =3D B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID =3D 72077137