From owner-freebsd-stable@FreeBSD.ORG Thu Dec 12 16:24:51 2013 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EA291604; Thu, 12 Dec 2013 16:24:50 +0000 (UTC) Received: from mx1.fisglobal.com (mx1.fisglobal.com [199.200.24.190]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id A71161066; Thu, 12 Dec 2013 16:24:50 +0000 (UTC) Received: from smtp.fisglobal.com ([10.132.206.17]) by ltcfislmsgpa04.fnfis.com (8.14.5/8.14.5) with ESMTP id rBCGOgHa029920 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Thu, 12 Dec 2013 10:24:42 -0600 Received: from LTCFISWMSGMB21.FNFIS.com ([169.254.1.7]) by LTCFISWMSGHT06.FNFIS.com ([10.132.206.17]) with mapi id 14.03.0158.001; Thu, 12 Dec 2013 10:24:40 -0600 From: "Teske, Devin" To: "Daniel O'Connor" Subject: Re: BIND segway -> python -> first-class ports Thread-Topic: BIND segway -> python -> first-class ports Thread-Index: AQHO9oguAMsIQg72sUSzcf93F1B7uw== Date: Thu, 12 Dec 2013 16:24:40 +0000 Message-ID: <85EE26D8-0AB4-41B0-85AE-5439160EC602@fisglobal.com> References: <20131210023615.GR55638@funkthat.com> <52A68141.6010003@mu.org> <622122.74675.bm@smtp120.sbc.mail.gq1.yahoo.com> <20131210224915.GA55638@funkthat.com> <52A82099.9080100@bluerosetech.com> <0EC3A50D-A6BE-4F3B-87D6-AB0470F0BA64@gsoft.com.au> <4174A92E-F202-4FFB-BFED-C38A9D0A7F91@fisglobal.com> <0D92E13A-F869-492C-852B-37A0BFB1674C@gsoft.com.au> <38856510-A2D9-41E6-8CDC-ED282BDA933A@gsoft.com.au> <5A92C643-0BA6-4D15-AB54-DB78BE00583A@fisglobal.com> <6052F96E-0CD3-4C56-A619-8337C4ED890C@gsoft.com.au> In-Reply-To: <6052F96E-0CD3-4C56-A619-8337C4ED890C@gsoft.com.au> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.132.253.120] Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.11.87, 1.0.14, 0.0.0000 definitions=2013-12-12_04:2013-12-12,2013-12-12,1970-01-01 signatures=0 Cc: Kevin Oberman , Devin Teske , "freebsd-stable@freebsd.org Stable" , "Teske, Devin" , Darren Pilgrim X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: Devin Teske List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Dec 2013 16:24:51 -0000 On Dec 11, 2013, at 11:07 PM, Daniel O'Connor wrote: >=20 > On 12 Dec 2013, at 17:32, Teske, Devin wrote: >> On Dec 11, 2013, at 9:46 PM, Daniel O'Connor wrote: >>> On 12 Dec 2013, at 12:24, Teske, Devin wrot= e: >>>>> Thanks, if only I'd know about this 6 months ago :) >>>>=20 >>>> I just wrote it from scratch, so didn't exist until today ;D >>>=20 >>> Hah nice, although I imagine there is plenty of legal XML it can't pars= e. >>>=20 >>> That plays to another point about this sort of work - it's very hard to= write shell script that will work properly in all cases (things like space= s, or even newlines and unprintable characters in filenames). >>>=20 >>=20 >> If I had spent more time on it, then it would be able to parse any >> XML. However, it wasn't worth going further without first having >> a look at the C code that produces the output. >>=20 >> For example, different XML encoding libraries may encode the >> property values more or less strictly (for example, are values >> properly encoded to prevent a value of "" (for example) >> from prematurely terminating the property borking the XML >> valiation. (my guess would be that it would be encoded fully as >> "</name>". >>=20 >> Just a matter of extending the extract_data() and extract_attr() >> functions and then generalizing a little more. >=20 > I think looking at what produces it is 'cheating' and can end up biting y= ou in the ass later on. >=20 > Basically my point is that there needs to be _some_ interchange format wh= ere you can reliably parse output from tools generating it (which by and by= might be written by different people with different assumptions etc). So a= core extremely robust parser is necessary. >=20 > Perhaps there could be a base tool which can take such output and convert= it to a set of struct commands. That is really my second choice, but I thi= nk that it is politically infeasible to modify our /bin/sh to parse XML (or= any other useful interchange format). Two 'nits'... I remember having these same types of discussions decades ago. They seem to repeat themselves every 6-12 years. I seem to recall that everytime the topic of format parsing and data mgmt comes up, there's a split between two types of people. A. The folks that want "purpose built" parsers that compartmentalize the lo= gic and B. The folks that want a "general built" parsers that have to potentially be tuned for the data that you're parsing. In my experience in building, developing, and *using* both... Nit 1. The general purpose tool forces you to use the data structure that it uses for access, while at the same time not taking into consideration that it may fail edge-cases if you don't "cheat" as you suggest and look at the code that is generating the output for which you will feed to a generalized parser. NB: Notice how you don't get away from the fact that you really *ought* to be looking at the code that generates the output (always) to make sure you don't have a gaping edge-case. Nit 2. The purpose-built parser can often lend simplicity to a situation wh= ere possible. That is to say, if you can get by with a simple parser, more often than not, this approach may be desirable because you localize the logic to the point where changes will occur less often. In the converse, we find that changes to the generalized library may unintentionally break the parsing of multiple code-points when all you did was want to add some basic "thing". Ultimately, the benefit of not over-complicating every "parse-job" is that.= .. + With a localized logic, you won't have to worry about the end-to-end regression testing that is required for such a beast. That's why all the great generalized parsers have their own test-harnesses and a giant pool of sample data to make sure that each change is rigorously tested against each/every known format. That's great, but a purpose-built parse can last 15-20 years without a chan= ge (be it written in C, C++, Obj-C, Assembly, whatever) because the only time = it will change is when the format it parses changes. So what we relinquish by (a) giving up the use of a generalized parser to "= Parse The World"(tm) and (b) using a localized purpose-built parse for individual= ized parse-jobs... + Longevity of code + Equal or lesser cost of maintenance + A little team-work Just my 2-cents. Been doing the whole "Parse The World"(tm) thing for a whi= le and it's given me some perspective. --=20 Devin _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you.