From owner-freebsd-hackers@FreeBSD.ORG Fri May 11 17:27:34 2007 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 46A4816A402 for ; Fri, 11 May 2007 17:27:34 +0000 (UTC) (envelope-from mwm-keyword-freebsdhackers2.e313df@mired.org) Received: from mired.org (vpn.mired.org [66.92.153.74]) by mx1.freebsd.org (Postfix) with SMTP id D994313C459 for ; Fri, 11 May 2007 17:27:33 +0000 (UTC) (envelope-from mwm-keyword-freebsdhackers2.e313df@mired.org) Received: (qmail 41814 invoked by uid 1001); 11 May 2007 17:27:02 -0000 Received: by bhuda.mired.org (tmda-sendmail, from uid 1001); Fri, 11 May 2007 13:27:02 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <17988.42853.894408.181184@bhuda.mired.org> Date: Fri, 11 May 2007 13:27:01 -0400 To: Ivan Voras In-Reply-To: References: <200705102105.27271.blackdragon@highveldmail.co.za> <4643C7DB.6000408@elischer.org> <17988.35412.231093.411177@bhuda.mired.org> <17988.40311.210855.381093@bhuda.mired.org> X-Mailer: VM 7.19 under Emacs 21.3.1 X-Primary-Address: mwm@mired.org X-face: "5Mnwy%?j>IIV\)A=):rjWL~NB2aH[}Yq8Z=u~vJ`"(,&SiLvbbz2W`; h9L,Yg`+vb1>RG% *h+%X^n0EZd>TM8_IB;a8F?(Fb"lw'IgCoyM.[Lg#r\ X-Delivery-Agent: TMDA/1.1.11 (Ladyburn) From: Mike Meyer Cc: freebsd-hackers@freebsd.org Subject: Re: SQL in the base system X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 May 2007 17:27:34 -0000 In , Ivan Voras typed: > Mike Meyer wrote: > > > Yes, they are present no matter what representation you use. The > > question is - how do the answers change if you change the > > format. These days, cross-platform means you deal with length as well > > as endian issues. Or maybe you don't, depending on the db. I know the > > answers for text files (easy, easy, very, yes). Can you propose a db > > scheme that gets has the same answers? > I think I don't understand the question. If the database contains number > "42" in a field typed "int32", in a row, and handles endianess well, why > would I get a different number on different platforms? The question is "How hard is it to move a db to a radically different platform and still use it?" Endianness is a biggie; world length (what does int32 turn into on a platform with 40 bit words and 20 bit half-words?) is less of a problem than it used to be; pointer sizes may be an issue, etc. Text files work on every Unix platform that uses the character set of your system. For the application at hand, working on any FreeBSD system would be equivalent. > (A side note about sqlite: it's actually weakly typed - you store and > receive strings). Which is part of why answering the questions needs a detailed proposal. > > I hate to tell you this, but your XML solution would still consist of > > a bunch of one-of file formats for each and every purpose. Using XML > > just fixes the syntax for the file, not the semantics. Settling on XML > > (or JSON, or INI, or cap files, or ...) is sort of like settling on > > UTF, only less obviously a win. Sure, you get to use canned code that > > will turn you text file into a structure in memory. But you still have > > to figure out what it all means. > > > > As you say, the XML toolset is the real win. Smart editors, > > validators, schemas (which make the editors and validators even more > > powerful) are all good things. Most people don't really seem > > interested in this beyond editors. That's not really much of a win. > > I agree that validation in XML is a strong point - but one of the reason > people like text files is that they DON'T usually have validation > features :) No, it's that the validation is usually simple enough that it can be done manually. Of course, for both XML and other text formats, people generally don't bother - they feed the data to the program, and let it blow up if the file is invalid. I suspect it's because the validation is generally a separate step. If we changed the config files to XML, and vi were configured to interactively underline in red everything that wasn't valid according to the schema, admins would almost certainly *love* it.(*) > | pro | contra > ---------------------------------------------------------------------- > XML | standard tools, validation, | evil manual parsing, bad rep > | can embed multiple data | > | structures in a standard way | bloated compared to alternatives, one document element per file. Since XML is text, you really want this row to be named someething like "ad-hoc". > ---------------------------------------------------------------------- > text | standard tools, sometimes | no validation, manual parsing, > | human readable | usually one data structure per > | | file Choosing a specific format - XML, JSON, S-expressions, cap, INI, etc. - just wires down the tokenizer (trivial) and structuring (varies from trivial to complex). If you're not taking advantage of anything beyond that that XML offers, there's not much point in picking XML over anything else - including ad-hoc formats. http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information.