From owner-freebsd-arch@FreeBSD.ORG Mon Aug 4 21:04:03 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 176FB5D1; Mon, 4 Aug 2014 21:04:03 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E24B72990; Mon, 4 Aug 2014 21:04:02 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s74L41qj007792 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 4 Aug 2014 14:04:02 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s74L40Ed007791; Mon, 4 Aug 2014 14:04:00 -0700 (PDT) (envelope-from jmg) Date: Mon, 4 Aug 2014 14:04:00 -0700 From: John-Mark Gurney To: Phil Shafer Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML Message-ID: <20140804210400.GG88623@funkthat.com> Mail-Followup-To: Phil Shafer , Poul-Henning Kamp , "Simon J. Gerraty" , arch@freebsd.org, marcel@freebsd.org References: <63132.1406924887@critter.freebsd.dk> <201408041449.s74Emwk0019816@idle.juniper.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201408041449.s74Emwk0019816@idle.juniper.net> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Mon, 04 Aug 2014 14:04:02 -0700 (PDT) Cc: arch@freebsd.org, Poul-Henning Kamp , marcel@freebsd.org, "Simon J. Gerraty" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Aug 2014 21:04:03 -0000 Phil Shafer wrote this message on Mon, Aug 04, 2014 at 10:48 -0400: > Poul-Henning Kamp writes: > >First of, this is not just ENOMEM, this is also invalid UTF-8 strings, > >NULL pointers and much more bogosity. > > Yup, there are 26 failure cases at present, ranging from missing > close braces in format strings to unbalanced open/close calls. > > >>Seeing broken output is better than limping > >>along with output that looks right but isn't. > >The output should preferably be explicitly broken, so that nobody > >downstream mistakenly takes it and runs with it. > > I think we're in agreement, but there is the question of what > constitutes sufficient problems to trigger abort. I'm coding the > UTF-8 support now and that's a perfect example. If the output > character set (the user's LANG setting) doesn't support a character > of output (u+10d6), does that constitute a complete failure? I'll It depends... For output to terminal/text, then you should use iconv's ICONV_SET_TRANSLITERATE option (see iconvctl(3), which isn't linked from iconv(3), but now is)... > assumably give flags to tailor the behavior, but by default, I'd > be upset if character conversion issues like this turned into > complete failure. But a format string with an invalid UTF-8 sequence > would be more severe. > > FWIW, the UTF-8 strategy for libox is this: > - all format strings are UTF-8 > - argument strings (%s) are UTF-8 > - "%ls" handles wide characters > - "%hs" will handle locale-based strings > - XML, JSON, and HTML will be UTF-8 output > - text will be locale-based This looks exactly what I had in mind... Though for XML and HTML, you might want to add the proper processing directive that says the encoding is UTF-8... How about make this an option to turn off? That way if someone wants to nest the output in another document, they provide the option to turn it off, while by default you end up w/ a properly formed HTML or XML document? > The painful part is that I've been using vsnprintf as the plumbing > for formatting strings, but it doesn't handle field widths for UTF-8 > data correctly, so I'll need to start doing that by handle myself. iconv or another i18n library should help w/ that... Since some languages, like Thai, have combining characters, so even though there might be a 6 character UTF-8 sequence, it'll only take up one column width... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."