From owner-freebsd-arch@FreeBSD.ORG Thu Jul 31 17:55:54 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1CEA4C1B; Thu, 31 Jul 2014 17:55:54 +0000 (UTC) Received: from h2.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "funkthat.com", Issuer "funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D2CE021C4; Thu, 31 Jul 2014 17:55:53 +0000 (UTC) Received: from h2.funkthat.com (localhost [127.0.0.1]) by h2.funkthat.com (8.14.3/8.14.3) with ESMTP id s6VHtlvG032115 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 31 Jul 2014 10:55:47 -0700 (PDT) (envelope-from jmg@h2.funkthat.com) Received: (from jmg@localhost) by h2.funkthat.com (8.14.3/8.14.3/Submit) id s6VHtl0k032114; Thu, 31 Jul 2014 10:55:47 -0700 (PDT) (envelope-from jmg) Date: Thu, 31 Jul 2014 10:55:47 -0700 From: John-Mark Gurney To: Phil Shafer Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML Message-ID: <20140731175547.GO43962@funkthat.com> Mail-Followup-To: Phil Shafer , arch@freebsd.org, sjg@freebsd.org, marcel@freebsd.org References: <20140730193819.GM43962@funkthat.com> <201407302324.s6UNOB2H087915@idle.juniper.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201407302324.s6UNOB2H087915@idle.juniper.net> User-Agent: Mutt/1.4.2.3i X-Operating-System: FreeBSD 7.2-RELEASE i386 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (h2.funkthat.com [127.0.0.1]); Thu, 31 Jul 2014 10:55:47 -0700 (PDT) Cc: sjg@freebsd.org, arch@freebsd.org, marcel@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 31 Jul 2014 17:55:54 -0000 Phil Shafer wrote this message on Wed, Jul 30, 2014 at 19:24 -0400: > John-Mark Gurney writes: > >My vote would be to use and *enforce* UTF-8 by the API. That means if > >someone passes a string in, it must be properly formed UTF-8... > > I can certainly see making this an option, detecting the high-bit > and inspecting the following 1-5 bytes to ensure the corresponding > high two bits are set appropriately. But what action would you > expect the library to take when invalid strings are passed in? Return an error? printf can return an error, yet most people don't check it.. so no real difference in API/bugs... The reason I even suggest this is that JSON requires the output to be in Unicode... Not some special locale encoding.. See section 3 of: https://www.ietf.org/rfc/rfc4627.txt Besides we should finally move to UTF-8 for file system and other parts of the system... I do like the idea of random binary filenames, but we really should stop sticking our head in the sand.. We will only make ourselves look silly when 2020 roles around if we don't... > libxo supports a warning flag, that will trigger warnings on stderr > for things like invalid or malformed format strings, but I'm not > sure I'd be happy if the library skipped invalid strings. printf may skip parts of your strings if you don't check it's return value... Plus, if the API states you must pass in UTF-8 strings, and someone doesn't properly encode/convert to UTF-8, it's their bug, not the library's bug... We have too many encoding issues already in our source tree, and we need to get better about making sure we don't have them, and this will help... > BTW, this issue is driven by "w"s use of wide characters (for > days of the week). Plus, enforcing UTF-8 will make the w versions easier, and allow the library to output other width of UTF if wanted/requested.. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."