From owner-freebsd-arch@FreeBSD.ORG Mon Aug 4 14:50:00 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EC83DFEF; Mon, 4 Aug 2014 14:49:59 +0000 (UTC) Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1blp0188.outbound.protection.outlook.com [207.46.163.188]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "MSIT Machine Auth CA 2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7C2B128B8; Mon, 4 Aug 2014 14:49:58 +0000 (UTC) Received: from BY2PR05CA010.namprd05.prod.outlook.com (10.242.32.40) by BLUPR05MB722.namprd05.prod.outlook.com (10.141.207.150) with Microsoft SMTP Server (TLS) id 15.0.995.14; Mon, 4 Aug 2014 14:49:42 +0000 Received: from BN1BFFO11FD026.protection.gbl (2a01:111:f400:7c10::1:197) by BY2PR05CA010.outlook.office365.com (2a01:111:e400:2c2a::40) with Microsoft SMTP Server (TLS) id 15.0.995.14 via Frontend Transport; Mon, 4 Aug 2014 14:49:41 +0000 Received: from P-EMF02-SAC.jnpr.net (66.129.239.16) by BN1BFFO11FD026.mail.protection.outlook.com (10.58.144.89) with Microsoft SMTP Server (TLS) id 15.0.990.10 via Frontend Transport; Mon, 4 Aug 2014 14:49:40 +0000 Received: from magenta.juniper.net (172.17.27.123) by P-EMF02-SAC.jnpr.net (172.24.192.21) with Microsoft SMTP Server (TLS) id 14.3.146.0; Mon, 4 Aug 2014 07:49:39 -0700 Received: from idle.juniper.net (idleski.juniper.net [172.25.4.26]) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id s74EnSn38603; Mon, 4 Aug 2014 07:49:28 -0700 (PDT) (envelope-from phil@juniper.net) Received: from idle.juniper.net (localhost [127.0.0.1]) by idle.juniper.net (8.14.4/8.14.3) with ESMTP id s74Emwk0019816; Mon, 4 Aug 2014 10:49:08 -0400 (EDT) (envelope-from phil@idle.juniper.net) Message-ID: <201408041449.s74Emwk0019816@idle.juniper.net> To: Poul-Henning Kamp Subject: Re: XML Output: libxo - provide single API to output TXT, XML, JSON and HTML In-Reply-To: <63132.1406924887@critter.freebsd.dk> Date: Mon, 4 Aug 2014 10:48:58 -0400 From: Phil Shafer MIME-Version: 1.0 Content-Type: text/plain X-EOPAttributedMessage: 0 X-Forefront-Antispam-Report: CIP:66.129.239.16; CTRY:US; IPV:NLI; IPV:NLI; EFV:NLI; SFV:NSPM; SFS:(6009001)(199002)(189002)(164054003)(92566001)(83072002)(48376002)(92726001)(85852003)(46102001)(50986999)(77982001)(107046002)(110136001)(50466002)(53416004)(54356999)(44976005)(76482001)(102836001)(83322001)(86362001)(84676001)(6806004)(97736001)(99396002)(87936001)(103666002)(4396001)(68736004)(69596002)(31966008)(74502001)(74662001)(76506005)(81156004)(106466001)(64706001)(47776003)(21056001)(20776003)(80022001)(79102001)(105596002)(81342001)(81542001)(85306004)(95666004); DIR:OUT; SFP:; SCL:1; SRVR:BLUPR05MB722; H:P-EMF02-SAC.jnpr.net; FPR:; MLV:sfv; PTR:ErrorRetry; A:1; MX:1; LANG:en; X-Microsoft-Antispam: BCL:0;PCL:0;RULEID: X-Forefront-PRVS: 0293D40691 Received-SPF: SoftFail (protection.outlook.com: domain of transitioning juniper.net discourages use of 66.129.239.16 as permitted sender) Authentication-Results: spf=softfail (sender IP is 66.129.239.16) smtp.mailfrom=phil@juniper.net; X-OriginatorOrg: juniper.net Cc: arch@freebsd.org, John-Mark Gurney , marcel@freebsd.org, "Simon J. Gerraty" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Aug 2014 14:50:00 -0000 Poul-Henning Kamp writes: >First of, this is not just ENOMEM, this is also invalid UTF-8 strings, >NULL pointers and much more bogosity. Yup, there are 26 failure cases at present, ranging from missing close braces in format strings to unbalanced open/close calls. >>Seeing broken output is better than limping >>along with output that looks right but isn't. >The output should preferably be explicitly broken, so that nobody >downstream mistakenly takes it and runs with it. I think we're in agreement, but there is the question of what constitutes sufficient problems to trigger abort. I'm coding the UTF-8 support now and that's a perfect example. If the output character set (the user's LANG setting) doesn't support a character of output (u+10d6), does that constitute a complete failure? I'll assumably give flags to tailor the behavior, but by default, I'd be upset if character conversion issues like this turned into complete failure. But a format string with an invalid UTF-8 sequence would be more severe. FWIW, the UTF-8 strategy for libox is this: - all format strings are UTF-8 - argument strings (%s) are UTF-8 - "%ls" handles wide characters - "%hs" will handle locale-based strings - XML, JSON, and HTML will be UTF-8 output - text will be locale-based The painful part is that I've been using vsnprintf as the plumbing for formatting strings, but it doesn't handle field widths for UTF-8 data correctly, so I'll need to start doing that by handle myself. Thanks, Phil