From owner-freebsd-standards@FreeBSD.ORG Sun Feb 21 22:44:16 2010 Return-Path: Delivered-To: freebsd-standards@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1B3D61065672 for ; Sun, 21 Feb 2010 22:44:16 +0000 (UTC) (envelope-from joerg@britannica.bec.de) Received: from www.sonnenberger.org (www.sonnenberger.org [92.79.50.50]) by mx1.freebsd.org (Postfix) with ESMTP id D2C148FC1B for ; Sun, 21 Feb 2010 22:44:15 +0000 (UTC) Received: from britannica.bec.de (www.sonnenberger.org [192.168.1.10]) by www.sonnenberger.org (Postfix) with ESMTP id 77F7D6678A for ; Sun, 21 Feb 2010 23:28:42 +0100 (CET) Received: by britannica.bec.de (Postfix, from userid 1000) id EE9FA15C6C; Sun, 21 Feb 2010 23:28:11 +0100 (CET) Date: Sun, 21 Feb 2010 23:28:11 +0100 From: Joerg Sonnenberger To: freebsd-standards@FreeBSD.org Message-ID: <20100221222811.GA10638@britannica.bec.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Subject: UTF-8 and wchar_t X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 21 Feb 2010 22:44:16 -0000 Hi all, reviewing some libarchive code I stumpled about the code that converts UTF-8 to wide strings. As done by a lot of other software, it currently blindly assumes that wchar_t ~= UCS-4. My question is whether FreeBSD intentionally makes that decision what (and therefore should define __STDC_ISO_10646__ according to ISO C99) or what correct way for reading UTF-8 it allows. Contrary to NetBSD, FreeBSD still lacks iconv(3) support in base, so the usual approach of converting to the locale charset and using mbtowc etc. is not possible. Joerg PS: Please keep me in CC.