From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 19 08:19:09 2008 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7465E1065673 for ; Wed, 19 Nov 2008 08:19:09 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe11.swip.net [212.247.155.65]) by mx1.freebsd.org (Postfix) with ESMTP id D1C858FC0A for ; Wed, 19 Nov 2008 08:19:08 +0000 (UTC) (envelope-from hselasky@c2i.net) X-Cloudmark-Score: 0.000000 [] X-Cloudmark-Analysis: v=1.0 c=1 a=aniA1o7mVp4QawOfT9qHqA==:17 a=MDXcvVOm-T_fsBTNoW8A:9 a=GQyCKAhj4s3td-REwuoA:7 a=CqxSfrD98CMMYZi7M71__QfCPncA:4 a=LY0hPdMaydYA:10 Received: from [62.113.133.1] (account mc467741@c2i.net [62.113.133.1] verified) by mailfe11.swip.net (CommuniGate Pro SMTP 5.2.6) with ESMTPA id 979525479; Wed, 19 Nov 2008 09:19:06 +0100 From: Hans Petter Selasky To: freebsd-hackers@freebsd.org Date: Wed, 19 Nov 2008 09:21:12 +0100 User-Agent: KMail/1.9.7 References: <200811190842.59377.nick@van-laarhoven.org> In-Reply-To: <200811190842.59377.nick@van-laarhoven.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200811190921.13859.hselasky@c2i.net> Cc: FreeBSD Hackers Mailing List , Nick Hibma Subject: Re: Unicode USB strings conversion X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Nov 2008 08:19:09 -0000 On Wednesday 19 November 2008, Nick Hibma wrote: > In the USB code (and I bet it is the same in the USB4BSD code) unicode > characters in strings are converted in a very crude way to ASCII. As I have > a user on the line who sees rubbish in his logs and when using > usbctl/usbdevs/etc., I bet this is the problem. > > I'd like to try and fix this problem by using libkern/libiconv. > > 1) Is this the right approach to convert UTF8 to printable string in the > kernel? > > 2) Is this needed at all in the short term future? I remember seeing > attempts at making the kernel use UTF8. > > 3) Does anyone know of a good example in the code without me having to hunt > through the kernel to find it? > > For reference: The code that needs replacing is: > > usbd_get_string(): > > s = buf; > n = size / 2 - 1; > for (i = 0; i < n && i < len - 1; i++) { > c = UGETW(us.bString[i]); > /* Convert from Unicode, handle buggy strings. */ > if ((c & 0xff00) == 0) > *s++ = c; > else if ((c & 0x00ff) == 0 && swap) > *s++ = c >> 8; > else > *s++ = '?'; > } > *s++ = 0; > > I haven't got the USB specs handy, but I believe that this is a simple way > of converting LE and BE UTF8 to ASCII. Or you could try to search for a better language ID. Currently the USB stack (old and new) selects the first language ID in the language string. Probably there is an english language ID, but not as the first selection. --HPS