From owner-freebsd-questions@FreeBSD.ORG Wed Nov 9 00:42:48 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9A7DF106564A for ; Wed, 9 Nov 2011 00:42:48 +0000 (UTC) (envelope-from conrads@cox.net) Received: from eastrmfepo201.cox.net (eastrmfepo201.cox.net [68.230.241.216]) by mx1.freebsd.org (Postfix) with ESMTP id 42F148FC0C for ; Wed, 9 Nov 2011 00:42:48 +0000 (UTC) Received: from eastrmimpo209.cox.net ([68.230.241.224]) by eastrmfepo201.cox.net (InterMail vM.8.01.04.00 201-2260-137-20101110) with ESMTP id <20111109004242.GKYD3765.eastrmfepo201.cox.net@eastrmimpo209.cox.net> for ; Tue, 8 Nov 2011 19:42:42 -0500 Received: from serene.no-ip.org ([98.164.86.236]) by eastrmimpo209.cox.net with bizsmtp id uoih1h00G55wwzE02oih9F; Tue, 08 Nov 2011 19:42:42 -0500 X-CT-Class: Clean X-CT-Score: 0.00 X-CT-RefID: str=0001.0A020205.4EB9CC82.001F,ss=1,re=0.000,fgs=0 X-CT-Spam: 0 X-Authority-Analysis: v=1.1 cv=GlvmX0EemCkHIsfJI/uUz53NbOqpCno7T9uTKvSdMTY= c=1 sm=1 a=G8Uczd0VNMoA:10 a=kj9zAlcOel0A:10 a=uAbGmPAyUfLL1M3oYAsfuA==:17 a=kviXuzpPAAAA:8 a=AMax94v9meFwFOeFc9YA:9 a=CjuIK1q_8ugA:10 a=4vB-4DCPJfMA:10 a=uAbGmPAyUfLL1M3oYAsfuA==:117 X-CM-Score: 0.00 Authentication-Results: cox.net; none Received: from cox.net (localhost [127.0.0.1]) by serene.no-ip.org (8.14.5/8.14.5) with ESMTP id pA90gfgS011957 for ; Tue, 8 Nov 2011 18:42:41 -0600 (CST) (envelope-from conrads@cox.net) Date: Tue, 8 Nov 2011 18:42:36 -0600 From: "Conrad J. Sabatier" To: freebsd-questions@FreeBSD.org Message-ID: <20111108184236.3a78ebf6@cox.net> X-Mailer: Claws Mail 3.7.10 (GTK+ 2.24.6; amd64-portbld-freebsd9.0) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Subject: "Unprintable" 8-bit characters X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 09 Nov 2011 00:42:48 -0000 Pardon me if this may seem like a stupid question, but this is something that's been bugging me for a long time, and none of my research has turned up anything useful yet. I've been trying to understand what the deal is with regards to the displaying of the "extended" 8-bit character set, i.e., 8-bit characters with the MSB set. More specifically, I'm trying to figure out how to get the "ls" command to properly display filenames containing characters in this extended set. I have some MP3 files, for instance, whose names contain certain European characters, such as the lowercase "u" with umlaut (code 0xfc in the Latin set, according to gucharmap), that I just can't get ls to display properly. These characters seem to be considered by ls as "unprintable", and the best I've been able to produce in the ls output is backslash interpretations of the characters using either the -B or -b options, otherwise the default "?" is displayed in their place. The strange thing is that these characters will display just fine in xterm, gnome-terminal, etc. I can copy and paste them from the gucharmap utility into a shell command line or other application, and they appear as they should, but ls simply refuses to display them. I can print them using the printf command, even bash's builtin echo seems to have no problem with them. Only ls appears to have this problem. I've experimented with using various locales, using the LC_* variables, as well as the LANG variable (as documented in the environment section of the ls man page), all to no avail. Is this an inherent limitation of ls, or is there some workaround or other solution? Do we need a new en_*.UTF-16 locale? Should we consider extending the ls command to handle these characters? Or is there just something about all of this that I'm just not "getting"? As an additional note, I notice that in the text console, this same character code (0xfc) produces an entirely different character (a lowercase n in a raised position, as for the exponent in a mathematical expression). Is there, in fact, no standardization re: the representation of these "high bit" characters? Thanks to anyone who can help clear up this long-standing mystery for me. -- Conrad J. Sabatier conrads@cox.net