From owner-freebsd-standards@FreeBSD.ORG Mon May 4 11:08:07 2009 Return-Path: Delivered-To: freebsd-standards@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82344106564A for ; Mon, 4 May 2009 11:08:07 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 6E5E78FC12 for ; Mon, 4 May 2009 11:08:07 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n44B87n7098894 for ; Mon, 4 May 2009 11:08:07 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n44B868k098887 for freebsd-standards@FreeBSD.org; Mon, 4 May 2009 11:08:06 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 4 May 2009 11:08:06 GMT Message-Id: <200905041108.n44B868k098887@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-standards@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-standards@FreeBSD.org X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 11:08:07 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o stand/133369 standards [patch] test(1) with 3 or 4 arguments o stand/130067 standards Wrong numeric limits in system headers? o stand/129554 standards lp(1) [patch] Implement -m and -t options o stand/129524 standards FreeBSD 7.0 isnt detecting my hardrives with raid5 o stand/128546 standards ls -p does not follow symlinks o bin/125855 standards sh(1) allows for multiline, non-escaped control struct o stand/124860 standards flockfile(3) doesn't work when the memory has been exh o stand/123688 standards POSIX standard changes in unistd.h and grp.h o stand/121921 standards [patch] Add leap second support to at(1), atrun(8) o stand/121568 standards [patch] ln(1): wrong "ln -s" behaviour o stand/120947 standards xsm ignores system.xsm and .xsmstartup o stand/119804 standards [patch] [locale] Invalid (long)date format in pl_PL.IS o stand/116826 standards [patch] sh support for POSIX character classes o stand/116477 standards rm(1): rm behaves unexpectedly when using -r and relat o bin/116413 standards incorrect getconf(1) handling of unsigned constants gi o stand/116081 standards make does not work with the directive sinclude p stand/107561 standards [libc] [patch] [request] Missing SUS function tcgetsid o stand/104743 standards [headers] [patch] Wrong values for _POSIX_ minimal lim o stand/100017 standards [Patch] Add fuser(1) functionality to fstat(1) o stand/96236 standards [patch] [posix] sed(1) incorrectly describes a functio o stand/96016 standards [headers] clock_getres et al should be in o stand/94729 standards [libc] fcntl() throws undocumented ENOTTY o kern/93705 standards [headers] [patch] ENODATA and EGREGIOUS (for glibc com o stand/92362 standards [headers] [patch] Missing SIGPOLL in kernel headers a stand/86484 standards [patch] mkfifo(1) uses wrong permissions o stand/83845 standards [libm] [patch] add log2() and log2f() support for libm o stand/82654 standards C99 long double math functions are missing o stand/81287 standards [patch] fingerd(8) might send a line not ending in CRL a stand/80293 standards sysconf() does not support well-defined unistd values o stand/79056 standards [feature request] [atch] regex(3) regression tests o stand/70813 standards [patch] ls(1) not Posix compliant o stand/66357 standards make POSIX conformance problem ('sh -e' & '+' command- s kern/64875 standards [libc] [patch] [request] add a system call: fdatasync( s stand/62858 standards malloc(0) not C99 compliant o stand/56476 standards cd9660 unicode support simple hack p stand/55112 standards glob.h, glob_t's gl_pathc should be "size_t", not "int o stand/54839 standards [pcvt] pcvt deficits o stand/54833 standards [pcvt] more pcvt deficits o stand/54410 standards one-true-awk not POSIX compliant (no extended REs) o stand/46119 standards Priority problems for SCHED_OTHER using pthreads o stand/44425 standards getcwd() succeeds even if current dir has perm 000. p stand/41576 standards POSIX compliance of ln(1) o stand/39256 standards snprintf/vsnprintf aren't POSIX-conformant for strings s stand/36076 standards Implementation of POSIX fuser command o kern/27835 standards [libc] execve() doesn't conform to execve(2) spec in s a docs/26003 standards getgroups(2) lists NGROUPS_MAX but not syslimits.h o stand/25777 standards [kernel] [patch] atime not updated on exec o bin/25542 standards sh(1) null char in quoted string s stand/24590 standards timezone function not compatible witn Single Unix Spec o bin/24390 standards ln(1) Replacing old dir-symlinks when using /bin/ln o stand/21519 standards sys/dir.h should be deprecated some more s bin/14925 standards getsubopt isn't poisonous enough 52 problems total. From owner-freebsd-standards@FreeBSD.ORG Mon May 4 11:29:18 2009 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9D1E910656D3 for ; Mon, 4 May 2009 11:29:18 +0000 (UTC) (envelope-from octo@verplant.org) Received: from huhu.verplant.org (verplant.org [IPv6:2001:780:0:1e::1]) by mx1.freebsd.org (Postfix) with ESMTP id 05D328FC12 for ; Mon, 4 May 2009 11:29:17 +0000 (UTC) (envelope-from octo@verplant.org) Received: from octo by huhu.verplant.org with local (Exim 4.63) (envelope-from ) id 1M0wMG-0003id-JQ; Mon, 04 May 2009 13:29:16 +0200 Date: Mon, 4 May 2009 13:29:16 +0200 From: Florian Forster To: freebsd-standards@freebsd.org Message-ID: <20090504112916.GI25815@verplant.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="1R7yw+ZU6pp3P8Kk" Content-Disposition: inline X-Pgp-Fingerprint: E7F2 3FEC B693 9F6F 9B77 ACF6 8EF9 1EF5 9152 3C3D X-Pgp-Public-Key: http://verplant.org/pubkey.txt User-Agent: Mutt/1.5.13 (2006-08-11) X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: isnan(3) not available in C99/POSIX2001 mode X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 May 2009 11:29:19 -0000 --1R7yw+ZU6pp3P8Kk Content-Type: multipart/mixed; boundary="86AcgArEE8aDdRDK" Content-Disposition: inline --86AcgArEE8aDdRDK Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi, I'm having problems compiling a C99 program and I believe this to be a problem in FreeBSD's libc. In the macro `isnan' is defined. The macro checks the size of its argument to determine whether it's a float, double, or long double and calls `isnanf', `isnan', or `__isnanl' respectively. The problem is that the `isnan' macro is defined if `__ISO_C_VISIBLE >=3D 1999' but `isnanf' is only declared if `__BSD_VISIBLE' is defined. In my case (C99 and POSIX 2001 requested) this results in a syntax error: (The file `c99_nan.c' is attached.) [octo@collectd ~]$ c99 -D_POSIX_C_SOURCE=3D200112L -o c99_nan c99_nan.c=20 c99_nan.c: In function 'main': c99_nan.c:21: warning: implicit declaration of function 'isnanf' I was able to reproduce the problem on the following systems: - FreeBSD *** 7.0-STABLE FreeBSD 7.0-STABLE #3: Tue Jun 24 12:35:54 CEST 2= 008 ***:/usr/obj/usr/src/sys/SEHRKERN2 i386 - FreeBSD *** 7.1-RELEASE FreeBSD 7.1-RELEASE #0: Thu Jan 1 14:37:25 UTC = 2009 ***:/usr/obj/usr/src/sys/GENERIC i386 The version information in /usr/include/math.h is: - $FreeBSD: src/lib/msun/src/math.h,v 1.62 2007/01/07 07:54:21 das Exp $ - $FreeBSD: src/lib/msun/src/math.h,v 1.62.6.1 2008/11/25 02:59:29 kensmit= h Exp $ The `c99' binary refused to give out any version information, but the installed version of GCC is 4.2.1 20070719. Using `gcc -std=3Dc99' does not result in different behavior from using `c99' as far as the syntax error is concerned. A quick search at did not yield any related bug reports. Please CC me on replies =E2=80=93 I'm not subscribed to this list. Regards, -octo --=20 Florian octo Forster Hacker in training GnuPG: 0x91523C3D http://verplant.org/ --86AcgArEE8aDdRDK-- --1R7yw+ZU6pp3P8Kk Content-Type: application/pgp-signature; name="signature.asc" Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFJ/tGMHdggu3Q05IYRAticAJ4+5/WIRKd5qqZNkmKi8U4PmdpV6ACgkT/N ZsldVi9kR5jss06T0qDVNPw= =VP74 -----END PGP SIGNATURE----- --1R7yw+ZU6pp3P8Kk-- From owner-freebsd-standards@FreeBSD.ORG Wed May 6 08:32:05 2009 Return-Path: Delivered-To: freebsd-standards@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C34FA1065673 for ; Wed, 6 May 2009 08:32:05 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 42E218FC27 for ; Wed, 6 May 2009 08:32:05 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id n468VdbJ018432; Wed, 6 May 2009 10:32:02 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id n468VcRE018431; Wed, 6 May 2009 10:31:38 +0200 (CEST) (envelope-from olli) Date: Wed, 6 May 2009 10:31:38 +0200 (CEST) Message-Id: <200905060831.n468VcRE018431@lurza.secnetix.de> From: Oliver Fromme To: freebsd-standards@FreeBSD.ORG, juli@clockworksquid.com In-Reply-To: X-Newsgroups: list.freebsd-standards User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Wed, 06 May 2009 10:32:02 +0200 (CEST) Cc: Subject: Re: Shouldn't cat(1) use the C locale? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-standards@FreeBSD.ORG, juli@clockworksquid.com List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 May 2009 08:32:06 -0000 Juli Mallett wrote: > The cat manpage suggests that the infamous, non-standard -v extension > is ASCII-oriented but cat(1) these days uses isprint and pals and > calls setlocale(LC_CTYPE, ""), which for those of us with dodgy > environments (mine includes LC_ALL=en_US.UTF-8), means that "cat -v" > behaves radically-differently to the manual page describes. > > Does anyone see any reason for our extensions, etc., to work with > LC_CTYPE != C? It doesn't make a lot of sense to me. I'd like to > change it if there's not a good reason to keep it broken this way, > like: > > - setlocale(LC_CTYPE, ""); > + setlocale(LC_CTYPE, "C"); > > Thoughts, etc.? This is a difficult matter. I guess when you ask n people, you will get n different opinions. Well, here's mine ... I think this is a bug in the manual page. When cat(1) is using the current locale, that's perfectly correct behaviour in a world that is clearly moving away from ASCII, towards unicode. "Fixing" it by always using the ASCII locale would be a step backwards. Instead it is better to work on bringing all of the tools to compliance with multibyte character encodings in general, and with UTF8 in particular, which seems to be the most important unicode encoding these days (and probably UTF16, too). So I think the manual page should be fixed so it says that the -v option handles non-printing characters in the current locale, and cat needs to be fixed to handle multibyte chars correctly if the -v option is used with a UTF locale. By the way, your patch would probably be a POLA violation. I currently have LC_CTYPE=de_DE.ISO8859-15 on most of my machines (because FreeBSD's UTF support is too incomplete at the moment), and I'm occasionally using "cat -v" to look for non-printable characters in that locale. In fact I have a zsh function: "diff -u =(cat $1) =(cat -v $1)" Your patch would break that. I'm already somewhat annoyed that locale support was broken in strings(1). Some time ago, it used the current locale so I could use it on German texts with my LC_CTYPE setting. At some point in time, they probably introduced a patch similar to yours and instead provided the -e option, which does not work as expected ("-e S" is completely useless because it prints characters that are non-printable in ISO8859 locales). Since then I was forced to use cat -v for that purpose. Now you're proposing to break that, too. I hope that explains a little bit why I'm against that change. ;-) Best regards Oliver PS: If you set LC_* to a UTF locale, but your environment (i.e. tools and adat) is not UTF-compliant, breakage is expected. If you still want to keep that LC_* setting, a workaround would be to make aliases cat='LC_CTYPE=C cat' or similar for tools that seem to be broken. I also recommend *not* to set LC_ALL, but instead set LANG. The differenc is that you can override LANG, like in the above example ("LC_CTYPE=C cat"). You cannot override LC_ALL, because LC_ALL overrides everything else. See the environ(7) manual page for details. -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "Perl will consistently give you what you want, unless what you want is consistency." -- Larry Wall From owner-freebsd-standards@FreeBSD.ORG Wed May 6 16:08:31 2009 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2BBA1106567A for ; Wed, 6 May 2009 16:08:31 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [128.30.28.20]) by mx1.freebsd.org (Postfix) with ESMTP id DB21D8FC16 for ; Wed, 6 May 2009 16:08:30 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.3/8.14.3) with ESMTP id n46Fa943096394 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Wed, 6 May 2009 11:36:10 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.3/8.14.3/Submit) id n46Fa9Xg096391; Wed, 6 May 2009 11:36:09 -0400 (EDT) (envelope-from wollman) From: Garrett Wollman MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18945.44648.875780.605560@khavrinen.csail.mit.edu> Date: Wed, 6 May 2009 11:36:08 -0400 To: freebsd-standards@freebsd.org, juli@clockworksquid.com In-Reply-To: <200905060831.n468VcRE018431@lurza.secnetix.de> References: <200905060831.n468VcRE018431@lurza.secnetix.de> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.0.1 (khavrinen.csail.mit.edu [127.0.0.1]); Wed, 06 May 2009 11:36:10 -0400 (EDT) Cc: Subject: Re: Shouldn't cat(1) use the C locale? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 May 2009 16:08:31 -0000 < said: > I think this is a bug in the manual page. When cat(1) is > using the current locale, that's perfectly correct behaviour > in a world that is clearly moving away from ASCII, towards > unicode. Maybe your part of the world.... > So I think the manual page should be fixed so it says that > the -v option handles non-printing characters in the current > locale, and cat needs to be fixed to handle multibyte chars > correctly if the -v option is used with a UTF locale. This is a Bad Idea. cat -v ought to work properly when the input does not consist of "characters" at all. -GAWollman From owner-freebsd-standards@FreeBSD.ORG Wed May 6 17:08:12 2009 Return-Path: Delivered-To: freebsd-standards@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8FF1F106566B for ; Wed, 6 May 2009 17:08:12 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [IPv6:2a01:170:102f::2]) by mx1.freebsd.org (Postfix) with ESMTP id 13E188FC0C for ; Wed, 6 May 2009 17:08:11 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (localhost [127.0.0.1]) by lurza.secnetix.de (8.14.3/8.14.3) with ESMTP id n46H7j68042943; Wed, 6 May 2009 19:08:08 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.14.3/8.14.3/Submit) id n46H7jqs042942; Wed, 6 May 2009 19:07:45 +0200 (CEST) (envelope-from olli) Date: Wed, 6 May 2009 19:07:45 +0200 (CEST) Message-Id: <200905061707.n46H7jqs042942@lurza.secnetix.de> From: Oliver Fromme To: freebsd-standards@FreeBSD.ORG, juli@clockworksquid.com In-Reply-To: <18945.44648.875780.605560@khavrinen.csail.mit.edu> X-Newsgroups: list.freebsd-standards User-Agent: tin/1.8.3-20070201 ("Scotasay") (UNIX) (FreeBSD/6.4-PRERELEASE-20080904 (i386)) MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Wed, 06 May 2009 19:08:08 +0200 (CEST) Cc: Subject: Re: Shouldn't cat(1) use the C locale? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 May 2009 17:08:12 -0000 Garrett Wollman wrote: > This is a Bad Idea. cat -v ought to work properly when the input does > not consist of "characters" at all. It depends on your definition of properly. For me, it already does work properly (using an ISO8859 locale). It also works properly for people using a US-ASCII (or C) locale. It does not seem to work properly for Juli who is using a multibyte UTF locale. Normally cat is agnostic of the encoding of its input data, because it is handled like binary data. But if the -v option is used, it has to actually look at the data in order to decide what is printable and what is not. This has two consequences: First, it has to know the encoding of the input, and second, it has to know what is considered "printable". The problem is that cat has no knowledge of the encoding of its input data. Strictly speaking, the locale (LC_CTYPE) specifies only the properties of the output device. Furthermore, conversion between different encodings would be beyond the scope of cat (there are other tools for this). Therefore the only reasonable thing to do is to assume that input and output use the same encoding. So, if you're working in a UTF locale and use cat to display a file to the screen, that file should be UTF-encoded or UTF-compatible (such as US-ASCII), otherwise it will look wrong, no matter if you use the -v option or not. The same is true for binary files. For example, if you have a binary with embedded ISO8859 strings that you want to display on a UTF8 terminal, then the following works: LC_CTYPE=en_US.ISO8859-1 cat -v file | recode iso8859-1..utf8 It correctly displays German Umlauts and some other characters, but escapes 8bit characters that are non-printable in the ISO8859-1 locale. If you want to filter for US-ASCII characters only, then it's even easier because UTF8 is US-ASCII-compatible, so you don't need to use recode: LC_CTYPE=C cat -v file If you don't use a multibyte locale, and if your files aren't multibyte encoded either, then you don't have any of the above problems, of course, and cat will work either way. Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M. Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung: secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün- chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd "If Java had true garbage collection, most programs would delete themselves upon execution." -- Robert Sewell From owner-freebsd-standards@FreeBSD.ORG Wed May 6 17:43:14 2009 Return-Path: Delivered-To: freebsd-standards@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B43021065674 for ; Wed, 6 May 2009 17:43:14 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [128.30.28.20]) by mx1.freebsd.org (Postfix) with ESMTP id 6EAFC8FC1D for ; Wed, 6 May 2009 17:43:13 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.3/8.14.3) with ESMTP id n46Hh5rW098511 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Wed, 6 May 2009 13:43:05 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.3/8.14.3/Submit) id n46Hh5w0098508; Wed, 6 May 2009 13:43:05 -0400 (EDT) (envelope-from wollman) From: Garrett Wollman MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18945.52265.44038.498643@khavrinen.csail.mit.edu> Date: Wed, 6 May 2009 13:43:05 -0400 To: Oliver Fromme In-Reply-To: <200905061707.n46H7jqs042942@lurza.secnetix.de> References: <18945.44648.875780.605560@khavrinen.csail.mit.edu> <200905061707.n46H7jqs042942@lurza.secnetix.de> X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.0.1 (khavrinen.csail.mit.edu [127.0.0.1]); Wed, 06 May 2009 13:43:05 -0400 (EDT) Cc: freebsd-standards@freebsd.org, juli@clockworksquid.com Subject: Re: Shouldn't cat(1) use the C locale? X-BeenThere: freebsd-standards@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Standards compliance List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 May 2009 17:43:15 -0000 < said: > Normally cat is agnostic of the encoding of its input data, > because it is handled like binary data. But if the -v > option is used, it has to actually look at the data in > order to decide what is printable and what is not. > This has two consequences: First, it has to know the > encoding of the input, and second, it has to know what > is considered "printable". I think that should be fairly obvious: the input is a stream of bytes, which may or may not encode characters in any locale. > The same is true for binary files. For example, if you have > a binary with embedded ISO8859 strings that you want to display > on a UTF8 terminal, then the following works: > LC_CTYPE=en_US.ISO8859-1 cat -v file | recode iso8859-1..utf8 > It correctly displays German Umlauts and some other characters, > but escapes 8bit characters that are non-printable in the > ISO8859-1 locale. Now try the same thing on a binary with UTF-8 strings in it. (UTF-8 at least gives you a validity constraint on possible multibyte characters, which arbitrary multibyte encodings do not necessarily provide. This mitigates the "reading frame" problem, because the first byte of an actual UTF-8 character cannot be the n'th byte of any UTF-8 character.) -GAWollman