From owner-freebsd-hackers@FreeBSD.ORG Wed Nov 13 18:55:55 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 8ED06623; Wed, 13 Nov 2013 18:55:55 +0000 (UTC) Received: from land.berklix.org (land.berklix.org [144.76.10.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 1B90C279D; Wed, 13 Nov 2013 18:55:54 +0000 (UTC) Received: from mart.js.berklix.net (pD9FBF93E.dip0.t-ipconnect.de [217.251.249.62]) (authenticated bits=128) by land.berklix.org (8.14.5/8.14.5) with ESMTP id rADHnl65034739; Wed, 13 Nov 2013 17:49:47 GMT (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (fire.js.berklix.net [192.168.91.41]) by mart.js.berklix.net (8.14.3/8.14.3) with ESMTP id rADHnYGD046696; Wed, 13 Nov 2013 18:49:34 +0100 (CET) (envelope-from jhs@berklix.com) Received: from fire.js.berklix.net (localhost.js.berklix.net [127.0.0.1]) by fire.js.berklix.net (8.14.4/8.14.4) with ESMTP id rADHmpxG084992; Wed, 13 Nov 2013 18:48:57 +0100 (CET) (envelope-from jhs@fire.js.berklix.net) Message-Id: <201311131748.rADHmpxG084992@fire.js.berklix.net> to: hackers@freebsd.org Subject: Re: patch for /usr/src/usr.bin/fmt/ (not 8 bit clean) for German & French From: "Julian H. Stacey" Organization: http://berklix.com BSD Unix Linux Consultancy, Munich Germany User-agent: EXMH on FreeBSD http://berklix.com/free/ X-URL: http://www.berklix.com In-reply-to: Your message "Tue, 12 Nov 2013 21:17:37 +0100." <20131112201737.GA52200@lorvorc.mips.inka.de> Date: Wed, 13 Nov 2013 18:48:51 +0100 Sender: jhs@berklix.com Cc: Jordan Hubbard , FreeBSD-gnats-submit@freebsd.org, "Bernhard Riedel \(Work\)" , Astrid Jekat , Christian Weisgerber X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Nov 2013 18:55:55 -0000 Christian Weisgerber wrote: > Julian H. Stacey: > > > I don't know about ISO 8859-1 and UTF-8, (I dislike & avoid > > national char set stuff as much as possible), but I want > > That is your problem right there. My perspective & experience or `problem' as you mislabel it, is I was supporting Unix Internationalisation back in 1985, & long since tired of agravating German umlauts issues (Umlauts even back then had AE OE UE [& SS] replacements but few used them). Your problem is being German you had an incentive to attain umlauts, & probably being younger, wasted less time achieving umlauts going straight to the since available UTF; but myopic that others may be averse to waste more time for superflous national oddities that cleaner Roman derivatives like Italian & English etc find superfluous. It seemed best to make fmt.c 8 bit clean[er], to help process arbitrary text, harm no one, & not disturb users of eg UTF. Your problem is you would obstruct a cleaner fmt, so fmt continues to fail until users are forced to waste their time too like you did, reading & configuring internationalisation variables some don't need. ** > > to be able to edit files that simultaneously contain eg all > > of English German & French etc, so setting some var to eg > > just German would be inappropriate. 8 bit clean would be ideal, > > next best would be my patches I suppose. > > You MUST define a character set for this. "8-bit clean" is meaningless > for a tool that deals with runs of characters. Without a defined > character set, you have no idea what those bytes mean. Is 0x90 a Not true. See below. ** > printable character? Is it a control character? Is it part of a > multibyte character? > > And setting, for example, LC_CTYPE=de_DE.ISO8859-1 does in no way > limit you to German. For LC_CTYPE purposes, the language/country > part of the locale specification isn't used. > > This is definitely a PEBKAC. Avoid junk acronyms. Re-Read original post http://lists.freebsd.org/pipermail/freebsd-hackers/2010-May/031901.html Particularly: Example: Pasting notes into an xterm, clauses from http://seafrance.com in English then French original & German, to get the feel of what an unclear English translation **: Sometimes I mouse paste from Firefox in English, French, German & other languages, making notes in a single file with vi in an xterm, all with standard env. no Locale. & it edits OK in vi, & displays with cat in xterm, till !}fmt in vi wraps long lines, when fmt breaks it. So I fixed fmt. It would Not be appropriate to set a German locale, nor a French etc. Other utils might misbehave now or later See eg man sort re LC_ALL. No way I'd keep exiting vi & resetting LC_CTYPE between mouse pastes from different language pages, The default American works fine. I'm not bothered if vi+xterm might mis-display some odd accent, as I can see something is there, so long as fmt does not strip the accent, but FreeBSD fmt.c Does strip the French accents & German umlauts, that's why I fixed fmt.c Summary: Making fmt.c 8 bit cleaner would not break UTF & unicode I believe so no reason to object to removal of fmt.c '& 0x7f' cruft etc. Cheers, Julian -- Julian Stacey, BSD Unix Linux C Sys Eng Consultant, Munich http://berklix.com Interleave replies below like a play script. Indent old text with "> ". Send plain text, not quoted-printable, HTML, base64, or multipart/alternative. Extradite NSA spy chief Alexander. http://berklix.eu/jhs/blog/2013_10_30