From owner-freebsd-questions@freebsd.org Sat Mar 17 14:33:14 2018 Return-Path: Delivered-To: freebsd-questions@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A5024F4E0E3 for ; Sat, 17 Mar 2018 14:33:14 +0000 (UTC) (envelope-from Lena@lena.kiev.ua) Received: from lena.kiev.ua (lena.kiev.ua [212.109.198.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7A4617EC50 for ; Sat, 17 Mar 2018 14:33:12 +0000 (UTC) (envelope-from Lena@lena.kiev.ua) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lena.kiev.ua; s=3; h=In-Reply-To:Content-Type:Mime-Version:References: Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=sZizS4l/gCdlK1135ckyYLEkOg+V3R+JUIQ0t6SUTUU=; b=h7b3W67Eh4LFRMt8ONYPGraJIr mXfpVQwzimY6R+NTP6jpgj0rM5xneWX+fJu033yzUNwABYQSmlo8E+8tuOnHA8NJKyhwqiP0Zqq3E x3w+nlFgHnFtPNak6sotPY1wLwnIuquIqvF3B/ry/LyvcUi5ZLqubgnknhZQXsPM5spU=; Received: from ip-1cd0.rusanovka-net.kiev.ua ([94.244.28.208] helo=bedside.lena.kiev.ua) by lena.kiev.ua with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1 (FreeBSD)) (envelope-from ) id 1exCtU-0008aX-IW for freebsd-questions@FreeBSD.org; Sat, 17 Mar 2018 16:33:13 +0200 Received: from bedside.lena.kiev.ua (localhost.lena.kiev.ua [127.0.0.1]) by bedside.lena.kiev.ua (8.15.2/8.15.2) with ESMTP id w2HEX4Ha014970 for ; Sat, 17 Mar 2018 16:33:04 +0200 (EET) (envelope-from Lena@lena.kiev.ua) Received: (from lena@localhost) by bedside.lena.kiev.ua (8.15.2/8.15.2/Submit) id w2HEX3vT014969 for freebsd-questions@FreeBSD.org; Sat, 17 Mar 2018 16:33:03 +0200 (EET) (envelope-from Lena@lena.kiev.ua) Date: Sat, 17 Mar 2018 16:33:03 +0200 From: Lena@lena.kiev.ua To: freebsd-questions@FreeBSD.org Subject: Re: printable characters in LC_CTYPE Message-ID: <20180317143303.GA2281@lena.kiev> References: <20180315232301.GA804@lena.kiev> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180315232301.GA804@lena.kiev> User-Agent: Mutt/1.4.2.3i X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 17 Mar 2018 14:33:14 -0000 I wrote: > In FreeBSD 10 or less (like 8.4) the file > /usr/src/share/mklocale/ru_RU.KOI8-R.src > contains a list of printable characters: > > PRINT 0x20 - 0x7e 0x80 - 0xff > > mklocale(1) converted that source file to binary file > /usr/share/locale/ru_RU.KOI8-R/LC_CTYPE > Apparently, this file is used by iswprint(3) > > In FreeBSD 11.1 `man mklocale` says > "mklocale has been replaced by localedef(1) in FreeBSD 11.0", > and list of printable characters for LC_CTYPE=ru_RU.KOI8-R is different, > excludes characters in 0x80-0xBF range including 0x9A (non-breaking space). > As a consequence, ports mail/mutt14 and editors/aee > which use iswprint(3) > show garbage (escape sequences) instead of non-breaking space > and some other characters. > I need to change the binary file /usr/share/locale/ru_RU.KOI8-R/LC_CTYPE > but how? > I installed FreeBSD 11.1 i386 from memstick.img, downloaded sources. > What to feed to localedef(1), where are the source files? > Are they /usr/src/tools/tools/locale/etc/common.UTF-8.src > and /usr/src/tools/tools/locale/etc/final-maps/map.KOI8-R ? > But they seem to not specify which characters are printable. I found definitions in /usr/include/_ctype.h : #define _CTYPE_A 0x00000100L /* Alpha */ #define _CTYPE_C 0x00000200L /* Control */ #define _CTYPE_D 0x00000400L /* Digit */ #define _CTYPE_G 0x00000800L /* Graph */ #define _CTYPE_L 0x00001000L /* Lower */ #define _CTYPE_P 0x00002000L /* Punct */ #define _CTYPE_S 0x00004000L /* Space */ #define _CTYPE_U 0x00008000L /* Upper */ #define _CTYPE_X 0x00010000L /* X digit */ #define _CTYPE_B 0x00020000L /* Blank */ #define _CTYPE_R 0x00040000L /* Print */ #define _CTYPE_I 0x00080000L /* Ideogram */ #define _CTYPE_T 0x00100000L /* Special */ #define _CTYPE_Q 0x00200000L /* Phonogram */ #define _CTYPE_N 0x00400000L /* Number (superset of digit) */ and an array of 256 4-byte words in the binary file /usr/share/locale/ru_RU.KOI8-R/LC_CTYPE at offset 0x28, with reverse (little-endian) byte order. The array corresponds to characters 0x00 - 0xFF, for example 4 bytes 00 02 00 00 (0x00000200) at offset 0x028 for char 0x00 (a control character), 00 40 06 00 (0x00064000) at offset 0x0A8 for char 0x20 (blank), 00 00 00 00 (0x00000000) at offset 0x290 for char 0x9A (non-breaking space). Using `mcedit` from port misc/mc (F4 key in Midnight Commander), I edited that binary file in hex mode: changed each "** ** *0 **" to "** ** *4 **" at offsets 0x228 - 0x424 (128 times) in order to mark characters 0x80 - 0xFF (including non-breaking space 0x9A) as printable. The same with /usr/share/locale/uk_UA.KOI8-U/LC_CTYPE Also a patch for port mail/mutt14 : --- pager.c.orig 2007-05-23 04:17:53.000000000 +0300 +++ pager.c 2018-03-17 11:08:28.792617000 +0200 @@ -1144,6 +1144,8 @@ else col = t; } + else if (wc == '\r' /* ^M CR 0x0d */ && buf[ch+1] == '\n') + ; else if (wc < 0x20 || wc == 0x7f) { if (col + 2 > wrap_cols) @@ -1154,11 +1156,11 @@ } else if (wc < 0x100) { - if (col + 4 > wrap_cols) + if (col + 1 > wrap_cols) break; - col += 4; + col += 1; if (pa) - printw ("\\%03o", wc); + addch (wc); } else { --- protos.h.orig 2007-05-23 04:17:53.000000000 +0300 +++ protos.h 2018-03-17 10:55:09.328186000 +0200 @@ -343,14 +343,14 @@ #ifdef LOCALES_HACK #define IsPrint(c) (isprint((unsigned char)(c)) || \ - ((unsigned char)(c) >= 0xa0)) -#define IsWPrint(wc) (iswprint(wc) || wc >= 0xa0) + ((unsigned char)(c) >= 0x80)) +#define IsWPrint(wc) (iswprint(wc) || wc >= 0x80) #else #define IsPrint(c) (isprint((unsigned char)(c)) || \ (option (OPTLOCALES) ? 0 : \ - ((unsigned char)(c) >= 0xa0))) + ((unsigned char)(c) >= 0x80))) #define IsWPrint(wc) (iswprint(wc) || \ - (option (OPTLOCALES) ? 0 : (wc >= 0xa0))) + (option (OPTLOCALES) ? 0 : (wc >= 0x80))) #endif #define new_pattern() safe_calloc(1, sizeof (pattern_t))