From owner-freebsd-hackers@freebsd.org Wed Jun 20 04:21:19 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59CB71024220 for ; Wed, 20 Jun 2018 04:21:19 +0000 (UTC) (envelope-from khanzf@gmail.com) Received: from mail-it0-x232.google.com (mail-it0-x232.google.com [IPv6:2607:f8b0:4001:c0b::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DDE4E8603D; Wed, 20 Jun 2018 04:21:18 +0000 (UTC) (envelope-from khanzf@gmail.com) Received: by mail-it0-x232.google.com with SMTP id v83-v6so3656744itc.3; Tue, 19 Jun 2018 21:21:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=NYlI8P3OG3TlUW0O3TjtoFFPb7OEHkv5+tDXLBb7TKw=; b=tfqAB0kOJbTRkIxJU3vTDsHptbzUSHR4DPbgk4blS5Qgji7ZzmJcINksI8Lo5NYxJH N2G0qbvAoGHZosKWn5j7g2b1JWBpwxc3uKRTh8o6CH3jPAfK2bftLf1B3YB0qLK/bakb 2B6fDWvcUy8DfOjaykb4dN8MX/egt2PsNsU3fsWLaSYWbVPyzWHxXvCcIk4UvsyNhLQi XEni/UrgSh9yvleB/s6Ozz8/8M8XqOwzNXCq1RYqWFIyizTizKO/TQoUHtzY4Gv3rVF5 JYlaHDZkFxLGCWbi2dIdYJrIfaFGr+1lNkGQAQHQVcm7nzeip+qoUBRKvrcu+Rq81Ws/ Hg1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=NYlI8P3OG3TlUW0O3TjtoFFPb7OEHkv5+tDXLBb7TKw=; b=TaLj645gcjPQLtpzgnLzwDBsaoNfCtGCdipp3GZiwbtTscgZJvZ2XAxoIMUmWcku4h gBCZ/dBITHVEF0hkodQZJsixTKL2WVhyPfxtH3MGYKgcs3trxmdz56kn08t9QMH0dNzx VP2leidlEIwduWsQmOGVJglImcMEue8vdAQ/SqXCJKcl143oFgKUPxO7f4WkD8uAsFkY QaSb4mtLPU/EeLkTD+gUqDCKKqeME2hMN9EwaTpeFTx0x8Zc+dCQhNs2jE+HS0DvbK0e 7LIVMjTOW/lMEGX2ukacP2yl3ZushT3wptYbr47YUie7sLKe9Edh2vm8LyyzVSct/kwa Goqg== X-Gm-Message-State: APt69E1I5tOCb0HMI0DQNUvxpSwXPWGYpXmlm7lzRao/x2JFuQ7IfTDR Uy2phuhlwcIAiTL0BE0fMifhpAHJWQJuNYh1cOYOKQ== X-Google-Smtp-Source: ADUXVKJTHZE/P9h6NlpeUwHeeNoAcFXDxJU3BohUY66Gj6bJW3h7AqdCX4XU0OPUxH49SPY/fzUuKWGY0U4WyO0j8FQ= X-Received: by 2002:a24:e51a:: with SMTP id g26-v6mr312089iti.43.1529468478026; Tue, 19 Jun 2018 21:21:18 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac0:a148:0:0:0:0:0 with HTTP; Tue, 19 Jun 2018 21:20:57 -0700 (PDT) In-Reply-To: References: <20180201072831.GA2239@c720-r314251> <20180202035130.C51F8156E80B@mail.bitblocks.com> From: Farhan Khan Date: Wed, 20 Jun 2018 00:20:57 -0400 Message-ID: Subject: Re: Printing UTF-8 characters To: cem@freebsd.org Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.26 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2018 04:21:19 -0000 On Tue, Jun 19, 2018, 10:46 PM Conrad Meyer wrote: > You want LC_CTYPE. > > On Tue, Jun 19, 2018 at 6:38 PM Farhan Khan wrote: > >> On Thu, Feb 1, 2018 at 10:51 PM, Bakul Shah wrote: >> > On Thu, 01 Feb 2018 10:42:36 -0500 Farhan Khan >> wrote: >> >> Sorry, that was a poorly phrased question on my part. Let me try again. >> >> I am trying to make text align in columns in a terminal. My >> >> understanding is that characters above 0x7E are 3 bytes in length. A >> >> modern terminal will render that as either a single question-mark or >> >> the character itself, making terminal column alignment easy. But how >> >> would an older terminal display a 3-byte character? I am worried that >> >> would render as 3 question marks and throw off column alignment. If >> >> so, is there a proper way to perform alignment for both newer and >> >> older terminals? >> > >> > UTF-8 can use upto 4 bytes to encode a unicode point, >> > depending on the script. >> > >> > For what you want, you can use openoffice like programs that >> > understand unicode and can do complex text layout. Normal >> > terminal programs typically use monospace (fixed width) fonts >> > are simply not capable of what you want. The assumption that >> > one char means one rectangular cell on the screen is too >> > deeply woven in them. Particularly for Indic languages this >> > just doesn't work, You may have N unicode points, each of >> > which require 3 bytes, all together map to a one single glyph. >> >> Hi all, >> >> To follow-up from my earlier poorly asked question from a few months >> back, how do I determine if the terminal is capable of printing UTF-8 >> encoded strings and/or unicode in general? >> The obvious answer is to check the LANG variable via getenv(3), but >> what if you are using "en_US.UTF-8" vs "en_GB.UTF-8"? Should I just >> check for the string "UTF-8" in the LANG variable? >> >> My concern is printing characters above 0x7F on terminals/encodings >> that are not capable of displaying them, resulting in unusual >> behavior. >> >> Thanks, >> >> -- >> Farhan Khan >> PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE >> _______________________________________________ >> freebsd-hackers@freebsd.org mailing list >> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers >> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org >> " >> > Thanks Conrad! I looked up exactly how locale(1) worked. Similar to what you suggested, locale(1) did essentially this: setlocale(LC_ALL, ""); charset = nl_langinfo(CODESET); The final product was 'charset'. Thanks! >