From owner-freebsd-hackers@freebsd.org Wed Jun 20 02:46:36 2018 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D7852101FCED for ; Wed, 20 Jun 2018 02:46:35 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: from mail-it0-f42.google.com (mail-it0-f42.google.com [209.85.214.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 767D882328 for ; Wed, 20 Jun 2018 02:46:35 +0000 (UTC) (envelope-from cse.cem@gmail.com) Received: by mail-it0-f42.google.com with SMTP id 16-v6so3359954itl.5 for ; Tue, 19 Jun 2018 19:46:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=f5gQIXiFTwYrNITA06x1cwUhCzmijF4al+l1KekEZkA=; b=FedPJ45tXxrH4JEyHKUxJcfc2vlbD9SqbQ8wLldwlwWCCTyrAFHTVtGE9vRjv/DieE mY+dWAbToanFBYAAVRZUMIui7NHIb7z7tM6wo1h50Bc8wi7z1x9UyXd/Nni4ariImhlV e5zM2aTiFTWgKiGmBfhNu39TMn7QC6k4Nw2/nm3CAd72wzxyltwn8fRyhcN0VPBeBJmV Z/AiHc7ABE6USyN1We8s6B//NvjUE4n+7Q+p9LHHwevXRq7aStOWhyVkRD1YOm3yPRIi LkP0XLHBCDxyujw7d4TspUQRdwKRd7/5X1APWVzNgUciexnbfcWbzQyqCpEyP+rA3TI/ 8p1A== X-Gm-Message-State: APt69E2EqFEUYs5+PK13Z+SdX9yIB0uGuW/sjtZMe8CNfR4hAbsMs3J3 gNOqNN5N6O0piaQv5d0t4PZkGxqy X-Google-Smtp-Source: ADUXVKJbh+3vv2stCU+BnrAntnuBb33wmwsq0YWNzDhyndhmzH7GjWFWhWpsf2ljhFqfGJS3Sy0AjA== X-Received: by 2002:a02:93:: with SMTP id 141-v6mr16000544jaa.4.1529462789388; Tue, 19 Jun 2018 19:46:29 -0700 (PDT) Received: from mail-it0-f42.google.com (mail-it0-f42.google.com. [209.85.214.42]) by smtp.gmail.com with ESMTPSA id a200-v6sm618511ioa.56.2018.06.19.19.46.29 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 Jun 2018 19:46:29 -0700 (PDT) Received: by mail-it0-f42.google.com with SMTP id l6-v6so3443017iti.2 for ; Tue, 19 Jun 2018 19:46:29 -0700 (PDT) X-Received: by 2002:a24:100f:: with SMTP id 15-v6mr161690ity.61.1529462788930; Tue, 19 Jun 2018 19:46:28 -0700 (PDT) MIME-Version: 1.0 References: <20180201072831.GA2239@c720-r314251> <20180202035130.C51F8156E80B@mail.bitblocks.com> In-Reply-To: Reply-To: cem@freebsd.org From: Conrad Meyer Date: Tue, 19 Jun 2018 19:46:18 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Printing UTF-8 characters To: Farhan Khan Cc: freebsd-hackers@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.26 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2018 02:46:36 -0000 You want LC_CTYPE. On Tue, Jun 19, 2018 at 6:38 PM Farhan Khan wrote: > On Thu, Feb 1, 2018 at 10:51 PM, Bakul Shah wrote: > > On Thu, 01 Feb 2018 10:42:36 -0500 Farhan Khan wrote: > >> Sorry, that was a poorly phrased question on my part. Let me try again. > >> I am trying to make text align in columns in a terminal. My > >> understanding is that characters above 0x7E are 3 bytes in length. A > >> modern terminal will render that as either a single question-mark or > >> the character itself, making terminal column alignment easy. But how > >> would an older terminal display a 3-byte character? I am worried that > >> would render as 3 question marks and throw off column alignment. If > >> so, is there a proper way to perform alignment for both newer and > >> older terminals? > > > > UTF-8 can use upto 4 bytes to encode a unicode point, > > depending on the script. > > > > For what you want, you can use openoffice like programs that > > understand unicode and can do complex text layout. Normal > > terminal programs typically use monospace (fixed width) fonts > > are simply not capable of what you want. The assumption that > > one char means one rectangular cell on the screen is too > > deeply woven in them. Particularly for Indic languages this > > just doesn't work, You may have N unicode points, each of > > which require 3 bytes, all together map to a one single glyph. > > Hi all, > > To follow-up from my earlier poorly asked question from a few months > back, how do I determine if the terminal is capable of printing UTF-8 > encoded strings and/or unicode in general? > The obvious answer is to check the LANG variable via getenv(3), but > what if you are using "en_US.UTF-8" vs "en_GB.UTF-8"? Should I just > check for the string "UTF-8" in the LANG variable? > > My concern is printing characters above 0x7F on terminals/encodings > that are not capable of displaying them, resulting in unusual > behavior. > > Thanks, > > -- > Farhan Khan > PGP Fingerprint: B28D 2726 E2BC A97E 3854 5ABE 9A9F 00BC D525 16EE > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" >