From nobody Fri Nov 3 13:23:02 2023 X-Original-To: freebsd-arch@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4SMLz96FlQz5083C for ; Fri, 3 Nov 2023 13:23:17 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4SMLz94db0z4c0l for ; Fri, 3 Nov 2023 13:23:17 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-9dd3f4a0f5aso67147866b.1 for ; Fri, 03 Nov 2023 06:23:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1699017795; x=1699622595; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8fi0gT9Bin8I6uIoomxlDYzxGRWctvV/YrA1CSx+ZrI=; b=NGDkLlylvm8pN5RhAKLcnK46ff3RLiNGouDM/IYz68QxCHOYuWNCaBCJNchbqQzL25 tOyTJOAqG7MpG+siJJVE3DUWpttTNw18FO7PMUgKPcI4Zi5npjmj/r3tJ/NXtBzoecMi 42MXggKsegYlaGgymzd2MMHP5020UVljaTw2mQ0Y30jYwYmwrzb+YRTHtGBgzN+MQJmf DapU6+m0SRnU+8IuVyGfEFTYL0u4MiuKY4lUChV8Xeq30mpya/loPZ57uNWclvhhUR7o TSACnxNDTHC7fjEnZi6C/Jj4Awdu2uCcPy1b7KSznj1/PX5avFCVW+yIprYfZ8xBmRL9 SuJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699017795; x=1699622595; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8fi0gT9Bin8I6uIoomxlDYzxGRWctvV/YrA1CSx+ZrI=; b=Z07Q4gpRkKOdK87TxA2+jgIJmCiawO3diOlRasg0zH750kmkU9OzrUwPiN8hv8Imgm AVIbaIm9KnvR5l3LEh8+4Y3AVzNcBtleL2Rk4j/SntxBVG58dPmmW9C71WZ2V/xY0iCO +gcNj/qwe1YgwknuNfg4/A7gX+klBs2YMJhbvH78GW/pjFhQgMxSXT1wjYP6dvk1BtZw FkAZBOKTDxhrdnsAmSV7G1qGoCJulERS8h5WaYBtvrm5xQrkru3Xj7K+lX36MfEIGKKD S4gmjIto5PqxWZGV9r0eY+TkVQAk/9i1LDrFpQWR6Weaq/HsKNx6SnXAdGvGBfPYEuKR 0l3A== X-Gm-Message-State: AOJu0YxjWowNEq1OZVUT4c+JW3MNIIA6BO5ZsZB45fzVaKj936OeQCVR Rbz8vpbLMnVLcO++eOiTKZ9ysz4EFx9MMEgHRBariA== X-Google-Smtp-Source: AGHT+IGkq325+qHJVzT4muU3U82+Wg/BrY6bwryrARU/1ZG6yGD1QoCKNgjd1N6PvscYfEt26jC5zsWz8Ui4LNwMipw= X-Received: by 2002:a17:907:944e:b0:9ae:4054:5d2a with SMTP id dl14-20020a170907944e00b009ae40545d2amr6277903ejc.16.1699017794580; Fri, 03 Nov 2023 06:23:14 -0700 (PDT) List-Id: Discussion related to FreeBSD architecture List-Archive: https://lists.freebsd.org/archives/freebsd-arch List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arch@freebsd.org MIME-Version: 1.0 References: <20231103081529.016be29d@ernst.home> In-Reply-To: <20231103081529.016be29d@ernst.home> From: Warner Losh Date: Fri, 3 Nov 2023 07:23:02 -0600 Message-ID: Subject: Re: HEADS UP: IUTF8 to be enabled by default To: garyj@gmx.de Cc: Christos Margiolis , "freebsd-arch@freebsd.org" , bojan.novkovic@fer.hr, Warner Losh Content-Type: multipart/alternative; boundary="0000000000001d05db06093f6892" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4SMLz94db0z4c0l --0000000000001d05db06093f6892 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Nov 3, 2023, 1:15 AM Gary Jennejohn wrote: > On Thu, 2 Nov 2023 21:43:32 +0200 > Christos Margiolis wrote: > > > Hello again and sorry for the poorly worded previous email, > > > > To give a bit more context, during EuroBSDCon 2023, me and Bojan > > Novkovi? started working on a patch to fix backspacing of UTF-8 > > characters in the tty driver. What was happening is if you typed a >1 > > byte UTF-8 character and then backspaced it, the driver would actually > > delete only 1 byte from the character, instead of all its bytes, which > > ended up leaving garbage in the buffer since the character wasn't fully > > deleted. To test this, run cat(1), type a UTF-8 character (e.g =C3=A9, = =C3=A8, =C3=A0, > > non-latin characters, etc), press backspace only once, and look at the > > output: > > > > $ cat > > ?? > > ?? > > > > Bojan then implemented a new IUTF8 flag for stty [1], which enables > > proper handling for UTF-8 backspacing in the tty driver [2]. > > > > In the Phabricator review of the tty(4) patch [3], I proposed the idea > > of having the IUTF8 flag enabled by default. imp@ mentioned that since > > the default locale is UTF-8, having the flag set by default shouldn't b= e > > a problem. > > > > Two possible solutions I have thought of: > > > > 1. Add IUTF8 to TTYDEF_IFLAG in sys/sys/ttydefaults.h. > > 2. Add a check in tty_init_termios() whether the current locale is > > UTF-8 (how?), and enable it there. > > > > Use getenv("LANG") and check whether UTF-8 is part of the string? > This string is set too late for the default. Also, drivers don't have access to process data. Warner My LANG is set to C.UTF-8, for example. > > > What do you think? Could this change cause any side-effects we haven't > > thought about? > > > > Christos > > > > [1] > https://cgit.freebsd.org/src/commit/?id=3D128f63cedc14ae21b35f74e11e2fe1a= 5659c58e8 > > [2] > https://cgit.freebsd.org/src/commit/?id=3D9e589b0938579f3f4d89fa5c051f845= bf754184d > > [3] https://reviews.freebsd.org/D42067 > > > > > -- > Gary Jennejohn > --0000000000001d05db06093f6892 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Fri, Nov 3, 2023, 1:15 AM Gary Jennejohn <garyj@gmx.de> wrote:
On Thu, 2 Nov 2023 21:43:32 +0200
Christos Margiolis <christos@freebsd.org> wrote:

> Hello again and sorry for the poorly worded previous email,
>
> To give a bit more context, during EuroBSDCon 2023, me and Bojan
> Novkovi? started working on a patch to fix backspacing of UTF-8
> characters in the tty driver. What was happening is if you typed a >= ;1
> byte UTF-8 character and then backspaced it, the driver would actually=
> delete only 1 byte from the character, instead of all its bytes, which=
> ended up leaving garbage in the buffer since the character wasn't = fully
> deleted. To test this, run cat(1), type a UTF-8 character (e.g =C3=A9,= =C3=A8, =C3=A0,
> non-latin characters, etc), press backspace only once, and look at the=
> output:
>
> $ cat
> ??<backspace>
> ??
>
> Bojan then implemented a new IUTF8 flag for stty [1], which enables > proper handling for UTF-8 backspacing in the tty driver [2].
>
> In the Phabricator review of the tty(4) patch [3], I proposed the idea=
> of having the IUTF8 flag enabled by default. imp@ mentioned that since=
> the default locale is UTF-8, having the flag set by default shouldn= 9;t be
> a problem.
>
> Two possible solutions I have thought of:
>
> 1. Add IUTF8 to TTYDEF_IFLAG in sys/sys/ttydefaults.h.
> 2. Add a check in tty_init_termios() whether the current locale is
>=C2=A0 =C2=A0 UTF-8 (how?), and enable it there.
>

Use getenv("LANG") and check whether UTF-8 is part of the string?=

= This string is set too late for the default. Also, drivers don't have a= ccess to process data.

W= arner

My LANG is set to C.UTF-8, for example.

> What do you think? Could this change cause any side-effects we haven&#= 39;t
> thought about?
>
> Christos
>
> [1] https://cgit.freebsd.org/src/commit/?id=3D128f63cedc14ae21b35f74e11e2fe1a= 5659c58e8
> [2] https://cgit.freebsd.org/src/commit/?id=3D9e589b0938579f3f4d89fa5c051f845= bf754184d
> [3] https://reviews.freebsd.org/D42067
>


--
Gary Jennejohn
--0000000000001d05db06093f6892--