Date: Sat, 25 Jun 2022 01:51:50 +0900 From: Tomoaki AOKI <junchoon@dec.sakura.ne.jp> To: Hans Petter Selasky <hps@selasky.org> Cc: Ivan Quitschal <tezeka@hotmail.com>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Kurt Jaeger <pi@freebsd.org> Subject: Re: vt newcons mouse paste issue FIXED Message-ID: <20220625015150.4f57017e7098ea591f57bd2a@dec.sakura.ne.jp> In-Reply-To: <5196d98c-7b3a-55b4-3ef7-227b19b66721@selasky.org> References: <CP6P284MB1900CA1ED5B5BADE054ECB34CBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <f6c1ee1c-bdd9-c8d6-1385-145022e6765d@selasky.org> <CP6P284MB1900CC7B7F6343DAB1D1E5BCCBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <41ef5c38-515f-739a-cb47-7cab0e609526@selasky.org> <CP6P284MB1900DD3D6F41CBAF38CF2CA4CBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <20220623014847.067b18a5ba388639cf6009ce@dec.sakura.ne.jp> <fd0f9de9-98ac-87b4-2c9d-5fdc27bdb3c4@selasky.org> <CP6P284MB1900794465902392ACDA36C2CBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <b868c9f4-0a2b-00b9-30e0-d612d17d4bba@selasky.org> <CP6P284MB1900578BBC31730413B30643CBB59@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <790bd76d-890f-cf09-a30d-c2e5fba91ec5@selasky.org> <20220624230215.82e02ac661cd7594624a1845@dec.sakura.ne.jp> <5bd74766-f2f0-3df2-0e8c-adabd110f913@selasky.org> <5196d98c-7b3a-55b4-3ef7-227b19b66721@selasky.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 24 Jun 2022 17:29:26 +0200 Hans Petter Selasky <hps@selasky.org> wrote: > Hi Tomoaki, > > On 6/24/22 16:48, Hans Petter Selasky wrote: > > IDEOGRAPHIC (Full-width) SPACE > > According to this page: > > https://jkorpela.fi/chars/spaces.html > > There are multiple uni-code characters which are spaces. Should we > support them all? > > --HPS Nice page! Maybe not all. My guess based on "Sample" and "Width of the character" fields are as below. At the first column, 'Y': Should be treated as space / word separator 'N': Should NOT be treated as space / word separator 'U': Unknown for me. Need native speaker to determine. Maybe someone have objections, but basically I've considered breakable spaces as space characters. See also URL [1] below. Special cases: *Looking sample, U+1680 is shown as dash so considered 'N'. *Treated "QUAD" as just a graphical (non-semantic) use so considered as 'N'. *Considered U+205F as 'N', as I thought, for mathematical usage, unintended line break could cause fatal confusion. Code Name of the character Y U+0020 SPACE N U+00A0 NO-BREAK SPACE N U+1680 OGHAM SPACE MARK Y U+180E MONGOLIAN VOWEL SEPARATOR N U+2000 EN QUAD N U+2001 EM QUAD Y U+2002 EN SPACE (nut) Y U+2003 EM SPACE (mutton) Y U+2004 THREE-PER-EM SPACE (thick space) Y U+2005 FOUR-PER-EM SPACE (mid space) Y U+2006 SIX-PER-EM SPACE N U+2007 FIGURE SPACE Y U+2008 PUNCTUATION SPACE Y U+2009 THIN SPACE Y U+200A HAIR SPACE Y U+200B ZERO WIDTH SPACE N U+202F NARROW NO-BREAK SPACE N U+205F MEDIUM MATHEMATICAL SPACE Y U+3000 IDEOGRAPHIC SPACE N U+FEFF ZERO WIDTH NO-BREAK SPACE Maybe, the best would be looking into how unicode normalization treat them. But we Japanese would want U+3000 treated as space. [1] https://en.wikipedia.org/wiki/Non-breaking_space -- Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20220625015150.4f57017e7098ea591f57bd2a>