Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 25 Jun 2022 01:51:50 +0900
From:      Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
To:        Hans Petter Selasky <hps@selasky.org>
Cc:        Ivan Quitschal <tezeka@hotmail.com>, "freebsd-current@freebsd.org" <freebsd-current@freebsd.org>, Kurt Jaeger <pi@freebsd.org>
Subject:   Re: vt newcons mouse paste issue FIXED
Message-ID:  <20220625015150.4f57017e7098ea591f57bd2a@dec.sakura.ne.jp>
In-Reply-To: <5196d98c-7b3a-55b4-3ef7-227b19b66721@selasky.org>
References:  <CP6P284MB1900CA1ED5B5BADE054ECB34CBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <f6c1ee1c-bdd9-c8d6-1385-145022e6765d@selasky.org> <CP6P284MB1900CC7B7F6343DAB1D1E5BCCBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <41ef5c38-515f-739a-cb47-7cab0e609526@selasky.org> <CP6P284MB1900DD3D6F41CBAF38CF2CA4CBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <20220623014847.067b18a5ba388639cf6009ce@dec.sakura.ne.jp> <fd0f9de9-98ac-87b4-2c9d-5fdc27bdb3c4@selasky.org> <CP6P284MB1900794465902392ACDA36C2CBB29@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <b868c9f4-0a2b-00b9-30e0-d612d17d4bba@selasky.org> <CP6P284MB1900578BBC31730413B30643CBB59@CP6P284MB1900.BRAP284.PROD.OUTLOOK.COM> <790bd76d-890f-cf09-a30d-c2e5fba91ec5@selasky.org> <20220624230215.82e02ac661cd7594624a1845@dec.sakura.ne.jp> <5bd74766-f2f0-3df2-0e8c-adabd110f913@selasky.org> <5196d98c-7b3a-55b4-3ef7-227b19b66721@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 24 Jun 2022 17:29:26 +0200
Hans Petter Selasky <hps@selasky.org> wrote:

> Hi Tomoaki,
> 
> On 6/24/22 16:48, Hans Petter Selasky wrote:
> > IDEOGRAPHIC (Full-width) SPACE
> 
> According to this page:
> 
> https://jkorpela.fi/chars/spaces.html
> 
> There are multiple uni-code characters which are spaces. Should we 
> support them all?
> 
> --HPS

Nice page!

Maybe not all. My guess based on "Sample" and "Width of the character"
fields are as below. At the first column,

'Y': Should be treated as space / word separator
'N': Should NOT be treated as space / word separator
'U': Unknown for me. Need native speaker to determine. 

Maybe someone have objections, but basically I've considered breakable
spaces as space characters. See also URL [1] below.

  Special cases:
    *Looking sample, U+1680 is shown as dash so considered 'N'.
    *Treated "QUAD" as just a graphical (non-semantic) use so
     considered as 'N'.
    *Considered U+205F as 'N', as I thought, for mathematical usage,
     unintended line break could cause fatal confusion.


  Code   Name of the character
Y U+0020 SPACE
N U+00A0 NO-BREAK SPACE
N U+1680 OGHAM SPACE MARK
Y U+180E MONGOLIAN VOWEL SEPARATOR
N U+2000 EN QUAD
N U+2001 EM QUAD
Y U+2002 EN SPACE (nut)
Y U+2003 EM SPACE (mutton)
Y U+2004 THREE-PER-EM SPACE (thick space)
Y U+2005 FOUR-PER-EM SPACE (mid space)
Y U+2006 SIX-PER-EM SPACE
N U+2007 FIGURE SPACE
Y U+2008 PUNCTUATION SPACE
Y U+2009 THIN SPACE
Y U+200A HAIR SPACE
Y U+200B ZERO WIDTH SPACE
N U+202F NARROW NO-BREAK SPACE
N U+205F MEDIUM MATHEMATICAL SPACE
Y U+3000 IDEOGRAPHIC SPACE
N U+FEFF ZERO WIDTH NO-BREAK SPACE

Maybe, the best would be looking into how unicode normalization treat
them. But we Japanese would want U+3000 treated as space.


[1] https://en.wikipedia.org/wiki/Non-breaking_space

-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20220625015150.4f57017e7098ea591f57bd2a>