Date: Fri, 21 Apr 2023 12:38:05 +0200 From: Dimitry Andric <dim@FreeBSD.org> To: Ronald Klop <ronald-lists@klop.ws> Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, current@freebsd.org Subject: Re: find(1): I18N gone wild ? Message-ID: <C54AFCA2-1064-432D-9573-A231A6E4163E@FreeBSD.org> In-Reply-To: <564252502.12.1682071276296@mailrelay> References: <202304172106.33HL6RUX051407@critter.freebsd.dk> <564252502.12.1682071276296@mailrelay>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --] On 21 Apr 2023, at 12:01, Ronald Klop <ronald-lists@klop.ws> wrote: > Van: Poul-Henning Kamp <phk@phk.freebsd.dk> > Datum: maandag, 17 april 2023 23:06 > Aan: current@freebsd.org > Onderwerp: find(1): I18N gone wild ? > This surprised me: > > # mkdir /tmp/P > # cd /tmp/P > # touch FOO > # touch bar > # env LANG=C.UTF-8 find . -name '[A-Z]*' -print > ./FOO > # env LANG=en_US.UTF-8 find . -name '[A-Z]*' -print > ./FOO > ./bar > > Really ?! ... > My Mac and a Linux server only give ./FOO in both cases. Just a 2 cents remark. Same here. However, I have read that with unicode, you should *never* use [A-Z] or [0-9], but character classes instead. That seems to give both files on macOS and Linux with [[:alpha:]]: $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print ./BAR ./foo and only the lowercase file with [[:lower:]]: $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print ./foo But on FreeBSD, these don't work at all: $ LANG=en_US.UTF-8 find . -name '[[:alpha:]]*' -print <nothing> $ LANG=en_US.UTF-8 find . -name '[[:lower:]]*' -print <nothing> This is an interesting rabbit hole... :) -Dimitry [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.2 iF0EARECAB0WIQR6tGLSzjX8bUI5T82wXqMKLiCWowUCZEJnjQAKCRCwXqMKLiCW o1q/AJ9GDBFlhlXhv7jPnhbEdImI8MKrjACfefJ7A7gkn2K2LVHkevKiXtA/7sk= =5KGL -----END PGP SIGNATURE-----help
Want to link to this message? Use this
URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?C54AFCA2-1064-432D-9573-A231A6E4163E>
