Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Feb 2023 15:26:17 +0100
From:      Eivind Nicolay Evensen <eivinde@terraplane.org>
To:        Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
Cc:        stable@freebsd.org
Subject:   Re: Grep with non-ascii
Message-ID:  <20230203152617.00e01686@elg.hjerdalen.lokalnett>
In-Reply-To: <20230203203948.23d66303bcae8c528202071a@dec.sakura.ne.jp>
References:  <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <20230203203948.23d66303bcae8c528202071a@dec.sakura.ne.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
Den Fri, 3 Feb 2023 20:39:48 +0900
skrev Tomoaki AOKI <junchoon@dec.sakura.ne.jp>:

> On Fri, 3 Feb 2023 11:06:42 +0100
> Eivind Nicolay Evensen <eivinde@terraplane.org> wrote:
> 
> > Hello.
> > 
> > I just noticed this today:
> >   
> > elg!ene[~]> printf "bø\nhei\nøl\n" | grep ø  
> > grep: trailing backslash (\)  
> > elg!ene[~]> echo $LC_CTYPE $LANG  
> > nb_NO.ISO8859-1 nb_NO.ISO8859-1
> > 
> > While I have the result I envisioned with gnugrep:
> >   
> > elg!ene[~]> printf "bø\nhei\nøl\n" | ggrep ø  
> > bø
> > øl
> > 
> > Also, on OpenIndiana, linux and Netbsd, grep gives the proper
> > result.
> > 
> > Is lib/libc/regex the right place to look into this if I
> > find the time, or does anybody know this enough to know the
> > problem?
> > 
> > Regards
> > -- 
> > Eivind Nicolay Evensen  
> 
> Possibly a locale problem, or depending on what command line shell you
> are using.
> 
> Tried copy/pasting to command line, I got the result below.
> 
> % printf "bø\nhei\nøl\n" | grep ø
> bø
> øl
> 
> I'm using LC_ALL=ja_JP.UTF-8, LANG=ja_JP.UTF-8 as locale and
> shells/zsh as command line shell.
> 
> What happenes if you switch locale to nb_NO.UTF-8?
> 

Indeed seems like a locale problem, because it works when
I change it:

elg!ene[~]> grep ø
grep: trailing backslash (\)
(i select UTF-8 encoding in the xterm menu here)
elg!ene[~]> setenv LC_CTYPE nb_NO.UTF-8
elg!ene[~]> grep ø
zzz
æøå
æøå
^D

Perhaps for more of them, I just tried this (back to non-utf8 encoding in xterm):

elg!ene[~]> setenv LC_CTYPE sv_SE.ISO8859-1
elg!ene[~]> grep 
grep: trailing backslash (\)

and

elg!ene[~]> setenv LC_CTYPE de_DE.ISO8859-1
elg!ene[~]> grep 
grep: trailing backslash (\)
elg!ene[~]> grep 
grep: trailing backslash (\)
elg!ene[~]> 


-- 
Eivind Nicolay Evensen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230203152617.00e01686>