Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 3 Feb 2023 15:18:53 +0100
From:      Eivind Nicolay Evensen <eivinde@terraplane.org>
To:        Eugene Grosbein <eugen@grosbein.net>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Grep with non-ascii
Message-ID:  <20230203151853.02732bd6@elg.hjerdalen.lokalnett>
In-Reply-To: <819a4336-9689-bdbe-a90d-8f1d7b842662@grosbein.net>
References:  <20230203110642.70e4a076@elg.hjerdalen.lokalnett> <819a4336-9689-bdbe-a90d-8f1d7b842662@grosbein.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Den Fri, 3 Feb 2023 19:12:32 +0700
skrev Eugene Grosbein <eugen@grosbein.net>:

> 03.02.2023 17:06, Eivind Nicolay Evensen wrote:
> > Hello.
> > 
> > I just noticed this today:
> >   
> > elg!ene[~]> printf "bø\nhei\nøl\n" | grep ø  
> > grep: trailing backslash (\)  
> > elg!ene[~]> echo $LC_CTYPE $LANG  
> > nb_NO.ISO8859-1 nb_NO.ISO8859-1
> > 
> > While I have the result I envisioned with gnugrep:
> >   
> > elg!ene[~]> printf "bø\nhei\nøl\n" | ggrep ø  
> > bø
> > øl
> > 
> > Also, on OpenIndiana, linux and Netbsd, grep gives the proper
> > result.
> > 
> > Is lib/libc/regex the right place to look into this if I
> > find the time, or does anybody know this enough to know the
> > problem?  
> 
> Try single quotes instead of double quotes.
> And pleace specify system version and shell name, and shell version
> if its not in base system.

This is
elg!ene[~]> uname -a
FreeBSD elg.hjerdalen.lokalnett 13.2-PRERELEASE FreeBSD 13.2-PRERELEASE
#1: Tue Jan 31 11:23:29 CET 2023
ene@elg.hjerdalen.lokalnett:/usr/obj/usr/src/amd64.amd64/sys/ENE-spurv
amd64

Using the tcsh that comes with it. But I don't think the quotes matter
much because of this:

elg!ene[~]> grep ø
grep: trailing backslash (\)

The output was more just to have something to look for, like
with ggrep but anyway:

elg!ene[~]> printf 'bø\nhei\nøl\n' |grep ø
grep: trailing backslash (\)

And obviously:

elg!ene[~]> printf 'bø\nhei\nøl\n' 
bø
hei
øl

And it seems to be the same for any 8859-1 character not part
of ascii:

elg!ene[~]> grep ä
grep: trailing backslash (\)
elg!ene[~]> grep ß
grep: trailing backslash (\)
elg!ene[~]> grep ç
grep: trailing backslash (\)

-- 
Eivind Nicolay Evensen



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20230203151853.02732bd6>