Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 27 Oct 2022 13:43:24 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 264275] sed complaining about trailing backslash when using Umlauts
Message-ID:  <bug-264275-227-YYiGRjxeCs@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-264275-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-264275-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D264275

Daniel Tameling <tamelingdaniel@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |tamelingdaniel@gmail.com

--- Comment #1 from Daniel Tameling <tamelingdaniel@gmail.com> ---
The error comes from trying to compile the umlaut as a regex. I managed to
create a small reproducer that just calls regcomp.

The error seems to come from this snippet in the p_simp_re function in
lib/libc/regex/regcomp.c:

  if ((c & BACKSL) =3D=3D 0 || may_escape(p, wc))
       ordinary(p, wc);
  else
       SETERROR(REG_EESCAPE);

Both checks in the if statement are false and thus we end up with the trail=
ing
backslash error. In may_escape this is the return statement that gets taken:

  if (isalpha(ch) || ch =3D=3D '\'' || ch =3D=3D '`')
      return (false);

ch is the wint_t representation of the umlaut, which is 0xe4. In
de_DE.ISO8859-1, the isalpha call returns true. (If I do it with an UTF8 =
=C3=A4 in
an UTF8 locale, ch becomes also 0xe4, but the isalpha call returns false, so
this doesn't trigger the trailing backslash error.)

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-264275-227-YYiGRjxeCs>