Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 24 Mar 2022 13:12:10 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        "Simon J. Gerraty" <sjg@juniper.net>
Cc:        "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net>, Phil Shafer <phil@freebsd.org>,  FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: What's the locale for system files (e.g. /etc/fstab)?
Message-ID:  <CANCZdfrZjeU_%2BLRew9BOCdktDi3aTUoeEaBkrov9FccvwfaN0g@mail.gmail.com>
In-Reply-To: <71356.1648139436@kaos.jnpr.net>
References:  <70B211BB-15BA-47A4-8F9C-C833AA8C1EAA@freebsd.org> <202203241519.22OFJ3Mk098649@gndrsh.dnsmgr.net> <CANCZdfp1oJdC2HfU63U_3y4y%2BQE0TswdVSg%2Big4uS3RJC3yK3w@mail.gmail.com> <71356.1648139436@kaos.jnpr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
--0000000000002b090005dafba03c
Content-Type: text/plain; charset="UTF-8"

On Thu, Mar 24, 2022, 10:30 AM Simon J. Gerraty <sjg@juniper.net> wrote:

> Warner Losh <imp@bsdimp.com> wrote:
> > Config files, like fstab, have no locale and parsing them with a locale
> leads to errors, even when the user or the system has a nondefault locale.
> >
> > >
> > > Put more generally, there's not a system-wide place which declares the
> > > encoding for system files, which leads to this problem where we
> > > interpret files from one user's locale using another user's locale.
> >
> > Well /etc/login.conf *IS* a system wide declaration of this type of
> > stuff, both lang= and charset= are declared there.
> >
> > Since system wide files like yhese are always parsed without a locale,
> this information is correct, but I'm not sure how it applies.
> >
> > It is always  C.UTF-8. Anything else may, or may not, work based on
> accidents of coincident encoding. Not everything can change locales, and
> the fstab and other parsing routines in libc assume C.UTF-8 or even just
> the ascii-7/8 subset.
> >
> > >
> > > One solution would a symlink in /etc that "points to" the name of the
> > > current system-wide locale name.
> > >
> > > % ls -Fl /etc/locale
> > > lrwxr-xr-x  1 root  wheel  7 Mar 23 15:42 /etc/locale@ -> C.UTF-8
> >
> > grep lang /etc/login.conf:
> >         :lang=C.UTF-8:
> >         :lang=ru_RU.UTF-8:\
> >
> > Probably what you want?
>
> I doubt it, one is from the entry for Russian users ;-)
>
> >
> > You can get this with the locale routines, no? No need for grep.
>
> I suspect not.
>
> AFAIK virtually everything about locale support tells you about the
> locale for the current process - which does not necessarily inform you
> of the locale that was in effect when a system file was last edited.
>
> I don't even know if it is guaranteed that everything that reads system
> files groks random locales - or what happens when you have 3 admins each
> prefering a different locale, do different entries in fstab for example
> get impacted and the result thus not readable by anyone?
>
> There's probably something to be said for enforcing something like
> C.UTF-8 for system files.
>

That is the primary reason for system files always being C.UTF-8... There
is no way to tag it as anything else... and some of these files are often
parsed from a context that can't set the locale, like the boot loader or
the kernel... also, these files have a format that was defined back in the
7bit ascii time frame. They also don't make use of the text in a way that
isn't literal...

Having said that, I'm unsure how you'd mount /<kanji-for-neko> from fstab,
or if that is well defined. The kernel just presents a string of bytes not
containing /...

Warner

--sjg
>

--0000000000002b090005dafba03c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Thu, Mar 24, 2022, 10:30 AM Simon J. Gerraty &lt;<a=
 href=3D"mailto:sjg@juniper.net">sjg@juniper.net</a>&gt; wrote:<br></div><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex">Warner Losh &lt;<a href=3D"mailto:imp@bsdimp.c=
om" target=3D"_blank" rel=3D"noreferrer">imp@bsdimp.com</a>&gt; wrote:<br>
&gt; Config files, like fstab, have no locale and parsing them with a local=
e leads to errors, even when the user or the system has a nondefault locale=
.<br>
&gt; <br>
&gt; &gt;<br>
&gt; &gt; Put more generally, there&#39;s not a system-wide place which dec=
lares the<br>
&gt; &gt; encoding for system files, which leads to this problem where we<b=
r>
&gt; &gt; interpret files from one user&#39;s locale using another user&#39=
;s locale.<br>
&gt; <br>
&gt; Well /etc/login.conf *IS* a system wide declaration of this type of<br=
>
&gt; stuff, both lang=3D and charset=3D are declared there.<br>
&gt; <br>
&gt; Since system wide files like yhese are always parsed without a locale,=
 this information is correct, but I&#39;m not sure how it applies.<br>
&gt; <br>
&gt; It is always=C2=A0 C.UTF-8. Anything else may, or may not, work based =
on accidents of coincident encoding. Not everything can change locales, and=
 the fstab and other parsing routines in libc assume C.UTF-8 or even just t=
he ascii-7/8 subset.<br>
&gt; <br>
&gt; &gt;<br>
&gt; &gt; One solution would a symlink in /etc that &quot;points to&quot; t=
he name of the<br>
&gt; &gt; current system-wide locale name.<br>
&gt; &gt;<br>
&gt; &gt; % ls -Fl /etc/locale<br>
&gt; &gt; lrwxr-xr-x=C2=A0 1 root=C2=A0 wheel=C2=A0 7 Mar 23 15:42 /etc/loc=
ale@ -&gt; C.UTF-8<br>
&gt; <br>
&gt; grep lang /etc/login.conf:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0:lang=3DC.UTF-8:<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0:lang=3Dru_RU.UTF-8:\<br>
&gt; <br>
&gt; Probably what you want?<br>
<br>
I doubt it, one is from the entry for Russian users ;-)<br>
<br>
&gt; <br>
&gt; You can get this with the locale routines, no? No need for grep.<br>
<br>
I suspect not.<br>
<br>
AFAIK virtually everything about locale support tells you about the<br>
locale for the current process - which does not necessarily inform you<br>
of the locale that was in effect when a system file was last edited.<br>
<br>
I don&#39;t even know if it is guaranteed that everything that reads system=
<br>
files groks random locales - or what happens when you have 3 admins each <b=
r>
prefering a different locale, do different entries in fstab for example<br>
get impacted and the result thus not readable by anyone?<br>
<br>
There&#39;s probably something to be said for enforcing something like<br>
C.UTF-8 for system files.<br></blockquote></div></div><div dir=3D"auto"><br=
></div><div dir=3D"auto">That is the primary reason for system files always=
 being C.UTF-8... There is no way to tag it as anything else... and some of=
 these files are often parsed from a context that can&#39;t set the locale,=
 like the boot loader or the kernel... also, these files have a format that=
 was defined back in the 7bit ascii time frame. They also don&#39;t make us=
e of the text in a way that isn&#39;t literal...</div><div dir=3D"auto"><br=
></div><div dir=3D"auto">Having said that, I&#39;m unsure how you&#39;d mou=
nt /&lt;kanji-for-neko&gt; from fstab, or if that is well defined. The kern=
el just presents a string of bytes not containing /...</div><div dir=3D"aut=
o"><br></div><div dir=3D"auto">Warner=C2=A0</div><div dir=3D"auto"><br></di=
v><div dir=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x">
--sjg<br>
</blockquote></div></div></div>

--0000000000002b090005dafba03c--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfrZjeU_%2BLRew9BOCdktDi3aTUoeEaBkrov9FccvwfaN0g>