Date: Thu, 24 Mar 2022 09:31:33 -0600 From: Warner Losh <imp@bsdimp.com> To: "Rodney W. Grimes" <freebsd-rwg@gndrsh.dnsmgr.net> Cc: Phil Shafer <phil@freebsd.org>, FreeBSD Hackers <freebsd-hackers@freebsd.org>, "Simon J. Gerraty" <sjg@freebsd.org> Subject: Re: What's the locale for system files (e.g. /etc/fstab)? Message-ID: <CANCZdfp1oJdC2HfU63U_3y4y%2BQE0TswdVSg%2Big4uS3RJC3yK3w@mail.gmail.com> In-Reply-To: <202203241519.22OFJ3Mk098649@gndrsh.dnsmgr.net> References: <70B211BB-15BA-47A4-8F9C-C833AA8C1EAA@freebsd.org> <202203241519.22OFJ3Mk098649@gndrsh.dnsmgr.net>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Thu, Mar 24, 2022, 9:20 AM Rodney W. Grimes <
freebsd-rwg@gndrsh.dnsmgr.net> wrote:
> > On 23 Mar 2022, at 11:51, Piotr Pawel Stefaniak wrote:
> > > mount: make libxo support more locale-aware
> > >
> > > "special", "node", and "mounter" are not guaranteed to be encoded
> > > with
> > > UTF-8. Use the appropriate modifier.
> > >
> > > - xo_emit("{:special}{L: on }{:node}{L: (}{:fstype}",
> > > sfp->f_mntfromname,
> > > + xo_emit("{:special/%hs}{L: on }{:node/%hs}{L: (}{:fstype}",
> > > sfp->f_mntfromname,
> > sfp->f_mntonname, sfp->f_fstypename);
> >
> > This recent "mount" patch highlights a libxo-related problem for which I
> > don't have a solution:
> >
> > There are several files for which the encoding is not known. Since
> > locale is user specific, we don't know how to interpret the contents of
> > /etc/fstab. It's assumably been encoded with the format of the user who
> > wrote it, but that information is lost.
>
> Since you say "locale is user specific" it makes me want to say that
> this should come from the environment set by "default:" in /etc/login.conf,
> no need for a new file or anything special.
>
Config files, like fstab, have no locale and parsing them with a locale
leads to errors, even when the user or the system has a nondefault locale.
>
> > Put more generally, there's not a system-wide place which declares the
> > encoding for system files, which leads to this problem where we
> > interpret files from one user's locale using another user's locale.
>
> Well /etc/login.conf *IS* a system wide declaration of this type of
> stuff, both lang= and charset= are declared there.
>
Since system wide files like yhese are always parsed without a locale, this
information is correct, but I'm not sure how it applies.
It is always C.UTF-8. Anything else may, or may not, work based on
accidents of coincident encoding. Not everything can change locales, and
the fstab and other parsing routines in libc assume C.UTF-8 or even just
the ascii-7/8 subset.
>
> > One solution would a symlink in /etc that "points to" the name of the
> > current system-wide locale name.
> >
> > % ls -Fl /etc/locale
> > lrwxr-xr-x 1 root wheel 7 Mar 23 15:42 /etc/locale@ -> C.UTF-8
>
> grep lang /etc/login.conf:
> :lang=C.UTF-8:
> :lang=ru_RU.UTF-8:\
>
> Probably what you want?
>
You can get this with the locale routines, no? No need for grep.
Warner
>
> > (Or "/etc/system.locale" ?)
> >
> > If the symlink doesn't exist, would "C.UTF-8" be a suitable default
> > moving forwards? It certainly would not be backwards compatible, since
> > an existing fstab could have non-UTF-8 strings in it, encoded with the
> > locale of the user who touched the file. But there's really no
> > backwards compatible solution, given that there's no guarantee that (for
> > any specific FreeBSD system) all system files were written with the same
> > locale. Fun, eh? ;^)
> >
> > Opinions, thoughts, please?
> >
> > Thanks,
> > Phil
> >
> >
>
> --
> Rod Grimes
> rgrimes@freebsd.org
>
>
[-- Attachment #2 --]
<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 24, 2022, 9:20 AM Rodney W. Grimes <<a href="mailto:freebsd-rwg@gndrsh.dnsmgr.net">freebsd-rwg@gndrsh.dnsmgr.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">> On 23 Mar 2022, at 11:51, Piotr Pawel Stefaniak wrote:<br>
> > mount: make libxo support more locale-aware<br>
> ><br>
> > "special", "node", and "mounter" are not guaranteed to be encoded <br>
> > with<br>
> > UTF-8. Use the appropriate modifier.<br>
> ><br>
> > - xo_emit("{:special}{L: on }{:node}{L: (}{:fstype}", <br>
> > sfp->f_mntfromname,<br>
> > + xo_emit("{:special/%hs}{L: on }{:node/%hs}{L: (}{:fstype}", <br>
> > sfp->f_mntfromname,<br>
> sfp->f_mntonname, sfp->f_fstypename);<br>
> <br>
> This recent "mount" patch highlights a libxo-related problem for which I <br>
> don't have a solution:<br>
> <br>
> There are several files for which the encoding is not known. Since <br>
> locale is user specific, we don't know how to interpret the contents of <br>
> /etc/fstab. It's assumably been encoded with the format of the user who <br>
> wrote it, but that information is lost.<br>
<br>
Since you say "locale is user specific" it makes me want to say that<br>
this should come from the environment set by "default:" in /etc/login.conf,<br>
no need for a new file or anything special.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Config files, like fstab, have no locale and parsing them with a locale leads to errors, even when the user or the system has a nondefault locale. </div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> <br>
> Put more generally, there's not a system-wide place which declares the <br>
> encoding for system files, which leads to this problem where we <br>
> interpret files from one user's locale using another user's locale.<br>
<br>
Well /etc/login.conf *IS* a system wide declaration of this type of<br>
stuff, both lang= and charset= are declared there.<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Since system wide files like yhese are always parsed without a locale, this information is correct, but I'm not sure how it applies.</div><div dir="auto"><br></div><div dir="auto">It is always C.UTF-8. Anything else may, or may not, work based on accidents of coincident encoding. Not everything can change locales, and the fstab and other parsing routines in libc assume C.UTF-8 or even just the ascii-7/8 subset.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> <br>
> One solution would a symlink in /etc that "points to" the name of the <br>
> current system-wide locale name.<br>
> <br>
> % ls -Fl /etc/locale<br>
> lrwxr-xr-x 1 root wheel 7 Mar 23 15:42 /etc/locale@ -> C.UTF-8<br>
<br>
grep lang /etc/login.conf:<br>
:lang=C.UTF-8:<br>
:lang=ru_RU.UTF-8:\<br>
<br>
Probably what you want?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">You can get this with the locale routines, no? No need for grep.</div><div dir="auto"><br></div><div dir="auto">Warner</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
> <br>
> (Or "/etc/system.locale" ?)<br>
> <br>
> If the symlink doesn't exist, would "C.UTF-8" be a suitable default <br>
> moving forwards? It certainly would not be backwards compatible, since <br>
> an existing fstab could have non-UTF-8 strings in it, encoded with the <br>
> locale of the user who touched the file. But there's really no <br>
> backwards compatible solution, given that there's no guarantee that (for <br>
> any specific FreeBSD system) all system files were written with the same <br>
> locale. Fun, eh? ;^)<br>
> <br>
> Opinions, thoughts, please?<br>
> <br>
> Thanks,<br>
> Phil<br>
> <br>
> <br>
<br>
-- <br>
Rod Grimes <a href="mailto:rgrimes@freebsd.org" target="_blank" rel="noreferrer">rgrimes@freebsd.org</a><br>
<br>
</blockquote></div></div></div>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfp1oJdC2HfU63U_3y4y%2BQE0TswdVSg%2Big4uS3RJC3yK3w>
