Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Jan 2024 11:29:25 -0800
From:      Xin LI <delphij@gmail.com>
To:        Olivier Certner <olce@freebsd.org>
Cc:        Xin LI <delphij@freebsd.org>, Mike Karels <mike@karels.net>, src-committers@freebsd.org,  dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org
Subject:   Re: git: 2f036705f337 - main - Document the two recent newsyslog(8) change (-c option and <compress> configuration option).
Message-ID:  <CAGMYy3tzXv%2Bp7CCAvNU5YQxoia6Thn3pazkc_xSZYfHN=tctEw@mail.gmail.com>
In-Reply-To: <2683023.poxlI1A5LX@ravel>
References:  <202312290846.3BT8kOiO029918@gitrepo.freebsd.org> <90D0905E-AA46-4351-AEE0-9ED9D835DB50@karels.net> <2683023.poxlI1A5LX@ravel>

next in thread | previous in thread | raw e-mail | index | archive | help
--000000000000b760b8060e8855d6
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi, Olivier,

On Tue, Jan 9, 2024 at 2:19=E2=80=AFAM Olivier Certner <olce@freebsd.org> w=
rote:

> [...]
> > Sorry not to have noticed this in the review; it was only when I saw th=
is
> > message that it sunk in that we now have *three* ways to specify
> compression,
> > and I'm not even sure what the precedence is.  I would have thought tha=
t
> > <compress> would replace -c.  It's a mess if the config file has entrie=
s
> > that specify J and X flags as well as none, the config file has
> > <compress> zstd, and the -c option is given as well.  We now have a kno=
b
> > to override the knob to override a knob. The only reason to keep -c tha=
t
> > I can think of is to specify a different compression in a single
> invocation,
> > but as noted, changing compression requires manual operations that make
> > it unreasonable to change it invocation by invocation.
>
> I agree.  Two possibilies that I can think of from here: Remove '-c' or
> make it enable compression regardless of the log files' individual settin=
gs.
>

I am open to removing '-c'.

Could you please clarify what you mean by "make it enable compression" --
did you mean that we mark all log files to be compressible?  (It's probably
not a good idea as some "log" files may be binary and not really
compressible).

>
> > I still think it would be much better to add an option letter to select
> > the default compression as specified by <compress>.  This would elimina=
te
> > the need for "legacy", and it would add the ability to have both a glob=
al
> > default and an exception.  I think the redefinition of the existing fla=
gs
> > to have different meanings if <compress> is given is messy.
>
> I didn't think about that at first.  I agree.
>
> If people want to be able to override compression settings globally, whic=
h
> I find useful, one could introduce another directive such as
> <compress_override> taking a boolean to request to apply the <compress>
> option regardless of the individual compression letters.
>
> Another possibility is just to rename "<compress>" to
> "<compress_override>" (so, this time, not a boolean) and keep its current
> behavior.  This would match one of the suggestions above about '-c', but
> then there's the question of which one takes precedence, and I think that
> the command-line specification should prevail (for practical purposes and
> POLA).
>
> > The entry for -c says that we plan to change the default to "none" in
> 15.0.
> > Hopefully that would be done via <compress> and not -c.  However, there
> > was significant pushback on "none" being the default.
>
> I think the default should be "no <compress_override>", i.e., no
> directive.  This may plea for having "none" mean "don't change anything"
> (as if the directive wasn't there) and have something else to deactivate
> compression, such as "no_compression" (which is really an override).  If
> "none" is confusing, then just forego it completely, and have 'newsyslog'
> plain fail on it (but keep "no_compression" as just described).
>
> If there is consensus, I'd then change the 'J' flag currently used for al=
l
> log files to the new chosen flag for generic compression, and have
> <compress_override> set to "bzip2" in a first step (for POLA).  Then, it
> could be changed to something else, e.g., 'zstd'.
>
> Setting it to 'none' seems to me the worst solution (but far from being
> the end of the world).
>

Changing the meaning of all four legacy compression type letters to "file
is compressible" is part of the intention.  The goal is to discourage using
them as a way to specify a compression type, in favor of using the
administrator configured value.

That's said, 'none' is a reasonable default in many ways as explained
before (it makes grep'ing easier, compression is not really that helpful in
the modern world because hard drives are larger than the 90's and it
reduces the times data gets rewritten to SSDs and avoids hourly CPU load
bursts for busy systems).

'bzip2' could be a good second best default (because for most
configurations it's how the log files are compressed with today's
defaults), but if the administrator has already configured their systems to
use a different method, this would break their configuration anyways.


> More deeply, I remember having seen at least two claims that using
> filesystem's compression is better, without arguments.  I don't agree wit=
h
> that in practice.  The only advantage of in-filesystem compression, besid=
es
> the administrative simplification that you can also get with the override
> above, is to get O(1) random access to big log files, and I don't see any
> compelling and common use case for it.  You certainly want to get to the
> end of the current log quickly, but that one precisely is not handled by
> 'newsyslog' and stays uncompressed (at the application level).  When you
> want to search for strings or patterns, you have to grep the whole file
> anyway.  You may want to immediately reach the end of some historical log
> file, e.g., when manually going back in time from the current log, but th=
is
> should have negligible latency, and if it doesn't, than just use more and
> smaller log archives.  Same thing if you have a more sophisticated setup
> with an index of log text: Jumping to a particular location in the log fi=
le
> should have negligible latency, else apply the same recipe.  If your setu=
p
> with index requires a single, never rotated, log file, then you're not ev=
en
> using 'newsyslog' in the first place (or should not).  Although I agree
> that in this case using a compressed filesystem (or a randomly accessible
> archive) can make sense (if your index doesn't already cover the results
> expected from your searches), I very much doubt this is a common setup.
>

There are other benefits of not compressing rotated logs.  For busy
systems, the hourly newsyslog run would process larger logs and cause CPU
workload bursts.

And when logs are compressed, the data is read back and compressed data is
rewritten to disk / SSDs, causing additional wear of the flash storage, and
all that comes with no significant benefit for modern hardware.

(I don't think it's common to have log files indexed after rotation; a more
common use case would be to use [u]grep to look up for a certain pattern).


> Moreover, using in-filesystem compression can lead to degrading the
> compression ratio, since the compression method on ZFS is chosen per
> dataset, which includes a bunch of other files and use cases preventing t=
he
> administrator from choosing the best, and slowest, compression methods.  =
To
> avoid this problem, one can use a separate dataset for /var/log (anyone?)=
,
> but changing this on already running systems is a greater burden than jus=
t
> changing the compression settings in the 'newsyslog' configuration files.
>

Yes, and that's not a big concern.  Achieving the maximum compression ratio
is probably never the goal for most scenarios (not limited to logs, but
also other places) where compression is used, and one always has to balance
between the cost and benefit.

If the person is distributing a release image to many thousands of users
over the Internet, it would make a lot of sense to try the best compression
for an 5% reduction of size because that adds up to the bandwidth cost and
optimizes the experience for users, but it doesn't make as much sense to
save, let's say a few MBs of disk space at the expense of spending a few
more minutes every hour, the added "bursts" of slower response time for a
server, and that's usually undesirable for production.

Cheers,

--000000000000b760b8060e8855d6
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"fon=
t-family:monospace,monospace">Hi, Olivier,</div></div><br><div class=3D"gma=
il_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Tue, Jan 9, 2024 at 2:19=
=E2=80=AFAM Olivier Certner &lt;<a href=3D"mailto:olce@freebsd.org">olce@fr=
eebsd.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex"><span class=3D"gmail_default" style=3D"font-family:monospace,mon=
ospace">[...]</span><br>
&gt; Sorry not to have noticed this in the review; it was only when I saw t=
his<br>
&gt; message that it sunk in that we now have *three* ways to specify compr=
ession,<br>
&gt; and I&#39;m not even sure what the precedence is.=C2=A0 I would have t=
hought that<br>
&gt; &lt;compress&gt; would replace -c.=C2=A0 It&#39;s a mess if the config=
 file has entries<br>
&gt; that specify J and X flags as well as none, the config file has<br>
&gt; &lt;compress&gt; zstd, and the -c option is given as well.=C2=A0 We no=
w have a knob<br>
&gt; to override the knob to override a knob. The only reason to keep -c th=
at<br>
&gt; I can think of is to specify a different compression in a single invoc=
ation,<br>
&gt; but as noted, changing compression requires manual operations that mak=
e<br>
&gt; it unreasonable to change it invocation by invocation.<br>
<br>
I agree.=C2=A0 Two possibilies that I can think of from here: Remove &#39;-=
c&#39; or make it enable compression regardless of the log files&#39; indiv=
idual settings.<br></blockquote><div><br></div><div><div class=3D"gmail_def=
ault" style=3D"font-family:monospace,monospace">I am open to removing &#39;=
-c&#39;.</div><div class=3D"gmail_default" style=3D"font-family:monospace,m=
onospace"><br></div><div class=3D"gmail_default" style=3D"font-family:monos=
pace,monospace">Could you please clarify what you mean by &quot;make it ena=
ble compression&quot; -- did you mean that we mark all log files to be comp=
ressible?=C2=A0 (It&#39;s probably not a good idea as some &quot;log&quot; =
files may be binary and not really compressible).<span style=3D"font-family=
:Arial,Helvetica,sans-serif">=C2=A0</span></div></div><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20=
4,204,204);padding-left:1ex">
<br>
&gt; I still think it would be much better to add an option letter to selec=
t<br>
&gt; the default compression as specified by &lt;compress&gt;.=C2=A0 This w=
ould eliminate<br>
&gt; the need for &quot;legacy&quot;, and it would add the ability to have =
both a global<br>
&gt; default and an exception.=C2=A0 I think the redefinition of the existi=
ng flags<br>
&gt; to have different meanings if &lt;compress&gt; is given is messy.<br>
<br>
I didn&#39;t think about that at first.=C2=A0 I agree.<br>
<br>
If people want to be able to override compression settings globally, which =
I find useful, one could introduce another directive such as &lt;compress_o=
verride&gt; taking a boolean to request to apply the &lt;compress&gt; optio=
n regardless of the individual compression letters.<br>
<br>
Another possibility is just to rename &quot;&lt;compress&gt;&quot; to &quot=
;&lt;compress_override&gt;&quot; (so, this time, not a boolean) and keep it=
s current behavior.=C2=A0 This would match one of the suggestions above abo=
ut &#39;-c&#39;, but then there&#39;s the question of which one takes prece=
dence, and I think that the command-line specification should prevail (for =
practical purposes and POLA).<br>
<br>
&gt; The entry for -c says that we plan to change the default to &quot;none=
&quot; in 15.0.<br>
&gt; Hopefully that would be done via &lt;compress&gt; and not -c.=C2=A0 Ho=
wever, there<br>
&gt; was significant pushback on &quot;none&quot; being the default.<br>
<br>
I think the default should be &quot;no &lt;compress_override&gt;&quot;, i.e=
., no directive.=C2=A0 This may plea for having &quot;none&quot; mean &quot=
;don&#39;t change anything&quot; (as if the directive wasn&#39;t there) and=
 have something else to deactivate compression, such as &quot;no_compressio=
n&quot; (which is really an override).=C2=A0 If &quot;none&quot; is confusi=
ng, then just forego it completely, and have &#39;newsyslog&#39; plain fail=
 on it (but keep &quot;no_compression&quot; as just described).<br>
<br>
If there is consensus, I&#39;d then change the &#39;J&#39; flag currently u=
sed for all log files to the new chosen flag for generic compression, and h=
ave &lt;compress_override&gt; set to &quot;bzip2&quot; in a first step (for=
 POLA).=C2=A0 Then, it could be changed to something else, e.g., &#39;zstd&=
#39;.<br>
<br>
Setting it to &#39;none&#39; seems to me the worst solution (but far from b=
eing the end of the world).<br></blockquote><div><br></div><div><div class=
=3D"gmail_default" style=3D"font-family:monospace,monospace">Changing the m=
eaning of all four legacy compression type letters to &quot;file is compres=
sible&quot; is part of the intention.=C2=A0 The goal is to discourage using=
 them as a way to specify a compression type, in favor of using the adminis=
trator configured value.</div></div><div class=3D"gmail_default" style=3D"f=
ont-family:monospace,monospace"><br></div><div class=3D"gmail_default" styl=
e=3D"font-family:monospace,monospace">That&#39;s said, &#39;none&#39; is a =
reasonable default in many ways as=C2=A0explained before (it makes grep&#39=
;ing easier, compression is not really that helpful in the modern world bec=
ause hard drives are larger than the 90&#39;s and it reduces the times data=
 gets rewritten to SSDs and=C2=A0avoids hourly CPU load bursts for busy sys=
tems).</div><div class=3D"gmail_default" style=3D"font-family:monospace,mon=
ospace"><br></div><div class=3D"gmail_default" style=3D"font-family:monospa=
ce,monospace">&#39;bzip2&#39; could be a good second best default (because =
for most configurations it&#39;s how the log files are compressed with toda=
y&#39;s defaults), but if the administrator has already configured their sy=
stems to use a different method, this would break their configuration anywa=
ys.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
>More deeply, I remember having seen at least two claims that using filesys=
tem&#39;s compression is better, without arguments.=C2=A0 I don&#39;t agree=
 with that in practice.=C2=A0 The only advantage of in-filesystem compressi=
on, besides the administrative simplification that you can also get with th=
e override above, is to get O(1) random access to big log files, and I don&=
#39;t see any compelling and common use case for it.=C2=A0 You certainly wa=
nt to get to the end of the current log quickly, but that one precisely is =
not handled by &#39;newsyslog&#39; and stays uncompressed (at the applicati=
on level).=C2=A0 When you want to search for strings or patterns, you have =
to grep the whole file anyway.=C2=A0 You may want to immediately reach the =
end of some historical log file, e.g., when manually going back in time fro=
m the current log, but this should have negligible latency, and if it doesn=
&#39;t, than just use more and smaller log archives.=C2=A0 Same thing if yo=
u have a more sophisticated setup with an index of log text: Jumping to a p=
articular location in the log file should have negligible latency, else app=
ly the same recipe.=C2=A0 If your setup with index requires a single, never=
 rotated, log file, then you&#39;re not even using &#39;newsyslog&#39; in t=
he first place (or should not).=C2=A0 Although I agree that in this case us=
ing a compressed filesystem (or a randomly accessible archive) can make sen=
se (if your index doesn&#39;t already cover the results expected from your =
searches), I very much doubt this is a common setup.<br></blockquote><div><=
br></div><div><div class=3D"gmail_default" style=3D"font-family:monospace,m=
onospace">There are other benefits of not compressing rotated logs.=C2=A0 F=
or busy systems, the hourly newsyslog run would process larger logs and cau=
se CPU workload bursts.</div></div><div class=3D"gmail_default" style=3D"fo=
nt-family:monospace,monospace"><br></div><div class=3D"gmail_default" style=
=3D"font-family:monospace,monospace">And when logs are compressed, the data=
 is read back and compressed data is rewritten to disk / SSDs, causing addi=
tional wear of the flash storage, and all that comes with no significant be=
nefit for modern hardware.</div><div class=3D"gmail_default" style=3D"font-=
family:monospace,monospace"><br></div><div class=3D"gmail_default" style=3D=
"font-family:monospace,monospace">(I don&#39;t think it&#39;s common to hav=
e log files indexed after rotation; a more common use case would be to use =
[u]grep to look up for a certain pattern).</div><div>=C2=A0</div><blockquot=
e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s=
olid rgb(204,204,204);padding-left:1ex">
Moreover, using in-filesystem compression can lead to degrading the compres=
sion ratio, since the compression method on ZFS is chosen per dataset, whic=
h includes a bunch of other files and use cases preventing the administrato=
r from choosing the best, and slowest, compression methods.=C2=A0 To avoid =
this problem, one can use a separate dataset for /var/log (anyone?), but ch=
anging this on already running systems is a greater burden than just changi=
ng the compression settings in the &#39;newsyslog&#39; configuration files.=
<br></blockquote><div><br></div><div><div class=3D"gmail_default" style=3D"=
font-family:monospace,monospace">Yes, and that&#39;s not a big concern.=C2=
=A0 Achieving the maximum compression ratio is probably never the goal for =
most scenarios (not limited to logs, but also other places) where compressi=
on is used, and one always has to balance between the cost and benefit.</di=
v></div><div><div class=3D"gmail_default" style=3D"font-family:monospace,mo=
nospace"><br></div><div class=3D"gmail_default" style=3D"font-family:monosp=
ace,monospace">If the person is distributing a release image to many thousa=
nds of users over the Internet, it would make a lot of sense to try the bes=
t compression for an 5% reduction of size because that adds up to the bandw=
idth cost and optimizes the experience for users, but it doesn&#39;t make a=
s much sense to save, let&#39;s say a few MBs of disk space at the expense =
of spending a few more minutes every hour, the added &quot;bursts&quot; of =
slower response time for a server, and that&#39;s usually undesirable for p=
roduction.</div></div><div class=3D"gmail_default" style=3D"font-family:mon=
ospace,monospace"><br></div><div class=3D"gmail_default" style=3D"font-fami=
ly:monospace,monospace">Cheers,</div></div></div>

--000000000000b760b8060e8855d6--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAGMYy3tzXv%2Bp7CCAvNU5YQxoia6Thn3pazkc_xSZYfHN=tctEw>