Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 23 Jan 2019 00:15:03 +0100
From:      =?UTF-8?B?VMSzbA==?= Coosemans <tijl@FreeBSD.org>
To:        soralx@cydem.org, tcberner@FreeBSD.org
Cc:        rigoletto@FreeBSD.org, svn-ports-all@FreeBSD.org
Subject:   Re: svn commit: r490800 - in head/net-p2p: transmission-cli[...]
Message-ID:  <20190123001503.7f9c494e@kalimero.tijl.coosemans.org>
In-Reply-To: <20190122043000.3fcc2340@mscad14>
References:  <20190122043000.3fcc2340@mscad14>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, 22 Jan 2019 04:30:00 -0800 <soralx@cydem.org> wrote:
>>> New Revision: 490800
>>> URL: https://svnweb.freebsd.org/changeset/ports/490800
>>> 
>>> Log:
>>>   net-p2p/transmission-cli: change transmission's data size unit
>>>   conversion factors from 1000 to 1024, to match FreeBSD's blocksize. 
>> What blocksize?  The disk block sizes are determined by the hardware so  
> 
> # env | grep BLOCKSIZE
> BLOCKSIZE=K
> # uname -r
> 11.2-STABLE
> 
> The 'K' stands for "kilo[bytes]", where 1K = 1024 [bytes] on FreeBSD.
> This is the default on 9, 10, 11, 12, and 13-CURRENT (I forget when
> BLOCKSIZE was changed from 512 to 1024... it was lo-o-ong time ago).
> Most basic tools like `ls` and `df` care about BLOCKSIZE.

And Linux defaults to 1024 too and Linux ls uses K, M and G as well.

>> FreeBSD isn't any different from Linux.  Why should the application
>> behave differently on FreeBSD than it does on Linux?  
> 
> Um... Because FreeBSD is not a Linux distribution? It is an operating
> system with its own style & philosophy -- and I think it would do much
> good for user experience to put some effort into keeping programs from
> ports & the base OS consistent? FreeBSD is all about consistency, no?
> 
> Pardon the bitterness, but I honestly fail to understand the argument
> "that's how it works on Linux, and they're fine -- so we should do it
> like that too!".

I didn't say FreeBSD should be like Linux.  I said the application should
be like it is on Linux (and everywhere else).  To make it different is
confusing/surprising, for users as well as for upstream developers that
receive bug reports from FreeBSD users.  I also suspect your change affects
some values in the configuration file which means the same configuration
file now has different behaviour on FreeBSD vs Linux.  That's not good.

>>> [...]
>>> + #define MEM_G_STR "GiB"
>>> + #define MEM_T_STR "TiB"  
>> 
>> Since they use GiB and TiB here...  
>> 
>>> +-#define DISK_K 1000
>>> ++#define DISK_K 1024
>>> + #define DISK_B_STR   "B"
>>> + #define DISK_K_STR "kB"
>>> + #define DISK_M_STR "MB"
>>> + #define DISK_G_STR "GB"
>>> + #define DISK_T_STR "TB"  
>> 
>> ...you should use KiB, MiB, GiB and TiB here...  
> 
> Good point: I agree that measurement units should be consistent.
> I wanted to keep changes to the minimum (plus I do not care so much
> about displayed units personally, as long as the numbers are correct),
> so that's why I didn't bother changing these. But the patch certainly
> can be improved, yes.
> 
> There are no prefixes "ki", "Mi", "Gi", etc. in FreeBSD, so I think
> we should simply change the MEM_* units to be similar to DISK_* and
> SPEED_* ("kB", "MB", "GB"...). Honestly, how may times you've heard
> anyone say: "There are 16 gibibytes of RAM in that machine!"?

They are used by pkg.  The binary prefixes have been adopted by all the
relevant standards bodies in the world.  More and more software has
started to adopt them.  Changing correct use of the prefixes with
incorrect use is a regression.

FreeBSD is not a good example to follow in this case.  I was reading
gpart(8) a while ago and it uses both kB and KB when it means KiB and it
even claims to support SI units when it doesn't because it always uses
multiples of 1024.  For someone who has always used SI claiming to support
SI and giving an example that uses the value 4k where that does not equal
4000 is highly confusing.

>>> +-#define SPEED_K 1000
>>> ++#define SPEED_K 1024
>>> + #define SPEED_B_STR  "B/s"
>>> + #define SPEED_K_STR "kB/s"
>>> + #define SPEED_M_STR "MB/s"  
>> 
>> ...and here as well (although using 1024 for bandwidth is weird).  
> 
> Notice that bandwidth is displayed in bytes/s, not bits/s. Applying
> SI prefixes to bps is desirable, but measuring throughput in 1000's
> of BYTES/s would be weird and unexpected.

Yet that's what pkg uses, what MacOS uses and Ubuntu gives the example of
50kB/s at https://wiki.ubuntu.com/UnitsPolicy but I don't know if they do
this consistently.  As far as I can tell the trend is to use decimal
prefixes as much as possible, even for disk sizes and file sizes.

>> But really I think you should revert this change.  
> 
> I am a long-time user of transmission, and I've been patching it
> locally ever since they changed the scaling factor; I've got so
> many machines that patching locally wastes too much time, so I
> did my duty to improve the port, and rolled a patch & submitted
> it. Now you are telling me I've wasted my time, because I alone
> would want such a change?

I'm a transmission user too and now I have to keep a local change that
removes your patch.  Your change is just a personal preference.  A
bikeshed really.  You want red and I want blue.

The ports tree is just the wrong place to make such changes.  You need to
discuss this with the transmission developers.  Maybe they'll add an option
to the preferences dialog.

> How about this: we patch the port to adapt T. better to FreeBSD,
> enable the fix by default? Then, if there are unhappy users who
> care enough to complain and/or send a patch, the option can be
> turned off by default.

This isn't workable.  Imagine what a mess we would have if everybody
could have their own option with their local patches.  In 30000 ports.

>> If the submitter wants
>> this he can discuss that with the transmission developers.  
> 
> Point is to adapt transmission to work on FreeBSD, not to convince
> the program's developers to change their ways. Isn't the advantage
> of ports being able to customize?

No, the advantage is to have one place to get all software and one way to
install it.  Patches should be kept to a minimum especially patches that
change the meaning of things.  Every patch is an additional maintenance
burden.  You didn't want to keep carrying that burden and now it got
pushed to somebody else.

> Below I include a message previously sent to Alexandre that goes
> into more detail about the issue.
> 
> ===================================8<===================================
>> I need to wait my mentor to approve it (or not) but if the patch get 
>> approved I will just merge the patch itself making it default without
>> the UNITS OPTION.    
> 
> I thought this was obvious, but perhaps I should explain.
> 
> In FreeBSD, we don't use SI prefixes [which apply to physical measures]
> for scaling digital data [which is not physical, thus has nothing to do
> with SI]; rather, we use traditional binary units. So, we have a base
> unit of bytes, and derived units of kilobytes, megabytes, etc. -- and
> to convert between them, a factor of 1024 is used (or 512, in case of
> sectors).

SI prefixes and SI units are two different things.  The prefixes are
dimensionless multipliers.  They can be applied to any quantity, including
bits and bytes.

> Transmission, on the other hand, incorrectly applies prefixes that carry
> a scale factor of 1000 (perhaps to match the behavior of the OSes it was
> written for?) to the base unit "byte", which produces an error of +2.4%;
> notably, this error multiplies when applying the conversion multiple
> times -- so when you're dealing with gigabytes of data, for instance,
> the error becomes 7.4%.
> 
> As a practical example, if I set a bandwidth limit of 512 "KB"/s in
> unpatched transmission, the actual limit will be 500 KB/s -- not a big
> difference, but technically not what I've asked for.

Transmission doesn't use KB/s anywhere.  It uses kB/s and interprets it
as 1000 B/s which is correct.  With your patch it's still using kB/s but
interprets it as 1024 B/s which is incorrect.

> If have a torrent
> that tr.-cli tells me is 40GB, but `ls -alfh` will show that its actual
> size is 37GB -- already a 3GB difference! And, let us say, I've got 222
> torrents, summing to 3.03TB total according to t-cli, then their actual
> size is 2.75TB, for a 0.25TB difference! huge! 3TB of data will not fit
> on a 3 "TB" disk, while 2.7TB might; it's a qualitative difference.

Except there is no difference, only the units change.  You're just
misinterpreting GB as GiB.  Your 3TB torrents will fit on a 3TB disk (if
you dd them as a tarball onto the disk without using any filesystem).

> IMO, using SI for digital data is misguided, as, again, data is not
> physical nor analog (i.e., you would not re-use the unit "byte" when
> scaling below 1; for ex., there is no such thing as a microbyte), so
> I think that we should not let transmission be trendy and fashionable
> on FreeBSD, but instead fix it to mach the way the OS calculates data
> sizes.

Being standards compliant has nothing to do with being trendy and
fashionable.

> [Note that a byte is itself a derived unit, being made of 8 bits most
> commonly, but as far as the OS is concerned, it _is_ a base unit of
> data storage. So while it makes a lot of sense to apply SI prefixes
> to bits, the same cannot be said about bytes; the difference is that
> bytes measure quantity of data always, while bits can measure amount
> of information.]

So how many MiB go in 1.6 TiB and how many MB go in 1.6 TB?  How much is
1 TiB + 100 MiB and how much is 1 TB + 100 MB?  Why do you prefer what's
difficult to calculate over what's easy to calculate?

> Thus, I believe the UNITS option should not only be included, but
> also made default, to make transmission more compatible with *BSD.
> ===================================8<===================================
> 
> P.S.:
>  I live in North America, but I use SI units for physical measures.
>  Why? is SI inherently better? Who cares; SI is _international_, and
>  using SI reduces *ambiguity*. That's what matters. So, are you still
>  in support of introducing multiple units for measuring data in bytes?
> 
>  Well, which one to settle on? Is *BSD currently wrong? Consider this.
>  What's easier for a binary computer (and a human dealing with binary
>  numbers) to calculate: divide by 1024 or divide by 1000? How often
>  you view a hexdump of a binary file with decimal instead of hex
>  formatting?

FreeBSD isn't standards compliant.  I don't know if that's wrong per se
but it's not something to be proud of.  As for division by 1000.  This is
only done when presenting a number to a user so it's not on a hot path and
integer division by a known constant is converted to integer multiplication
plus a shift so on today's CPUs I'd say there isn't any noticeable
difference.  I don't use decimal in a hexdump, but you didn't use hex or
binary in any example in your email.  You used decimal and with decimal
numbers powers of ten are easier to work with.  The fact that the hardware
uses powers of two is just a low level implementation detail that is
irrelevant, especially at GB and TB scales.

I still want this commit reverted.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190123001503.7f9c494e>