Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 07 Dec 2017 13:07:28 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 224160] wc -c is slow
Message-ID:  <bug-224160-8@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D224160

            Bug ID: 224160
           Summary: wc -c is slow
           Product: Base System
           Version: CURRENT
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bin
          Assignee: freebsd-bugs@FreeBSD.org
          Reporter: wosch@FreeBSD.org

The wc(1) command has several optimizations to run as fast as possible.
However, it is still slow in some use cases, much slower than the GNU wc
command

Using the OpenStreetMap database dump planet-latest.osm.bz2
(from https://wiki.openstreetmap.org/wiki/Planet.osm)
which it is a 61GB bzip'd XML file.

I checked how large the uncompressed XML is, on a 32 CPU machine:

# FreeBSD wc
$ pbzip2 -dc planet-latest.osm.bz2 | time wc -c
908171295050
    4729.53 real      4400.69 user       199.34 sys

the wc(1) command was running at 100% CPU time, and pbzip2 was using only 5=
00%
CPU time.


I run the tests again with GNU wc. The wc command was using only 20% CPU ti=
me,
and pbzip2 around 3000%.

# GNU wc
$ pbzip2 -dc planet-latest.osm.bz2 | time gwc -c
908171295050
    2003.15 real         8.86 user       355.53 sys

The FreeBSD wc(1) command is using 500 times more user time (4400 <-> 9) th=
an
the GNU wc, and a little bit less system time (199 <-> 355). The bottleneck=
 was
not pbzip2, it was wc.=20

We should check why the optimization for wc -c for reading from stdin is not
working.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-224160-8>