Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 08 Dec 2017 14:34:52 +0000
From:      bugzilla-noreply@freebsd.org
To:        freebsd-bugs@FreeBSD.org
Subject:   [Bug 224160] [patch] wc -c is slow
Message-ID:  <bug-224160-8-8XUdTcqYdN@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-224160-8@https.bugs.freebsd.org/bugzilla/>
References:  <bug-224160-8@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D224160

Conrad Meyer <cem@freebsd.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |patch
             Status|New                         |In Progress
            Summary|wc -c is slow               |[patch] wc -c is slow
           Assignee|freebsd-bugs@FreeBSD.org    |cem@freebsd.org

--- Comment #2 from Conrad Meyer <cem@freebsd.org> ---
wc(1) uses a stack buffer of size MAXBSIZE, or 64kB.  Increasing this may h=
elp
(move it to the heap).

Secondly, there is an optimization for counting lines, and that same
optimization counts characters, but it is not used if wc is only asked to c=
ount
characters!  Silly.  It's also not used if wc is asked to count stdin!  Stu=
pid.

Just fixing stdin + character count optimization gives much better results,
comparable to GNU wc:

 2097152000
~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c  0.01s user
0.43s system 45% cpu 0.964 total

Bumping the buffer size to 4 MB yields big improvement in system time.  (No=
te
that the dd size was increased 10x.)

Before:
 20971520000
~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c  0.14s user
3.99s system 42% cpu 9.653 total
After:
 20971520000
~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c  0.12s user
1.90s system 40% cpu 4.954 total

GNU wc is actually worse:
20971520000
gwc -c  0.21s user 2.91s system 48% cpu 6.490 total


Here is the PoC patch (whitespace changes elided (-w) for legibility).  Note
that it leaks memory.  4 MB may be totally inappropriate for small devices,
too.

--- a/usr.bin/wc/wc.c
+++ b/usr.bin/wc/wc.c
@@ -199,15 +199,17 @@ cnt(const char *file)
        size_t clen;
        short gotsp;
        u_char *p;
-       u_char buf[MAXBSIZE];
+       u_char *buf;
        wchar_t wch;
        mbstate_t mbs;

+#define MY_BUF_SIZE (4 * 1024 * 1024)
+       buf =3D malloc(MY_BUF_SIZE);
+
        linect =3D wordct =3D charct =3D llct =3D tmpll =3D 0;
        if (file =3D=3D NULL)
                fd =3D STDIN_FILENO;
-       else {
-               if ((fd =3D open(file, O_RDONLY, 0)) < 0) {
+       else if ((fd =3D open(file, O_RDONLY, 0)) < 0) {
                xo_warn("%s: open", file);
                return (1);
        }
@@ -218,8 +220,8 @@ cnt(const char *file)
         * lines than to get words, since the word count requires some
         * logic.
         */
-               if (doline) {
-                       while ((len =3D read(fd, buf, MAXBSIZE))) {
+       if (doline || dochar) {
+               while ((len =3D read(fd, buf, MY_BUF_SIZE))) {
                        if (len =3D=3D -1) {
                                xo_warn("%s: read", file);
                                (void)close(fd);
@@ -230,6 +232,7 @@ cnt(const char *file)
                                    llct);
                        }
                        charct +=3D len;
+                       if (doline) {
                                for (p =3D buf; len--; ++p)
                                        if (*p =3D=3D '\n') {
                                                if (tmpll > llct)
@@ -239,7 +242,9 @@ cnt(const char *file)
                                        } else
                                                tmpll++;
                        }
+               }
                reset_siginfo();
+               if (doline)
                        tlinect +=3D linect;
                if (dochar)
                        tcharct +=3D charct;
@@ -270,13 +275,12 @@ cnt(const char *file)
                        return (0);
                }
        }
-       }

        /* Do it the hard way... */
 word:  gotsp =3D 1;
        warned =3D 0;
        memset(&mbs, 0, sizeof(mbs));
-       while ((len =3D read(fd, buf, MAXBSIZE)) !=3D 0) {
+       while ((len =3D read(fd, buf, MY_BUF_SIZE)) !=3D 0) {
                if (len =3D=3D -1) {
                        xo_warn("%s: read", file !=3D NULL ? file : "stdin"=
);
                        (void)close(fd);

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-224160-8-8XUdTcqYdN>