From owner-freebsd-bugs@freebsd.org Fri Dec 8 14:34:53 2017 Return-Path: Delivered-To: freebsd-bugs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0DFBDE84F4A for ; Fri, 8 Dec 2017 14:34:53 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E771E743CD for ; Fri, 8 Dec 2017 14:34:52 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id vB8EYq8j093717 for ; Fri, 8 Dec 2017 14:34:52 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-bugs@FreeBSD.org Subject: [Bug 224160] [patch] wc -c is slow Date: Fri, 08 Dec 2017 14:34:52 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bin X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: cem@freebsd.org X-Bugzilla-Status: In Progress X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: cem@freebsd.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: keywords bug_status short_desc assigned_to Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Dec 2017 14:34:53 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D224160 Conrad Meyer changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |patch Status|New |In Progress Summary|wc -c is slow |[patch] wc -c is slow Assignee|freebsd-bugs@FreeBSD.org |cem@freebsd.org --- Comment #2 from Conrad Meyer --- wc(1) uses a stack buffer of size MAXBSIZE, or 64kB. Increasing this may h= elp (move it to the heap). Secondly, there is an optimization for counting lines, and that same optimization counts characters, but it is not used if wc is only asked to c= ount characters! Silly. It's also not used if wc is asked to count stdin! Stu= pid. Just fixing stdin + character count optimization gives much better results, comparable to GNU wc: 2097152000 ~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c 0.01s user 0.43s system 45% cpu 0.964 total Bumping the buffer size to 4 MB yields big improvement in system time. (No= te that the dd size was increased 10x.) Before: 20971520000 ~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c 0.14s user 3.99s system 42% cpu 9.653 total After: 20971520000 ~/obj/usr/home/conrad/src/freebsd/amd64.amd64/usr.bin/wc/wc -c 0.12s user 1.90s system 40% cpu 4.954 total GNU wc is actually worse: 20971520000 gwc -c 0.21s user 2.91s system 48% cpu 6.490 total Here is the PoC patch (whitespace changes elided (-w) for legibility). Note that it leaks memory. 4 MB may be totally inappropriate for small devices, too. --- a/usr.bin/wc/wc.c +++ b/usr.bin/wc/wc.c @@ -199,15 +199,17 @@ cnt(const char *file) size_t clen; short gotsp; u_char *p; - u_char buf[MAXBSIZE]; + u_char *buf; wchar_t wch; mbstate_t mbs; +#define MY_BUF_SIZE (4 * 1024 * 1024) + buf =3D malloc(MY_BUF_SIZE); + linect =3D wordct =3D charct =3D llct =3D tmpll =3D 0; if (file =3D=3D NULL) fd =3D STDIN_FILENO; - else { - if ((fd =3D open(file, O_RDONLY, 0)) < 0) { + else if ((fd =3D open(file, O_RDONLY, 0)) < 0) { xo_warn("%s: open", file); return (1); } @@ -218,8 +220,8 @@ cnt(const char *file) * lines than to get words, since the word count requires some * logic. */ - if (doline) { - while ((len =3D read(fd, buf, MAXBSIZE))) { + if (doline || dochar) { + while ((len =3D read(fd, buf, MY_BUF_SIZE))) { if (len =3D=3D -1) { xo_warn("%s: read", file); (void)close(fd); @@ -230,6 +232,7 @@ cnt(const char *file) llct); } charct +=3D len; + if (doline) { for (p =3D buf; len--; ++p) if (*p =3D=3D '\n') { if (tmpll > llct) @@ -239,7 +242,9 @@ cnt(const char *file) } else tmpll++; } + } reset_siginfo(); + if (doline) tlinect +=3D linect; if (dochar) tcharct +=3D charct; @@ -270,13 +275,12 @@ cnt(const char *file) return (0); } } - } /* Do it the hard way... */ word: gotsp =3D 1; warned =3D 0; memset(&mbs, 0, sizeof(mbs)); - while ((len =3D read(fd, buf, MAXBSIZE)) !=3D 0) { + while ((len =3D read(fd, buf, MY_BUF_SIZE)) !=3D 0) { if (len =3D=3D -1) { xo_warn("%s: read", file !=3D NULL ? file : "stdin"= ); (void)close(fd); --=20 You are receiving this mail because: You are the assignee for the bug.=