From owner-svn-src-stable-9@FreeBSD.ORG  Fri May  3 19:55:42 2013
Return-Path: <owner-svn-src-stable-9@FreeBSD.ORG>
Delivered-To: svn-src-stable-9@freebsd.org
Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115])
 by hub.freebsd.org (Postfix) with ESMTP id D1F60A05;
 Fri,  3 May 2013 19:55:42 +0000 (UTC) (envelope-from jilles@stack.nl)
Received: from mx1.stack.nl (relay02.stack.nl [IPv6:2001:610:1108:5010::104])
 by mx1.freebsd.org (Postfix) with ESMTP id 793F1109A;
 Fri,  3 May 2013 19:55:42 +0000 (UTC)
Received: from snail.stack.nl (snail.stack.nl [IPv6:2001:610:1108:5010::131])
 by mx1.stack.nl (Postfix) with ESMTP id A8F093592E6;
 Fri,  3 May 2013 21:55:40 +0200 (CEST)
Received: by snail.stack.nl (Postfix, from userid 1677)
 id 844EA28493; Fri,  3 May 2013 21:55:40 +0200 (CEST)
Date: Fri, 3 May 2013 21:55:40 +0200
From: Jilles Tjoelker <jilles@stack.nl>
To: Sergey Kandaurov <pluknet@freebsd.org>
Subject: Re: svn commit: r250215 - stable/9/lib/libc/locale
Message-ID: <20130503195540.GA52657@stack.nl>
References: <201305031552.r43FqiPN024580@svn.freebsd.org>
 <5183E899.4000503@freebsd.org>
 <CAE-mSO+B_p_HCbKwSO-rJ+dforcPEfThmOxy+Ki_1e9zPn3q_w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAE-mSO+B_p_HCbKwSO-rJ+dforcPEfThmOxy+Ki_1e9zPn3q_w@mail.gmail.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: svn-src-stable@freebsd.org, svn-src-all@freebsd.org,
 src-committers@freebsd.org, Andrey Chernov <ache@freebsd.org>,
 svn-src-stable-9@freebsd.org
X-BeenThere: svn-src-stable-9@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: SVN commit messages for only the 9-stable src tree
 <svn-src-stable-9.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/svn-src-stable-9>, 
 <mailto:svn-src-stable-9-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-stable-9>
List-Post: <mailto:svn-src-stable-9@freebsd.org>
List-Help: <mailto:svn-src-stable-9-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/svn-src-stable-9>, 
 <mailto:svn-src-stable-9-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 03 May 2013 19:55:42 -0000

On Fri, May 03, 2013 at 09:49:05PM +0400, Sergey Kandaurov wrote:
> On 3 May 2013 20:40, Andrey Chernov <ache@freebsd.org> wrote:
> > I don't think this change is optimal (cause slowdown) since you add
> > unneeded strlen() in the loop for known-sized elements plus one strlen()
> > for tested property, and wctype_l() itself can be called very often, so
> > we definitely don't need slowdown here.
> > Since most of elements have equal sizes, your len1 == len2 condition
> > gains almost nothing for valid calls.

> Thanks, you are correct. With this change I got 2-6 times slowdown depending
> on the specified wctype. I will look at how it can be optimized.

As the original author of this change, I mainly considered programs that
do not use wctype(), which gain a reduced memory footprint and rtld
load. For programs that do use it, I expect fewer cache misses because
there is less pointer chasing.

It is certainly right that sh(1) will call wctype() fairly often if a
pattern contains a [[:ctype:]] after a *.

If the slowness is because of the strlen() calls, the propnames can be
constructed like "\5alnum\5alpha"...

The slowness could also be because strcmp() is implemented less badly
than memcmp(). With the current code, memcmp() can be changed to
strcmp().

Some sort of perfect hashing can also be an option, although it makes it
harder to add new properties or adds a build dependency on gperf(1) that
we would like to get rid of.

Sorry, I have not run my own benchmarks yet.

> > On 03.05.2013 19:52, Sergey Kandaurov wrote:
> >> Modified: stable/9/lib/libc/locale/wctype.c
> >> ==============================================================================
> >> --- stable/9/lib/libc/locale/wctype.c Fri May  3 15:28:31 2013        (r250214)
> >> +++ stable/9/lib/libc/locale/wctype.c Fri May  3 15:52:43 2013        (r250215)
> >> @@ -57,35 +57,54 @@ iswctype_l(wint_t wc, wctype_t charclass
> >>  wctype_t
> >>  wctype_l(const char *property, locale_t locale)
> >>  {
> >> -     static const struct {
> >> -             const char      *name;
> >> -             wctype_t         mask;
> >> -     } props[] = {
> >> -             { "alnum",      _CTYPE_A|_CTYPE_D },
> >> -             { "alpha",      _CTYPE_A },
> >> -             { "blank",      _CTYPE_B },
> >> -             { "cntrl",      _CTYPE_C },
> >> -             { "digit",      _CTYPE_D },
> >> -             { "graph",      _CTYPE_G },
> >> -             { "lower",      _CTYPE_L },
> >> -             { "print",      _CTYPE_R },
> >> -             { "punct",      _CTYPE_P },
> >> -             { "space",      _CTYPE_S },
> >> -             { "upper",      _CTYPE_U },
> >> -             { "xdigit",     _CTYPE_X },
> >> -             { "ideogram",   _CTYPE_I },     /* BSD extension */
> >> -             { "special",    _CTYPE_T },     /* BSD extension */
> >> -             { "phonogram",  _CTYPE_Q },     /* BSD extension */
> >> -             { "rune",       0xFFFFFF00L },  /* BSD extension */
> >> -             { NULL,         0UL },          /* Default */
> >> +     const char *propnames =
> >> +             "alnum\0"
> >> +             "alpha\0"
> >> +             "blank\0"
> >> +             "cntrl\0"
> >> +             "digit\0"
> >> +             "graph\0"
> >> +             "lower\0"
> >> +             "print\0"
> >> +             "punct\0"
> >> +             "space\0"
> >> +             "upper\0"
> >> +             "xdigit\0"
> >> +             "ideogram\0"    /* BSD extension */
> >> +             "special\0"     /* BSD extension */
> >> +             "phonogram\0"   /* BSD extension */
> >> +             "rune\0";       /* BSD extension */
> >> +     static const wctype_t propmasks[] = {
> >> +             _CTYPE_A|_CTYPE_D,
> >> +             _CTYPE_A,
> >> +             _CTYPE_B,
> >> +             _CTYPE_C,
> >> +             _CTYPE_D,
> >> +             _CTYPE_G,
> >> +             _CTYPE_L,
> >> +             _CTYPE_R,
> >> +             _CTYPE_P,
> >> +             _CTYPE_S,
> >> +             _CTYPE_U,
> >> +             _CTYPE_X,
> >> +             _CTYPE_I,
> >> +             _CTYPE_T,
> >> +             _CTYPE_Q,
> >> +             0xFFFFFF00L
> >>       };
> >> -     int i;
> >> +     size_t len1, len2;
> >> +     const char *p;
> >> +     const wctype_t *q;
> >>
> >> -     i = 0;
> >> -     while (props[i].name != NULL && strcmp(props[i].name, property) != 0)
> >> -             i++;
> >> +     len1 = strlen(property);
> >> +     q = propmasks;
> >> +     for (p = propnames; (len2 = strlen(p)) != 0; p += len2 + 1) {
> >> +             if (len1 == len2 && memcmp(property, p, len1) == 0)
> >> +                     return (*q);
> >> +             q++;
> >> +     }
> >>
> >> -     return (props[i].mask);
> >> +     return (0UL);
> >>  }
> >>
> >>  wctype_t wctype(const char *property)

-- 
Jilles Tjoelker