Date: Wed, 10 Oct 2001 20:09:47 -0400 From: Mike Barcroft <mike@FreeBSD.org> To: Bakul Shah <bakul@bitblocks.com> Cc: audit@FreeBSD.org Subject: Re: strnstr(3) - New libc function for review Message-ID: <20011010200947.F49828@coffee.q9media.com> In-Reply-To: <200110101725.NAA06941@valiant.cnchost.com>; from bakul@bitblocks.com on Wed, Oct 10, 2001 at 10:25:25AM -0700 References: <20011009221220.C49828@coffee.q9media.com> <200110101725.NAA06941@valiant.cnchost.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Bakul Shah <bakul@bitblocks.com> writes:
> [synposis:
> I am arguing to make strnstrn a part of libc. strnstrn instead
> of, or in addition to, strnstr. strnstrn is defined as
> char * strnstrn(const char* s, size_t slen, const char* p, size_t plen)
> ]
>
> > This is probably not needed for most uses for this. It would be rare
> > to have two non-NUL terminated strings. On the other hand, this could
> > be implemented in the future, if it's seen as useful. strnstr(3)
> > could easily be modified to call a strnstrn() and use strlen(3) to get
> > the missing size field.
>
> Well, as you point out strnstr (as per your definition) is
> not general enough but strnstrn is. The latter will accept
> non nul terminated strings and can be used to build strnstr:
>
> strnstr(a,b,l) == strnstrn(a,l, b, strlen(b))
>
> Actually it is not so rare to have two non nul terminated
> strings, for example when you are comparing a substring (part
> of a bigger string) against another string. strnstr will
> force nul-termination and you either have to allocate a new
> string or, worse (and the more frequent case), write a \nul
> into a string, while remembering to save and restore the
> overwritten char. This latter horrible habit is even
> enshrined in strtok, strsep and friends.
I did do some research into current uses of strnstr() and strnstrn().
strnstr() seemed to be the more popular of the two. Yes, strnstrn()
would be more general, but I really can't see a lot of uses for it.
> I don't care much about the name but do care about generality
> (IMHO the names str{,n}chr and str{,n}str stink -- naming a
> function after its argument types is pretty strange! But
> that is a separate discussion:).
>
> > > {
> > > while (slen >= plen) {
> > > if (strncmp(s, p, plen) == 0)
> > > return (char*)s;
> > > s++, slen--;
> > > }
> >
> > It seems to me, it would be a pessimization to call strncmp(3) when
> > you don't even have one character that matches.
>
> Good point! Okay, how about the following? It should be as
> efficient as your version.
>
> while (slen >= plen) {
> if (*s == *p && strncmp(s+1, p+1, plen-1) == 0)
> return (char*)s;
> s++; slen--;
> }
You missed the case where "If little is an empty string, big is returned".
> > > return 0;
> >
> > Do you mean: return (NULL) ?
>
> I mildly prefer 0 and removal of unnecessary parens but that
> is just a style issue. Not important.
Actually style is quite important, see style(9).
> > Yes, I recall seeing these algorithms discussed recently and I believe
> > it was concluded that making strstr(3) use one of these more advanced
> > algorithms would be a pessimization for most cases. That said, don't
> > let me hold you back from proposing alternative or complimentary
> > functions to strnstr(3).
>
> No proof was proffered either way. I happen to believe it is
> a win even when you are searching sub 100 byte strings but
> I'll shut up until I can show that (or Andrew L. Neporada
> does that!). *If* it is a win, IMHO it is better to have one
> interface (strnstrn or whatever) that selects the appropriate
> algorithm for a number of reasons:
>
> a) most people simply want to use a function that meets their
> needs without doing any algorithmic analysis. They benefit
> automatically.
>
> b) if tomorrow you come up with a faster algorithm, a new
> strnstrn implementation that can take advantage of that
> will benefit existing programs as well (if they use shared
> libs).
>
> This is a philosophical argument about library design (not
> just strnstrn) which is why I put freebsd-hackers back in
> bcc:. To me it makes sense to provide algorithm specific
> functions *and* a generic function that selects the best one
> based on inputs. Use the specific version when you know
> exatcly what you are doing and want a better control over the
> behavior of your program; use the generic version when you
> something `fast' but don't care beyond that. Sort of like
> providing VM for the masses -- not everyone needs or wants to
> do their own memory management!
I don't see how strnstr(3) would impede future optimizations, you
can just get the length by using strlen(3).
Best regards,
Mike Barcroft
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-audit" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011010200947.F49828>
