FreeBSD Mail Archives

Date:      Wed, 10 Oct 2001 20:09:47 -0400
From:      Mike Barcroft <mike@FreeBSD.org>
To:        Bakul Shah <bakul@bitblocks.com>
Cc:        audit@FreeBSD.org
Subject:   Re: strnstr(3) - New libc function for review
Message-ID:  <20011010200947.F49828@coffee.q9media.com>
In-Reply-To: <200110101725.NAA06941@valiant.cnchost.com>; from bakul@bitblocks.com on Wed, Oct 10, 2001 at 10:25:25AM -0700
References:  <20011009221220.C49828@coffee.q9media.com> <200110101725.NAA06941@valiant.cnchost.com>


Bakul Shah <bakul@bitblocks.com> writes:
> [synposis:
>  I am arguing to make strnstrn a part of libc.  strnstrn instead
>  of, or in addition to, strnstr.  strnstrn is defined as
>      char * strnstrn(const char* s, size_t slen, const char* p, size_t plen)
> ]
> 
> > This is probably not needed for most uses for this.  It would be rare
> > to have two non-NUL terminated strings.  On the other hand, this could 
> > be implemented in the future, if it's seen as useful.  strnstr(3)
> > could easily be modified to call a strnstrn() and use strlen(3) to get
> > the missing size field.
> 
> Well, as you point out strnstr (as per your definition) is
> not general enough but strnstrn is.  The latter will accept
> non nul terminated strings and can be used to build strnstr:
> 
>     strnstr(a,b,l) == strnstrn(a,l, b, strlen(b))
> 
> Actually it is not so rare to have two non nul terminated
> strings, for example when you are comparing a substring (part
> of a bigger string) against another string.  strnstr will
> force nul-termination and you either have to allocate a new
> string or, worse (and the more frequent case), write a \nul
> into a string, while remembering to save and restore the
> overwritten char.  This latter horrible habit is even
> enshrined in strtok, strsep and friends.

I did do some research into current uses of strnstr() and strnstrn().
strnstr() seemed to be the more popular of the two.  Yes, strnstrn()
would be more general, but I really can't see a lot of uses for it.

> I don't care much about the name but do care about generality
> (IMHO the names str{,n}chr and str{,n}str stink -- naming a
> function after its argument types is pretty strange!  But
> that is a separate discussion:).
> 
> > > {
> > > 	while (slen >= plen) {
> > > 		if (strncmp(s, p, plen) == 0)
> > > 			return (char*)s;
> > > 		s++, slen--;
> > > 	}
> > 
> > It seems to me, it would be a pessimization to call strncmp(3) when
> > you don't even have one character that matches.
> 
> Good point!  Okay, how about the following?  It should be as
> efficient as your version.
> 
> 	while (slen >= plen) {
> 		if (*s == *p && strncmp(s+1, p+1, plen-1) == 0)
> 			return (char*)s;
> 		s++; slen--;
> 	}

You missed the case where "If little is an empty string, big is returned".

> > > 	return 0;
> > 
> > Do you mean: return (NULL) ?
> 
> I mildly prefer 0 and removal of unnecessary parens but that
> is just a style issue.  Not important.

Actually style is quite important, see style(9).

> > Yes, I recall seeing these algorithms discussed recently and I believe
> > it was concluded that making strstr(3) use one of these more advanced
> > algorithms would be a pessimization for most cases.  That said, don't
> > let me hold you back from proposing alternative or complimentary
> > functions to strnstr(3).
> 
> No proof was proffered either way.  I happen to believe it is
> a win even when you are searching sub 100 byte strings but
> I'll shut up until I can show that (or Andrew L. Neporada
> does that!).  *If* it is a win, IMHO it is better to have one
> interface (strnstrn or whatever) that selects the appropriate
> algorithm for a number of reasons:
> 
> a) most people simply want to use a function that meets their
>    needs without doing any algorithmic analysis.  They benefit
>    automatically.
> 
> b) if tomorrow you come up with a faster algorithm, a new
>    strnstrn implementation that can take advantage of that
>    will benefit existing programs as well (if they use shared
>    libs).
> 
> This is a philosophical argument about library design (not
> just strnstrn) which is why I put freebsd-hackers back in
> bcc:.  To me it makes sense to provide algorithm specific
> functions *and* a generic function that selects the best one
> based on inputs.  Use the specific version when you know
> exatcly what you are doing and want a better control over the
> behavior of your program; use the generic version when you
> something `fast' but don't care beyond that.  Sort of like
> providing VM for the masses -- not everyone needs or wants to
> do their own memory management!

I don't see how strnstr(3) would impede future optimizations, you
can just get the length by using strlen(3).

Best regards,
Mike Barcroft

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-audit" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20011010200947.F49828>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation