From owner-freebsd-audit  Wed Oct 10 10:25:54 2001
Delivered-To: freebsd-audit@freebsd.org
Received: from valiant.cnchost.com (valiant.concentric.net [207.155.252.9])
	by hub.freebsd.org (Postfix) with ESMTP
	id 4707237B408; Wed, 10 Oct 2001 10:25:42 -0700 (PDT)
Received: from bitblocks.com (adsl-209-204-185-216.sonic.net [209.204.185.216])
	by valiant.cnchost.com
	id NAA06941; Wed, 10 Oct 2001 13:25:37 -0400 (EDT)
	[ConcentricHost SMTP Relay 1.14]
Message-ID: <200110101725.NAA06941@valiant.cnchost.com>
To: Mike Barcroft <mike@FreeBSD.org>
Cc: audit@FreeBSD.org
Subject: Re: strnstr(3) - New libc function for review 
In-reply-to: Your message of "Tue, 09 Oct 2001 22:12:20 EDT."
             <20011009221220.C49828@coffee.q9media.com> 
Date: Wed, 10 Oct 2001 10:25:25 -0700
From: Bakul Shah <bakul@bitblocks.com>
Sender: owner-freebsd-audit@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-audit.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-audit>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-audit>
X-Loop: FreeBSD.ORG

[synposis:
 I am arguing to make strnstrn a part of libc.  strnstrn instead
 of, or in addition to, strnstr.  strnstrn is defined as
     char * strnstrn(const char* s, size_t slen, const char* p, size_t plen)
]

> This is probably not needed for most uses for this.  It would be rare
> to have two non-NUL terminated strings.  On the other hand, this could 
> be implemented in the future, if it's seen as useful.  strnstr(3)
> could easily be modified to call a strnstrn() and use strlen(3) to get
> the missing size field.

Well, as you point out strnstr (as per your definition) is
not general enough but strnstrn is.  The latter will accept
non nul terminated strings and can be used to build strnstr:

    strnstr(a,b,l) == strnstrn(a,l, b, strlen(b))

Actually it is not so rare to have two non nul terminated
strings, for example when you are comparing a substring (part
of a bigger string) against another string.  strnstr will
force nul-termination and you either have to allocate a new
string or, worse (and the more frequent case), write a \nul
into a string, while remembering to save and restore the
overwritten char.  This latter horrible habit is even
enshrined in strtok, strsep and friends.

I don't care much about the name but do care about generality
(IMHO the names str{,n}chr and str{,n}str stink -- naming a
function after its argument types is pretty strange!  But
that is a separate discussion:).

> > {
> > 	while (slen >= plen) {
> > 		if (strncmp(s, p, plen) == 0)
> > 			return (char*)s;
> > 		s++, slen--;
> > 	}
> 
> It seems to me, it would be a pessimization to call strncmp(3) when
> you don't even have one character that matches.

Good point!  Okay, how about the following?  It should be as
efficient as your version.

	while (slen >= plen) {
		if (*s == *p && strncmp(s+1, p+1, plen-1) == 0)
			return (char*)s;
		s++; slen--;
	}

> > 	return 0;
> 
> Do you mean: return (NULL) ?

I mildly prefer 0 and removal of unnecessary parens but that
is just a style issue.  Not important.

> Yes, I recall seeing these algorithms discussed recently and I believe
> it was concluded that making strstr(3) use one of these more advanced
> algorithms would be a pessimization for most cases.  That said, don't
> let me hold you back from proposing alternative or complimentary
> functions to strnstr(3).

No proof was proffered either way.  I happen to believe it is
a win even when you are searching sub 100 byte strings but
I'll shut up until I can show that (or Andrew L. Neporada
does that!).  *If* it is a win, IMHO it is better to have one
interface (strnstrn or whatever) that selects the appropriate
algorithm for a number of reasons:

a) most people simply want to use a function that meets their
   needs without doing any algorithmic analysis.  They benefit
   automatically.

b) if tomorrow you come up with a faster algorithm, a new
   strnstrn implementation that can take advantage of that
   will benefit existing programs as well (if they use shared
   libs).

This is a philosophical argument about library design (not
just strnstrn) which is why I put freebsd-hackers back in
bcc:.  To me it makes sense to provide algorithm specific
functions *and* a generic function that selects the best one
based on inputs.  Use the specific version when you know
exatcly what you are doing and want a better control over the
behavior of your program; use the generic version when you
something `fast' but don't care beyond that.  Sort of like
providing VM for the masses -- not everyone needs or wants to
do their own memory management!

-- bakul

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-audit" in the body of the message