From owner-freebsd-hackers Tue Oct 9 19:11:31 2001 Delivered-To: freebsd-hackers@freebsd.org Received: from coffee.q9media.com (coffee.q9media.com [216.94.229.19]) by hub.freebsd.org (Postfix) with ESMTP id 020A537B406; Tue, 9 Oct 2001 19:11:21 -0700 (PDT) Received: (from mike@localhost) by coffee.q9media.com (8.11.6/8.11.6) id f9A2CLp50300; Tue, 9 Oct 2001 22:12:21 -0400 (EDT) (envelope-from mike) Date: Tue, 9 Oct 2001 22:12:20 -0400 From: Mike Barcroft To: Bakul Shah Cc: audit@FreeBSD.org Subject: Re: strnstr(3) - New libc function for review Message-ID: <20011009221220.C49828@coffee.q9media.com> References: <20011004215706.B34530@coffee.q9media.com> <200110100127.VAA22073@rodney.cnchost.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200110100127.VAA22073@rodney.cnchost.com>; from bakul@bitblocks.com on Tue, Oct 09, 2001 at 06:27:38PM -0700 Organization: The FreeBSD Project Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG [-hackers moved back to BCC. I had intended follow-ups to go to -audit, but there might be some interested hackers reading now. Follow-ups to this message go to -audit, thanks! :) ] Bakul Shah writes: > > I would appreciate comments/reviews of the following new addition to > > libc. It is largely based off the current strstr(3) implementation. > > Sorry for not getting to this sooner. > > > /* > > * Find the first occurrence of find in s, where the search is limited to the > > * first slen characters of s. > > */ > > char * > > strnstr(s, find, slen) > > const char *s; > > const char *find; > > size_t slen; > > { > > char c, sc; > > size_t len; > > > > if ((c = *find++) != '\0') { > > len = strlen(find); > > do { > > do { > > if ((sc = *s++) == '\0' || slen-- < 1) > > return (NULL); > > } while (sc != c); > > if (len > slen) > > return (NULL); > > } while (strncmp(s, find, len) != 0); > > s--; > > } > > return ((char *)s); > > } > > Why not pass the length of the pattern as well? Regardless, This is probably not needed for most uses for this. It would be rare to have two non-NUL terminated strings. On the other hand, this could be implemented in the future, if it's seen as useful. strnstr(3) could easily be modified to call a strnstrn() and use strlen(3) to get the missing size field. > why not use simpler code that is easier to prove right? > > char* > strnstr(const char *s, size_t slen, const chat *p, size_t plen) This prototype is inconsistent with any strn...(3) functions that I'm aware of. > { > while (slen >= plen) { > if (strncmp(s, p, plen) == 0) > return (char*)s; > s++, slen--; > } It seems to me, it would be a pessimization to call strncmp(3) when you don't even have one character that matches. > return 0; > } Do you mean: return (NULL) ? > Another reason for passing in both string lengths is to allow > switching to a more efficient algorithm. The above algorithm > runs in slen*plen time. Other more efficient algorithms have > a startup cost that can be hiddne for a fairly moderate value > of slen*plen. So you'd insert something like > > if (worth_it_to_run_KMP_algo(splen, plen)) > return kmp_strnstr(s, slen, p, plen); > > right above the while loop. This makes such functions > useful for much larger strings (e.g. when you have > mmapped in the whole file). Yes, I recall seeing these algorithms discussed recently and I believe it was concluded that making strstr(3) use one of these more advanced algorithms would be a pessimization for most cases. That said, don't let me hold you back from proposing alternative or complimentary functions to strnstr(3). Best regards, Mike Barcroft To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message