From owner-freebsd-audit Wed Oct 10 10:25:54 2001 Delivered-To: freebsd-audit@freebsd.org Received: from valiant.cnchost.com (valiant.concentric.net [207.155.252.9]) by hub.freebsd.org (Postfix) with ESMTP id 4707237B408; Wed, 10 Oct 2001 10:25:42 -0700 (PDT) Received: from bitblocks.com (adsl-209-204-185-216.sonic.net [209.204.185.216]) by valiant.cnchost.com id NAA06941; Wed, 10 Oct 2001 13:25:37 -0400 (EDT) [ConcentricHost SMTP Relay 1.14] Message-ID: <200110101725.NAA06941@valiant.cnchost.com> To: Mike Barcroft Cc: audit@FreeBSD.org Subject: Re: strnstr(3) - New libc function for review In-reply-to: Your message of "Tue, 09 Oct 2001 22:12:20 EDT." <20011009221220.C49828@coffee.q9media.com> Date: Wed, 10 Oct 2001 10:25:25 -0700 From: Bakul Shah Sender: owner-freebsd-audit@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG [synposis: I am arguing to make strnstrn a part of libc. strnstrn instead of, or in addition to, strnstr. strnstrn is defined as char * strnstrn(const char* s, size_t slen, const char* p, size_t plen) ] > This is probably not needed for most uses for this. It would be rare > to have two non-NUL terminated strings. On the other hand, this could > be implemented in the future, if it's seen as useful. strnstr(3) > could easily be modified to call a strnstrn() and use strlen(3) to get > the missing size field. Well, as you point out strnstr (as per your definition) is not general enough but strnstrn is. The latter will accept non nul terminated strings and can be used to build strnstr: strnstr(a,b,l) == strnstrn(a,l, b, strlen(b)) Actually it is not so rare to have two non nul terminated strings, for example when you are comparing a substring (part of a bigger string) against another string. strnstr will force nul-termination and you either have to allocate a new string or, worse (and the more frequent case), write a \nul into a string, while remembering to save and restore the overwritten char. This latter horrible habit is even enshrined in strtok, strsep and friends. I don't care much about the name but do care about generality (IMHO the names str{,n}chr and str{,n}str stink -- naming a function after its argument types is pretty strange! But that is a separate discussion:). > > { > > while (slen >= plen) { > > if (strncmp(s, p, plen) == 0) > > return (char*)s; > > s++, slen--; > > } > > It seems to me, it would be a pessimization to call strncmp(3) when > you don't even have one character that matches. Good point! Okay, how about the following? It should be as efficient as your version. while (slen >= plen) { if (*s == *p && strncmp(s+1, p+1, plen-1) == 0) return (char*)s; s++; slen--; } > > return 0; > > Do you mean: return (NULL) ? I mildly prefer 0 and removal of unnecessary parens but that is just a style issue. Not important. > Yes, I recall seeing these algorithms discussed recently and I believe > it was concluded that making strstr(3) use one of these more advanced > algorithms would be a pessimization for most cases. That said, don't > let me hold you back from proposing alternative or complimentary > functions to strnstr(3). No proof was proffered either way. I happen to believe it is a win even when you are searching sub 100 byte strings but I'll shut up until I can show that (or Andrew L. Neporada does that!). *If* it is a win, IMHO it is better to have one interface (strnstrn or whatever) that selects the appropriate algorithm for a number of reasons: a) most people simply want to use a function that meets their needs without doing any algorithmic analysis. They benefit automatically. b) if tomorrow you come up with a faster algorithm, a new strnstrn implementation that can take advantage of that will benefit existing programs as well (if they use shared libs). This is a philosophical argument about library design (not just strnstrn) which is why I put freebsd-hackers back in bcc:. To me it makes sense to provide algorithm specific functions *and* a generic function that selects the best one based on inputs. Use the specific version when you know exatcly what you are doing and want a better control over the behavior of your program; use the generic version when you something `fast' but don't care beyond that. Sort of like providing VM for the masses -- not everyone needs or wants to do their own memory management! -- bakul To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-audit" in the body of the message