From owner-freebsd-hackers Wed Jul 7 12:31:43 1999 Delivered-To: freebsd-hackers@freebsd.org Received: from flood.ping.uio.no (flood.ping.uio.no [129.240.78.31]) by hub.freebsd.org (Postfix) with ESMTP id C9E451502A for ; Wed, 7 Jul 1999 12:31:40 -0700 (PDT) (envelope-from des@flood.ping.uio.no) Received: (from des@localhost) by flood.ping.uio.no (8.9.3/8.9.1) id VAA91346; Wed, 7 Jul 1999 21:31:11 +0200 (CEST) (envelope-from des) To: Jamie Howard Cc: Archie Cobbs , freebsd-hackers@FreeBSD.ORG, tech-userlevel@netbsd.org, tech@openbsd.org Subject: Re: Repalcement for grep(1) References: From: Dag-Erling Smorgrav Date: 07 Jul 1999 21:31:10 +0200 In-Reply-To: Jamie Howard's message of "Sun, 4 Jul 1999 21:32:22 -0400 (EDT)" Message-ID: Lines: 41 X-Mailer: Gnus v5.5/Emacs 19.34 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Jamie Howard writes: > On Sun, 4 Jul 1999, Archie Cobbs wrote: > > There are two special cases- of bracket expressions: the > > bracket expressions `[[:<:]]' and `[[:>:]]' match the null > > string at the beginning and end of a word respectively. > > Perhaps this will help with -w? > Yes, I received a patch from Simon Burge which implements this. It also > beats using [^A-Za-z] and [A-Za-z$] as I was and GNU grep does. No, because there are scripts out there (e.g. ports/Mk/bsd.port.mk) which rely on this behaviour. I suggest you explore the magic of the nmatch and pmatch arguments to regexec() :) Specifically, the pattern matched a word if: ((pmatch[0].rm_so == 0 || !isalpha(line[pmatch[0].rm_so-1])) && (pmatch[0].rm_eo == len || !isalpha(line[pmatch[0].rm_eo]))) This is off the top of my head, from reading the man page: you'll have to try it out to see if it works. You might want to replace isalpha with something less restrictive, such as isalnum(), or: #define isword(x) (isalnum(x) || (x) == '_') (judging from empirical observation, the latter corresponds to what GNU grep does) As for full-line matches (-x), simply check that (pmatch[0].rm_so == 0 && pmatch[0].rm_eo == len) This should save you from playing games with back-references. (both code snippets assume that line points to a line of text from the input and that len is the length of that line minus the newline) DES -- Dag-Erling Smorgrav - des@flood.ping.uio.no To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message