Date: Tue, 15 Jul 2003 00:49:15 +0930 From: "Rob" <listone@deathbeforedecaf.net> To: <hawkeyd@visi.com>, "questions at FreeBSD" <freebsd-questions@freebsd.org> Subject: Re: sed(1) regular expression gurus Message-ID: <00b001c34a1b$4ad17800$a4b826cb@goo> References: <20030714140816.GA27439@sheol.localdomain>
next in thread | previous in thread | raw e-mail | index | archive | help
OK, here's a solution using awk - may be possible in sed, but awk has more control statements for this kind of thing: awk --posix -F'[^0-9A-Za-z.]+' ' $1 ~ /by/ { result = $2 for (i=3; i<=NF; i++) { if ($i ~ /^([0-9]+\.){3}[0-9]+$/) { result = result " " $i } } print result }' * Use the field separator to throw away anything that isn't a number, letter or periodic - don't have to worry about brackets anymore * Match lines starting with 'by' and save the second word (which should be a hostname) * Check the following words - if they match an IP address, they're saved too * Then print the result! There may be 'neater' ways of doing it, but it's the most concise example I could come up with. You need to include the --posix option to get the '{3}' notation to work (peculiar to GNU awk). ----- Original Message ----- From: "D J Hawkey Jr" <hawkeyd@visi.com> Subject: Attn: sed(1) regular expression gurus > Hi all. > > I'm getting really frustrated by a seemingly simple problem. I'm doing > this under FreeBSD 4.5. > > Given these portions of an e-mail's multi-line Received header as tests: > > by some.host.at.a.com (Postfix) with ESMTP id 3A4E07B03 > by some.host.at.a.com (8.11.6) ESMTP; > by some.host.at.a.different.com (8.11.6p2/8.11.6) ESMTP; > by some.host.at.another.com ([123.4.56.789]) id 3A4E07B03 > by some.host.at.yet.another.com (123.4.56.789) id 3A4E07B03 > > I want to isolate the addresses (one for the 1st through 3rd, two for > the 4th and 5th). Here's the sed(1) command I'm playing with: > > echo "by nospam.mc.mpls.visi.com (Postfix) with ESMTP id 3A4E07B03" \ > |sed -E \ > -e "s/by[[:space:]]+//" \ > -e "s/(\((\[?([0-9]{1,3}\.){3}[0-9]{1,3}\]?){0}\)|id|with|E?SMTP).*//" > > In all cases, the parenthetical word is returned, when only the last > two should return the parenthetical word. The idea behind the first > branch of the second sed(1) command is to match anything that isn't a > "digits.digits.digits.digits" pattern. I've tried simpler expressions > like "\(\[?[^0-9.]+\]?\)", but it fails on the third example. > > What the devil am I doing wrong?? Am I exercizing known bugs in GNU's > sed(1)? Can anyone dream up a different solution - please, no Perl, but > awk(1) is fine. > > Thanks, > Dave > > -- > ______________________ ______________________ > \__________________ \ D. J. HAWKEY JR. / __________________/ > \________________/\ hawkeyd@visi.com /\________________/ > http://www.visi.com/~hawkeyd/ > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00b001c34a1b$4ad17800$a4b826cb>