From owner-freebsd-questions@FreeBSD.ORG Mon Jul 14 08:19:50 2003 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id D78C937B404 for ; Mon, 14 Jul 2003 08:19:50 -0700 (PDT) Received: from zim.0x7e.net (zim.0x7e.net [203.38.184.132]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0147243FB1 for ; Mon, 14 Jul 2003 08:19:25 -0700 (PDT) (envelope-from listone@deathbeforedecaf.net) Received: from goo.0x7e.net ([203.38.184.164] helo=goo) by zim.0x7e.net with smtp (Exim 3.36 #1) id 19c56j-0002oO-00; Tue, 15 Jul 2003 00:49:17 +0930 Message-ID: <00b001c34a1b$4ad17800$a4b826cb@goo> From: "Rob" To: , "questions at FreeBSD" References: <20030714140816.GA27439@sheol.localdomain> Date: Tue, 15 Jul 2003 00:49:15 +0930 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4927.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4927.1200 Subject: Re: sed(1) regular expression gurus X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Jul 2003 15:19:51 -0000 OK, here's a solution using awk - may be possible in sed, but awk has more control statements for this kind of thing: awk --posix -F'[^0-9A-Za-z.]+' ' $1 ~ /by/ { result = $2 for (i=3; i<=NF; i++) { if ($i ~ /^([0-9]+\.){3}[0-9]+$/) { result = result " " $i } } print result }' * Use the field separator to throw away anything that isn't a number, letter or periodic - don't have to worry about brackets anymore * Match lines starting with 'by' and save the second word (which should be a hostname) * Check the following words - if they match an IP address, they're saved too * Then print the result! There may be 'neater' ways of doing it, but it's the most concise example I could come up with. You need to include the --posix option to get the '{3}' notation to work (peculiar to GNU awk). ----- Original Message ----- From: "D J Hawkey Jr" Subject: Attn: sed(1) regular expression gurus > Hi all. > > I'm getting really frustrated by a seemingly simple problem. I'm doing > this under FreeBSD 4.5. > > Given these portions of an e-mail's multi-line Received header as tests: > > by some.host.at.a.com (Postfix) with ESMTP id 3A4E07B03 > by some.host.at.a.com (8.11.6) ESMTP; > by some.host.at.a.different.com (8.11.6p2/8.11.6) ESMTP; > by some.host.at.another.com ([123.4.56.789]) id 3A4E07B03 > by some.host.at.yet.another.com (123.4.56.789) id 3A4E07B03 > > I want to isolate the addresses (one for the 1st through 3rd, two for > the 4th and 5th). Here's the sed(1) command I'm playing with: > > echo "by nospam.mc.mpls.visi.com (Postfix) with ESMTP id 3A4E07B03" \ > |sed -E \ > -e "s/by[[:space:]]+//" \ > -e "s/(\((\[?([0-9]{1,3}\.){3}[0-9]{1,3}\]?){0}\)|id|with|E?SMTP).*//" > > In all cases, the parenthetical word is returned, when only the last > two should return the parenthetical word. The idea behind the first > branch of the second sed(1) command is to match anything that isn't a > "digits.digits.digits.digits" pattern. I've tried simpler expressions > like "\(\[?[^0-9.]+\]?\)", but it fails on the third example. > > What the devil am I doing wrong?? Am I exercizing known bugs in GNU's > sed(1)? Can anyone dream up a different solution - please, no Perl, but > awk(1) is fine. > > Thanks, > Dave > > -- > ______________________ ______________________ > \__________________ \ D. J. HAWKEY JR. / __________________/ > \________________/\ hawkeyd@visi.com /\________________/ > http://www.visi.com/~hawkeyd/ > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" >