Date: Thu, 11 May 2000 13:56:10 -0600 From: Charles Randall <crandall@matchlogic.com> To: Mitch Collinsworth <mkc@Graphics.Cornell.EDU>, Dan Larsson <dl@tyfon.net> Cc: questions@FreeBSD.ORG Subject: RE: regexp driving me nuts, help needed! Message-ID: <5FE9B713CCCDD311A03400508B8B3013B256B8@bdr-xcln.is.matchlogic.com>
next in thread | raw e-mail | index | archive | help
That seems like a lot of work, % echo http://www.domain.com/www.blah/html.asp | perl -ne 'print $1,"\n" if m|http://www\.([^/]+)|i' domain.com This will work with a big list of URLs on stdin. Charles -----Original Message----- From: Mitch Collinsworth [mailto:mkc@Graphics.Cornell.EDU] Sent: Thursday, May 11, 2000 11:27 AM To: Dan Larsson Cc: questions@FreeBSD.ORG Subject: Re: regexp driving me nuts, help needed! >I need to get the domain and tld from an url. > >this my idea of what would match and return 'domain.com': >echo http://www.domain.com/html.asp | sed -e 's/\([\.a-zA-Z0-9]+[a-zA-Z]{2,3}\ >)/\1 /g' > >But that's not what sh thinks ( it returns the whole url ) >What regexp should I use to get the desired result? Here's a perl 1-liner: echo http://www.domain.com/html.asp |\ perl -e '$u=<>; $u=~s/http:\/\///; $u=~s/^www.//i; $u=~s/\/.*$//; print $u' domain.com This works in stages, so it doesn't depending on the starting string always containing all syntactical elements. -Mitch To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5FE9B713CCCDD311A03400508B8B3013B256B8>