From owner-freebsd-questions@FreeBSD.ORG Wed Dec 31 20:49:22 2008 Return-Path: Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8EDBB106564A for ; Wed, 31 Dec 2008 20:49:22 +0000 (UTC) (envelope-from SRS0=WKIqSX=5E=shell.siscom.net=vogelke@siscom.net) Received: from lamorack.siscom.net (lamorack.siscom.net [209.251.2.116]) by mx1.freebsd.org (Postfix) with ESMTP id 4C7858FC19 for ; Wed, 31 Dec 2008 20:49:22 +0000 (UTC) (envelope-from SRS0=WKIqSX=5E=shell.siscom.net=vogelke@siscom.net) Received: from shell.siscom.net ([209.251.2.80]) by lamorack.siscom.net with esmtp (Exim 4.62) (envelope-from ) id 1LI7Z7-0007ok-01 for freebsd-questions@FreeBSD.ORG; Wed, 31 Dec 2008 15:21:17 -0500 Received: by shell.siscom.net (Postfix, from userid 2198) id CEB3B115529; Wed, 31 Dec 2008 15:21:16 -0500 (EST) Received: by kev.msw.wpafb.af.mil (Postfix, from userid 32768) id C8012BE14; Wed, 31 Dec 2008 15:20:14 -0500 (EST) To: freebsd-questions@FreeBSD.ORG In-reply-to: <20081230193111.GA32641@thought.org> (message from Gary Kline on Tue, 30 Dec 2008 11:31:14 -0800) Organization: Oasis Systems Inc. X-Disclaimer: I don't speak for the USAF or Oasis. X-GPG-ID: 1024D/711752A0 2006-06-27 Karl Vogel X-GPG-Fingerprint: 56EB 6DBF 4224 C953 F417 CC99 4C7C 7D46 7117 52A0 Message-Id: <20081231202014.C8012BE14@kev.msw.wpafb.af.mil> Date: Wed, 31 Dec 2008 15:20:14 -0500 (EST) From: vogelke+software@pobox.com (Karl Vogel) Cc: Subject: Re: well, blew it... sed or perl q again. X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: vogelke+software@pobox.com List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 31 Dec 2008 20:49:22 -0000 >> On Tue, 30 Dec 2008 11:31:14 -0800, >> Gary Kline said: G> The problem is that there are many, _many_ embedded " HREF="http://whatever> Site in my hundreds, or thousands, or G> files. I only want to delete the "http://" lines, _not_ G> the other Href links. Use perl. You'll want the "i" option to do case-insensitive matching, plus "m" for matching that could span multiple lines; the first quoted line above shows one of several places where a URL can cross a line-break. You might want to leave the originals completely alone. I never trust programs to modify files in place: you% mkdir /tmp/work you% find . -type f -print | xargs grep -li http://junkfoo.com > FILES you% pax -rwdv -pe /tmp/work < FILES Your perl script can just read FILES and overwrite the stuff in the new directory. You'll want to slurp the entire file into memory so you catch any URL that spans multiple lines. Try the script below, it works for input like this: This Site should go away too. And so should Site this And finally Site this -- Karl Vogel I don't speak for the USAF or my company The average person falls asleep in seven minutes. --item for a lull in conversation --------------------------------------------------------------------------- #!/usr/bin/perl -w use strict; my $URL = 'href=(.*?)"http://junkfoo.com/*"'; my $contents; my $fh; my $infile; my $outfile; while (<>) { chomp; $infile = $_; s{^./}{/tmp/}; $outfile = $_; open ($fh, "< $infile") or die "$infile"; $contents = do { local $/; <$fh> }; close ($fh); $contents =~ s{ # substitute ... # ... until we end } { }gixms; # ... with a single space open ($fh, "> $outfile") or die "$outfile"; print $fh $contents; close ($fh); } exit(0);