From owner-freebsd-questions Thu Jul 19 19:28:41 2001 Delivered-To: freebsd-questions@freebsd.org Received: from guru.mired.org (okc-27-141-144.mmcable.com [24.27.141.144]) by hub.freebsd.org (Postfix) with SMTP id 428DF37B409 for ; Thu, 19 Jul 2001 19:28:38 -0700 (PDT) (envelope-from mwm@mired.org) Received: (qmail 57766 invoked by uid 100); 20 Jul 2001 02:28:37 -0000 From: Mike Meyer MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15191.38741.225843.854067@guru.mired.org> Date: Thu, 19 Jul 2001 21:28:37 -0500 To: Mikhail Teterin Cc: questions@freebsd.org Subject: Re: grep and \t (\r, etc.) In-Reply-To: <78719817@toto.iv> X-Mailer: VM 6.90 under 21.1 (patch 14) "Cuyahoga Valley" XEmacs Lucid X-face: "5Mnwy%?j>IIV\)A=):rjWL~NB2aH[}Yq8Z=u~vJ`"(,&SiLvbbz2W`;h9L,Yg`+vb1>RG% *h+%X^n0EZd>TM8_IB;a8F?(Fb"lw'IgCoyM.[Lg#r\ Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Mikhail Teterin types: > Hi! > > I'm trying to clean up the HTML pages from the MSDOS-style > EOL characters. Actually removing them is easy: > > tr -d \\r < in > out > > does wonders, and, even better (removes spaces at EOL too): > > perl -pi -e 's/[\r ]+$//g' > > seems to work, but to find them (I don't want to touch the "good" > pages). I can not think of anything but grep. Which I can not make > work :( For example: > > find . -type -name '*.htm*' | xargs grep -E '\r$' > > just keeps listing all lines which end with ``r''... Any clues? Instead of trying to do this with home-grown tools, try installing tidy from the ports and just running it over all your html files. That will clean those up, among other things. http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message