From owner-freebsd-questions@FreeBSD.ORG Sat Feb 11 21:46:00 2006 Return-Path: X-Original-To: questions@freebsd.org Delivered-To: freebsd-questions@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3B86F16A420 for ; Sat, 11 Feb 2006 21:46:00 +0000 (GMT) (envelope-from parv@pair.com) Received: from mta9.adelphia.net (mta9.adelphia.net [68.168.78.199]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6FB2443D45 for ; Sat, 11 Feb 2006 21:45:59 +0000 (GMT) (envelope-from parv@pair.com) Received: from default.chvlva.adelphia.net ([68.67.248.200]) by mta9.adelphia.net (InterMail vM.6.01.05.02 201-2131-123-102-20050715) with ESMTP id <20060211214558.EPEB22902.mta9.adelphia.net@default.chvlva.adelphia.net>; Sat, 11 Feb 2006 16:45:58 -0500 Received: by default.chvlva.adelphia.net (Postfix, from userid 1000) id 132F3B9CE; Sat, 11 Feb 2006 16:45:50 -0500 (EST) Date: Sat, 11 Feb 2006 16:45:49 -0500 From: Parv To: Kristian Vaaf Message-ID: <20060211214549.GA1674@holestein.holy.cow> Mail-Followup-To: Kristian Vaaf , questions@freebsd.org References: <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no> Cc: questions@freebsd.org Subject: Re: Script to clean text files X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Feb 2006 21:46:00 -0000 in message <7.0.1.0.2.20060211172807.0214a4b8@broadpark.no>, wrote Kristian Vaaf thusly... > > > Among other things, this script is suppose to add an empty line at > the bottom of a file. > > But somehow it always removes the first line in a text file, > how do I stop this? Can you provide a small sample file complete w/ things that you want to remove? > #!/usr/local/bin/bash > # > # Remove CRLF, trailing whitespace and double lines. What are "double lines"? > # $ARBA: clean.sh,v 1.0 2007/11/11 15:09:05 vaaf Exp $ > # > for file in `find -s . -type f -not -name ".*"`; do > if file -b "$file" | grep -q 'text'; then > echo >> "$file" > perl -i -pe 's/\015$//' "$file" > perl -i -pe 's/[^\S\n]+$//g' "$file" Why do you have two perl runs? More importantly, you will remove anything which is not whitespace or not newline. That means, in the end, you should have a file filled w/ whitespace only. > > perl -pi -00 -e 1 "$file" > echo "$file: Done" > fi > done To remove CRLF, trailing whitespace, and 2 consecutive blank lines ... { tr -d '\r' < "$file" \ | sed -E -e 's/[[:space:]]+$//' \ | cat -s - > "${file}.tmp" } && mv -f "${file}.tmp" "$file" - Parv --