From owner-freebsd-questions@FreeBSD.ORG Thu Sep 4 06:27:47 2008 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 66CC1106566B for ; Thu, 4 Sep 2008 06:27:47 +0000 (UTC) (envelope-from SRS0=jEhdia=ZO=shell.siscom.net=vogelke@siscom.net) Received: from lamorack.siscom.net (lamorack.siscom.net [209.251.2.116]) by mx1.freebsd.org (Postfix) with ESMTP id 25B7F8FC14 for ; Thu, 4 Sep 2008 06:27:46 +0000 (UTC) (envelope-from SRS0=jEhdia=ZO=shell.siscom.net=vogelke@siscom.net) Received: from shell.siscom.net ([209.251.2.80]) by lamorack.siscom.net with esmtp (Exim 4.62) (envelope-from ) id 1Kb3ky-0003fC-7l for freebsd-questions@freebsd.org; Wed, 03 Sep 2008 21:35:32 -0400 Received: by shell.siscom.net (Postfix, from userid 2198) id 21940115529; Wed, 3 Sep 2008 21:35:32 -0400 (EDT) Received: by kev.msw.wpafb.af.mil (Postfix, from userid 32768) id B1E92B7BD; Wed, 3 Sep 2008 21:33:30 -0400 (EDT) To: freebsd-questions@freebsd.org In-reply-to: <1219723211.4994.165.camel@localhost> (message from Gary Kline on Mon, 25 Aug 2008 21:00:10 -0700) Organization: Oasis Systems Inc. X-Disclaimer: I don't speak for the USAF or Oasis. X-GPG-ID: 1024D/711752A0 2006-06-27 Karl Vogel X-GPG-Fingerprint: 56EB 6DBF 4224 C953 F417 CC99 4C7C 7D46 7117 52A0 Message-Id: <20080904013330.B1E92B7BD@kev.msw.wpafb.af.mil> Date: Wed, 3 Sep 2008 21:33:30 -0400 (EDT) From: vogelke+software@pobox.com (Karl Vogel) Subject: Re: script to assist ASCII text X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: vogelke+software@pobox.com List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Sep 2008 06:27:47 -0000 >> On Mon, 25 Aug 2008 21:00:10 -0700, >> Gary Kline said: G> This had eluded me for years and it may not be possible, but here goes. G> I write using vi or, less frequently vim. Is there any sh script that G> would make sure that there were exactly one space ('\040') between words, G> and three spaces between sentences? My definition of "a sentence" is a G> string of words that ends in a period or question-mark, exclamation-mark, G> or ellipse ("... . || ... ? || ... !) Also, any dash "--" could not have G> any whitespace around it. I like a similar setup -- one space between words, sentences ending with a period followed by two spaces. The GNU version of "fmt" handles this pretty well. Here's the first part of your message, formatted to 50-character-wide lines, with the type of spacing that drives me nuts: me% cat -n msg 1 This had eluded me for years and it may not be 2 possible, but here goes. I write using vi or, 3 less frequently vim. Is there any sh script that 4 would make sure that there were exactly one 5 space ('\040') between words, and three spaces 6 between sentences? My definition of "a sentence" 7 is a string of words that ends in a period or 8 question-mark, exclamation-mark, or ellipse. Putting one word on each line and then letting GNU fmt decide on sentence-handling does almost exactly what you want: me% gfmt -1 msg | gfmt -50 | cat -n 1 This had eluded me for years and it may not be 2 possible, but here goes. I write using vi or, 3 less frequently vim. Is there any sh script 4 that would make sure that there were exactly one 5 space ('\040') between words, and three spaces 6 between sentences? My definition of "a sentence" 7 is a string of words that ends in a period or 8 question-mark, exclamation-mark, or ellipse. Here's a script I use as a driver for GNU fmt. It looks for an optional environment variable FMTWIDTH to decide how long each line should be. This comes in handy if I call vi/vim from within a script: #!/bin/sh # driver for fmt. case "$FMTWIDTH" in "") opt= ;; *) opt="-$FMTWIDTH" ;; esac case "$1" in -*) opt= ;; *) ;; esac exec /usr/local/bin/gfmt $opt ${1+"$@"} Here's an alias I use for quickly reformatting a section of text in vim. I mark where to start using 'a', then move down to the end of the section and hit 'v': jmbk:'a,.!fmt -1|fmt'b A similar alias will reformat whatever paragraph I'm in, with no need for marks: }jmbk{ma}:'a,.!fmt -1|fmt'b The script below helps me clean up a file or message after running fmt, which makes strings like "U.S.A." look like the end of a sentence even when they're not. This should give you some ideas. -- Karl Vogel I don't speak for the USAF or my company Panda Mating Fails; Veterinarian Takes Over --actual news headline, 1997 --------------------------------------------------------------------------- #!/usr/bin/perl # # $Id: cm,v 1.3 2008/08/17 20:25:49 vogelke Exp $ # $Source: /home/vogelke/bin/RCS/cm,v $ # # cm: clean mail message while (<>) { s/Jan\. /Jan /g; s/Feb\. /Feb /g; s/Aug\. /Aug /g; s/Sept\. /Sept /g; s/Oct\. /Oct /g; s/Nov\. /Nov /g; s/Dec\. /Dec /g; s/Mr\. /Mr. /g; s/Mrs\. /Mrs. /g; s/Ms\. /Ms. /g; s/Dr\. /Dr. /g; s/Sen\. /Senator /g; s/Rep\. /Representative /g; s/U\.S\.A\. /USA /g; s/U\.S\. /US /g; s/D\.C\. /DC /g; s/U\.N\. /UN /g; s/B\.S\. /BS /g; s/M\.B\.A\. /MBA /g; s/ ([A-Z]\.) / $1 /g; s/''/\"/g; s/``/\"/g; s/\342\200\231/'/g; # These come from saving Firefox pages s/\342\200\234/"/g; s/\342\200\235/"/g; print; } exit(0);