Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 May 2010 09:00:57 +0200
From:      Polytropon <freebsd@edvax.de>
To:        Gary Kline <kline@thought.org>
Cc:        FreeBSD Mailing List <freebsd-questions@freebsd.org>
Subject:   Re: any shortcuts to doc to ascii?
Message-ID:  <20100528090057.87144ef4.freebsd@edvax.de>
In-Reply-To: <20100527233607.GD19297@thought.org>
References:  <20100527013843.GA40751@thought.org> <20100527050302.da39c258.freebsd@edvax.de> <20100527233607.GD19297@thought.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, 27 May 2010 16:36:08 -0700, Gary Kline <kline@thought.org> wrote:
> 	i don't see any ascii suffix [for OOo].  i saved as .txt.

This should be right. The .txt extension refers to ASCII text,
at least in standard-compliant operating systems.



> 	same krap.  the \x94, x9d, \x9c...  same with catdoc.  i'll
> 	try antiword.  [forgot about that.  ]

This makes me believe that the original DOC file has been created
with a wrong character set or language setting. "Windows" - as far
as I know - does not use standard locales such as all other systems
do, but uses an arbitrary setting.

Another idea may be that the character that you think should be
an apostrophe isn't an apostrophe. I often do see this in german
texts with misplaces apostrophes that are in fact accent grave
or accent acute, or a character from UTF-8 that just looks like
an apostrophe. For example, if the original document contains

	We don`t

and this ` is not a real ', then conversion tools will of course
use the "escape notation" for this unknown character. Other
characters that may lead to such "escape notation" replacements
can be quotation marks (usually typographical ones), ellipsis
and hyphens.

I know I'm saying this too often, but you wouldn't have such
problems with LaTeX. :-)



> > I'm not sure in how far conflicting codepages may be involved.
> > It is known that "Windows" does have problems supporting standards,
> > and this applies to character sets and language variations, too.
> > 
> 
> 	your words could be emblazoned in 24k gold on some Monument
> 	of Truth. 

It's my job - I'm working for the Ministry of Truth. :-)



> i've been fighting going for mac to OOo and back...

Keep on fighting - I've got a new idea. It's much more complicated
than using OpenOffice for conversion - but it MIGHT work.

1. Open the DOC file in OpenOffice.

2. Mark all content you want to convert, e. g. Ctrl+A.

3. Get it into edit buffer, Ctrl+C.

4. Open KDE's text editor (or any other text editor you have
   installed), output the edit buffer, Ctrl+V.

5. Save the file you now got in the editor. It should be all in
   ASCII and with correct interpretation of "special characters".

Because I don't have a test setting here, I cannot predict that
it will compensate malformed codings, but if OpenOffice shows a
character as an apostrophe, it should be transferred exactly as
that through the edit buffer.



> 	ps: antiword same as catdoc.  back to my per substitutions.
> 	that works, along with vi's Builtin subs.  

The joy of modern programs: You start to do everything manually
again. :-)




-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100528090057.87144ef4.freebsd>