Date: Mon, 24 Mar 2003 21:28:04 +0200 From: Giorgos Keramidas <keramida@ceid.upatras.gr> To: Jeroen Ruigrok/asmodai <asmodai@wxs.nl> Cc: freebsd-doc@FreeBSD.ORG Subject: Re: docs/50211: [PATCH] Fix textfile creation Message-ID: <20030324192804.GA26996@gothmog.gr> In-Reply-To: <20030324074358.GL87781@nexus.ninth-circle.org> References: <200303231710.h2NHAGEb024196@freefall.freebsd.org> <20030324020745.GA22656@gothmog.gr> <20030324024026.GA23139@gothmog.gr> <20030324074358.GL87781@nexus.ninth-circle.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2003-03-24 08:43, Jeroen Ruigrok/asmodai <asmodai@wxs.nl> wrote: >-On [20030324 04:02], Giorgos Keramidas (keramida@ceid.upatras.gr) wrote: >> I have been using w3m for producing text versions of the few Greek >> documents I managed to write so far. It just works. No special >> tweaking of ~/.w3m needed, no strange conversions donen to 8-bit text. > > Problem with w3m is that it doesn't format quite as well as (e)links > does (or did). > w3m cuts text off at certain points, whereas elinks does it right. > > Example: > > w3m -dump -T text/html -cols 78: > > For questions about TenDRA, read the documentation before contacting < > help@tendra.org>. Ah, yes. This is why I haven't been too persistent about switching to w3m for everything. > I need to add the explicit recognition of the HTML to w3m and lynx since > they apparently look at the extension of the filename, which is not > always .html, but also .html-text. w3m seems to work fairly well with -T text/html, fwiw. > I am not convinved either one is the best solution thus far. I am > going to hack elinks a bit to properly parse the Content-Type. > Funnily though, it ``translated'' the 8-bit Greek from a page into > latin-1 and I recognised Greek words. :) It's a bit early to be certain how well it will work if I use it for a while and read the documentation or source more carefully. What you describe seems to be a result of the notion elinks has for "output terminals". Is there some way of forcing elinks to use a "dumb" terminal which we can set to ISO-8859-7 or whatever else with command line options when -dump is used? It does recognise Greek text but uses 7-bit approximations for the output characters. For instance: Ellhnik'o ke'imeno. which is "Greek text" in 7-bit approximations of ISO-8859-7 Greek. This is not enough though. For European texts we need a browser that can -dump 8-bit text without doing funky things with the characters. I can't speak for Chinese, Japanese, Hangul or any other language that uses wide characters or Unicode, so I'll leave this to more experienced people who actually use those languages and encodings. > Btw, I truly think Unix sucks hard when it comes to l10n and i18n. Been > trying to get my aterm to display Greek for the past hour or so. Ehm, I'm not using aterm but it's small enough. Let me install it for a while... [ installs aterm port ] ...ah there it is. It works fine with Greek here. I'm using a font that I hacked with xmbdfed to add ISO-8859-7 Greek characters, derived from lucida-typewriter-10 and it displays Greek fine, but fails to read *any* Greek at all from the keyboard. Transparencies and all the rest are nice, but terminal emulators really need to grow up and learn to be 8-bit clean in 2003 :-( - Giorgos To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030324192804.GA26996>