From owner-freebsd-doc Mon Mar 24 12:15: 9 2003 Delivered-To: freebsd-doc@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C619337B401 for ; Mon, 24 Mar 2003 12:15:05 -0800 (PST) Received: from usenet.otenet.gr (usenet.otenet.gr [195.170.0.4]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1BED643F75 for ; Mon, 24 Mar 2003 12:15:04 -0800 (PST) (envelope-from keramida@ceid.upatras.gr) Received: from mailsrv.otenet.gr (mailsrv.otenet.gr [195.170.0.5]) by usenet.otenet.gr (8.12.8/8.12.8) with ESMTP id h2OJTHGo009354 for ; Mon, 24 Mar 2003 21:29:17 +0200 (EET) Received: from gothmog.gr (patr530-a196.otenet.gr [212.205.215.196]) by mailsrv.otenet.gr (8.12.8/8.12.8) with ESMTP id h2OJS9h1026711; Mon, 24 Mar 2003 21:28:18 +0200 (EET) Received: from gothmog.gr (gothmog [127.0.0.1]) by gothmog.gr (8.12.8/8.12.8) with ESMTP id h2OJSAqu029261; Mon, 24 Mar 2003 21:28:10 +0200 (EET) (envelope-from keramida@ceid.upatras.gr) Received: (from giorgos@localhost) by gothmog.gr (8.12.8/8.12.8/Submit) id h2OJS4hY029260; Mon, 24 Mar 2003 21:28:04 +0200 (EET) (envelope-from keramida@ceid.upatras.gr) Date: Mon, 24 Mar 2003 21:28:04 +0200 From: Giorgos Keramidas To: Jeroen Ruigrok/asmodai Cc: freebsd-doc@FreeBSD.ORG Subject: Re: docs/50211: [PATCH] Fix textfile creation Message-ID: <20030324192804.GA26996@gothmog.gr> References: <200303231710.h2NHAGEb024196@freefall.freebsd.org> <20030324020745.GA22656@gothmog.gr> <20030324024026.GA23139@gothmog.gr> <20030324074358.GL87781@nexus.ninth-circle.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030324074358.GL87781@nexus.ninth-circle.org> X-RAVMilter-Version: 8.4.2(snapshot 20021217) (terpsi) X-Spam-Status: No, hits=-25.4 required=5.0 tests=EMAIL_ATTRIBUTION,IN_REP_TO,QUOTED_EMAIL_TEXT, RCVD_IN_UNCONFIRMED_DSBL,REFERENCES,REPLY_WITH_QUOTES autolearn=ham version=2.50 X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp) Sender: owner-freebsd-doc@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org On 2003-03-24 08:43, Jeroen Ruigrok/asmodai wrote: >-On [20030324 04:02], Giorgos Keramidas (keramida@ceid.upatras.gr) wrote: >> I have been using w3m for producing text versions of the few Greek >> documents I managed to write so far. It just works. No special >> tweaking of ~/.w3m needed, no strange conversions donen to 8-bit text. > > Problem with w3m is that it doesn't format quite as well as (e)links > does (or did). > w3m cuts text off at certain points, whereas elinks does it right. > > Example: > > w3m -dump -T text/html -cols 78: > > For questions about TenDRA, read the documentation before contacting < > help@tendra.org>. Ah, yes. This is why I haven't been too persistent about switching to w3m for everything. > I need to add the explicit recognition of the HTML to w3m and lynx since > they apparently look at the extension of the filename, which is not > always .html, but also .html-text. w3m seems to work fairly well with -T text/html, fwiw. > I am not convinved either one is the best solution thus far. I am > going to hack elinks a bit to properly parse the Content-Type. > Funnily though, it ``translated'' the 8-bit Greek from a page into > latin-1 and I recognised Greek words. :) It's a bit early to be certain how well it will work if I use it for a while and read the documentation or source more carefully. What you describe seems to be a result of the notion elinks has for "output terminals". Is there some way of forcing elinks to use a "dumb" terminal which we can set to ISO-8859-7 or whatever else with command line options when -dump is used? It does recognise Greek text but uses 7-bit approximations for the output characters. For instance: Ellhnik'o ke'imeno. which is "Greek text" in 7-bit approximations of ISO-8859-7 Greek. This is not enough though. For European texts we need a browser that can -dump 8-bit text without doing funky things with the characters. I can't speak for Chinese, Japanese, Hangul or any other language that uses wide characters or Unicode, so I'll leave this to more experienced people who actually use those languages and encodings. > Btw, I truly think Unix sucks hard when it comes to l10n and i18n. Been > trying to get my aterm to display Greek for the past hour or so. Ehm, I'm not using aterm but it's small enough. Let me install it for a while... [ installs aterm port ] ...ah there it is. It works fine with Greek here. I'm using a font that I hacked with xmbdfed to add ISO-8859-7 Greek characters, derived from lucida-typewriter-10 and it displays Greek fine, but fails to read *any* Greek at all from the keyboard. Transparencies and all the rest are nice, but terminal emulators really need to grow up and learn to be 8-bit clean in 2003 :-( - Giorgos To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-doc" in the body of the message