From owner-freebsd-ports-bugs@FreeBSD.ORG Mon Apr 7 10:40:02 2008 Return-Path: Delivered-To: freebsd-ports-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E4115106567D for ; Mon, 7 Apr 2008 10:40:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id C4F848FC26 for ; Mon, 7 Apr 2008 10:40:01 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.2/8.14.2) with ESMTP id m37Ae15W046339 for ; Mon, 7 Apr 2008 10:40:01 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.2/8.14.1/Submit) id m37Ae1gW046338; Mon, 7 Apr 2008 10:40:01 GMT (envelope-from gnats) Resent-Date: Mon, 7 Apr 2008 10:40:01 GMT Resent-Message-Id: <200804071040.m37Ae1gW046338@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-ports-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Alexander Zagrebin Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 118CA1065671 for ; Mon, 7 Apr 2008 10:37:36 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (www.freebsd.org [IPv6:2001:4f8:fff6::21]) by mx1.freebsd.org (Postfix) with ESMTP id 04CB78FC30 for ; Mon, 7 Apr 2008 10:37:36 +0000 (UTC) (envelope-from nobody@FreeBSD.org) Received: from www.freebsd.org (localhost [127.0.0.1]) by www.freebsd.org (8.14.2/8.14.2) with ESMTP id m37AbMFt045679 for ; Mon, 7 Apr 2008 10:37:22 GMT (envelope-from nobody@www.freebsd.org) Received: (from nobody@localhost) by www.freebsd.org (8.14.2/8.14.1/Submit) id m37AbMSX045678; Mon, 7 Apr 2008 10:37:22 GMT (envelope-from nobody) Message-Id: <200804071037.m37AbMSX045678@www.freebsd.org> Date: Mon, 7 Apr 2008 10:37:22 GMT From: Alexander Zagrebin To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-3.1 Cc: Subject: ports/122524: www/links1 uses 7-bit us-ascii codepage only when using "-dump" X-BeenThere: freebsd-ports-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Ports bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Apr 2008 10:40:03 -0000 >Number: 122524 >Category: ports >Synopsis: www/links1 uses 7-bit us-ascii codepage only when using "-dump" >Confidential: no >Severity: non-critical >Priority: medium >Responsible: freebsd-ports-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Mon Apr 07 10:40:01 UTC 2008 >Closed-Date: >Last-Modified: >Originator: Alexander Zagrebin >Release: 7.0-RELEASE >Organization: - >Environment: >Description: When running in the interactive mode, links 0.98 (www/links1) works fine. But when it is used for dumping html page to stdout (links -dump ...), it always assumes us-ascii (7-bit) encoding for output. So there are some problems, if html page uses non us-ascii encoding. For example: 1. Some programs (mail/mutt, misc/mc etc.) can use the "links -dump ..." as html-to-text converter. When html has non us-ascii encoding, we are getting an unreadable output at most cases. 2. FreeBSD documentation project uses the links to convert html documentation to plain text version. So plain text documentation for, for example, ru_RU.KOI8-R is unreadable. >How-To-Repeat: Try to convert html source, containing 8-bit (or utf-8) characters, with "links -dump source.html", and compare result with "links source.html" The output from -dump will contain us-ascii characters only. >Fix: I have added -dump-codepage command line parameter (see the patch). It defines an output codepage, when links is running in the "dump" mode. I use koi8-r encoding, and, with this patch applied, I can use links like "links -dump -dump-codepage koi8-r source.html" Patch attached with submission follows: --- default.c.orig 2008-04-06 14:50:26.000000000 +0400 +++ default.c 2008-04-06 15:02:21.000000000 +0400 @@ -651,6 +651,19 @@ } } +unsigned char *dump_codepage_rd(struct option *o, unsigned char *c) +{ + unsigned char *token; + int i; + + if (!(token = get_token(&c))) return "Missing argument"; + i = get_cp_index(token); + mem_free(token); + if (i == -1) return "Unknown codepage"; + dump_codepage = i; + return NULL; +} + unsigned char *gen_cmd(struct option *o, unsigned char ***argv, int *argc) { unsigned char *r; @@ -783,6 +796,9 @@ Write a plain-text version of the given HTML document to\n\ stdout.\n\ \n\ + -dump-codepage \n\ + Output codepage to be used for -dump\n\ +\n\ -width \n\ Size of screen in characters, used in combination with -dump\n\ \n\ @@ -840,6 +856,7 @@ int base_session = 0; int dmp = 0; int force_html = 0; +int dump_codepage = 0; int async_lookup = 1; int download_utime = 0; @@ -896,6 +913,7 @@ 1, force_html_cmd, NULL, NULL, 0, 0, NULL, NULL, "force-html", 1, dump_cmd, NULL, NULL, D_DUMP, 0, NULL, NULL, "dump", 1, dump_cmd, NULL, NULL, D_SOURCE, 0, NULL, NULL, "source", + 1, gen_cmd, dump_codepage_rd, NULL, 0, 0, NULL, NULL, "dump-codepage", 1, gen_cmd, num_rd, num_wr, 0, 1, &async_lookup, "async_dns", "async-dns", 1, gen_cmd, num_rd, num_wr, 0, 1, &download_utime, "download_utime", "download-utime", 1, gen_cmd, num_rd, num_wr, 1, 16, &max_connections, "max_connections", "max-connections", --- links.h.orig 2002-06-29 21:44:25.000000000 +0400 +++ links.h 2008-04-06 14:30:10.000000000 +0400 @@ -2003,6 +2003,7 @@ extern int no_connect; extern int base_session; extern int force_html; +extern int dump_codepage; #define D_DUMP 1 #define D_SOURCE 2 --- main.c.orig 2002-06-29 21:44:25.000000000 +0400 +++ main.c 2008-04-06 14:48:55.000000000 +0400 @@ -201,7 +201,7 @@ o.xw = screen_width; o.yw = 25; o.col = 0; - o.cp = 0; + o.cp = dump_codepage; ds2do(&dds, &o); o.plain = 0; o.frames = 0;  >Release-Note: >Audit-Trail: >Unformatted: