Date: Wed, 18 Sep 2002 19:35:12 +0300 (EEST) From: Alexandr Kovalenko <never@nevermind.kiev.ua> To: FreeBSD-gnats-submit@FreeBSD.org Subject: ports/42931: New port: textproc/enca: detects file encoding Message-ID: <200209181635.g8IGZCdt087457@mile.nevermind.kiev.ua>
next in thread | raw e-mail | index | archive | help
>Number: 42931 >Category: ports >Synopsis: New port: textproc/enca: detects file encoding >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-ports >State: open >Quarter: >Keywords: >Date-Required: >Class: change-request >Submitter-Id: current-users >Arrival-Date: Wed Sep 18 09:40:02 PDT 2002 >Closed-Date: >Last-Modified: >Originator: Alexandr Kovalenko >Release: FreeBSD 4.7-RC i386 >Organization: Net.Style >Environment: System: FreeBSD mile.nevermind.kiev.ua 4.7-RC FreeBSD 4.7-RC #0: Wed Sep 18 12:04:53 EEST 2002 root@mile.nevermind.kiev.ua:/usr/obj/usr/src/sys/mile i386 >Description: WWW: http://www.physics.muni.cz/~yeti/software/enca.shtml Enca is an Extremely Naive Charset Analyser. It detects encoding of text files and is also able to convert them to other encodings. Enca currently can determine 8bit charsets of Belarussian, Czech, Polish, Russian, Slovak and Ukrainian texts and also some multibyte encodings, independently on language (provided it's some European language). The main features include: * recognises following 8bit charsets: o Belarussian: CP1251, IBM866, ISO-8859-5, KOI8-UNI, maccyr, IBM855 o Czech: ISO-8859-2, KEYBCS2, IBM852, macce, KOI-8_CS_2, CP1250 o Polish: ISO-8859-2, IBM852, macce, ISO-8859-13, ISO-8859-16, CP1250, baltic o Russian: KOI8-R, IBM866, CP1251, ISO-8859-5, maccyr o Slovak: CP1250, KEYBCS2, IBM852, macce, KOI-8_CS_2, ISO-8859-2 o Ukrainian: CP1251, IBM855, ISO-8859-5, KOI8-U, maccyr, CP1125 * recognises several multibyte encodings: UCS-2, UCS-4, UTF-8, UTF-7 and TeX accents * recognises all common EOL types, byte orders and also Quoted-printables * can report charset names after various conventions (or programs) as well as human-readable descriptions; accepts all common charset aliases * works with multiple files and can act as an intelligent filter * converts files using a built-in convertor, GNU recode library, UNIX98 iconv functions or some external convertor that can be specified on command line (e.g. cstocs, GNU recode) * has a special ambiguous mode for very short texts * can filter out binary parts of file and/or box drawing characters before guessing so it can determine encoding of pretty messy files * uses various tricks to solve hardly decidable cases like distinguishing between iso8859-2/cp1250, etc. >How-To-Repeat: N/A >Fix: # This is a shell archive. Save it in a file, remove anything before # this line, and then unpack it by entering "sh file". Note, it may # create directories; files and directories will be owned by you and # have default permissions. # # This archive contains: # # enca # enca/files # enca/files/patch-lib::encnames.c # enca/pkg-comment # enca/pkg-descr # enca/pkg-plist # echo c - enca mkdir -p enca > /dev/null 2>&1 echo c - enca/files mkdir -p enca/files > /dev/null 2>&1 echo x - enca/files/patch-lib::encnames.c sed 's/^X//' >enca/files/patch-lib::encnames.c << 'END-of-enca/files/patch-lib::encnames.c' X--- lib/encnames.c.orig Sun Aug 18 13:05:20 2002 X+++ lib/encnames.c Wed Sep 18 17:36:39 2002 X@@ -25,7 +25,7 @@ X X #include "enca.h" X #include "internal.h" X-#include "encodings.h" X+#include "tools/encodings.h" X X #define NCHARSETS (sizeof(CHARSET_INFO)/sizeof(EncaCharsetInfo)) X #define NALIASES (sizeof(ALIAS_LIST)/sizeof(char *)) END-of-enca/files/patch-lib::encnames.c echo x - enca/pkg-comment sed 's/^X//' >enca/pkg-comment << 'END-of-enca/pkg-comment' XDetects encoding of text files END-of-enca/pkg-comment echo x - enca/pkg-descr sed 's/^X//' >enca/pkg-descr << 'END-of-enca/pkg-descr' XEnca currently can determine 8bit charsets of Belarussian, Czech, Polish, XRussian, Slovak and Ukrainian texts and also some multibyte encodings, Xindependently on language (provided it's some European language). X XWWW: http://www.physics.muni.cz/~yeti/software/enca.shtml X X- Alexandr "Nevermind" Kovalenko Xnever@nevermind.kiev.ua END-of-enca/pkg-descr echo x - enca/pkg-plist sed 's/^X//' >enca/pkg-plist << 'END-of-enca/pkg-plist' Xbin/b-cstocs Xbin/b-map Xbin/b-recode Xbin/enca Xbin/enconv Xinclude/enca.h Xlib/libenca.so.1 Xlib/libenca.so Xlib/libenca.la Xlib/libenca.a Xshare/doc/enca/libenca/c1197.html Xshare/doc/enca/libenca/c4.html Xshare/doc/enca/libenca/index.html Xshare/doc/enca/libenca/libenca-analyser.html Xshare/doc/enca/libenca/libenca-auxiliary-functions.html Xshare/doc/enca/libenca/libenca-charsets-and-surfaces.html Xshare/doc/enca/libenca/libenca-internal-functions.html Xshare/doc/enca/libenca/libenca-typedefs-and-constants.html Xshare/doc/enca/libenca/index.sgml X@dirrm share/doc/enca/libenca X@dirrm share/doc/enca END-of-enca/pkg-plist exit >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-ports" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200209181635.g8IGZCdt087457>