From owner-freebsd-chat Tue Jul 9 16:39:43 2002 Delivered-To: freebsd-chat@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6295237B400; Tue, 9 Jul 2002 16:39:40 -0700 (PDT) Received: from vienna9.his.com (vienna9.his.com [216.200.68.14]) by mx1.FreeBSD.org (Postfix) with ESMTP id 81DC343E09; Tue, 9 Jul 2002 16:39:39 -0700 (PDT) (envelope-from brad.knowles@skynet.be) Received: from [10.0.1.15] (root@[127.0.0.1]) by vienna9.his.com (8.11.6/8.10.1) with ESMTP id g69KGbO08675; Tue, 9 Jul 2002 16:16:37 -0400 (EDT) Mime-Version: 1.0 X-Sender: bs663385@pop.skynet.be Message-Id: In-Reply-To: <3D2B43EF.955661FC@centtech.com> References: <200207091944.MAA05507@eskimo.com> <3D2B43EF.955661FC@centtech.com> Date: Tue, 9 Jul 2002 22:16:42 +0200 To: Eric Anderson , Ross Lippert From: Brad Knowles Subject: Re: Beta FreeBSD search engine Cc: joseph@randomnetworks.com, freebsd-doc@freebsd.org, freebsd-chat@freebsd.org Content-Type: text/plain; charset="us-ascii" ; format="flowed" Sender: owner-freebsd-chat@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org At 3:13 PM -0500 2002/07/09, Eric Anderson wrote: > Ok, all good thoughts.. One question: > > How can I determine a language for a page by looking at it? You need dictionaries of words in various languages, then you do a sort | uniq of all words in the document and compare it against the language dictionaries. The language dictionary with the highest number of hits is most likely to be the one in which the document is written. -- Brad Knowles, "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, Historical Review of Pennsylvania. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message