Date: Tue, 9 Jul 2002 22:16:42 +0200 From: Brad Knowles <brad.knowles@skynet.be> To: Eric Anderson <anderson@centtech.com>, Ross Lippert <ripper@eskimo.com> Cc: joseph@randomnetworks.com, freebsd-doc@freebsd.org, freebsd-chat@freebsd.org Subject: Re: Beta FreeBSD search engine Message-ID: <a05111b3cb950f4f23d0e@[10.0.1.15]> In-Reply-To: <3D2B43EF.955661FC@centtech.com> References: <200207091944.MAA05507@eskimo.com> <3D2B43EF.955661FC@centtech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
At 3:13 PM -0500 2002/07/09, Eric Anderson wrote:
> Ok, all good thoughts.. One question:
>
> How can I determine a language for a page by looking at it?
You need dictionaries of words in various languages, then you do
a sort | uniq of all words in the document and compare it against the
language dictionaries. The language dictionary with the highest
number of hits is most likely to be the one in which the document is
written.
--
Brad Knowles, <brad.knowles@skynet.be>
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?a05111b3cb950f4f23d0e>
