Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 Sep 2002 14:28:57 -0700
From:      Mike Thompson <mike@atomz.com>
To:        Michael Lucas <mwlucas@blackhelicopters.org>
Cc:        Michael Lucas <mwlucas@FreeBSD.ORG>, freebsd-doc@FreeBSD.ORG, fran@atomz.com
Subject:   Re: Improved searching for FreeBSD.org web site
Message-ID:  <4.3.2.7.2.20020917131047.00ad6ee0@pop.atomz.com>
In-Reply-To: <20020909081703.A41574@blackhelicopters.org>
References:  <4.3.2.7.2.20020906155438.00e4a2d0@pop.atomz.com> <4.3.2.7.2.20020822220010.00ab2540@pop.atomz.com> <4.3.2.7.2.20020822220010.00ab2540@pop.atomz.com> <20020905123434.A15857@blackhelicopters.org> <4.3.2.7.2.20020906155438.00e4a2d0@pop.atomz.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Michael,

I wanted to catch up with you regarding the implementation of Atomz Search 
for the FreeBSD.org web site.  Between my responsibilities at Atomz and my 
family with small children, I'm finally getting some time I can devote to 
getting things up and running.  I apologize that things have been moving a 
little slower than I intended.

At this time I have a few basic questions and some other comments that I'm 
hoping to get your feedback on.

1. Categorized searches across FreeBSD web site

With Atomz Search, it would be possible implement a search form that would 
enable searches across the entire web site or search more narrowly defined 
sections of the web site through a drop-down list.  Example of an Atomz 
customer doing this is at http://www.oreilly.com/.  Is this something that 
would be desirable for the FreeBSD.org web site?  If so, what sections 
would make sense to be able to narrow a search for?  Some sections that 
make sense would be as follows:

         Entire Web Site
         Handbook
         FAQs
         News
         Ports

2. Searching the list archives

The mailing lists will prove to be a bit of a challenge to properly 
index.  The reason is that it will likely take our search engine several 
days to crawl and index the full email archives of nearly a million email 
documents.  It may make more sense to store a local copy of the email 
archives at Atomz and then only crawl the latest archives from the actual 
FreeBSD.org web servers.  Our search engine will properly redirect the 
search result URLs to the proper content on the real FreeBSD web 
site.  This way we can crawl and index the local copy of the email archives 
at fast network speeds and lesson the drain our crawler would place on the 
FreeBSD servers and the getmsg.cgi script.  I'll need to investigate this a 
little further.

Also, it would be very beneficial if the getmsg.cgi script for viewing 
email messages were modified to put the following meta tags into the HTML 
header for each email message:

<meta name="list" content="freebsd-questions">
<meta name="subject" content="4.5 install problem">
<meta name="date" content="Wed, 6 Feb 2002 10:09:51 -0500">
<meta name="from" content="Jeff Aitken <jaitken@aitken.com>">

Atomz Search can then perform more useful searches using this meta 
information. For instance, searches could be limited to specific sections, 
to specific words in the subject, across a specific data range or to 
messages from a specific person.

If I made the changes to the getmsg.cgi script, is this something that I 
would be able to get checked into the CVS tree so this meta information 
appears in future versions of the email archives?

3. HTTP Error 403 - Forbidden on certain FreeBSD.org URLs

Our search engine is having problems crawling URLs such as the following 
description of the ncftpd port -- an HTTP 403 error of forbidden is returned.

http://www.FreeBSD.org/cgi/url.cgi?ports/ftp/ncftpd/pkg-descr

Are there known aspects of the FreeBSD.org web site that prevent robots 
from crawling certain portions of the web site?  I'll investigate this 
myself from this end, but I wanted to bounce this off of you first to see 
if you knew something about this.

That's where I'm at for now.  Hopefully I'll have something to show pretty 
quickly.

Thanks,

Mike Thompson
CTO/Co-Founder Atomz
mike@atomz.com

At 08:17 AM 9/9/02 -0400, Michael Lucas wrote:
>Mike,
>
>This sounds perfect.  If you can do your demo with thirty minutes of
>work, I'm fairly certain that what you can do in three days will blow
>our socks off.  :-)
>
>Thank you again!  Do let doc@FreeBSD.org know when you're ready for us
>to test your hard work.
>
>Regards,
>Michael
>
>On Fri, Sep 06, 2002 at 05:27:27PM -0700, Mike Thompson wrote:
> > Hi Michael,
> >
> > I'm finally getting through enough things this week to put my attention
> > towards a search engine for the FreeBSD.org web site.  As I mentioned 
> in my
> > previous email, it would not be a problems with me if the doc team would
> > like to present multiple search engines to the web site visitors --
> > however, in my biased opinion I'm sure that many will like using the Atomz
> > search :-).  Also, we can certainly host the entire search interface if
> > that would work best for the documentation team as well.
> >
> > I would suggest we proceed in the following manner:
> >
> > 1.  Let me work with my IT team to create an area where we can host the
> > FreeBSD.org search page here at Atomz -- probably a search page similar to
> > the demo I sent in my first email at the following URL:
> >
> >          http://freebsd.atomz.com/
> >
> > 2.  I'll then take a stab at creating a complete search implementation for
> > the entire FreeBSD.org web site.  The demo I sent was just something I
> > threw together in about 30 minutes, but with some time and effort I think
> > an even better job can be done.  For instance, I would want to include the
> > entire mail archives in the search database.
> >
> > 3.  Give the FreeBSD documentation project a chance to look things over 
> and
> > make comments or suggestions on improving search even further.  Also test
> > everything to make sure it works as desired.
> >
> > 4.  Work to link the Atomz Search page for the FreeBSD.org web site as
> > appropriate.
> >
> > Overall, I don't think this whole process should take more than two to
> > three weeks to complete.
> >
> > How does this sound with you?  At this time you will have a Search engine
> > for the FreeBSD.org web site that uses the same technology as used by such
> > companies as Macromedia, CBS News, Sharp Electronics, Palm Computing,
> > Handspring and many, many others.
> >
> > Mike Thompson
> > mike@atomz.com
> >
> >
> > At 12:34 PM 9/5/02 -0400, Michael Lucas wrote:
> > >Hello,
> > >
> > >We've discussed your offer at some length.  First, we're extremely
> > >appreciative of this offer.  Many of us have checked out the
> > >demonstration you prepared, and are quite impressed.  Thank you very
> > >much for presenting this to us!
> > >
> > >The -doc team would like to add the Atomz search engine capability to
> > >our existing search system, and give our users the choice of which
> > >engine to use.  Would this be acceptable to you?  If so, please do let
> > >us know how you'd like to proceed.  By far, the simplest thing for us
> > >would be to provide a link to you and let you provide the complete
> > >search interface, but that might not be what you're offering, so do
> > >let us know.
> > >
> > >In any event, we're glad that you find FreeBSD useful, and we're quite
> > >pleased that you wish to give back to the community.
> > >
> > >Regards,
> > >Michael Lucas
> > >FreeBSD Documentation Project/Donations Liason Officer
> > >
> > >
> > >On Thu, Aug 22, 2002 at 10:19:22PM -0700, Mike Thompson wrote:
> > > > Hello,
> > > >
> > > > I don't know you have heard of us, but Atomz is one of the leading
> > > > providers of hosted web-native search and content management 
> applications
> > > > delivered over the Internet.  Since our founding in 1998, our services
> > > have
> > > > been powered by FreeBSD and we now have over 55,000 web sites using 
> Atomz
> > > > hosted services.  Companies that use Atomz hosted services include
> > > > Macromedia, Palm, Handspring, Maxtor, CBS News, Sony Playstation, 
> Princess
> > > > Cruises, Kohler, U.S. Customs Department, Johnson & Johnson 
> Lifescan, Red
> > > > Herring Magazine, and the list goes on and on.
> > > >
> > > > I wanted to offer the use of the Atomz search service for the 
> FreeBSD.org
> > > > web site for free.  I believe that this could improve the overall 
> quality
> > > > of the FreeBSD.org web site and make it easier for people to find
> > > > information out about our favorite operating system -- FreeBSD.  I put
> > > > together a quick example of Atomz Search for the FreeBSD.org web 
> site at
> > > > the following URL:
> > > >
> > > > http://mike.demo.atomz.com/freebsd/
> > > >
> > > > This example is searching about 2600 documents on the FreeBSD.org web
> > > > site.  With a little more work we could search the entire email
> > > archives in
> > > > the same manner and the foreign language sections of the site.
> > > >
> > > > Because of the value at Atomz we have found in FreeBSD, I would 
> make our
> > > > services available to the FreeBSD.org web site for free.  We 
> normally sell
> > > > our search service for $15,000 to $100,000 depending on the customer
> > > > requirements so I hope that you can see that this is a very powerful
> > > > service.  I would also offer technical assistance from our company to
> > > > assist with making sure FreeBSD.org is getting a top quality
> > > implementation
> > > > of our web service.  I would also be willing to offer our content
> > > > management system, Atomz Publish, as well if you think that would be
> > > useful
> > > > to the FreeBSD organization.
> > > >
> > > > In any case, I hope that this offer seems attractive.  If you have
> > > > questions or would like to discuss this further, please contact me at
> > > > mike@atomz.com or by phone at 650-244-5602.
> > > >
> > > > Thank you for helping to support the creation of a wonderful operating
> > > system.
> > > >
> > > > Mike Thompson
> > > > CTO/Co-Founder Atomz
> > > > mike@atomz.com
> > > >
> > > >
> > > > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > > > with "unsubscribe freebsd-doc" in the body of the message
> > >
> > >--
> > >Michael Lucas           mwlucas@FreeBSD.org, mwlucas@BlackHelicopters.org
> > >http://www.oreillynet.com/pub/q/Big_Scary_Daemons
> > >
> > >            Absolute BSD:   http://www.AbsoluteBSD.com/
>
>--
>Michael Lucas           mwlucas@FreeBSD.org, mwlucas@BlackHelicopters.org
>http://www.oreillynet.com/pub/q/Big_Scary_Daemons
>
>            Absolute BSD:   http://www.AbsoluteBSD.com/


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-doc" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4.3.2.7.2.20020917131047.00ad6ee0>