Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 29 Aug 1997 17:43:22 +0200
From:      Stefan Bethke <stefan@promo.de>
To:        John Fieber <jfieber@indiana.edu>
Cc:        "Jordan K. Hubbard" <jkh@time.cdrom.com>, www@FreeBSD.ORG
Subject:   Re: Something I've always wanted to see with the mailing list search
Message-ID:  <l03102802b02c8925d358@[194.45.188.81]>
In-Reply-To:  <Pine.BSF.3.96.970829075012.341E-100000@fallout.campusview.indiana.edu>
References:  <l03102801b02c54a7fe9c@[194.45.188.81]>

next in thread | previous in thread | raw e-mail | index | archive | help
At 15:52 Uhr +0200 29.08.1997, John Fieber wrote:
>For a good discussion of the issues, see:
>
>  David D. Lewis and Kimberly A Knowles (1997) Threading
>  electronic mail: a preliminary study.  Information Processing &
>  Management, 33(2):209-217.
>

Is this a book? If so, do you have an ISBN or something? Or do you know any
online ressource?

>It turns out that breaking messages down in to quoted and
>unquoted chunks, indexing them separately, and using vector space
>similarity measures (what freeWAIS uses) for retrieval is more
>accurate in retrieving what a human would consider to be a
>message thread than following subject lines, in-reply-to or
>references fields. In the absence of those fields, it is really
>the only way to discover a thread.

You might be right. But given the amount of spare time I can put into this
project, I'll stick to In-Reply-To:/References: and Subject:. Hopefully,
the code will be modular enought that this can be changed later.

>As for constructing threads at index time, this may be best for
>efficiency but extra care must be give to how threads are
>represented.  For example, an "in-reply-to" linked tree may
>contain several distinct, but related threads.  It should be
>possible to get at the sub-threads individually, as well as the
>larger thread.  This means that any message may have multiple
>thread membership, either directly or indirectly via some thread
>record with pointers to parent/child threads.  Ultimately, I
>would hope for thread discovery at search time rather than
>indexing time because it offers much more flexibility in tweaking
>various dimensions of the thread concept--broadening or narrowing
>the boundaries, building threads that cross boundaries between
>in-reply-to message trees, etc....

For the first step, it won't be a linked tree but a list: given a message,
the in-reply-to id is used to look up the thread id from the message
referenced, thus assigning all follow-ups the same thread-id. Yes, this
leaves room for improvement :-)

How do you determine the border between two threads, that are linked to the
same anchestors? My general feeling is that I rather look through 100
messages to find the one I want than let an "intelligent" system present
only one to me, and that being not the one I'm looking for.


Cheers,
Stefan



--
Stefan Bethke
Promo Datentechnik      |  Tel. +49-40-851744-0
+ Systemberatung GmbH   |  Fax. +49-40-851744-44
Eduardstrasse 46-48     |  e-mail: stefan@Promo.DE
D-20257 Hamburg         |  http://www.Promo.DE/





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?l03102802b02c8925d358>