From owner-freebsd-multimedia  Fri Dec 12 12:33:10 1997
Return-Path: <owner-freebsd-multimedia>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id MAA22229
          for multimedia-outgoing; Fri, 12 Dec 1997 12:33:10 -0800 (PST)
          (envelope-from owner-freebsd-multimedia)
Received: from cerberus.partsnow.com (gatekeeper.partsnow.com [207.155.26.98])
          by hub.freebsd.org (8.8.7/8.8.7) with ESMTP id MAA22218
          for <multimedia@freebsd.org>; Fri, 12 Dec 1997 12:33:04 -0800 (PST)
          (envelope-from don@partsnow.com)
Received: (from bin@localhost) by cerberus.partsnow.com (8.8.5/8.6.9) id MAA08775; Fri, 12 Dec 1997 12:32:41 -0800 (PST)
X-Authentication-Warning: cerberus.partsnow.com: bin set sender to <don@partsnow.com> using -f
Received: from wildeweb(192.168.100.10) by cerberus.partsnow.com via smap (V2.0)
	id xma008772; Fri, 12 Dec 97 12:32:13 -0800
Message-ID: <34919F14.1295DF6D@partsnow.com>
Date: Fri, 12 Dec 1997 12:31:16 -0800
From: Don Wilde <don@partsnow.com>
Organization: Soligen, Incorporated
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Amancio Hasty <hasty@rah.star-gate.com>, don@partsnow.com
CC: Multi Media <multimedia@freebsd.org>
Subject: Re: remote controls
References: <199712121824.KAA02267@rah.star-gate.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-multimedia@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

-- Amancio Hasty wrote:
> 
> Yeap, I now about that hole in our current multimedia offerings and
> I am still keeping an eye on how to get voice-recognition into
> FreeBSD.
> 
The voice code was actually very simple, and ran on the 8-bit
microcontroller. Basically, you pressed a button to get it to listen, it
sent out a TV MUTE command, then it sampled 5 frequency bands until
sound went away or it ran out of DRAM, then it did a quick FFT and
compared against its known samples. The code was further constrained
such that it would only expect a limited subset of the words at any
given point in the programming sequence, and that it had to be trained
for each of four possible voices.

Our pentiums ought to be able to perform software filtering on a disk
file stream in near-real-time, certainly well enough for discrete speech
command recognition, even without such tricks as MMX.

It depends what you want. If you expect the world, i.e. full connected
speaker-independent recognition of umpteen thousands of words, ainna
gonna happen. Kurzweil and Dragon have been fighting for years with
megadollars of R&D backing to get to the point where such systems are
even minimally usable. Command recognition with an attention-button
trigger is fairly simple. Even command-word trigger, where the input is
always scanning to hear its trigger word, is not exhorbitantly
processor-intensive. Could be done :)

I'll tell you, though, Amancio, we're sure building _expensive_ TV's
here...!


  oooOOO O O O o * * *  *   *   *
 o     ___       _________ _________ ________ _________ _________ ___==_
 V_=_=_DW ===--- Don Wilde [don@PartsNow.com] [http://www.PartsNow.com ]
/oo0000oo-oo--oo-ooo---ooo-ooo---ooo-ooo--ooo-ooo---ooo-ooo---ooo-oo--oo
