Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 Aug 2004 09:49:54 -0700
From:      Brooks Davis <brooks@one-eyed-alien.net>
To:        Kathy Quinlan <kat-free@kaqelectronics.dyndns.org>
Cc:        freebsd-hardware@freebsd.org
Subject:   Re: Big Problem
Message-ID:  <20040804164954.GB10063@Odin.AC.HMC.Edu>
In-Reply-To: <4110C9C7.6080506@kaqelectronics.dyndns.org>
References:  <4110C9C7.6080506@kaqelectronics.dyndns.org>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
On Wed, Aug 04, 2004 at 07:34:31PM +0800, Kathy Quinlan wrote:
> Hi Guys and Gals,
> 
> First off I am not a troll, this is a serious email. I can not go into 
> to many fine points as I am bound by an NDA.
> 
> The problem:
> 
> I need to hold a text file in ram, the text file in the forseable future 
> could be up to 10TB in size.
> 
> My Options:
> 
> Design a computer (probably multiple AMD 64's) to handle 10TB of memory 
> (+ a few extra Gb of ram for system overhead) and hold the file in one 
> physical computer system.
> 
> Build a server farm and have each server hold a portion eg 4GB each 
> Server (250 servers (plus a few extra for system overhead)

That only gets you to 1TB...

> The reason the file needs to be in ram is that I need speed of search 
> for paterns in the data (less than 1 second to pull out relevent chunks)
>
> I am sure I have missed some options, right now I am just kicking ideas 
> around, the software will be based on FreeBSD with some major 
> modifications to address the large amount of ram (probably set it up as 
> a virtual drive with one file)

Depending on your budget, I'd either give Cray or SGI a call, or build
a cluster of AMD64 machines.

You can get 16GB in a 1U chassis so that would reduce your requirements
to around 700 machines, call it 18 racks minus the networking.  You will
not be able to use that as a ram disk and stripe it for a single machine
to search.  First, there's no way you'll be able to maintain any kind
of uptime if you do that.  With 5600 DIMMs, you'll lose at least one a
week, probably more.  Second, assuming you can completely process one
64-bit word per cycle and you had enough bandwidth you would need 625
seconds to process 10TB of data.  What you will need to do is build a
distributed application that a) allows processing to run on each machine
and b) provides a mechanism for fault tolerance in the face of machine
failures.

You would do well to read up on the techniques used by google to manage
unreliable systems and provide high-performance search.

-- Brooks

-- 
Any statement of the form "X is the one, true Y" is FALSE.
PGP fingerprint 655D 519C 26A7 82E7 2529  9BF0 5D8E 8BE9 F238 1AD4

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQFBEROxXY6L6fI4GtQRAsE1AJ9UUeIiEVvvY/2cEd/CWKSrhXh1eQCeL+Qb
keUyKiUQ59j7JdDyv3/punw=
=zqLg
-----END PGP SIGNATURE-----
help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040804164954.GB10063>