Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Sep 1996 00:50:58 -0700
From:      "Michael L. VanLoon -- HeadCandy.com" <michaelv@MindBender.serv.net>
To:        Joe Greco <jgreco@brasil.moneng.mei.com>
Cc:        henrich@crh.cl.msu.edu, freebsd-isp@freebsd.org
Subject:   Re: News server... 
Message-ID:  <199609200751.AAA14502@MindBender.serv.net>
In-Reply-To: Your message of Thu, 19 Sep 96 14:13:52 -0500. <199609191913.OAA11376@brasil.moneng.mei.com> 

next in thread | previous in thread | raw e-mail | index | archive | help

[...]
>> being read and written simultaneously).  I would be "done" by now if I
>> had more time...  Is there anything specific you would like to see in
>> something like this?

>We would all be "done" by now if we had more time.

That was my point. :-)  There are lots of things I intended to have
done by now.  Real time doesn't seem to follow virtual time very well,
however...

>If you really want to be faithful to news, it would have to have
>25000 hierarchically stacked directories...  and it is hard to
>do it the way news does it...  basically the two big "fan outs"
>are at /news and /news/alt, each of which generally hold hundreds
>of subdirectories...

Well, I was thinking a little more like this:

There would be three stages -- populate, run, and cleanup.

Populate would create a specified number of directories in a specified
depth, but randomly.  Although you could set the seed.  It would also
create a specified number of files of random size at the same time
it's doing directories.  It would do these at the same time to perturb
the FFS behavior so that it hopefully wouldn't operate in a completely
"perfect" fashion (i. e. fragment a few cylinder grouped directories
to mirror more "real life" used filesystems).  This would mainly apply
to testing on a clean drive, of course.

Run would spawn a specified number of writer processes and a specified
number of reader processes.  The "root" process would talk to them all
via full-duplex pipes.  It would keep track of the known heirarchy of
directories and files, and would issue the commands to each of the
readers and writers as they came up for another task.  (This would
really be cool with a single process and multiple non-blocking
threads, but....)  The readers/writers would just execute a wait for a
task, execute it, send the result back, and wait for the next task.

File sizes would be random, but from a weighted scale (so you could
specify any mix you like).  I would also like to put the task ratios
on a weighted scale, as well.  For the writer, you would have 1) write
a file, 2) create a new directory, 3) delete a file.  For the readers,
of course, you would just read a specified file from a specified
location, after a specified "random" sleep.

I figured you could tell it to run until: 1) a certain amount of time
had elapsed, 2) a certain number of files/directories had been read,
written, and/or created, and/or 3) the disk becomes less than X% full.

Of course, cleanup would just undo the whole mess.

>It would also be beneficial to weight different directories with different
>numbers of articles to be "held"... hmmm  argh does that ever get complex!

I was hoping the random number generator would just do this for me.

>1) It would have to be repeatable.  Use a seeded random function generator.
>   While I am willing to run a test long enough to judge "average"
>   behavior, there is no reason I can think of not to use a seeded random
>   generator.

Yes, I agree.  My only concern is the multiple asynchronous processes.
But, since the "root" process issues all commands, I figure the order
should stay fairly predictable.

>2) Multiple(!!) readers, single writer.

Actually, I intended that you specify the number of each.

>   Single writer adds file "N+1"
>   to "random" directory, some size between 1K-16K, at the rate of
>   five per second.

I had intended that the writer(s) would be able to write as fast as
they could get CPU and disk.  That would be the closest thing to a
full incoming feed, don't you think?

>   Multiple readers read random articles out of
>   random directories, but may have a tendency to read two or three
>   articles out of the same directory.  This might be something to make
>   a configurable parameter... i.e. 50% chance the next article comes
>   out of same directory.  This is particularly important due to the
>   fact that behavior on a reader machine and on a feeder machine is
>   very different.  Actually it would really be good to provide that
>   same capability for the writer too.  I have seen some behavior that
>   I am trying to explain, related to writers and pre-sorted article
>   lists.

Hmmm...  Interesting.  I hadn't thought of this, but it's worth
thinking about.

>3) Maybe an "expire" process that walks through and removes oldest
>   article(s) in a directory.

I figured that "expire" would be represented by the "delete a file"
task in the writer.  The frequency controlled by the weighted ratio.

>4) The minimum, average, and maximum times to perform a specific type
>   of operation.  I see four operations:
>   a) article write in different dir
>   b) article write in same dir
>   c) article read in different dir
>   d) article read in same dir

I think you want to break it down into tasks for the reader, and tasks
for the writer.  The question is whether to have a separate "expirer",
or to just let that be another task for the writer.  Speaking of
which, it might not be a bad idea to add a "stat" before each delete
operation.

One thing I want to measure accurately, in addition to "speed", is the
latency from the time each command is issued to the time control is
returned to that process to do something else.

>5) Number of readers configurable.  Potentially to a fairly large number
>   (worst case saturation testing).

This thing is anything, if configurable. :-)

>Michael, I do not expect your tool to do all these things, but if it
>did, it would answer a lot of "fine tuning" questions I personally have
>(but have not taken the time to find the answers to because a tool did not
>exist).  I could have written one but I do not place THAT much value on
>squeezing another 10-15% out of the system.

I'm a benchmark freak.  I just love to know exactly how things
interact and behave.  There is really no "logical" reason for me to
write this.  I just want to know... :-)

-----------------------------------------------------------------------------
  Michael L. VanLoon                           michaelv@MindBender.serv.net
        --<  Free your mind and your machine -- NetBSD free un*x  >--
    NetBSD working ports: 386+PC, Mac 68k, Amiga, Atari 68k, HP300, Sun3,
        Sun4/4c/4m, DEC MIPS, DEC Alpha, PC532, VAX, MVME68k, arm32...
    NetBSD ports in progress: PICA, others...
-----------------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199609200751.AAA14502>