From owner-freebsd-questions  Sun Feb  1 12:56:00 1998
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id MAA05051
          for questions-outgoing; Sun, 1 Feb 1998 12:56:00 -0800 (PST)
          (envelope-from owner-freebsd-questions@FreeBSD.ORG)
Received: from base486.home.org (root@imdave.pr.mcs.net [205.164.3.77])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id MAA05043
          for <questions@FreeBSD.org>; Sun, 1 Feb 1998 12:55:52 -0800 (PST)
          (envelope-from imdave@mcs.net)
Received: (from imdave@localhost)
	by base486.home.org (8.8.8/8.8.8) id OAA02532;
	Sun, 1 Feb 1998 14:52:17 -0600 (CST)
Date: Sun, 1 Feb 1998 14:52:17 -0600 (CST)
From: Dave Bodenstab <imdave@mcs.net>
Message-Id: <199802012052.OAA02532@base486.home.org>
To: grobin@accessv.com
Subject: Re: Setting Max open files for database CGI
Cc: questions@FreeBSD.ORG
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG
X-To-Unsubscribe: mail to majordomo@FreeBSD.org "unsubscribe questions"


Here comes my $.02...

> From: Geoffrey Robinson <grobin@accessv.com>
>
> I'm working on a CGI in C that has to be fast enough to run at *least*
> 40 times a second on a p-200 so it has to be coded for speed above all
> else. The program is basically a database the deals with client data.
> Originally the design was centered around two data file, a fixed width
> record, binary hash file and a fixed width record, unsorted binary file.
> The hash file was to contain client data that would be searched for by
> the query-string. The client records in the hash file would contain
> pointers to one or more secondary records in the unsorted data file.
> This was the fastest system I could think of but then when I brought
> file locking into the equation in created a bottleneck (only one
> instance of the CGI can access the file at a time). I'm aware that I can
> lock specific ranges of bytes in a file but my experience is limited so
> I can't let this get too complicated or it'll never get done.

Read fcntl(2) carefully -- if you've already implemented file locking,
then you're already 75% there.

> I happened to stumble on a train to though that led me to the idea of
> using a separate file for reach record instead of two files containing
> all the records. That would further speed things up (but not much) by
> eliminating the hash search if the query-string is the file name of the
> main client record and the client record contains the file names of the
> secondary records. The files would still have to be locked when in use
> but other instances could operate on other records at the same time.
> There will never be more then a few hundred records/files at a time in
> the entire database.
>
> I think this would run faster than the other system and would be easier
> to write. What I want to know is, do the experienced programmers in this
> group agree with me or am I making a big mistake? Are there any hidden
> performance penalties involved in the model?

The additional disk accesses for the directory search during open
and the initial read to get the data into the buffer cache would
likely make you application much slower.  Your P200 can probably
do thousands or more hash searches in the time it takes to do disk
I/O.  Go back and use file locking with fcntl's F_GETLK/F_SETLK/F_SETLKW
calls.

If your database is fairly small compared to your memory size, you
might even consider loading it entirly into shared memory.  That eliminates
all disk I/O.  You could then use semaphores for your locking. (Admittedly,
figuring out how to use semop(2) can take some time...)  You could then use
a daemon process to monitor activity and dump any changes back to disk.
If your cgi programs must wait until any database changes have been comitted
to disk, then the disk I/O will be the limiting factor in your throughput.

> Anyway, about open files. I wrote a test that attempts to open the same
> file 1000 times. When run under root the program gets to about 800
> before I get a kernel error which is more than I will ever need. But
> when I run the same program under a regular user it gets to 60 before
> the open fails. I spent about an hour in /etc/login.conf trying to get
> it to go higher (at least 100) but I can't. How do I set it higher?
> Also, dose the openfiles=n line in /etc/login.conf refer to the max
> number of files a single instance of a program can have open or is it
> the max number of files all the processes running under a particular
> user can have open? 

Forget about the separate files idea.  ;-)  You lose performance due
to both the additional disk accesses and the resulting misses in the
buffer cache maintained by the kernel for disk blocks.  Even if your
transaction rate causes the disk blocks to be in the buffer cache, the
I/O from the inode updates would be an additional load on the system.
(It may be a load anyway even with two files with each cgi instance
opening and closing the files.)

Dave Bodenstab
imdave@mcs.net