Date: Wed, 13 Aug 2008 18:56:13 +0200 From: cpghost <cpghost@cordula.ws> To: Laszlo Nagy <gandalf@shopzeus.com> Cc: freebsd-questions@freebsd.org Subject: Re: Max. number of opened files, efficiency Message-ID: <20080813165613.GB18638@epia-2.farid-hajji.net> In-Reply-To: <48A2EBD7.9000903@shopzeus.com> References: <48A2EBD7.9000903@shopzeus.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Aug 13, 2008 at 04:12:39PM +0200, Laszlo Nagy wrote: > How many files can I open under FreeBSD, at the same time? % sysctl -a | grep maxfiles kern.maxfiles: 7880 kern.maxfilesperproc: 7092 But remember that you're already using a few hundred file descriptors, so usually, you won't have more than 6800 or so open files for your application... unless you crank up those values (in /etc/sysctl.conf IIRC) Your shell may also limit the number of open files (cf openfiles below): % limits Resource limits (current): cputime infinity secs filesize infinity kB datasize 524288 kB stacksize 65536 kB coredumpsize infinity kB memoryuse infinity kB memorylocked infinity kB maxprocesses 3546 openfiles 7092 sbsize infinity bytes vmemoryuse infinity kB > Problem: I'm making a pivot table, and when I drill down the facts, I > would like to create a new temporary file for each possible dimension > value. In most cases, there will be less than 1000 dimension values. I > tried to open 1000 temporary files and I could do so within one second. > > But how efficient is that? What happens when I open 1000 temporary > files, and write data into them randomly, 10 million times. (avg. 10 000 > write operations per file) Will this be handled efficiently by the OS? > Is efficiency affected by the underlying filesystem? Wouldn't it be more efficient to use a DBM file (anydbm, bsddb), indexed by dimension, for this? You may also want to consider numpy and some modules in scipy for this kind of computations: IIRC they do have some functions to efficiently store and read back binary data to/from files. And numpy (ndarray) does have a nice slice-like syntax too. > I also tried to create 10 000 temporary files, but performance dropped down. > > Example in Python: > > import tempfile > import time > N = 10000 > start = time.time() > files = [ tempfile.TemporaryFile() for i in range(N)] > stop = time.time() > print "created %s files/second" % ( int(N/(stop-start)) ) > > On my computer this program prints "3814 files/second" for N=1000, and > "1561 files/second" for N=10000. > > Thanks, > > Laszlo Regards, -cpghost. -- Cordula's Web. http://www.cordula.ws/
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080813165613.GB18638>