From owner-freebsd-performance@FreeBSD.ORG Sun Feb 19 04:06:56 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 292CE16A422 for ; Sun, 19 Feb 2006 04:06:56 +0000 (GMT) (envelope-from mark@gaiahost.coop) Received: from biodiesel.gaiahost.coop (biodiesel.gaiahost.coop [64.95.78.120]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5F64D43D46 for ; Sun, 19 Feb 2006 04:06:55 +0000 (GMT) (envelope-from mark@gaiahost.coop) Received: from gaiahost.coop (host-64-65-195-19.spr.choiceone.net [::ffff:64.65.195.19]) (AUTH: LOGIN mark@hubcapconsulting.com) by biodiesel.gaiahost.coop with esmtp; Sat, 18 Feb 2006 23:06:55 -0500 id 0057807F.43F7EEDF.0000617F Received: by gaiahost.coop (sSMTP sendmail emulation); Sat, 18 Feb 2006 23:06:57 -0500 Date: Sat, 18 Feb 2006 23:06:57 -0500 From: Mark Bucciarelli To: freebsd-performance@freebsd.org Message-ID: <20060219040656.GT2756@rabbit> Mail-Followup-To: freebsd-performance@freebsd.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=_biodiesel.gaiahost.coop-24959-1140322015-0001-2" Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Subject: stat speed X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2006 04:06:56 -0000 This is a MIME-formatted message. If you see this text it means that your E-mail software does not support MIME-formatted messages. --=_biodiesel.gaiahost.coop-24959-1140322015-0001-2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline I'm curious how fast stat is. I generated a list of 200,000 file names # find / | head -200000 > files.statspeed then ran a million iterations of randomly picking a file name and stating it (see attached program). The run times were pretty consistent: 187,422 stats/second 189,059 189,567 189,894 Are these numbers a meaningful measure of stat speed on my particular machine? If not, how can I improve the test? m --=_biodiesel.gaiahost.coop-24959-1140322015-0001-2 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="statspeed.c" // statspeed.c // // Mark Bucciarelli // 2006-02-18 // // Read in a long list of file names. // Randomly pick one, stat it, randomly pick another, stat // it, etc. // // Use a big list of files to try and avoid file system caching. // #include #include #include #include // $ find / | head 200000 > files.statspeed #define fname "files.statspeed" #define maxfiles 200000 #define maxchars maxfiles * FILENAME_MAX // 1024 on FreeBSD #define statcalln 1000000 #define timeval2double(tv) (double)tv.tv_sec + (double)tv.tv_usec/1000000.0; char space[ maxchars ]; int main( long arvc, char* argv[] ) { struct timeval t1, t2; struct timezone tz; double t1d, t2d; char *files[maxfiles]; FILE *f; gettimeofday(&t1, &tz); // load list of file names f = fopen( fname, "r" ); if ( !f ) { printf( "Couldn't open file '%s'.\n", fname ); return 1; } char *p = space; long charn = 0; long filen = 0; while ( (charn + FILENAME_MAX ) < maxchars && filen < maxfiles && fgets( p, FILENAME_MAX, f ) ) { if ( p[strlen(p)-1] == '\n' ) p[strlen(p)-1]='\0'; charn += strlen( p ); files[filen++] = p; p += strlen(p); } fclose( f ); // msg if too many file names if ( (charn + FILENAME_MAX) >= maxchars ) { printf( "%s has too many characters (%d > %d)\n", fname, charn, maxchars ); return 1; } if ( filen > maxfiles ) { printf( "%s has too many files (%d > %d)\n", fname, filen, maxfiles ); return 1; } gettimeofday(&t2, &tz); t1d = timeval2double( t1 ); t2d = timeval2double( t2 ); printf( "%5.2f seconds for setup\n", t2d - t1d ); // stat files double r; long filei; long i; struct stat sb; gettimeofday(&t1, &tz); for ( i = 0; i < statcalln; i++ ) { r = ( (double) rand()) / ((double) RAND_MAX ); filei = (long) ( r * filen ); stat( files[filei], &sb ); } gettimeofday(&t2, &tz); t1d = timeval2double( t1 ); t2d = timeval2double( t2 ); // output results printf( "%5.2f seconds for %d stat calls\n", t2d - t1d, statcalln ); if ( t2d - t1d > 0 ) printf( "%d calls/second\n", (long) (statcalln / ( t2d - t1d ) ) ); else printf( "Cain't touch this, da da da dum, da DUM, da DUM cain't touch this.\n" ); return 0; } --=_biodiesel.gaiahost.coop-24959-1140322015-0001-2-- From owner-freebsd-performance@FreeBSD.ORG Sun Feb 19 04:17:51 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6D1FA16A423 for ; Sun, 19 Feb 2006 04:17:51 +0000 (GMT) (envelope-from mark@gaiahost.coop) Received: from biodiesel.gaiahost.coop (biodiesel.gaiahost.coop [64.95.78.120]) by mx1.FreeBSD.org (Postfix) with ESMTP id 05C9543D5F for ; Sun, 19 Feb 2006 04:17:46 +0000 (GMT) (envelope-from mark@gaiahost.coop) Received: from gaiahost.coop (host-64-65-195-19.spr.choiceone.net [::ffff:64.65.195.19]) (AUTH: LOGIN mark@hubcapconsulting.com) by biodiesel.gaiahost.coop with esmtp; Sat, 18 Feb 2006 23:17:46 -0500 id 0057807F.43F7F16A.000066B2 Received: by gaiahost.coop (sSMTP sendmail emulation); Sat, 18 Feb 2006 23:17:52 -0500 Date: Sat, 18 Feb 2006 23:17:52 -0500 From: Mark Bucciarelli To: freebsd-performance@freebsd.org Message-ID: <20060219041752.GU2756@rabbit> Mail-Followup-To: freebsd-performance@freebsd.org References: <20060219040656.GT2756@rabbit> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <20060219040656.GT2756@rabbit> User-Agent: Mutt/1.4.2.1i Subject: Re: stat speed X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2006 04:17:51 -0000 On Sat, Feb 18, 2006 at 11:06:57PM -0500, Mark Bucciarelli wrote: > I'm curious how fast stat is. > > I generated a list of 200,000 file names > > # find / | head -200000 > files.statspeed > > then ran a million iterations of randomly picking a file name and > stating it (see attached program). Hmmm, 200,000 files 1,000,000 iterations. On avg, each file hit five times. Uhh, that's not a good way to avoid caching. Doh. Wow, caching is pretty amazing. I just reran the program, this time using 500,000 file paths and only stat'ing 10,000 of them. The first run was 99,059/second, the second was 188,239. So I guess 100,000/second is about right on my system w/o cache. m From owner-freebsd-performance@FreeBSD.ORG Sun Feb 19 04:35:27 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8F56216A422 for ; Sun, 19 Feb 2006 04:35:27 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id E9A3343D60 for ; Sun, 19 Feb 2006 04:35:24 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [192.168.42.21] (andersonbox1.centtech.com [192.168.42.21]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id k1J4ZN95065054 for ; Sat, 18 Feb 2006 22:35:23 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <43F7F58C.3020908@centtech.com> Date: Sat, 18 Feb 2006 22:35:24 -0600 From: Eric Anderson User-Agent: Thunderbird 1.5 (X11/20060112) MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20060219040656.GT2756@rabbit> <20060219041752.GU2756@rabbit> In-Reply-To: <20060219041752.GU2756@rabbit> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1292/Fri Feb 17 03:39:02 2006 on mh2.centtech.com X-Virus-Status: Clean Subject: Re: stat speed X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2006 04:35:27 -0000 Mark Bucciarelli wrote: > On Sat, Feb 18, 2006 at 11:06:57PM -0500, Mark Bucciarelli wrote: > >> I'm curious how fast stat is. >> >> I generated a list of 200,000 file names >> >> # find / | head -200000 > files.statspeed >> >> then ran a million iterations of randomly picking a file name and >> stating it (see attached program). >> > > Hmmm, 200,000 files 1,000,000 iterations. On avg, each file hit > five times. Uhh, that's not a good way to avoid caching. Doh. > > Wow, caching is pretty amazing. I just reran the program, this time > using 500,000 file paths and only stat'ing 10,000 of them. > > The first run was 99,059/second, the second was 188,239. > > So I guess 100,000/second is about right on my system w/o cache. > I'm also wondering if by using find, and getting a list of files/directories in the default order, you might be seeing some results that aren't really completely random. What I mean is, your find is traversing the tree, probably digging through directories based on inode number or last modified time (can't recall which), but either way, it's possible your list consisted of clumps of files/dirs in the same cylinder groups, specially since you grabbed the first 500k files, instead of picking a random file from the entire list of files on the filesystem, and building a list from that random plucking.. This is all speculative, but if you had lots of files in a directory, those could be clumped in a few cylinder groups and therefore you might see higher numbers than sampling from the entire disk (since the speed is probably mostly dominated by disk seeks I believe). What exactly are you trying to determine? Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------ From owner-freebsd-performance@FreeBSD.ORG Sun Feb 19 04:39:48 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 32A1216A422 for ; Sun, 19 Feb 2006 04:39:48 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from mh1.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id B33F643D45 for ; Sun, 19 Feb 2006 04:39:47 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [192.168.42.21] (andersonbox1.centtech.com [192.168.42.21]) by mh1.centtech.com (8.13.1/8.13.1) with ESMTP id k1J4disD057573 for ; Sat, 18 Feb 2006 22:39:45 -0600 (CST) (envelope-from anderson@centtech.com) Message-ID: <43F7F691.6030303@centtech.com> Date: Sat, 18 Feb 2006 22:39:45 -0600 From: Eric Anderson User-Agent: Thunderbird 1.5 (X11/20060112) MIME-Version: 1.0 To: freebsd-performance@freebsd.org References: <20060219040656.GT2756@rabbit> <20060219041752.GU2756@rabbit> <43F7F58C.3020908@centtech.com> In-Reply-To: <43F7F58C.3020908@centtech.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.87.1/1292/Fri Feb 17 03:39:02 2006 on mh1.centtech.com X-Virus-Status: Clean Subject: Re: stat speed X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 19 Feb 2006 04:39:48 -0000 Eric Anderson wrote: > Mark Bucciarelli wrote: >> On Sat, Feb 18, 2006 at 11:06:57PM -0500, Mark Bucciarelli wrote: >> >>> I'm curious how fast stat is. >>> >>> I generated a list of 200,000 file names >>> >>> # find / | head -200000 > files.statspeed >>> >>> then ran a million iterations of randomly picking a file name and >>> stating it (see attached program). >>> >> >> Hmmm, 200,000 files 1,000,000 iterations. On avg, each file hit >> five times. Uhh, that's not a good way to avoid caching. Doh. >> >> Wow, caching is pretty amazing. I just reran the program, this time >> using 500,000 file paths and only stat'ing 10,000 of them. >> >> The first run was 99,059/second, the second was 188,239. >> >> So I guess 100,000/second is about right on my system w/o cache. >> > > I'm also wondering if by using find, and getting a list of > files/directories in the default order, you might be seeing some > results that aren't really completely random. What I mean is, your > find is traversing the tree, probably digging through directories > based on inode number or last modified time (can't recall which), but > either way, it's possible your list consisted of clumps of files/dirs > in the same cylinder groups, specially since you grabbed the first > 500k files, instead of picking a random file from the entire list of > files on the filesystem, and building a list from that random > plucking.. This is all speculative, but if you had lots of files in a > directory, those could be clumped in a few cylinder groups and > therefore you might see higher numbers than sampling from the entire > disk (since the speed is probably mostly dominated by disk seeks I > believe). > > What exactly are you trying to determine? You are also timing the rand() function. I suggest randomizing the list first, then stating the files in the randomized list. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------