From owner-freebsd-performance@FreeBSD.ORG Wed Dec 27 07:00:11 2006 Return-Path: X-Original-To: freebsd-performance@freebsd.org Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 26FC116A403; Wed, 27 Dec 2006 07:00:11 +0000 (UTC) (envelope-from markir@paradise.net.nz) Received: from smtp5.clear.net.nz (smtp5.clear.net.nz [203.97.33.68]) by mx1.freebsd.org (Postfix) with ESMTP id BF65C13C46F; Wed, 27 Dec 2006 07:00:10 +0000 (UTC) (envelope-from markir@paradise.net.nz) Received: from [192.168.1.11] (121-72-68-249.dsl.telstraclear.net [121.72.68.249]) by smtp5.clear.net.nz (CLEAR Net Mail) with ESMTP id <0JAX00L0V7G8WV00@smtp5.clear.net.nz>; Wed, 27 Dec 2006 20:00:09 +1300 (NZDT) Date: Wed, 27 Dec 2006 20:00:07 +1300 From: Mark Kirkwood In-reply-to: <458B3E0C.6090104@freebsd.org> To: David Xu Message-id: <459219F7.7080104@paradise.net.nz> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1; format=flowed Content-transfer-encoding: 7bit References: <458B3651.8090601@paradise.net.nz> <458B3E0C.6090104@freebsd.org> User-Agent: Thunderbird 1.5.0.9 (X11/20061227) Cc: freebsd-performance@freebsd.org, bde@zeta.org.au Subject: Re: Cached file read performance X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Dec 2006 07:00:11 -0000 David Xu wrote: > Mark Kirkwood wrote: . >> >> (snippage) I used the attached program to read a cached >> 781MB file sequentially and randomly with a specified block size (see >> below). The conclusion I came to was that our (i.e FreeBSD) cached >> read performance (particularly for smaller block sizes) could perhaps >> be improved... > > I suspect in such a test, memory copying speed will be a key factor, > I don't have number to back up my idea, but I think Linux has lots > of tweaks, such as using MMX instruction to copy data. > > Thought it would be good to test this too: I used the small program (see below) to test memcpy'ing 50000 8192 byte chunks on both systems: Gentoo (2.6.18): $ ./memcpytest 8192 50000 50000 malloc in 0.5367s 50000 memcpy of 8192 byte blocks in 1.0887s FreeBSD (6.2-PRE Nov 27): $ ./memcpytest 8192 50000 50000 malloc in 0.1469s 50000 memcpy of 8192 byte blocks in 1.3599s So we are a little slower (factor of 1.24) on the memcpy, this I guess contributes a *little* to us being slower with the cached reads, but is consistent with Bruce's findings of the cache design itself being the major factor! One nice point to notice is that our malloc is substantially faster (~3.6 times for this test). These results were readily repeatable with minimal variation. Cheers Mark P.s: I've included the program in-line below, as the attachment stripper for this list seems to be real aggressive :-) ---------------------------------------------------------------------- /* * memcpytest.c: Attempt to measure performance of memcpy. */ #include #include #include #include #include #include int main(int argc, char **argv) { int blocksz; /* The block size to copy. */ int numblocks; /* How many copies to do. */ char *buf; /* Input buffer. */ typedef struct _BlockArray { char *blockentry; } BlockArray; BlockArray *blockarray; /* Array of block entry ptrs. */ int i; struct timeval starttp, endtp, elapsedtp; double elapsed; if (argc != 3) { printf("usage %s blocksize num_blocks\n", argv[0]); exit(1); } else { blocksz = atoi(argv[1]); numblocks = atoi(argv[2]); } /* Start timing setup. */ gettimeofday(&starttp, NULL); /* Allocate source buffer and initialize to something trivial. */ buf = (char *) malloc(blocksz); if (buf == NULL) { printf("out of memory initializing source buffer!\n"); exit(2); } memset(buf, 1, blocksz); /* Allocate destination array of buffer pointers. */ blockarray = malloc(numblocks * sizeof(blockarray)); if (blockarray == NULL) { printf("out of memory initializing destination array!\n"); exit(2); } /* Allocate a block entry for each element of blockarray. */ for (i = 0; i < numblocks; i++) { blockarray[i].blockentry = malloc(blocksz); if (blockarray[i].blockentry == NULL) { printf("out of memory initializing destination array contents!\n"); exit(2); } } gettimeofday(&endtp, NULL); timersub(&endtp, &starttp, &elapsedtp); elapsed = (double)elapsedtp.tv_sec + (double)elapsedtp.tv_usec/1000000.0; printf("%d malloc in %.4fs\n", numblocks, elapsed); /* Start timing copy now. */ gettimeofday(&starttp, NULL); /* Perform the copy. */ for (i = 0; i < numblocks; i++) { memcpy((void *)blockarray[i].blockentry, (void *)buf, blocksz); } gettimeofday(&endtp, NULL); timersub(&endtp, &starttp, &elapsedtp); elapsed = (double)elapsedtp.tv_sec + (double)elapsedtp.tv_usec/1000000.0; printf("%d memcpy of %d byte blocks in %.4fs\n", numblocks, blocksz, elapsed); free(buf); for (i = 0; i < numblocks; i++) { free(blockarray[i].blockentry); } free(blockarray); exit(0); }