From owner-freebsd-hackers@FreeBSD.ORG Wed Apr 4 09:39:12 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7A999106566C for ; Wed, 4 Apr 2012 09:39:12 +0000 (UTC) (envelope-from andrey@zonov.org) Received: from mail-bk0-f54.google.com (mail-bk0-f54.google.com [209.85.214.54]) by mx1.freebsd.org (Postfix) with ESMTP id F134E8FC14 for ; Wed, 4 Apr 2012 09:39:11 +0000 (UTC) Received: by bkcjc3 with SMTP id jc3so90324bkc.13 for ; Wed, 04 Apr 2012 02:39:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:x-gm-message-state; bh=zapY+D4KfQQO6wDhcZ+6NnjhjHevdLqOp+q3/Ktxnvw=; b=mK3SKkArtA9bs2S2XcHW8k8kFx8gd60L0ZPVbuvfscGeYPjbcnJk5F+lH+969DrC8/ 0GQw0gXf4BvXlgG0AUbQOYQHxplVAN+CdfT8DT5DWck8MoKf79Ik0ErHtHX0+ni+944A X70YTNaKDnqpm4NcU3nethwjAobGsx7W6h10QYFM7gkZL8SiKsIb8pjtCKmitDS3LaVS tETzVwj20gmTYpoPUV9Xbz2f6JaX0MSVeicX/JqaRjy/zKLcy3gqLKRAbaJEQBUUIHzC FT4NiIb9Det3ne7ll5FC3QDlhmQndvxvqnMejK9Zd/Tt5nsGB9K3fgi6/UO8Dz1tC7L+ oxgg== Received: by 10.204.154.2 with SMTP id m2mr6844419bkw.110.1333532350975; Wed, 04 Apr 2012 02:39:10 -0700 (PDT) Received: from [10.254.254.77] (ppp95-165-133-149.pppoe.spdop.ru. [95.165.133.149]) by mx.google.com with ESMTPS id f5sm354836bke.9.2012.04.04.02.39.09 (version=SSLv3 cipher=OTHER); Wed, 04 Apr 2012 02:39:10 -0700 (PDT) Message-ID: <4F7C16BD.3010703@zonov.org> Date: Wed, 04 Apr 2012 13:39:09 +0400 From: Andrey Zonov User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.24) Gecko/20100228 Thunderbird/2.0.0.24 Mnenhy/0.7.6.0 MIME-Version: 1.0 To: Konstantin Belousov References: <4F7B495D.3010402@zonov.org> <20120404071746.GJ2358@deviant.kiev.zoral.com.ua> <4F7C1620.6040703@zonov.org> In-Reply-To: <4F7C1620.6040703@zonov.org> Content-Type: multipart/mixed; boundary="------------010408020308040305090701" X-Gm-Message-State: ALoCoQljr0wRIl+eTWz3kkmNrqB9zPdkI57zwP+zqjNsyXykv6yyQku56j1UtSnhaMYHympsUe3N Cc: alc@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: problems with mmap() and disk caching X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 04 Apr 2012 09:39:12 -0000 This is a multi-part message in MIME format. --------------010408020308040305090701 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit I forgot to attach my test program. On 04.04.2012 13:36, Andrey Zonov wrote: > On 04.04.2012 11:17, Konstantin Belousov wrote: >> >> Calling madvise(MADV_RANDOM) fixes the issue, because the code to >> deactivate/cache the pages is turned off. On the other hand, it also >> turns of read-ahead for faulting, and the first loop becomes eternally >> long. > > Now it takes 5 times longer. Anyway, thanks for explanation. > >> >> Doing MADV_WILLNEED does not fix the problem indeed, since willneed >> reactivates the pages of the object at the time of call. To use >> MADV_WILLNEED, you would need to call it between faults/memcpy. >> > > I played with it, but no luck so far. > >>> >>> I've also never seen super pages, how to make them work? >> They just work, at least for me. Look at the output of procstat -v >> after enough loops finished to not cause disk activity. >> > > The problem was in my test program. I fixed it, now I see super pages > but I'm still not satisfied. There are several tests below: > > 1. With madvise(MADV_RANDOM) I see almost all super pages: > $ ./mmap /mnt/random-1024 5 > mmap: 1 pass took: 26.438535 (none: 0; res: 262144; super: 511; other: 0) > mmap: 2 pass took: 0.187311 (none: 0; res: 262144; super: 511; other: 0) > mmap: 3 pass took: 0.184953 (none: 0; res: 262144; super: 511; other: 0) > mmap: 4 pass took: 0.186007 (none: 0; res: 262144; super: 511; other: 0) > mmap: 5 pass took: 0.185790 (none: 0; res: 262144; super: 511; other: 0) > > Should it be 512? > > 2. Without madvise(MADV_RANDOM): > $ ./mmap /mnt/random-1024 50 > mmap: 1 pass took: 7.629745 (none: 262112; res: 32; super: 0; other: 0) > mmap: 2 pass took: 7.301720 (none: 261202; res: 942; super: 0; other: 0) > mmap: 3 pass took: 7.261416 (none: 260226; res: 1918; super: 1; other: 0) > [skip] > mmap: 49 pass took: 0.155368 (none: 0; res: 262144; super: 323; other: 0) > mmap: 50 pass took: 0.155438 (none: 0; res: 262144; super: 323; other: 0) > > Only 323 pages. > > 3. If I just re-run test I don't see super pages with any size of "block". > > $ ./mmap /mnt/random-1024 5 $((1<<30)) > mmap: 1 pass took: 1.013939 (none: 0; res: 262144; super: 0; other: 0) > mmap: 2 pass took: 0.267082 (none: 0; res: 262144; super: 0; other: 0) > mmap: 3 pass took: 0.270711 (none: 0; res: 262144; super: 0; other: 0) > mmap: 4 pass took: 0.268940 (none: 0; res: 262144; super: 0; other: 0) > mmap: 5 pass took: 0.269634 (none: 0; res: 262144; super: 0; other: 0) > > 4. If I activate madvise(MADV_WILLNEDD) in the copy loop and re-run test > then I see super pages only if I use "block" greater than 2Mb. > > $ ./mmap /mnt/random-1024 1 $((1<<21)) > mmap: 1 pass took: 0.299722 (none: 0; res: 262144; super: 0; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<22)) > mmap: 1 pass took: 0.271828 (none: 0; res: 262144; super: 170; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<23)) > mmap: 1 pass took: 0.333188 (none: 0; res: 262144; super: 258; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<24)) > mmap: 1 pass took: 0.339250 (none: 0; res: 262144; super: 303; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<25)) > mmap: 1 pass took: 0.418812 (none: 0; res: 262144; super: 324; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<26)) > mmap: 1 pass took: 0.360892 (none: 0; res: 262144; super: 335; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<27)) > mmap: 1 pass took: 0.401122 (none: 0; res: 262144; super: 342; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<28)) > mmap: 1 pass took: 0.478764 (none: 0; res: 262144; super: 345; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<29)) > mmap: 1 pass took: 0.607266 (none: 0; res: 262144; super: 346; other: 0) > $ ./mmap /mnt/random-1024 1 $((1<<30)) > mmap: 1 pass took: 0.901269 (none: 0; res: 262144; super: 347; other: 0) > > 5. If I activate madvise(MADV_WILLNEED) immediately after mmap() then I > see some number of super pages (the number from test #2). > > $ ./mmap /mnt/random-1024 5 > mmap: 1 pass took: 0.178666 (none: 0; res: 262144; super: 323; other: 0) > mmap: 2 pass took: 0.158889 (none: 0; res: 262144; super: 323; other: 0) > mmap: 3 pass took: 0.157229 (none: 0; res: 262144; super: 323; other: 0) > mmap: 4 pass took: 0.156895 (none: 0; res: 262144; super: 323; other: 0) > mmap: 5 pass took: 0.162938 (none: 0; res: 262144; super: 323; other: 0) > > 6. If I read file manually before test then I don't see super pages with > any size of "block" and madvise(MADV_WILLNEED) doesn't help. > > $ ./mmap /mnt/random-1024 5 $((1<<30)) > mmap: 1 pass took: 0.996767 (none: 0; res: 262144; super: 0; other: 0) > mmap: 2 pass took: 0.311129 (none: 0; res: 262144; super: 0; other: 0) > mmap: 3 pass took: 0.317430 (none: 0; res: 262144; super: 0; other: 0) > mmap: 4 pass took: 0.314437 (none: 0; res: 262144; super: 0; other: 0) > mmap: 5 pass took: 0.310757 (none: 0; res: 262144; super: 0; other: 0) > > -- Andrey Zonov --------------010408020308040305090701 Content-Type: text/plain; charset=windows-1251; name="mmap.c" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="mmap.c" /*_ * Andrey Zonov (c) 2011 */ #include #include #include #include #include #include #include #include #include int main(int argc, char **argv) { int i; int fd; int num; int block; int pagesize; size_t size; size_t none, incore, super, other; char *ptr; char *ptrp; char *tmp; char *vec; char *vecp; struct stat sb; struct timeval tp, tp1, tp2; if (argc < 2 || argc > 4) errx(1, "usage: mmap [num] [block]"); fd = open(argv[1], O_RDONLY); if (fd == -1) err(1, "open()"); num = 1; if (argc >= 3) num = atoi(argv[2]); pagesize = getpagesize(); block = pagesize; if (argc == 4) block = atoi(argv[3]); if (fstat(fd, &sb) == -1) err(1, "fstat()"); size = sb.st_size; #if 0 if (posix_fadvise(fd, (off_t)0, (off_t)0, POSIX_FADV_WILLNEED) == -1) err(1, "posix_fadvise()"); #endif ptr = mmap(NULL, size, PROT_READ, /*MAP_PREFAULT_READ |*/ MAP_PRIVATE, fd, (off_t)0); if (ptr == MAP_FAILED) err(1, "mmap()"); #if 0 if (madvise(ptr, size, MADV_RANDOM) == -1) err(1, "madvise()"); #endif #if 0 /* Turn on super pages */ if (madvise(ptr, size, MADV_WILLNEED) == -1) err(1, "madvise()"); #endif tmp = calloc(1, block); if (tmp == NULL) err(1, "calloc()"); vec = calloc(1, size / pagesize); if (vec == NULL) err(1, "calloc()"); for (i = 0; i < num; i++) { gettimeofday(&tp1, NULL); for (ptrp = ptr; (size_t)(ptrp - ptr) < size; ptrp += block) { #if 0 if (madvise(ptrp, block, MADV_WILLNEED) == -1) err(1, "madvise()"); #endif memcpy(tmp, ptrp, block); } gettimeofday(&tp2, NULL); timersub(&tp2, &tp1, &tp); if (mincore(ptr, size, vec) == -1) err(1, "mincore()"); none = incore = super = other = 0; for (vecp = vec; (size_t)(vecp - vec) < size / pagesize; vecp++) { if (*vecp == 0) none++; else if (*vecp & MINCORE_INCORE) incore++; else other++; if (*vecp & MINCORE_SUPER) super++; } warnx("%2d pass took: %3ld.%06ld (none: %6ld; res: %6ld; super: %6ld; other: %6ld)", i + 1, tp.tv_sec, tp.tv_usec, none, incore, super / (2048/4) /* 2Mb / 4Kb */, other); } free(vec); free(tmp); if (munmap(ptr, size) == -1) err(1, "munmap()"); close(fd); exit(0); } --------------010408020308040305090701--