Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 26 Jun 2008 19:34:55 -0700
From:      Marcus Reid <marcus@blazingdot.com>
To:        Matthew Dillon <dillon@apollo.backplane.com>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Performance of madvise / msync
Message-ID:  <20080627023455.GA34022@blazingdot.com>
In-Reply-To: <200806270048.m5R0mDIU024172@apollo.backplane.com>
References:  <20080626234455.GA77263@blazingdot.com> <200806270048.m5R0mDIU024172@apollo.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Jun 26, 2008 at 05:48:13PM -0700, Matthew Dillon wrote:
> :   65074 python   0.000006 CALL  madvise(0x287c5000,0x70,_MADV_WILLNEED)
> :   65074 python   0.027455 RET   madvise 0
> :   65074 python   0.000058 CALL  madvise(0x287c5000,0x1c20,_MADV_WILLNEED)
> :   65074 python   0.016904 RET   madvise 0
> :   65074 python   0.000179 CALL  madvise(0x287c6000,0x1950,_MADV_WILLNEED)
> :   65074 python   0.008629 RET   madvise 0
> :   65074 python   0.000040 CALL  madvise(0x287c8000,0x8,_MADV_WILLNEED)
> :   65074 python   0.004173 RET   madvise 0
> :...
> :   65074 python   0.006084 CALL  msync(0x287c5000,0x453b7618,MS_ASYNC)
> :   65074 python   0.106284 RET   msync 0
> :...
> :As you can see, it's quite a bit faster.
> :
> :I know that msync is necessary under Linux but obsolete under FreeBSD, but
> :it's still funny that it takes a tenth of a second to return even with
> :MS_ASYNC specified.
> :
> :Also, why is it that the madvise() calls take so much longer when the
> :program does a couple of its own madvise() calls?  Was madvise() never
> :intended to be run so frequently and is therefore a little slower than
> :it could be?
> :
> :Here's the diff between the code for the first kdump above and the
> :second one.
> 
>      Those times are way way too large, even with other running threads
>      in the system.  madvise() should not take that long unless it is
>      being forced to wait on a busied page, and neither should msync().
>      madvise() doesn't even do any I/O (or shouldn't anyhow).
> 
>      Try removing just the msync() but keep the madvise() calls and see
>      if the madvise() calls continue to take horrendous amounts of time.
>      Then try the vise-versa.

Ok, first off, I'm noticing that of the 4 other files that this
is doing the same operations on, sized 569, 940, 116 and 116mB,
all of the msync() and madvise() calls are nice and fast.  It's
only with the 1161524760 byte file that msync is much, much
slower.  It's not linear -- it hits a wall somewhere between
940 and 1161 million bytes.

With madvise() and without msync(), there are high numbers of
faults, which matches the number of disk io operations.  It
goes through cycles, every once in a while stalling while about
60MB of data is dumped to disk at 20MB/s or so (buffers flushing?)
At the beginning of each cycle it's fast, with 140 faults/s or so,
and slows as the number of faults climbs to 180/s or so before
stalling and flusing again.  It never gets _really_ slow though.

   36286 python   0.000016 NAMI  "rg2.rrd"
   36286 python   0.000025 RET   open 7
   36286 python   0.000009 CALL  fstat(0x7,0xbfbfe428)
   36286 python   0.000014 RET   fstat 0
   36286 python   0.000010 CALL  mmap(0,0x453b7618,PROT_READ|PROT_WRITE,MAP_SHARED,0x7,0,0)
   36286 python   0.000020 RET   mmap 679235584/0x287c5000
   36286 python   0.000010 CALL  madvise(0x287c5000,0x453b7618,_MADV_RANDOM)
   36286 python   0.000010 RET   madvise 0
   36286 python   0.000009 CALL  madvise(0x287c5000,0x70,_MADV_WILLNEED)
   36286 python   0.000067 RET   madvise 0
   36286 python   0.000016 CALL  madvise(0x287c5000,0x1c20,_MADV_WILLNEED)
   36286 python   0.000015 RET   madvise 0
   36286 python   0.000019 CALL  madvise(0x287c6000,0x1950,_MADV_WILLNEED)
   36286 python   0.000013 RET   madvise 0
   36286 python   0.000010 CALL  madvise(0x287c8000,0x8,_MADV_WILLNEED)
   36286 python   0.000010 RET   madvise 0
   36286 python   0.000012 CALL  gettimeofday(0xbfbfe554,0)
   36286 python   0.000010 RET   gettimeofday 0
   36286 python   0.000014 CALL  fcntl(0x7,<invalid=12>,0xbfbfe478)
   36286 python   0.000021 RET   fcntl 0
   36286 python   0.040061 CALL  munmap(0x287c5000,0x453b7618)
   36286 python   0.000324 RET   munmap 0
   36286 python   0.000016 CALL  close(0x7)
   36286 python   0.000044 RET   close 0
   36286 python   0.000113 CALL  __sysctl(0xbfbfe388,0x2,0xbfbfe394,0xbfbfe398,0,0)
   36286 python   0.000018 RET   __sysctl 0

With msync() and without madvise(), things are very slow, and
there are no faults, just writes.

   61609 python   0.000016 NAMI  "rg2.rrd"
   61609 python   0.000024 RET   open 7
   61609 python   0.000010 CALL  fstat(0x7,0xbfbfe428)
   61609 python   0.000013 RET   fstat 0
   61609 python   0.000010 CALL  mmap(0,0x453b7618,PROT_READ|PROT_WRITE,MAP_SHARED,0x7,0,0)
   61609 python   0.000021 RET   mmap 679235584/0x287c5000
   61609 python   0.066603 CALL  madvise(0x287c5000,0x1c20,_MADV_WILLNEED)
   61609 python   0.000057 RET   madvise 0
   61609 python   0.000009 CALL  madvise(0x287c6000,0x1950,_MADV_WILLNEED)
   61609 python   0.000010 RET   madvise 0
   61609 python   0.000009 CALL  madvise(0x287c8000,0x8,_MADV_WILLNEED)
   61609 python   0.000009 RET   madvise 0
   61609 python   0.000010 CALL  gettimeofday(0xbfbfe554,0)
   61609 python   0.000018 RET   gettimeofday 0
   61609 python   0.000014 CALL  fcntl(0x7,<invalid=12>,0xbfbfe478)
   61609 python   0.000026 RET   fcntl 0
   61609 python   0.004044 CALL  msync(0x287c5000,0x453b7618,MS_ASYNC)
   61609 python   0.117374 RET   msync 0
   61609 python   0.000294 CALL  munmap(0x287c5000,0x453b7618)
   61609 python   0.000356 RET   munmap 0
   61609 python   0.000016 CALL  close(0x7)
   61609 python   0.000049 RET   close 0
   61609 python   0.000125 CALL  __sysctl(0xbfbfe388,0x2,0xbfbfe394,0xbfbfe398,0,0)
   61609 python   0.000018 RET   __sysctl 0

>      It kinda feels like a prior msync() is initiating physical I/O on
>      pages and a later mmap/madvise or page fault is being forced to
>      wait on the prior pages for the I/O to finish.

>      The size_t argument to msync() (0x453b7618) is highly questionable.
>      It could be ktrace reporting the wrong value, but maybe not.

That's the size of rg2.rrd.  It's 1161524760 bytes long.

>      On any sort of random writing test, particularly if multiple threads
>      are involved, specifying a size that large could result in very large
>      latencies.

Looks like the source of my problem is very slow msync() on the
file when the file is over a certain size.  It's still fastest
without either madvise or msync.

Thanks for your time,

Marcus



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20080627023455.GA34022>