Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Jun 2018 09:57:17 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        bob prohaska <fbsd@www.zefox.net>
Cc:        Mark Millard <marklmi@yahoo.com>, "freebsd-arm@freebsd.org" <freebsd-arm@freebsd.org>,  "Rodney W. Grimes" <freebsd-rwg@pdx.rh.cn85.dnsmgr.net>
Subject:   Re: GPT vs MBR for swap devices
Message-ID:  <CANCZdfoCA=E=Sh2X6H=Fi-TBkhiTdyzXAkjXr4usa8ie6%2Buo4g@mail.gmail.com>
In-Reply-To: <20180615154334.GA39777@www.zefox.net>
References:  <20180614175622.GC35161@www.zefox.net> <201806142110.w5ELAL0N046840@pdx.rh.CN85.dnsmgr.net> <20180615035225.GA37370@www.zefox.net> <CANCZdfoNasSpvEN-y3bzsDfWT=_atfp62AKvdpwK8bUQKi=bgA@mail.gmail.com> <20180615051527.GB37370@www.zefox.net> <834EA7A6-B567-436F-96B2-0C75FACA3FF9@yahoo.com> <20180615154334.GA39777@www.zefox.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, Jun 15, 2018 at 9:43 AM, bob prohaska <fbsd@www.zefox.net> wrote:

> On Thu, Jun 14, 2018 at 11:37:48PM -0700, Mark Millard wrote:
> >
> > When I look at:
> >
> > # vmstat -c -w 5
> > procs  memory       page                    disks     faults         cpu
> > r b w  avm   fre   flt  re  pi  po    fr   sr da0 ad0   in    sy    cs
> us sy id
> > 1 0 0 416M  224M  1647   1   0   0  1856  142   0   0  144  1791  1024
> 4  2 94
> > 0 0 0 416M  224M     9   0   0   0     0    1   0   0    4    85   116
> 0  0 100
> > 0 0 0 416M  224M    12   0   0   0     0    1   0   0    2    93   113
> 0  0 100
> > 0 0 0 416M  224M     9   0   0   0     2    1   1   0    4    64   121
> 0  0 100
> > . . .
> >
> > and "man vmstat" I do not see any column that is the swap space
> > usage (nor any combination of columns to do such a calculation
> > from).
> >
> > I do not expect that vmstat reports what you are likely/primarily
> > looking for.
> >
> > An example is "avm" which for which the man page reports:
> >
> >              . . . Note that the entire
> >              memory object's size is considered mapped even if only a
> subset
> >              of the object's pages are currently mapped.  This statistic
> is
> >              not related to the active page queue which is used to track
> real
> >              memory.
> >
> > The free list size ("fre") is not sufficient either.
> >
>
> That seems astonishing. I imagined that among those columns _had_ to be
> reads from and writes to the swap partitions.
>
> It looks as if
> top -d 1000 | grep Swap
> produces a running list of swap usage, but one must guess how many
> times to iterate:
>
> bob@www:/usr/src % top -d 1000 | grep Swap
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> .......
>
> Replacing the "1000" with "0" or "infinite" triggers
> a syntax error. Is there a special parameter that makes top run till
> it's killed, as in interactive mode? I didn't recognize any hint in the
> man page.
>
> Thanks for reading!
>

Right, this is why I was suggesting gstat. It's a direct measure of the
read/write performance of the device with some latency numbers. It will
give the kind of data I'm looking for. vmstat won't, top won't. I don't
care about used/free swap usage. I care about performance to the swap
partition. That's what I'm suspecting in the USB thumb drive FTL. I don't
care what the total swap usage is. I suspect that's irrelevant to the issue
at hand since the OOM isn't triggering because we're filling swap, but more
that it's due to not being able to get enough pages to the swap device fast
enough to satisfy the memory shortages, triggering OOM.

As for why it would affect the USB drive and not SD cards, I can only say
that USB drives tend to be first to market with bigger capacities. This has
traditionally made them less well tuned for anything other than large, long
sequential reads or writes that aren't mixed. More so than even SD or uSD
cards which tend to do better than USB drives at that workload. It's the
FTL that's the issue, not the NAND itself. The FTL is the software that
translates the log-style device you have to have for flash to work to the
LBA style devices that people attach to systems. If it can't cope with a
mixed workload, or needs to do too much garbage collection or
read/modify/write operations due to it's poor quality / tuning, that will
show up as long delays. USB flash also tends to suck more with BIO_DELETE
than others, though the swapper doesn't do that, so that's one fewer
wildcards we need to look at.

gstat -Bd -I 10 -f <regexp for your swap partition>  > gstat-swap-data.dat

would be how I'd recommend collecting it. This file may get kinda big
depending how long it takes to trigger the weird state. I'm hoping that if
you put this on a known good device, we'll power through the issues. We
might not get perfect correlation with this, but the data should show all
kinds of crazy before the system drives off the cliff if I'm right, so we
don't need perfect data.

There's some higher fidelity numbers we can get from the I/O scheduler with
dynamic scheduling compiled in, but I don't think we'll need those.

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfoCA=E=Sh2X6H=Fi-TBkhiTdyzXAkjXr4usa8ie6%2Buo4g>