Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 20 Dec 2012 20:02:08 -0700
From:      Warner Losh <imp@bsdimp.com>
To:        Ian Lepore <freebsd@damnhippie.dyndns.org>
Cc:        freebsd-arm@freebsd.org
Subject:   Re: nand performance
Message-ID:  <E695F8F2-021D-4CB5-A5A3-848401815E1C@bsdimp.com>
In-Reply-To: <1356051045.1198.329.camel@revolution.hippie.lan>
References:  <1355964085.1198.255.camel@revolution.hippie.lan> <20121220200728.GK1563@funkthat.com> <1356051045.1198.329.camel@revolution.hippie.lan>

next in thread | previous in thread | raw e-mail | index | archive | help

On Dec 20, 2012, at 5:50 PM, Ian Lepore wrote:

> On Thu, 2012-12-20 at 12:07 -0800, John-Mark Gurney wrote:
>> Ian Lepore wrote this message on Wed, Dec 19, 2012 at 17:41 -0700:
>>> I've been working to get nandfs going on a low-end Atmel arm system.
>>> Performance is horrible.  Last weekend I got my nand-based DreamPlug
>>> unbricked and got nandfs working on it too.  Performance is =
horrible.
>>>=20
>>> By that I'm referring not to the slow nature of the nand chips
>>> themselves, but to the fact that accessing them locks out userland
>>> processes, sometimes for many seconds at a time.  The problem is =
real
>>> easy to see, just format and populate a nandfs filesystem, then do
>>> something like this
>>>=20
>>>  mount -r -t nandfs /dev/gnand0s.root /mnt
>>>  nice +20 find /mnt -type f | xargs -J% cat % > /dev/null
>>>=20
>>> and then try to type in another terminal -- sometimes what you're =
typing
>>> doesn't get echoed for 10+ seconds a time.
>>>=20
>>> The problem is that the "I/O" on a nand chip is really just the cpu
>>> copying from one memory interface to another, a byte at a time, and =
it
>>> must also use busy-wait loops to wait for chip-ready and status =
info.
>>> This is being done by high-priority kernel threads, so everything =
else
>>> is locked out.
>>>=20
>>> It seems to me that this is about the same situation as classic ATA =
PIO
>>> mode, but PIO doesn't make a system that unresponsive. =20
>>>=20
>>> I'm curious what techniques are used to migitate performance =
problems
>>> for ATA PIO modes, and whether we can do something similar for nand. =
 I
>>> poked around a bit in dev/ata but the PIO code I saw (which surely
>>> wasn't the whole picture) just used a bus_space_read_multi().  Can
>>> someone clue me in as to how ATA manages to do PIO without usurping =
the
>>> whole system?
>>=20
>> Looks like the problem is all the DELAY calls in =
dev/nand/nand_generic.c..
>> DELAY is a busy wait not letting the cpu do anything else...  The bad =
one
>> is probably generic_erase_block as it looks like the default is 3ms,
>> plenty of time to let other code run...  If it could be interrupt =
driven,
>> that'd be best...
>>=20
>> I can't find the interface that would allow sub-hz sleeping, but =
there is
>> tsleep that could be used for some of the larger sleeps...  But =
switching
>> to interrupts + wakeup would be best...
>>=20
>=20
> Yeah, the DELAY() calls were actually not working for me (I think I'm
> the first to test this stuff with an ONFI type chip), and I've =
replaced
> them all with loops that poll for ready status, which at least =
minimizes
> the wait time, but it's still a busy-loop.  Real-world times for the
> chips I'm working with are 30uS to open a page for read, ~270uS to =
write
> a page, and ~750uS to erase a block.

You're the first one to use it with Intel or Micron NAND?  I find that =
kinda hard to believe given their ubiquity...

But those times look about right for 3xnm parts...  With newer parts, =
according to published specifications, those times get longer.  Expect =
them to double over the next year (meaning through Intel/Micron's 20nm =
parts now rolling out). Other NAND vendors have similar published specs, =
or there's much public information about this.

> But whether busy-looping for status or busy-looping polling a clock =
for
> DELAY, or transferring a byte at a time for the actual IO, it's all =
the
> same... it's cpu and memory bus cycles that are happening in a
> high-priority kernel thread. =20

But usually the transfer goes quickly (a few microseconds with dedicated =
hardware) compared to the waiting (tens or hundreds of microseconds).  =
The RM9200 doesn't have a dedicated NAND hardware, so byte-banging the =
data to the device is the only choice...

It looks like you'll also have to coordinate it with a number of GPIO =
pins, which is good...  That means you'll be able to have an interrupt =
service the state change of the GPIO pins (well, you may need to augment =
the current lame on AT91 gpio support that I wrote to allow for this). =
But the NAND subsystem looks like it needs some support to do that...

> The interface between the low-level controller and the nand layer
> doesn't allow for interrupt handling right now.  Not all hardware
> designs would allow for using interrupts, but mine does, so reworking
> things to allow its use would help some.  Well, it would help for =
writes
> and erases.  The 180mhz ARM I'm working with doesn't get much done in
> 30uS, so reads wouldn't get any better.   Reads are all I really care
> about, since the product in the field will have a read-only =
filesystem,
> and firmware updates are infrequent and it's okay if they're a bit =
slow.

Any idea what the interrupt and scheduling delay runs these days on the =
RM9200?  It has been forever since I tried to measure it. You may be =
able to signal a waiting process rather than using DELAY to busy wait =
for things.  But that likely means a thread of some sort to defer the =
work once the chip returns done.  Read might get better, from a system =
load point of view, but maybe not from a performance point of view.  =
While 30us isn't a lot, you may find that your console performance goes =
to hell with that long a block...

Warner=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E695F8F2-021D-4CB5-A5A3-848401815E1C>