Date: Thu, 20 Dec 2012 20:02:08 -0700 From: Warner Losh <imp@bsdimp.com> To: Ian Lepore <freebsd@damnhippie.dyndns.org> Cc: freebsd-arm@freebsd.org Subject: Re: nand performance Message-ID: <E695F8F2-021D-4CB5-A5A3-848401815E1C@bsdimp.com> In-Reply-To: <1356051045.1198.329.camel@revolution.hippie.lan> References: <1355964085.1198.255.camel@revolution.hippie.lan> <20121220200728.GK1563@funkthat.com> <1356051045.1198.329.camel@revolution.hippie.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
On Dec 20, 2012, at 5:50 PM, Ian Lepore wrote: > On Thu, 2012-12-20 at 12:07 -0800, John-Mark Gurney wrote: >> Ian Lepore wrote this message on Wed, Dec 19, 2012 at 17:41 -0700: >>> I've been working to get nandfs going on a low-end Atmel arm system. >>> Performance is horrible. Last weekend I got my nand-based DreamPlug >>> unbricked and got nandfs working on it too. Performance is = horrible. >>>=20 >>> By that I'm referring not to the slow nature of the nand chips >>> themselves, but to the fact that accessing them locks out userland >>> processes, sometimes for many seconds at a time. The problem is = real >>> easy to see, just format and populate a nandfs filesystem, then do >>> something like this >>>=20 >>> mount -r -t nandfs /dev/gnand0s.root /mnt >>> nice +20 find /mnt -type f | xargs -J% cat % > /dev/null >>>=20 >>> and then try to type in another terminal -- sometimes what you're = typing >>> doesn't get echoed for 10+ seconds a time. >>>=20 >>> The problem is that the "I/O" on a nand chip is really just the cpu >>> copying from one memory interface to another, a byte at a time, and = it >>> must also use busy-wait loops to wait for chip-ready and status = info. >>> This is being done by high-priority kernel threads, so everything = else >>> is locked out. >>>=20 >>> It seems to me that this is about the same situation as classic ATA = PIO >>> mode, but PIO doesn't make a system that unresponsive. =20 >>>=20 >>> I'm curious what techniques are used to migitate performance = problems >>> for ATA PIO modes, and whether we can do something similar for nand. = I >>> poked around a bit in dev/ata but the PIO code I saw (which surely >>> wasn't the whole picture) just used a bus_space_read_multi(). Can >>> someone clue me in as to how ATA manages to do PIO without usurping = the >>> whole system? >>=20 >> Looks like the problem is all the DELAY calls in = dev/nand/nand_generic.c.. >> DELAY is a busy wait not letting the cpu do anything else... The bad = one >> is probably generic_erase_block as it looks like the default is 3ms, >> plenty of time to let other code run... If it could be interrupt = driven, >> that'd be best... >>=20 >> I can't find the interface that would allow sub-hz sleeping, but = there is >> tsleep that could be used for some of the larger sleeps... But = switching >> to interrupts + wakeup would be best... >>=20 >=20 > Yeah, the DELAY() calls were actually not working for me (I think I'm > the first to test this stuff with an ONFI type chip), and I've = replaced > them all with loops that poll for ready status, which at least = minimizes > the wait time, but it's still a busy-loop. Real-world times for the > chips I'm working with are 30uS to open a page for read, ~270uS to = write > a page, and ~750uS to erase a block. You're the first one to use it with Intel or Micron NAND? I find that = kinda hard to believe given their ubiquity... But those times look about right for 3xnm parts... With newer parts, = according to published specifications, those times get longer. Expect = them to double over the next year (meaning through Intel/Micron's 20nm = parts now rolling out). Other NAND vendors have similar published specs, = or there's much public information about this. > But whether busy-looping for status or busy-looping polling a clock = for > DELAY, or transferring a byte at a time for the actual IO, it's all = the > same... it's cpu and memory bus cycles that are happening in a > high-priority kernel thread. =20 But usually the transfer goes quickly (a few microseconds with dedicated = hardware) compared to the waiting (tens or hundreds of microseconds). = The RM9200 doesn't have a dedicated NAND hardware, so byte-banging the = data to the device is the only choice... It looks like you'll also have to coordinate it with a number of GPIO = pins, which is good... That means you'll be able to have an interrupt = service the state change of the GPIO pins (well, you may need to augment = the current lame on AT91 gpio support that I wrote to allow for this). = But the NAND subsystem looks like it needs some support to do that... > The interface between the low-level controller and the nand layer > doesn't allow for interrupt handling right now. Not all hardware > designs would allow for using interrupts, but mine does, so reworking > things to allow its use would help some. Well, it would help for = writes > and erases. The 180mhz ARM I'm working with doesn't get much done in > 30uS, so reads wouldn't get any better. Reads are all I really care > about, since the product in the field will have a read-only = filesystem, > and firmware updates are infrequent and it's okay if they're a bit = slow. Any idea what the interrupt and scheduling delay runs these days on the = RM9200? It has been forever since I tried to measure it. You may be = able to signal a waiting process rather than using DELAY to busy wait = for things. But that likely means a thread of some sort to defer the = work once the chip returns done. Read might get better, from a system = load point of view, but maybe not from a performance point of view. = While 30us isn't a lot, you may find that your console performance goes = to hell with that long a block... Warner=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E695F8F2-021D-4CB5-A5A3-848401815E1C>