From owner-freebsd-arm@FreeBSD.ORG  Fri Dec 21 03:02:18 2012
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: freebsd-arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id D311B79E
 for <freebsd-arm@freebsd.org>; Fri, 21 Dec 2012 03:02:18 +0000 (UTC)
 (envelope-from imp@bsdimp.com)
Received: from mail-ia0-f169.google.com (mail-ia0-f169.google.com
 [209.85.210.169])
 by mx1.freebsd.org (Postfix) with ESMTP id 8B4768FC17
 for <freebsd-arm@freebsd.org>; Fri, 21 Dec 2012 03:02:18 +0000 (UTC)
Received: by mail-ia0-f169.google.com with SMTP id r4so3601695iaj.0
 for <freebsd-arm@freebsd.org>; Thu, 20 Dec 2012 19:02:12 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-received:sender:subject:mime-version:content-type:from
 :in-reply-to:date:cc:content-transfer-encoding:message-id:references
 :to:x-mailer:x-gm-message-state;
 bh=NI9c28MRLU8hvCe1IOddCmkiEQ4uRkCQzn3b9fk4eMg=;
 b=cgmupi/ECK1+45kHWY4nYeKQhpSppS9nwHXz8c+DxdzOjPaMKcV4nMn3MFbBA4higf
 u3WW/9rRigaCFK89UyUkPhHdIgyFA8BJWZaxfu76f2fvVoQa3RBCxccG91oRrlXButB2
 O8I8aHvpLlYbKnBkAf8Q2PiTveDqVgfc5beYPfEYt44VgAR7Mcaiojr9Z0EPnr6JsUsb
 oKznoXKv75lmJBAC/TR7WUI8uDPG+2bEBrcGcFpqyv9Z6Fp1z7OcN89E8F1KjM+kJ7Xb
 KKU1g2hPPAAnBwge4k8lnHt1U9v0uP28a0+Lpzm7O7fMAtVgBrkp+83ZsZOtb91Y5mUP
 rEBA==
X-Received: by 10.50.91.230 with SMTP id ch6mr7534163igb.92.1356058931925;
 Thu, 20 Dec 2012 19:02:11 -0800 (PST)
Received: from 53.imp.bsdimp.com
 (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198])
 by mx.google.com with ESMTPS id ez8sm8483198igb.17.2012.12.20.19.02.10
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 20 Dec 2012 19:02:10 -0800 (PST)
Sender: Warner Losh <wlosh@bsdimp.com>
Subject: Re: nand performance
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=us-ascii
From: Warner Losh <imp@bsdimp.com>
In-Reply-To: <1356051045.1198.329.camel@revolution.hippie.lan>
Date: Thu, 20 Dec 2012 20:02:08 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <E695F8F2-021D-4CB5-A5A3-848401815E1C@bsdimp.com>
References: <1355964085.1198.255.camel@revolution.hippie.lan>
 <20121220200728.GK1563@funkthat.com>
 <1356051045.1198.329.camel@revolution.hippie.lan>
To: Ian Lepore <freebsd@damnhippie.dyndns.org>
X-Mailer: Apple Mail (2.1085)
X-Gm-Message-State: ALoCoQlkouuiOBxX30fGmLIgSP+d43CSA+xkSH1sbWXm+WBdyvITsr79oTce5igrIvvsZLjqhsVk
Cc: freebsd-arm@freebsd.org
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Dec 2012 03:02:18 -0000


On Dec 20, 2012, at 5:50 PM, Ian Lepore wrote:

> On Thu, 2012-12-20 at 12:07 -0800, John-Mark Gurney wrote:
>> Ian Lepore wrote this message on Wed, Dec 19, 2012 at 17:41 -0700:
>>> I've been working to get nandfs going on a low-end Atmel arm system.
>>> Performance is horrible.  Last weekend I got my nand-based DreamPlug
>>> unbricked and got nandfs working on it too.  Performance is =
horrible.
>>>=20
>>> By that I'm referring not to the slow nature of the nand chips
>>> themselves, but to the fact that accessing them locks out userland
>>> processes, sometimes for many seconds at a time.  The problem is =
real
>>> easy to see, just format and populate a nandfs filesystem, then do
>>> something like this
>>>=20
>>>  mount -r -t nandfs /dev/gnand0s.root /mnt
>>>  nice +20 find /mnt -type f | xargs -J% cat % > /dev/null
>>>=20
>>> and then try to type in another terminal -- sometimes what you're =
typing
>>> doesn't get echoed for 10+ seconds a time.
>>>=20
>>> The problem is that the "I/O" on a nand chip is really just the cpu
>>> copying from one memory interface to another, a byte at a time, and =
it
>>> must also use busy-wait loops to wait for chip-ready and status =
info.
>>> This is being done by high-priority kernel threads, so everything =
else
>>> is locked out.
>>>=20
>>> It seems to me that this is about the same situation as classic ATA =
PIO
>>> mode, but PIO doesn't make a system that unresponsive. =20
>>>=20
>>> I'm curious what techniques are used to migitate performance =
problems
>>> for ATA PIO modes, and whether we can do something similar for nand. =
 I
>>> poked around a bit in dev/ata but the PIO code I saw (which surely
>>> wasn't the whole picture) just used a bus_space_read_multi().  Can
>>> someone clue me in as to how ATA manages to do PIO without usurping =
the
>>> whole system?
>>=20
>> Looks like the problem is all the DELAY calls in =
dev/nand/nand_generic.c..
>> DELAY is a busy wait not letting the cpu do anything else...  The bad =
one
>> is probably generic_erase_block as it looks like the default is 3ms,
>> plenty of time to let other code run...  If it could be interrupt =
driven,
>> that'd be best...
>>=20
>> I can't find the interface that would allow sub-hz sleeping, but =
there is
>> tsleep that could be used for some of the larger sleeps...  But =
switching
>> to interrupts + wakeup would be best...
>>=20
>=20
> Yeah, the DELAY() calls were actually not working for me (I think I'm
> the first to test this stuff with an ONFI type chip), and I've =
replaced
> them all with loops that poll for ready status, which at least =
minimizes
> the wait time, but it's still a busy-loop.  Real-world times for the
> chips I'm working with are 30uS to open a page for read, ~270uS to =
write
> a page, and ~750uS to erase a block.

You're the first one to use it with Intel or Micron NAND?  I find that =
kinda hard to believe given their ubiquity...

But those times look about right for 3xnm parts...  With newer parts, =
according to published specifications, those times get longer.  Expect =
them to double over the next year (meaning through Intel/Micron's 20nm =
parts now rolling out). Other NAND vendors have similar published specs, =
or there's much public information about this.

> But whether busy-looping for status or busy-looping polling a clock =
for
> DELAY, or transferring a byte at a time for the actual IO, it's all =
the
> same... it's cpu and memory bus cycles that are happening in a
> high-priority kernel thread. =20

But usually the transfer goes quickly (a few microseconds with dedicated =
hardware) compared to the waiting (tens or hundreds of microseconds).  =
The RM9200 doesn't have a dedicated NAND hardware, so byte-banging the =
data to the device is the only choice...

It looks like you'll also have to coordinate it with a number of GPIO =
pins, which is good...  That means you'll be able to have an interrupt =
service the state change of the GPIO pins (well, you may need to augment =
the current lame on AT91 gpio support that I wrote to allow for this). =
But the NAND subsystem looks like it needs some support to do that...

> The interface between the low-level controller and the nand layer
> doesn't allow for interrupt handling right now.  Not all hardware
> designs would allow for using interrupts, but mine does, so reworking
> things to allow its use would help some.  Well, it would help for =
writes
> and erases.  The 180mhz ARM I'm working with doesn't get much done in
> 30uS, so reads wouldn't get any better.   Reads are all I really care
> about, since the product in the field will have a read-only =
filesystem,
> and firmware updates are infrequent and it's okay if they're a bit =
slow.

Any idea what the interrupt and scheduling delay runs these days on the =
RM9200?  It has been forever since I tried to measure it. You may be =
able to signal a waiting process rather than using DELAY to busy wait =
for things.  But that likely means a thread of some sort to defer the =
work once the chip returns done.  Read might get better, from a system =
load point of view, but maybe not from a performance point of view.  =
While 30us isn't a lot, you may find that your console performance goes =
to hell with that long a block...

Warner=