From owner-freebsd-arm@FreeBSD.ORG Fri Dec 21 03:02:18 2012 Return-Path: Delivered-To: freebsd-arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D311B79E for ; Fri, 21 Dec 2012 03:02:18 +0000 (UTC) (envelope-from imp@bsdimp.com) Received: from mail-ia0-f169.google.com (mail-ia0-f169.google.com [209.85.210.169]) by mx1.freebsd.org (Postfix) with ESMTP id 8B4768FC17 for ; Fri, 21 Dec 2012 03:02:18 +0000 (UTC) Received: by mail-ia0-f169.google.com with SMTP id r4so3601695iaj.0 for ; Thu, 20 Dec 2012 19:02:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:sender:subject:mime-version:content-type:from :in-reply-to:date:cc:content-transfer-encoding:message-id:references :to:x-mailer:x-gm-message-state; bh=NI9c28MRLU8hvCe1IOddCmkiEQ4uRkCQzn3b9fk4eMg=; b=cgmupi/ECK1+45kHWY4nYeKQhpSppS9nwHXz8c+DxdzOjPaMKcV4nMn3MFbBA4higf u3WW/9rRigaCFK89UyUkPhHdIgyFA8BJWZaxfu76f2fvVoQa3RBCxccG91oRrlXButB2 O8I8aHvpLlYbKnBkAf8Q2PiTveDqVgfc5beYPfEYt44VgAR7Mcaiojr9Z0EPnr6JsUsb oKznoXKv75lmJBAC/TR7WUI8uDPG+2bEBrcGcFpqyv9Z6Fp1z7OcN89E8F1KjM+kJ7Xb KKU1g2hPPAAnBwge4k8lnHt1U9v0uP28a0+Lpzm7O7fMAtVgBrkp+83ZsZOtb91Y5mUP rEBA== X-Received: by 10.50.91.230 with SMTP id ch6mr7534163igb.92.1356058931925; Thu, 20 Dec 2012 19:02:11 -0800 (PST) Received: from 53.imp.bsdimp.com (50-78-194-198-static.hfc.comcastbusiness.net. [50.78.194.198]) by mx.google.com with ESMTPS id ez8sm8483198igb.17.2012.12.20.19.02.10 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 20 Dec 2012 19:02:10 -0800 (PST) Sender: Warner Losh Subject: Re: nand performance Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=us-ascii From: Warner Losh In-Reply-To: <1356051045.1198.329.camel@revolution.hippie.lan> Date: Thu, 20 Dec 2012 20:02:08 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <1355964085.1198.255.camel@revolution.hippie.lan> <20121220200728.GK1563@funkthat.com> <1356051045.1198.329.camel@revolution.hippie.lan> To: Ian Lepore X-Mailer: Apple Mail (2.1085) X-Gm-Message-State: ALoCoQlkouuiOBxX30fGmLIgSP+d43CSA+xkSH1sbWXm+WBdyvITsr79oTce5igrIvvsZLjqhsVk Cc: freebsd-arm@freebsd.org X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Dec 2012 03:02:18 -0000 On Dec 20, 2012, at 5:50 PM, Ian Lepore wrote: > On Thu, 2012-12-20 at 12:07 -0800, John-Mark Gurney wrote: >> Ian Lepore wrote this message on Wed, Dec 19, 2012 at 17:41 -0700: >>> I've been working to get nandfs going on a low-end Atmel arm system. >>> Performance is horrible. Last weekend I got my nand-based DreamPlug >>> unbricked and got nandfs working on it too. Performance is = horrible. >>>=20 >>> By that I'm referring not to the slow nature of the nand chips >>> themselves, but to the fact that accessing them locks out userland >>> processes, sometimes for many seconds at a time. The problem is = real >>> easy to see, just format and populate a nandfs filesystem, then do >>> something like this >>>=20 >>> mount -r -t nandfs /dev/gnand0s.root /mnt >>> nice +20 find /mnt -type f | xargs -J% cat % > /dev/null >>>=20 >>> and then try to type in another terminal -- sometimes what you're = typing >>> doesn't get echoed for 10+ seconds a time. >>>=20 >>> The problem is that the "I/O" on a nand chip is really just the cpu >>> copying from one memory interface to another, a byte at a time, and = it >>> must also use busy-wait loops to wait for chip-ready and status = info. >>> This is being done by high-priority kernel threads, so everything = else >>> is locked out. >>>=20 >>> It seems to me that this is about the same situation as classic ATA = PIO >>> mode, but PIO doesn't make a system that unresponsive. =20 >>>=20 >>> I'm curious what techniques are used to migitate performance = problems >>> for ATA PIO modes, and whether we can do something similar for nand. = I >>> poked around a bit in dev/ata but the PIO code I saw (which surely >>> wasn't the whole picture) just used a bus_space_read_multi(). Can >>> someone clue me in as to how ATA manages to do PIO without usurping = the >>> whole system? >>=20 >> Looks like the problem is all the DELAY calls in = dev/nand/nand_generic.c.. >> DELAY is a busy wait not letting the cpu do anything else... The bad = one >> is probably generic_erase_block as it looks like the default is 3ms, >> plenty of time to let other code run... If it could be interrupt = driven, >> that'd be best... >>=20 >> I can't find the interface that would allow sub-hz sleeping, but = there is >> tsleep that could be used for some of the larger sleeps... But = switching >> to interrupts + wakeup would be best... >>=20 >=20 > Yeah, the DELAY() calls were actually not working for me (I think I'm > the first to test this stuff with an ONFI type chip), and I've = replaced > them all with loops that poll for ready status, which at least = minimizes > the wait time, but it's still a busy-loop. Real-world times for the > chips I'm working with are 30uS to open a page for read, ~270uS to = write > a page, and ~750uS to erase a block. You're the first one to use it with Intel or Micron NAND? I find that = kinda hard to believe given their ubiquity... But those times look about right for 3xnm parts... With newer parts, = according to published specifications, those times get longer. Expect = them to double over the next year (meaning through Intel/Micron's 20nm = parts now rolling out). Other NAND vendors have similar published specs, = or there's much public information about this. > But whether busy-looping for status or busy-looping polling a clock = for > DELAY, or transferring a byte at a time for the actual IO, it's all = the > same... it's cpu and memory bus cycles that are happening in a > high-priority kernel thread. =20 But usually the transfer goes quickly (a few microseconds with dedicated = hardware) compared to the waiting (tens or hundreds of microseconds). = The RM9200 doesn't have a dedicated NAND hardware, so byte-banging the = data to the device is the only choice... It looks like you'll also have to coordinate it with a number of GPIO = pins, which is good... That means you'll be able to have an interrupt = service the state change of the GPIO pins (well, you may need to augment = the current lame on AT91 gpio support that I wrote to allow for this). = But the NAND subsystem looks like it needs some support to do that... > The interface between the low-level controller and the nand layer > doesn't allow for interrupt handling right now. Not all hardware > designs would allow for using interrupts, but mine does, so reworking > things to allow its use would help some. Well, it would help for = writes > and erases. The 180mhz ARM I'm working with doesn't get much done in > 30uS, so reads wouldn't get any better. Reads are all I really care > about, since the product in the field will have a read-only = filesystem, > and firmware updates are infrequent and it's okay if they're a bit = slow. Any idea what the interrupt and scheduling delay runs these days on the = RM9200? It has been forever since I tried to measure it. You may be = able to signal a waiting process rather than using DELAY to busy wait = for things. But that likely means a thread of some sort to defer the = work once the chip returns done. Read might get better, from a system = load point of view, but maybe not from a performance point of view. = While 30us isn't a lot, you may find that your console performance goes = to hell with that long a block... Warner=