From owner-freebsd-drivers@FreeBSD.ORG Thu Apr 14 18:15:23 2011 Return-Path: Delivered-To: freebsd-drivers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7E91F1065676; Thu, 14 Apr 2011 18:15:23 +0000 (UTC) (envelope-from dieterbsd@engineer.com) Received: from imr-da03.mx.aol.com (imr-da03.mx.aol.com [205.188.105.145]) by mx1.freebsd.org (Postfix) with ESMTP id 2A7298FC14; Thu, 14 Apr 2011 18:15:22 +0000 (UTC) Received: from imo-ma04.mx.aol.com (imo-ma04.mx.aol.com [64.12.78.139]) by imr-da03.mx.aol.com (8.14.1/8.14.1) with ESMTP id p3EIEtY4007964; Thu, 14 Apr 2011 14:14:55 -0400 Received: from dieterbsd@engineer.com by imo-ma04.mx.aol.com (mail_out_v42.9.) id n.fca.f31f65d (44669); Thu, 14 Apr 2011 14:14:51 -0400 (EDT) Received: from smtprly-dd01.mx.aol.com (smtprly-dd01.mx.aol.com [205.188.84.129]) by cia-mc01.mx.aol.com (v129.9) with ESMTP id MAILCIAMC018-d3e64da7399933d; Thu, 14 Apr 2011 14:14:51 -0400 Received: from web-mmc-m04 (web-mmc-m04.sim.aol.com [64.12.224.137]) by smtprly-dd01.mx.aol.com (v129.9) with ESMTP id MAILSMTPRLYDD012-d3e64da7399933d; Thu, 14 Apr 2011 14:14:49 -0400 To: mav@freebsd.org Content-Transfer-Encoding: quoted-printable Date: Thu, 14 Apr 2011 14:14:49 -0400 X-AOL-IP: 67.206.162.44 X-MB-Message-Source: WebUI Received: from 67.206.162.44 by web-mmc-m04.sysops.aol.com (64.12.224.137) with HTTP (WebMailUI); Thu, 14 Apr 2011 14:14:49 -0400 MIME-Version: 1.0 From: dieterbsd@engineer.com X-MB-Message-Type: User Content-Type: text/plain; charset="us-ascii"; format=flowed X-Mailer: Mail.com Webmail 33540-STANDARD Message-Id: <8CDC8E6FA136231-29B0-2128@web-mmc-m04.sysops.aol.com> X-Spam-Flag: NO X-AOL-SENDER: dieterbsd@engineer.com Cc: freebsd-hackers@freebsd.org, freebsd-drivers@freebsd.org Subject: (no subject) X-BeenThere: freebsd-drivers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Writing device drivers for FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2011 18:15:23 -0000 [ Email attempt #3 and counting... ] Alexander Motin wrote: >> Warner Losh wrote: >>> I don't suppose that your driver could cause the hardware to=20 interrupt after a little time? That would be more resource friendly...=20 Otherwise, 1ms is long enough that a msleep or tsleep would likely=20 work quite nicely. >> >> It's not his driver, it's mine. Actually, unlike AHCI, this hardware >> even has interrupt for ready transition (second, biggest of sleeps).=20 But >> it is not used in present situation. >> >>> On Apr 11, 2011, at 1:43 PM, dieterbsd@engineer.com wrote: >>>>>> FreeBSD 8.2 amd64 uniprocessor >>>>>> >>>>>> kernel: siisch1: DISCONNECT requested >>>>>> kernel: siisch1: SIIS reset... >>>>>> kernel: siisch1: siis_sata_connect() calling DELAY(1000) >>>>>> last message repeated 59 times >>>>>> kernel: siisch1: SATA connect time=3D60ms status=3D00000123 >>>>>> kernel: siisch1: SIIS reset done: devices=3D00000001 >>>>>> kernel: siisch1: DISCONNECT requested >>>>>> kernel: siisch1: SIIS reset... >>>>>> kernel: siisch1: siis_sata_connect() calling DELAY(1000) >>>>>> last message repeated 58 times >>>>>> kernel: siisch1: SATA connect time=3D59ms status=3D00000123 >>>>>> ... >>>>>> kernel: siisch0: siis_wait_ready() calling DELAY(1000) >>>>>> last message repeated 1300 times >>>>>> kernel: siisch0: port is not ready (timeout 10000ms) status =3D >>>> 001f2000 >>>>>> Meanwhile, *everything* comes to a screeching halt. Device >>>>>> drivers are locked out, and thus incoming data is lost. >>>>>> Losing incoming data is unacceptable. >>>>>> >>>>>> Need an alternative to DELAY() that does not lock out >>>>>> other device drivers. There must be a way to reset one >>>>>> bit of hardware without locking down the entire machine. >>>> Hans Petter Selasky writes: >>>>> An alternative to DELAY() is the simplest solution. You probably=20 need >>>>> to do some redesign in the SCSI layer to find a better solution. >>>> I keep coming back to the idea that a device driver for one >>>> controller should not have to lock out *all* the hardware. >>>> RS-232 locks out Ethernet. Disk drivers lock out Ethernet. >>>> And so on. Why? Is there some fundamental reason that this >>>> *has* to be? I thought the conversion from spl() to mutex() >>>> was supposed to fix this? >>>> >>>> I'm making progress on my project converting printf(9) calls >>>> to log(9), and fixing some bugs along the way. Eventually I'll >>>> have patches to submit. But this is really a workaround, not >>>> a fix to the underlying problem. >>>> >>>> Redesigning the SCSI layer sounds like a job for someone who took >>>> a lot more CS classes than I did. /dev/brain returns ENOCLUE. :-( >> >> CAM is not completely innocent in this situation indeed. CAM defines >> XPT_RESET_BUS request as synchronous. It is not queued, and called=20 under >> the SIM mutex lock. I don't think lock can be safely dropped in the >> middle there. >> >> Now I think that I could try to move readiness waiting out of the >> siis_reset() to do it asynchronously. I'll think about it. > > I've fixed this problem for ahci(4) in HEAD, there should be no sleeps > longer then 100ms now (typical 1-2ms). > > With siis(4) the situation is different. There by default should be no > sleeps longer then 100ms (typical 1-2ms). Longer sleep means that=20 either > controller is not responding, or it can't establish link to device it > sees. I've reduced waiting timeout from 10s to 1s. It should improve > situation a bit, but I would look for the original problem cause. Have > you done something specific to trigger it? Are your drive/cables OK? Thank you for your prompt attention to this problem, it is very much appreciated. (losing data sucks) However, 100 ms is still way too long. (assuming ms =3D milliseconds) 1 millisecond is dangerous, if Ethernet is locked out for approx 4 milliseconds there is guaranteed data loss. I'd like to see something more like 100 microseconds worst case (for TCP). Closed source closed hardware black box generates data, has a very small output buffer, cannot be changed. In some cases it insists on using UDP rather than TCP so dropping even a single packet screws up the data. I have cranked the TCP and UDP receive buffer sizes way up, I'm reading the ports at rtprio into a large buffer locked into main memory, etc. etc. Most of the time it works. But if a device driver takes too long, incoming Ethernet packets do not get serviced in time, and I lose data. A device driver doing printf(9) to the RS-232 console is too slow. Changing printf to log(9) works around this. If a disk controller, port multiplier, or disk has a hiccup, I lose data. Siis(4) is the current problem, but IIRC I've had problems from ahci(4) and ata(4) in the past. I'm currently using all three drivers. Is there any way I can keep the Ethernet from being locked out by other drivers?