From owner-freebsd-drivers@FreeBSD.ORG  Thu Apr 14 18:15:23 2011
Return-Path: <owner-freebsd-drivers@FreeBSD.ORG>
Delivered-To: freebsd-drivers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 7E91F1065676;
	Thu, 14 Apr 2011 18:15:23 +0000 (UTC)
	(envelope-from dieterbsd@engineer.com)
Received: from imr-da03.mx.aol.com (imr-da03.mx.aol.com [205.188.105.145])
	by mx1.freebsd.org (Postfix) with ESMTP id 2A7298FC14;
	Thu, 14 Apr 2011 18:15:22 +0000 (UTC)
Received: from imo-ma04.mx.aol.com (imo-ma04.mx.aol.com [64.12.78.139])
	by imr-da03.mx.aol.com (8.14.1/8.14.1) with ESMTP id p3EIEtY4007964;
	Thu, 14 Apr 2011 14:14:55 -0400
Received: from dieterbsd@engineer.com
	by imo-ma04.mx.aol.com  (mail_out_v42.9.) id n.fca.f31f65d (44669);
	Thu, 14 Apr 2011 14:14:51 -0400 (EDT)
Received: from smtprly-dd01.mx.aol.com (smtprly-dd01.mx.aol.com
	[205.188.84.129]) by cia-mc01.mx.aol.com (v129.9) with ESMTP id
	MAILCIAMC018-d3e64da7399933d; Thu, 14 Apr 2011 14:14:51 -0400
Received: from web-mmc-m04 (web-mmc-m04.sim.aol.com [64.12.224.137]) by
	smtprly-dd01.mx.aol.com (v129.9) with ESMTP id
	MAILSMTPRLYDD012-d3e64da7399933d; Thu, 14 Apr 2011 14:14:49 -0400
To: mav@freebsd.org
Content-Transfer-Encoding: quoted-printable
Date: Thu, 14 Apr 2011 14:14:49 -0400
X-AOL-IP: 67.206.162.44
X-MB-Message-Source: WebUI
Received: from 67.206.162.44 by web-mmc-m04.sysops.aol.com (64.12.224.137)
	with HTTP (WebMailUI); Thu, 14 Apr 2011 14:14:49 -0400
MIME-Version: 1.0
From: dieterbsd@engineer.com
X-MB-Message-Type: User
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Mailer: Mail.com Webmail 33540-STANDARD
Message-Id: <8CDC8E6FA136231-29B0-2128@web-mmc-m04.sysops.aol.com>
X-Spam-Flag: NO
X-AOL-SENDER: dieterbsd@engineer.com
Cc: freebsd-hackers@freebsd.org, freebsd-drivers@freebsd.org
Subject: (no subject)
X-BeenThere: freebsd-drivers@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Writing device drivers for FreeBSD <freebsd-drivers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-drivers>, 
	<mailto:freebsd-drivers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-drivers>
List-Post: <mailto:freebsd-drivers@freebsd.org>
List-Help: <mailto:freebsd-drivers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-drivers>,
	<mailto:freebsd-drivers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 14 Apr 2011 18:15:23 -0000

[ Email attempt #3 and counting... ]

Alexander Motin wrote:
>> Warner Losh wrote:
>>> I don't suppose that your driver could cause the hardware to=20
interrupt after a little time?  That would be more resource friendly...=20
  Otherwise, 1ms is long enough that a msleep or tsleep would likely=20
work quite nicely.
>>
>> It's not his driver, it's mine. Actually, unlike AHCI, this hardware
>> even has interrupt for ready transition (second, biggest of sleeps).=20
But
>> it is not used in present situation.
>>
>>> On Apr 11, 2011, at 1:43 PM, dieterbsd@engineer.com wrote:
>>>>>> FreeBSD 8.2  amd64  uniprocessor
>>>>>>
>>>>>> kernel: siisch1: DISCONNECT requested
>>>>>> kernel: siisch1: SIIS reset...
>>>>>> kernel: siisch1: siis_sata_connect() calling DELAY(1000)
>>>>>> last message repeated 59 times
>>>>>> kernel: siisch1: SATA connect time=3D60ms status=3D00000123
>>>>>> kernel: siisch1: SIIS reset done: devices=3D00000001
>>>>>> kernel: siisch1: DISCONNECT requested
>>>>>> kernel: siisch1: SIIS reset...
>>>>>> kernel: siisch1: siis_sata_connect() calling DELAY(1000)
>>>>>> last message repeated 58 times
>>>>>> kernel: siisch1: SATA connect time=3D59ms status=3D00000123
>>>>>> ...
>>>>>> kernel: siisch0: siis_wait_ready() calling DELAY(1000)
>>>>>> last message repeated 1300 times
>>>>>> kernel: siisch0: port is not ready (timeout 10000ms) status =3D
>>>> 001f2000
>>>>>> Meanwhile, *everything* comes to a screeching halt.  Device
>>>>>> drivers are locked out, and thus incoming data is lost.
>>>>>> Losing incoming data is unacceptable.
>>>>>>
>>>>>> Need an alternative to DELAY() that does not lock out
>>>>>> other device drivers.  There must be a way to reset one
>>>>>> bit of hardware without locking down the entire machine.
>>>> Hans Petter Selasky writes:
>>>>> An alternative to DELAY() is the simplest solution. You probably=20
need
>>>>> to do some redesign in the SCSI layer to find a better solution.
>>>> I keep coming back to the idea that a device driver for one
>>>> controller should not have to lock out *all* the hardware.
>>>> RS-232 locks out Ethernet.  Disk drivers lock out Ethernet.
>>>> And so on.  Why?  Is there some fundamental reason that this
>>>> *has* to be?  I thought the conversion from spl() to mutex()
>>>> was supposed to fix this?
>>>>
>>>> I'm making progress on my project converting printf(9) calls
>>>> to log(9), and fixing some bugs along the way.  Eventually I'll
>>>> have patches to submit.  But this is really a workaround, not
>>>> a fix to the underlying problem.
>>>>
>>>> Redesigning the SCSI layer sounds like a job for someone who took
>>>> a lot more CS classes than I did.  /dev/brain returns ENOCLUE.  :-(
>>
>> CAM is not completely innocent in this situation indeed. CAM defines
>> XPT_RESET_BUS request as synchronous. It is not queued, and called=20
under
>> the SIM mutex lock. I don't think lock can be safely dropped in the
>> middle there.
>>
>> Now I think that I could try to move readiness waiting out of the
>> siis_reset() to do it asynchronously. I'll think about it.
>
> I've fixed this problem for ahci(4) in HEAD, there should be no sleeps
> longer then 100ms now (typical 1-2ms).
>
> With siis(4) the situation is different. There by default should be no
> sleeps longer then 100ms (typical 1-2ms). Longer sleep means that=20
either
> controller is not responding, or it can't establish link to device it
> sees. I've reduced waiting timeout from 10s to 1s. It should improve
> situation a bit, but I would look for the original problem cause. Have
> you done something specific to trigger it? Are your drive/cables OK?

Thank you for your prompt attention to this problem, it is very much
appreciated.  (losing data sucks)

However, 100 ms is still way too long.  (assuming ms =3D milliseconds)
1 millisecond is dangerous, if Ethernet is locked out for approx 4
milliseconds there is guaranteed data loss.  I'd like to see
something more like 100 microseconds worst case (for TCP).  Closed
source closed hardware black box generates data, has a very small
output buffer, cannot be changed.  In some cases it insists on using
UDP rather than TCP so dropping even a single packet screws up the
data.  I have cranked the TCP and UDP receive buffer sizes way up,
I'm reading the ports at rtprio into a large buffer locked into main
memory, etc. etc.  Most of the time it works.

But if a device driver takes too long, incoming Ethernet packets do
not get serviced in time, and I lose data.  A device driver doing
printf(9) to the RS-232 console is too slow.  Changing printf to
log(9) works around this.  If a disk controller, port multiplier,
or disk has a hiccup, I lose data.  Siis(4) is the current problem,
but IIRC I've had problems from ahci(4) and ata(4) in the past.
I'm currently using all three drivers.

Is there any way I can keep the Ethernet from being locked out
by other drivers?