Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 25 Dec 2012 21:51:14 -0500
From:      "Dieter BSD" <dieterbsd@engineer.com>
To:        freebsd-hackers@freebsd.org
Subject:   Re: FreeBSD for serious performance?
Message-ID:  <20121226025115.91860@gmx.com>

next in thread | raw e-mail | index | archive | help
> Which device drivers?  We can't fix problems we don't know about.

ata(4) completely hung the system for 19 minutes (at which point
I manually intervened, see the PR), probably an infinite loop.

http://www.freebsd.org/cgi/query-pr.cgi?pr=170675

Siis(4) and ahci(4) have also caused data loss, presumably by
blocking interrupts for too long.

Improving these drivers would be wonderful. But better yet,
can we please find a way to fix the underlying problem?

When a device driver handles an interrupt, it needs to block
further interrupts while it modifies its data structures. Otherwise
another interrupt coming in might cause it to mangle the data.
Right? But! Why does it need to block interrupts for everything?
Why does a disk driver need to block interrupts from Ethernet?
Why does Ethernet need to block Firewire? Why does Firewire
need to block USB? And so on. Can't the disk driver block just
its own interrupts and leave the other devices alone?

That way, when some device driver writer puts in DELAY(TOO_LONG),
at least the other devices will still work.

Alternately, why couldn't the data structures be protected with
a mutex? Then the drivers shouldn't have to block even themselves.

Alternately, why can't drivers have a polling option?
Yes, the extra overhead of polling sucks, but losing incoming
data sucks a lot more. I am not suggesting that polling should
be the default, just an option for those who need it.

Alternately, <some method I haven't thought of>

Current machines can have multiple disks, multiple Ethernets,
multiple pretty-much-any-device, multiple CPUs, etc. etc.
We have SMP kernel to juggle those multiple CPUs. But we still
have this absurd bottleneck where the device drivers bring
everything to a screaching halt every time an interrupt happens.
And if the driver has a bug, or thinks there is a problem and
decides to keep DELAY()ing over and over, the entire machine
just locks up and stays locked up, often forever.

It isn't just me. I have seen quite a few threads where other
people are having the same problem.

This needs to be fixed.

(Fixing this is at *least* a Usenix paper.)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121226025115.91860>