Date: Tue, 25 Dec 2012 21:51:14 -0500 From: "Dieter BSD" <dieterbsd@engineer.com> To: freebsd-hackers@freebsd.org Subject: Re: FreeBSD for serious performance? Message-ID: <20121226025115.91860@gmx.com>
next in thread | raw e-mail | index | archive | help
> Which device drivers? We can't fix problems we don't know about. ata(4) completely hung the system for 19 minutes (at which point I manually intervened, see the PR), probably an infinite loop. http://www.freebsd.org/cgi/query-pr.cgi?pr=170675 Siis(4) and ahci(4) have also caused data loss, presumably by blocking interrupts for too long. Improving these drivers would be wonderful. But better yet, can we please find a way to fix the underlying problem? When a device driver handles an interrupt, it needs to block further interrupts while it modifies its data structures. Otherwise another interrupt coming in might cause it to mangle the data. Right? But! Why does it need to block interrupts for everything? Why does a disk driver need to block interrupts from Ethernet? Why does Ethernet need to block Firewire? Why does Firewire need to block USB? And so on. Can't the disk driver block just its own interrupts and leave the other devices alone? That way, when some device driver writer puts in DELAY(TOO_LONG), at least the other devices will still work. Alternately, why couldn't the data structures be protected with a mutex? Then the drivers shouldn't have to block even themselves. Alternately, why can't drivers have a polling option? Yes, the extra overhead of polling sucks, but losing incoming data sucks a lot more. I am not suggesting that polling should be the default, just an option for those who need it. Alternately, <some method I haven't thought of> Current machines can have multiple disks, multiple Ethernets, multiple pretty-much-any-device, multiple CPUs, etc. etc. We have SMP kernel to juggle those multiple CPUs. But we still have this absurd bottleneck where the device drivers bring everything to a screaching halt every time an interrupt happens. And if the driver has a bug, or thinks there is a problem and decides to keep DELAY()ing over and over, the entire machine just locks up and stays locked up, often forever. It isn't just me. I have seen quite a few threads where other people are having the same problem. This needs to be fixed. (Fixing this is at *least* a Usenix paper.)
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121226025115.91860>