From owner-freebsd-smp Thu Dec 5 19:03:42 1996 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id TAA21925 for smp-outgoing; Thu, 5 Dec 1996 19:03:42 -0800 (PST) Received: from spinner.DIALix.COM (root@spinner.DIALix.COM [192.203.228.67]) by freefall.freebsd.org (8.8.4/8.8.4) with ESMTP id TAA21860 for ; Thu, 5 Dec 1996 19:03:31 -0800 (PST) Received: from spinner.DIALix.COM (peter@localhost.DIALix.oz.au [127.0.0.1]) by spinner.DIALix.COM (8.8.4/8.8.4) with ESMTP id LAA12276; Fri, 6 Dec 1996 11:02:55 +0800 (WST) Message-Id: <199612060302.LAA12276@spinner.DIALix.COM> To: Chris Csanady cc: freebsd-smp@freebsd.org Subject: Re: make locking more generic? In-reply-to: Your message of "Thu, 05 Dec 1996 20:30:20 CST." <199612060230.UAA00386@friley216.res.iastate.edu> Date: Fri, 06 Dec 1996 11:02:54 +0800 From: Peter Wemm Sender: owner-smp@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Chris Csanady wrote: > >Kevin Van Maren wrote: > >> Yes, the reason you need finer grained locking is because the > >> interrupts *should* go to the other processor. If one > >> processor is handling an interrupt and annother int comes > >> in, the other CPU should be able to handle it. This > >> would finally give parallel I/O! Linux doesn't do this, > >> and they do very poorly when not every process is CPU bound. > >> > >> Kevin > >> > >> ps: This will most likely mean fixing device drivers as well. > > > >Yes, it will most likely one of two options for each driver.. We will > >have to modify it to do fine grain locking (this is a major problem for > >the network cards due to the mbuf design), or have some way of running > > s/design/stupidity/ > > Chris Well, there are several things to consider before writing it off as "stupid", it's bloody quick compared to what something like STREAMS would be like, which is much more MP-safe. With the present "design", an mbuf is allocated in the network soft interrupt layer, and that mbuf is passed all the way up to the socket buffers, right through the ip and tcp layers without queueing it anywhere. The netisr handler literally "runs" the upwards direction tcp/ip "engine". It's pretty much the same in the other direction, but from memory one direction has one queueing stage, the other doesn't. STREAMS, on the other hand, has "modules" with an incoming and outgoing queue. The network driver allocates an mblk/dblk and queues it in it's upstream neighbor's queue. The upstream module's service routine is run which dequeues it, processes it, and enqueues it on the upstream side... And so on right up to the stream head. On the plus side, it's got a lot of potential for putting hooks in for fine-grain locking on each queue and data block, but the overheads are incredible. I'm not familiar with Linux's design, I think that it's a simplified (read: the crap has been stripped out and optimised) version of the mbuf cluster model, where large buffers are used and passed around. I do not know if they queue on entry to each logical "component" of the protocol stack, but I suspect not, since they are after speed. Calling it a simplified version of mbuf clusters is probably not going to be popular, but that's what a casual glance suggested to me. We have two major types of mbufs.. "standard" small ones 128 bytes long, with about 106 bytes of data space or so, and "clusters" where the mbuf points to a 2K or 4K page of data. I think the Linux model is like the latter where there is either a seperate header and data chunk, or the header is at the start of the data "page". If this is the case, their system probably won't lend itself to multi-threading any better than ours. I suspect that we're going to be stuck with a giant "networking lock" to cover everything from the soft network interrupts through the mbuf code, through to the protocol engine and high level socket buffers. There may be room to have a decoupling layer in between the network cards and the network "protocol engine" as such, and the same at the top end. This may allow us to get away with running the soft net processing for the cards in parallel with the network "engine" as such. This will require a locked queueing stage to get data from one to the other. Cheers, -Peter