From owner-freebsd-arch@FreeBSD.ORG Sun Jul 30 14:59:19 2006 Return-Path: X-Original-To: freebsd-arch@freebsd.org Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 39CA816A4DD; Sun, 30 Jul 2006 14:59:19 +0000 (UTC) (envelope-from max@love2party.net) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.186]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7A51143D46; Sun, 30 Jul 2006 14:59:18 +0000 (GMT) (envelope-from max@love2party.net) Received: from [88.64.179.108] (helo=amd64.laiers.local) by mrelayeu.kundenserver.de (node=mrelayeu0) with ESMTP (Nemesis), id 0MKwh2-1G7ClB1MeV-0001wZ; Sun, 30 Jul 2006 16:59:17 +0200 From: Max Laier Organization: FreeBSD To: freebsd-arch@freebsd.org Date: Sun, 30 Jul 2006 16:59:10 +0200 User-Agent: KMail/1.9.3 References: <20060730141642.D16341@fledge.watson.org> In-Reply-To: <20060730141642.D16341@fledge.watson.org> X-Face: ,,8R(x[kmU]tKN@>gtH1yQE4aslGdu+2]; R]*pL,U>^H?)gW@49@wdJ`H<=?utf-8?q?=25=7D*=5FBD=0A=09U=5For=3D=5CmOZf764=26nYj=3DJYbR1PW0ud?=>|!~,,CPC.1-D$FG@0h3#'5"k{V]a~.<=?utf-8?q?mZ=7D44=23Se=7Em=0A=09Fe=7E=5C=5DX5B=5D=5Fxj?=(ykz9QKMw_l0C2AQ]}Ym8)fU MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart2120229.eJNeJPqOEV"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit Message-Id: <200607301659.16323.max@love2party.net> X-Provags-ID: kundenserver.de abuse@kundenserver.de login:61c499deaeeba3ba5be80f48ecc83056 Cc: Robert Watson , freeebsd-net@freebsd.org Subject: Re: Changes in the network interface queueing handoff model X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Jul 2006 14:59:19 -0000 --nextPart2120229.eJNeJPqOEV Content-Type: text/plain; charset="iso-8859-6" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Sunday 30 July 2006 16:04, Robert Watson wrote: > One of the ideas that I, Scott Long, and a few others have been bouncing > around for some time is a restructuring of the network interface packet > transmission API to reduce the number of locking operations and allow > network device drivers increased control of the queueing behavior. Right > now, it works something like that following: > > - When a network protocol wants to transmit, it calls the ifnet's link > layer output routine via ifp->if_output() with the ifnet pointer, packet, > destination address information, and route information. > > - The link layer (e.g., ether_output() + ether_output_frame()) encapsulat= es > the packet as necessary, performs a link layer address translation (su= ch > as ARP), and hands off to the ifnet driver via a call to IFQ_HANDOFF(), > which accepts the ifnet pointer and packet. > > - The ifnet layer enqueues the packet in the ifnet send queue > (ifp->if_snd), and then looks at the driver's IFF_DRV_OACTIVE flag to > determine if it needs to "start" output by the driver. If the driver is > already active, it doesn't, and otherwise, it does. > > - The driver dequeues the packet from ifp->if_snd, performs any driver > encapsulation and wrapping, and notifies the hardware. In modern > hardware, this consists of hooking the data of the packet up to the > descriptor ring and notifying the hardware to pick it up via DMA. In ord= er > hardware, the driver would perform a series of I/O operations to send the > entire packet directly to the card via a system bus. > > Why change this? A few reasons: > > - The ifnet layer send queue is becoming decreasingly useful over time.=20 > Most modern hardware has a significant number of slots in its transmit > descriptor ring, tuned for the performance of the hardware, etc, which is > the effective transmit queue in practice. The additional queue depth > doesn't increase throughput substantially (if at all) but does consume > memory. > > - On extremely fast hardware (with respect to CPU speed), the queue remai= ns > essentially empty, so we pay the cost of enqueueing and dequeuing a > packet from an empty queue. > > - The ifnet send queue is a separately locked object from the device > driver, meaning that for a single enqueue/dequeue pair, we pay an extra > four lock operations (two for insert, two for remove) per packet. > > - For synthetic link layer drivers, such as if_vlan, which have no need f= or > queueing at all, the cost of queueing is eliminated. > > - IFF_DRV_OACTIVE is no longer inspected by the link layer, only by the > driver, which helps eliminate a latent race condition involving use of > the flag. > > The proposed change is simple: right now one or more enqueue operations > occurs, when a call to ifp->if_start() is made to notify the driver that = it > may need to do something (if the ACTIVE flag isn't set). In the new world > order, the driver is directly passed the mbuf, and may then choose to que= ue > it or otherwise handle it as it sees fit. The immediate practical benefit > is clear: if the queueing at the ifnet layer is unnecessary, it is entire= ly > avoided, skipping enqueue, dequeue, and four mutex operations. This > applies immediately for VLAN processing, but also means that for modern > gigabit cards, the hardware queue (which will be used anyway) is the only > queue necessary. > > There are a few downsides, of course: > > - For older hardware without its own queueing, the queue is still required > -- not only that, but we've now introduced an unconditional function > pointer invocation, which on older hardware, is has more significant > relative cost than it has on more recent CPUs. > > - If drivers still require or use a queue, they must now synchronize acce= ss > to the queue. The obvious choices are to use the ifq lock (and restore t= he > above four lock operations), or to use the driver mutex (and risk higher > contention). Right now, if the driver is busy (driver mutex held) then an > enqueue is still possible, but with this change and a single mutex > protecting the send queue and driver, that is no longer possible. > > Attached is a patch that maintains the current if_start, but adds > if_startmbuf. If a device driver implements if_startmbuf and the global > sysctl net.startmbuf_enabled is set to 1, then the if_startmbuf path in t= he > driver will be used. Otherwise, if_start is used. I have modified the > if_em driver to implement if_startmbuf also. If there is no packet backl= og > in the if_snd queue, it directly places the packet in the transmit > descriptor ring. If there is a backlog, it uses the if_snd queue protected > by driver mutex, rather than a separate ifq mutex. > > In some basic local micro-benchmarks, I saw a 5% improvement in UDP 0-byte > paylod PPS on UP, and a 10% improvement on SMP. I saw a 1.7% performance > improvement in the bulk serving of 1k files over HTTP. These are only > micro-benchmarks, and reflect a configuration in which the CPU is unable = to > keep up with the output rate of the 1gbps ethernet card in the device, so > reductions in host CPU usage are immediately visible in increased output = as > the CPU is able to better keep up with the network hardware. Other > configurations are also of interest of interesting, especially ones in > which the network device is unable to keep up with the CPU, resulting in > more queueing. > > Conceptual review as well as banchmarking, etc, would be most welcome. This begs the question: What about ALTQ? If we maintain the fallback mechanism in _handoff, we can just add=20 ALTQ_IS_ENABLED() to the test. Otherwise every driver's startmbuf function= =20 would have to take care of ALTQ itself, which is not preferable. I strongly agree with you comment about how messed up ifq_*/if_* in if_var.= h=20 are - and I'm afraid that's partly me fault for bringing in ALTQ. =2D-=20 /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News --nextPart2120229.eJNeJPqOEV Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.4 (FreeBSD) iD8DBQBEzMlEXyyEoT62BG0RAvsrAJ4v2m/yc+PHoUM+kPE0ZZUVknJbTgCfeJYN uQVwRejml24OusLMlSIJV5A= =OUxd -----END PGP SIGNATURE----- --nextPart2120229.eJNeJPqOEV--