From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 12:21:00 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3E716106564A for ; Sun, 5 Apr 2009 12:21:00 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id B21B78FC0A for ; Sun, 5 Apr 2009 12:20:59 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1LqRLK-0004Sk-9x for freebsd-net@freebsd.org; Sun, 05 Apr 2009 12:20:54 +0000 Received: from 93-141-3-137.adsl.net.t-com.hr ([93.141.3.137]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 12:20:54 +0000 Received: from ivoras by 93-141-3-137.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 12:20:54 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Ivan Voras Date: Sun, 05 Apr 2009 14:20:25 +0200 Lines: 72 Message-ID: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig9D3AA7C6A7FB08F179C61F87" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 93-141-3-137.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) X-Enigmail-Version: 0.95.7 Sender: news Subject: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 12:21:00 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig9D3AA7C6A7FB08F179C61F87 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I'm developing an application that needs a high rate of small TCP transactions on multi-core systems, and I'm hitting a limit where a kernel task, usually swi:net (but it depends on the driver) hits 100% of a CPU at some transactions/s rate and blocks further performance increase even though other cores are 100% idle. So I've got an idea and tested it out, but it fails in an unexpected way. I'm not very familiar with the network code so I'm probably missing something obvious. The idea was to locate where the packet processing takes place and offload packets to several new kernel threads. I see this can happen in several places - netisr, ip_input and tcp_input, and I chose netisr because I thought maybe it would also help other uses (routing?). Here's a patch against CURRENT: http://people.freebsd.org/~ivoras/diffs/mpip.patch It's fairly simple - starts a configurable number of threads in start_netisr(), assigns circular queues to each, and modifies what I think are entry points for packets in the non-netisr.direct case. I also try to have TCP and UDP traffic from the same host+port processed by the same thread. It has some rough edges but I think this is enough to test the idea. I know that there are several people officially working in this area and I'm not an expert in it so think of it as a weekend hack for learning purposes :) These parameters are needed in loader.conf to test it: net.isr.direct=3D0 net.isr.mtdispatch_n_threads=3D2 I expected things like the contention in upper layers (TCP) leading to not improving performance one bit, but I can't explain what I'm getting here. While testing the application on a plain kernel, I get approx. 100,000 - 120,000 packets/s per direction (by looking at "netstat 1") and a similar number of transactions/s in the application. With the patch I get up to 250,000 packets/s in netstat (3 mtdispatch threads), but for some weird reason the actual number of transactions processed by the application drops to less than 1,000 at the beginning (~~ 30 seconds), then jumps to close to 100,000 transactions/s, with netstat also showing a drop this number of packets. In the first phase, the new threads (netd0..3) are using CPU time almost 100%, in the second phase I can't see where the CPU time is going (using top). I thought this has something to deal with NIC moderation (em) but can't really explain it. The bad performance part (not the jump) is also visible over the loopback interface. Any ideas? --------------enig9D3AA7C6A7FB08F179C61F87 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknYogoACgkQldnAQVacBcg0rwCeK5aaPe2Al0xFoelvU1IyJXup 9DQAmwRr/BgW8/Q/sBkNmlrJqtJtmvci =KeAh -----END PGP SIGNATURE----- --------------enig9D3AA7C6A7FB08F179C61F87-- From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 13:21:23 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 701AE1065670; Sun, 5 Apr 2009 13:21:23 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 4527A8FC12; Sun, 5 Apr 2009 13:21:23 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id DAD7946B3B; Sun, 5 Apr 2009 09:21:22 -0400 (EDT) Date: Sun, 5 Apr 2009 14:21:22 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ivan Voras In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 13:21:23 -0000 On Sun, 5 Apr 2009, Ivan Voras wrote: > I'm developing an application that needs a high rate of small TCP > transactions on multi-core systems, and I'm hitting a limit where a kernel > task, usually swi:net (but it depends on the driver) hits 100% of a CPU at > some transactions/s rate and blocks further performance increase even though > other cores are 100% idle. You can find a similar, if possibly more mature, implementation here: //depot/projects/rwatson/netisr2/... I haven't updated it in about six months since I've been waiting for the RSS-based flowid support in HEAD to mature. One of the fundamental problems with hashing packets to distribute work is that it involves taking cache misses on packet headers, not just once, but twice, which often is one of the largest costs in processing packets. Most modern, interesting high-performance network cards can already take the hash in hardware, and you want to use that hash to place work where possible. In 8.x, you shouldn't be experiencing high lock contention for the TCP receipt path when doing bulk transfers, as we use read locking for the tcbinfo lock in most cases. In fact, you can even get fairly decent scalability even in 7.x because the regular packet processing path for TCP uses mutual exclusion only briefly. However, the current approach does dirty a lot of cache lines, especially locks and stats, and does not scale well (in 8.x, or at all in 7.x) if you have lots of short connections. Also, be aware that if you're outputting to a single interface or queue, there's a *lot* of lock contention in the device driver. Kip Macy has patches to support multiple output queues on cxgb, which should facilitate support for other drivers as well, and the plan is to get that in 8.0 as well. The patch above doesn't know about the mbuf packetheader flowid yet, but it's trivial to teach it about that. I have plans to get back to the netisr2 code before we finalize 8.0, but have some other stuff in the queue first. We're, briefly, in a period where input queue count is about the same density as CPU cores; it's not entirely clear, but we may soon be back in a situation where CPU core count exceeds queues, in which case doing software work placement will continue to be important. Right now, as long as your high-performance card supports multiple input queues, we already do pretty effective work placement by virtue of RSS and multiple ithreads. Robert N M Watson Computer Laboratory University of Cambridge > > So I've got an idea and tested it out, but it fails in an unexpected > way. I'm not very familiar with the network code so I'm probably missing > something obvious. The idea was to locate where the packet processing > takes place and offload packets to several new kernel threads. I see > this can happen in several places - netisr, ip_input and tcp_input, and > I chose netisr because I thought maybe it would also help other uses > (routing?). Here's a patch against CURRENT: > > http://people.freebsd.org/~ivoras/diffs/mpip.patch > > It's fairly simple - starts a configurable number of threads in > start_netisr(), assigns circular queues to each, and modifies what I > think are entry points for packets in the non-netisr.direct case. I also > try to have TCP and UDP traffic from the same host+port processed by the > same thread. It has some rough edges but I think this is enough to test > the idea. I know that there are several people officially working in > this area and I'm not an expert in it so think of it as a weekend hack > for learning purposes :) > > These parameters are needed in loader.conf to test it: > > net.isr.direct=0 > net.isr.mtdispatch_n_threads=2 > > I expected things like the contention in upper layers (TCP) leading to > not improving performance one bit, but I can't explain what I'm getting > here. While testing the application on a plain kernel, I get approx. > 100,000 - 120,000 packets/s per direction (by looking at "netstat 1") > and a similar number of transactions/s in the application. With the > patch I get up to 250,000 packets/s in netstat (3 mtdispatch threads), > but for some weird reason the actual number of transactions processed by > the application drops to less than 1,000 at the beginning (~~ 30 > seconds), then jumps to close to 100,000 transactions/s, with netstat > also showing a drop this number of packets. In the first phase, the new > threads (netd0..3) are using CPU time almost 100%, in the second phase I > can't see where the CPU time is going (using top). > > I thought this has something to deal with NIC moderation (em) but can't > really explain it. The bad performance part (not the jump) is also > visible over the loopback interface. > > Any ideas? > > From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 13:24:02 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F3E7D106566C; Sun, 5 Apr 2009 13:24:01 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id D0BCA8FC12; Sun, 5 Apr 2009 13:24:01 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 764A246B0C; Sun, 5 Apr 2009 09:24:01 -0400 (EDT) Date: Sun, 5 Apr 2009 14:24:01 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ivan Voras In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 13:24:02 -0000 On Sun, 5 Apr 2009, Ivan Voras wrote: > I thought this has something to deal with NIC moderation (em) but can't > really explain it. The bad performance part (not the jump) is also visible > over the loopback interface. FYI, if you want high performance, you really want a card supporting multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are fundamentally less scalable in an SMP environment because they require input or output to occur only from one CPU at a time. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 13:35:11 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 36170106564A for ; Sun, 5 Apr 2009 13:35:11 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id D913A8FC13 for ; Sun, 5 Apr 2009 13:35:10 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1LqSV5-0007C9-Ij for freebsd-net@freebsd.org; Sun, 05 Apr 2009 13:35:03 +0000 Received: from 93-141-3-137.adsl.net.t-com.hr ([93.141.3.137]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 13:35:03 +0000 Received: from ivoras by 93-141-3-137.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 13:35:03 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Ivan Voras Date: Sun, 05 Apr 2009 15:34:26 +0200 Lines: 38 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigE100CC5D7B9CFB0A0C63756C" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 93-141-3-137.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) In-Reply-To: X-Enigmail-Version: 0.95.7 Sender: news Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 13:35:11 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigE100CC5D7B9CFB0A0C63756C Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Robert Watson wrote: >=20 > On Sun, 5 Apr 2009, Ivan Voras wrote: >=20 >> I thought this has something to deal with NIC moderation (em) but >> can't really explain it. The bad performance part (not the jump) is >> also visible over the loopback interface. >=20 > FYI, if you want high performance, you really want a card supporting > multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are > fundamentally less scalable in an SMP environment because they require > input or output to occur only from one CPU at a time. Makes sense, but on the other hand - I see people are routing at least 250,000 packets per seconds per direction with these cards, so they probably aren't the bottleneck (pro/1000 pt on pci-e). --------------enigE100CC5D7B9CFB0A0C63756C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknYs2sACgkQldnAQVacBcgOzACguAsTzdt9DZStuslyOHAti/9J 9noAoPDt1v9OHmV2gx/eYD7cRClVnDMJ =UEzZ -----END PGP SIGNATURE----- --------------enigE100CC5D7B9CFB0A0C63756C-- From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 13:54:20 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 427A6106567F; Sun, 5 Apr 2009 13:54:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 018AB8FC12; Sun, 5 Apr 2009 13:54:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id AB86846B89; Sun, 5 Apr 2009 09:54:19 -0400 (EDT) Date: Sun, 5 Apr 2009 14:54:19 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ivan Voras In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 13:54:20 -0000 On Sun, 5 Apr 2009, Ivan Voras wrote: >>> I thought this has something to deal with NIC moderation (em) but can't >>> really explain it. The bad performance part (not the jump) is also visible >>> over the loopback interface. >> >> FYI, if you want high performance, you really want a card supporting >> multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are >> fundamentally less scalable in an SMP environment because they require >> input or output to occur only from one CPU at a time. > > Makes sense, but on the other hand - I see people are routing at least > 250,000 packets per seconds per direction with these cards, so they probably > aren't the bottleneck (pro/1000 pt on pci-e). The argument is not that they are slower (although they probably are a bit slower), rather that they introduce serialization bottlenecks by requiring synchronization between CPUs in order to distribute the work. Certainly some of the scalability issues in the stack are not a result of that, but a good number are. Historically, we've had a number of bottlenecks in, say, the bulk data receive and send paths, such as: - Initial receipt and processing of packets on a single CPU as a result of a single input queue from the hardware. Addressed by using multiple input queue hardware with appropriately configured drivers (generally the default is to use multiple input queues in 7.x and 8.x for supporting hardware). - Cache line contention on stats data structures in drivers and various levels of the network stack due to bouncing around exclusive ownership of the cache line. ifnet introduces at least a few, but I think most of the interesting ones are at the IP and TCP layers for receipt. - Global locks protecting connection lists, all rwlocks as of 7.1, but not necessarily always used read-only for packet processing. For UDP we do a very good job at avoiding write locks, but for TCP in 7.x we still use a global write lock, if briefly, for every packet. There's a change in 8.x to use a global read lock for most packets, especially steady state packets, but I didn't merge it for 7.2 because it's not well-benchmarked. Assuming I get positive feedback from more people, I will merge them before 7.3. - If the user application is multi-threaded and receiving from many threads at once, we see contention on the file descriptor table lock. This was markedly improved by the file descriptor table locking rewrite in 7.0, but we're continuing to look for ways to mitigate this. A lockless approach would be really nice... On the transmit path, the bottlenecks are similar but different: - Neither 7.x nor 8.x supports multiple transmit queues as shipped; Kip has patches for both that add it for cxgb. Maintaining ordering here, and ideally affinity to the appropriate associated input queue, is important. As the patches aren't in the tree yet, or for single-queue drivers, contention on the device driver send path and queues can be significant, especially for device drivers where the send and receive path are protected by the same lock (bge!). - Stats at various levels in the stack still dirty cache lines. - We don't acquire, in the common case, any global connection list locks during transmit. - Routing table locks may be an issue. Kip has patches against 8.x to re-introduce inpcb route as well as link layer flow caching. These are in my review queue currently... In 8.x the global radix tree lock is a read-write lock and we use read-locking where possible, but in 7.x it's still a mutex. This probably isn't an MFCable change. Another change coming in 8.x is increased use of read-mostly locks, rmlocks, which avoid writes to shared cache lines for read-acquire, but have a more expensive write-acquire. We're already using this in a few spots, including for firewall registration, but need to use it in more. With a fast CPU, introducing more cores may not necessarily speed up, and might often slow down, processing even if all bottlenecks are eliminated--fundamentally, if you have the CPU capacity to do the work on one CPU, then moving the work to other CPUs is an overhead best avoided. Especially if the device itself forces serialization due to having a single input queue and a single output queue. However, if we, reasonably, assume a capping of core speed over time, and increasing CPU density, software work placement becomes more important. And with multi-queue devices, avoiding writing to common cache lines from CPUs is increasingly possible. We have a 32-thread MIPS embedded eval board in the Netperf cluster now, which we'll begin using for 10gbps testing fairly soon, I hope. One of its properties is that individual threads are decidedly non-zippy compared to, say, a 10gbps interface running at line-rate, so it will allow us to explore these issues more effectively than we could before. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 17:25:42 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7BBDC1065733 for ; Sun, 5 Apr 2009 17:25:42 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63901.mail.re1.yahoo.com (web63901.mail.re1.yahoo.com [69.147.97.116]) by mx1.freebsd.org (Postfix) with SMTP id 32DC58FC2C for ; Sun, 5 Apr 2009 17:25:41 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 32282 invoked by uid 60001); 5 Apr 2009 17:25:41 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1238952341; bh=UAZAyL9X+LmTWizZqCNoHVnJcsaEyDhDFBi8LHWr3k8=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=sW2xGljViae/BOIsxjTFYt0a+U3xW8TEo8NNQykZtJM6xXB+xXbA0T8RiNyPfNIlnwsootQbb5s7MKyzkACMd8Q6WFIskIChdedVbkEG1/989Nxf6UvAz/2iwNGMPRPrl0zQyVzYDiNchEK2tPsOPGA0+NiZtFJQTU3/lyPYOmc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=VXYivRlYRKfb7BXOhi6NBbJvYrQJJlDONsVcX7/B2MFSS1GEQ/PuzwdkKkbAFO0vaKdNi/3Bj95KnAGD0A49hOHBGLoR/+yJhFjRWhO4VhO18ZVi36eD5e1HwcnHbkQHfaaIFpw4RRj3Ys9q6neQbNTJcEKDQx01fVE0itf1n5A=; Message-ID: <285323.31546.qm@web63901.mail.re1.yahoo.com> X-YMail-OSG: oBYwuYkVM1lxH_acsk6HjQP98EbR6ehygHnFTfDWr4FVxpITviaKt_UanaK6piXveRnvPR9GqmV5Bizijz4P9KkAG9EGJ67O8l496N4FDMpLfGUVVsUDGSzVA0jBn0jelaXCb9I1am6asd80T076a5HAH0ZVmtySdP2txdo21JnwgFkpUdhuMmSmQ3yaPYcEuTVFR8a42h8hNfaL7ZB8xRK6dNGLLAjBwctaCATqAIjATMndTK.pSuIe1z81DCsFNU9euN9j2Q0wuPY3t7XNGl5nk.QnqqMoYdA4a.FhPoLceHrDrj2MEO0Ba4SA Received: from [98.242.222.229] by web63901.mail.re1.yahoo.com via HTTP; Sun, 05 Apr 2009 10:25:41 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Sun, 5 Apr 2009 10:25:41 -0700 (PDT) From: Barney Cordoba To: Ivan Voras , Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 17:25:44 -0000 --- On Sun, 4/5/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Ivan Voras" > Cc: freebsd-net@freebsd.org > Date: Sunday, April 5, 2009, 9:54 AM > On Sun, 5 Apr 2009, Ivan Voras wrote: > > >>> I thought this has something to deal with NIC > moderation (em) but can't really explain it. The bad > performance part (not the jump) is also visible over the > loopback interface. > >> > >> FYI, if you want high performance, you really want > a card supporting multiple input queues -- igb, cxgb, mxge, > etc. if_em-only cards are fundamentally less scalable in an > SMP environment because they require input or output to > occur only from one CPU at a time. > > > > Makes sense, but on the other hand - I see people are > routing at least 250,000 packets per seconds per direction > with these cards, so they probably aren't the bottleneck > (pro/1000 pt on pci-e). > > The argument is not that they are slower (although they > probably are a bit slower), rather that they introduce > serialization bottlenecks by requiring synchronization > between CPUs in order to distribute the work. Certainly > some of the scalability issues in the stack are not a result > of that, but a good number are. > > Historically, we've had a number of bottlenecks in, > say, the bulk data receive and send paths, such as: > > - Initial receipt and processing of packets on a single CPU > as a result of a > single input queue from the hardware. Addressed by using > multiple input > queue hardware with appropriately configured drivers > (generally the default > is to use multiple input queues in 7.x and 8.x for > supporting hardware). > > - Cache line contention on stats data structures in drivers > and various levels > of the network stack due to bouncing around exclusive > ownership of the cache > line. ifnet introduces at least a few, but I think most > of the interesting > ones are at the IP and TCP layers for receipt. > > - Global locks protecting connection lists, all rwlocks as > of 7.1, but not > necessarily always used read-only for packet processing. > For UDP we do a > very good job at avoiding write locks, but for TCP in 7.x > we still use a > global write lock, if briefly, for every packet. > There's a change in 8.x to > use a global read lock for most packets, especially > steady state packets, > but I didn't merge it for 7.2 because it's not > well-benchmarked. Assuming I > get positive feedback from more people, I will merge them > before 7.3. > > - If the user application is multi-threaded and receiving > from many threads at > once, we see contention on the file descriptor table > lock. This was > markedly improved by the file descriptor table locking > rewrite in 7.0, but > we're continuing to look for ways to mitigate this. > A lockless approach > would be really nice... > > On the transmit path, the bottlenecks are similar but > different: > > - Neither 7.x nor 8.x supports multiple transmit queues as > shipped; Kip has > patches for both that add it for cxgb. Maintaining > ordering here, and > ideally affinity to the appropriate associated input > queue, is important. > As the patches aren't in the tree yet, or for > single-queue drivers, > contention on the device driver send path and queues can > be significant, > especially for device drivers where the send and receive > path are protected > by the same lock (bge!). I'm curious as to your assertion that hardware transmit queues are a big win. You're really just loading a transmit ring well ahead of actual transmission; there's no need to force a "start" for each packet queued. You then have more overheard managing the multiple queues; more memory used, more cpu cache needed, more interrupts (perhaps), overhead generating the flowid. It seems to me that a more efficient method of transmitting, such as offloading the transmit workload to a kernel task, would be more effective than using multiple transmit queues. All the source thread has to do is queue the packet and get out. As an aside, why is Kip doing development on a Chelsio card rather than a more mainstream product such as Intel or Broadcom that would generate more widespread interest? Barney From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 17:27:25 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F2C41065691 for ; Sun, 5 Apr 2009 17:27:25 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from out2.smtp.messagingengine.com (out2.smtp.messagingengine.com [66.111.4.26]) by mx1.freebsd.org (Postfix) with ESMTP id D86F58FC0A for ; Sun, 5 Apr 2009 17:27:24 +0000 (UTC) (envelope-from bms@incunabulum.net) Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id 19135311DBC; Sun, 5 Apr 2009 13:27:24 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Sun, 05 Apr 2009 13:27:24 -0400 X-Sasl-enc: 2IloMwDbXowInNaEkcNP5WFSi17ro/YyVDin5YuvzBKy 1238952443 Received: from anglepoise.lon.incunabulum.net (82-35-112-254.cable.ubr07.dals.blueyonder.co.uk [82.35.112.254]) by mail.messagingengine.com (Postfix) with ESMTPSA id EDE222DED8; Sun, 5 Apr 2009 13:27:22 -0400 (EDT) Message-ID: <49D8E9F8.7090800@incunabulum.net> Date: Sun, 05 Apr 2009 18:27:20 +0100 From: Bruce Simpson User-Agent: Thunderbird 2.0.0.21 (X11/20090321) MIME-Version: 1.0 To: Upakul Barkakaty References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Multicast routing X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 17:27:25 -0000 Upakul Barkakaty wrote: > Hi all, > > I was trying to setup a multicast tunneling setup with freebsd, with the > mrouted utility. However, my multicast router doesnt seem to be forwarding > those multicast packets. > > It would really be helpful if someone could help me with the setup or the > mrouted.conf file contents. > > Thanks in anticipation. > > Please try the mcast-tools port to confirm that multicast forwarding works. There are tools in that port which will allow you to run basic UDP stream tests as well as installing static entries in the forwarding cache. The most likely culprit is a network interface which does not support ALLMULTI. Also, DVMRP has been dead for years, avoid mrouted -- try a PIM implementation e.g. XORP or pimsd. thanks BMS From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 17:29:45 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5877510656C0 for ; Sun, 5 Apr 2009 17:29:45 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id C95E58FC18 for ; Sun, 5 Apr 2009 17:29:44 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1LqWAA-0007pK-FU for freebsd-net@freebsd.org; Sun, 05 Apr 2009 17:29:42 +0000 Received: from 93-141-3-137.adsl.net.t-com.hr ([93.141.3.137]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 17:29:42 +0000 Received: from ivoras by 93-141-3-137.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 17:29:42 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Ivan Voras Date: Sun, 05 Apr 2009 19:29:10 +0200 Lines: 74 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig1FAE96532F0E09824EF6C434" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 93-141-3-137.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) In-Reply-To: X-Enigmail-Version: 0.95.7 Sender: news Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 17:29:46 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig1FAE96532F0E09824EF6C434 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Robert Watson wrote: >=20 > On Sun, 5 Apr 2009, Ivan Voras wrote: >=20 >>>> I thought this has something to deal with NIC moderation (em) but >>>> can't really explain it. The bad performance part (not the jump) is >>>> also visible over the loopback interface. >>> >>> FYI, if you want high performance, you really want a card supporting >>> multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are >>> fundamentally less scalable in an SMP environment because they >>> require input or output to occur only from one CPU at a time. >> >> Makes sense, but on the other hand - I see people are routing at least= >> 250,000 packets per seconds per direction with these cards, so they >> probably aren't the bottleneck (pro/1000 pt on pci-e). >=20 > The argument is not that they are slower (although they probably are a > bit slower), rather that they introduce serialization bottlenecks by > requiring synchronization between CPUs in order to distribute the work.= =20 > Certainly some of the scalability issues in the stack are not a result > of that, but a good number are. I'd like to understand more. If (in netisr) I have a mbuf with headers, is this data already transfered from the card or is it magically "not here yet"? In the first case, the package reception code path is not changed until it's queued on a thread, on which it's handled in the future (or is the influence of "other" data like timers and internal TCP reassembly buffers so large?). In the second case, why? > Historically, we've had a number of bottlenecks in, say, the bulk data > receive and send paths, such as: >=20 > - Initial receipt and processing of packets on a single CPU as a result= > of a > single input queue from the hardware. Addressed by using multiple in= put > queue hardware with appropriately configured drivers (generally the > default > is to use multiple input queues in 7.x and 8.x for supporting hardwar= e). As the card and the OS can already process many packets per second for something fairly complex as routing (http://www.tancsa.com/blast.html), and TCP chokes swi:net at 100% of a core, isn't this indication there's certainly more space for improvement even with a single-queue old-fashioned NICs? --------------enig1FAE96532F0E09824EF6C434 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknY6mYACgkQldnAQVacBcjOfwCeOKtS8skAua5SW8DwMiFIdozi TFMAn0LkN2TD0wVJ9tkz9rnP6x3BSRjR =8O6z -----END PGP SIGNATURE----- --------------enig1FAE96532F0E09824EF6C434-- From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 17:40:19 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EF4E1065691; Sun, 5 Apr 2009 17:40:19 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id ED0138FC1C; Sun, 5 Apr 2009 17:40:18 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id A409646B8F; Sun, 5 Apr 2009 13:40:18 -0400 (EDT) Date: Sun, 5 Apr 2009 18:40:18 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Barney Cordoba In-Reply-To: <285323.31546.qm@web63901.mail.re1.yahoo.com> Message-ID: References: <285323.31546.qm@web63901.mail.re1.yahoo.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 17:40:20 -0000 On Sun, 5 Apr 2009, Barney Cordoba wrote: > I'm curious as to your assertion that hardware transmit queues are a big > win. You're really just loading a transmit ring well ahead of actual > transmission; there's no need to force a "start" for each packet queued. You > then have more overheard managing the multiple queues; more memory used, > more cpu cache needed, more interrupts (perhaps), overhead generating the > flowid. It seems to me that a more efficient method of transmitting, such as > offloading the transmit workload to a kernel task, would be more effective > than using multiple transmit queues. All the source thread has to do is > queue the packet and get out. When using multiple cores, we've observed significant contention on the transmit-side locks protecting a single output queue; when multiple queues are used, that contention is avoided. The lock only coveres the queue, but the overhead of a single high contention lock twice for every packet (enqeueu, later dequeue) is significant at high pps and with many cores. > As an aside, why is Kip doing development on a Chelsio card rather than a > more mainstream product such as Intel or Broadcom that would generate more > widespread interest? Because they paid him to to write their driver? :-) Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 21:24:21 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A34541065670; Sun, 5 Apr 2009 21:24:21 +0000 (UTC) (envelope-from oberman@es.net) Received: from mailgw.es.net (mail1.es.net [IPv6:2001:400:201:1::2]) by mx1.freebsd.org (Postfix) with ESMTP id 8D0558FC18; Sun, 5 Apr 2009 21:24:21 +0000 (UTC) (envelope-from oberman@es.net) Received: from ptavv.es.net (ptavv.es.net [IPv6:2001:400:910::29]) by mailgw.es.net (8.14.3/8.14.3) with ESMTP id n35LOKvF001485 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 5 Apr 2009 14:24:20 -0700 Received: from ptavv.es.net (ptavv.es.net [127.0.0.1]) by ptavv.es.net (Tachyon Server) with ESMTP id 31A311CC50; Sun, 5 Apr 2009 14:24:20 -0700 (PDT) To: barney_cordoba@yahoo.com In-reply-to: Your message of "Sun, 05 Apr 2009 10:25:41 PDT." <285323.31546.qm@web63901.mail.re1.yahoo.com> Date: Sun, 05 Apr 2009 14:24:20 -0700 From: "Kevin Oberman" Message-Id: <20090405212420.31A311CC50@ptavv.es.net> Cc: freebsd-net@freebsd.org, Robert Watson , Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 21:24:22 -0000 > Date: Sun, 5 Apr 2009 10:25:41 -0700 (PDT) > From: Barney Cordoba > Sender: owner-freebsd-net@freebsd.org > > > As an aside, why is Kip doing development on a Chelsio card rather > than a more mainstream product such as Intel or Broadcom that would > generate more widespread interest? Because Chelsio pays him better than the makers of the "more mainstream" products. And, at 10GE, Chelsio and Myricom seem to have stronger products than others. (Just my opinion and not that of The US Dept. of Energy, The university of California, or Lawrence Berkeley National Labs.) I just hope Kip's legal problems are resolved soon. FreeBSD really needs him. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: oberman@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 21:32:37 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A1F261065674 for ; Sun, 5 Apr 2009 21:32:37 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63906.mail.re1.yahoo.com (web63906.mail.re1.yahoo.com [69.147.97.121]) by mx1.freebsd.org (Postfix) with SMTP id 464DD8FC13 for ; Sun, 5 Apr 2009 21:32:37 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 73900 invoked by uid 60001); 5 Apr 2009 21:32:36 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1238967156; bh=RuMlGk4vsuJPubhuisTxnMKgvyShTshtnqEpj5W2r54=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=P/llS3UpA5Qb0S9UwIZWfjjaBaR6H2Ce94p7tzLdIHmNa2+i4Rq67I8A3fBIMWxM/EwGB05WPjYnSy1rkkGTivYP7gVILqwwJ+jWh6V2MS3Q+60WbIiQK6+VRR6cfaZYSFB/LbkxGN4iQZ/5oTFZ9saFEYrmQ9IP3qYN70RBCLQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=W+IN4VS63VntDErPfLsiv3lRYwsXs6QIR96cKc2yu9mq7uEefnN3DwLKVLhPmY88U57O2ahijH02ogfl13Kw2ZzvrtTTO8XrRuk0M5u0KFum1VgFIXq2g+hHT5tdBm1yzOiA6x/f0Ls7RIzJtNbn3hKw+GPqNN8Ug/mI/paZyFc=; Message-ID: <496315.72401.qm@web63906.mail.re1.yahoo.com> X-YMail-OSG: AzBgvPAVM1na3Ij0KuHddDM6Qqwd.9SDzBv9y2iStBH2xMLkJ97NXEk4eDkEmaoaAXpAdncwOex8Tz4U2BRrxf6Gn7IFykAOFniSkGAuv7iD9HKbinhKX1GP0lQi.iS2Ku.JIO84z9yqySd0355.QdtB7B72rF2bz58PgZ.qvKVUNP3Rn3aV6uQPPS.fjRfIWOn8eb_VUQ7u3ozAcUKHNysHDX_PP2gXnP.7tx0RVEarLCmZqObbU7NA.gEmZ5wPIx.8SEvhCi6mzTpqg9j4TPNpHyBjOzoqppKmwEwpUbh77NfpEue7YLPqwsQrHfEGlNw0bDLrA8GUQRkUBw-- Received: from [98.242.222.229] by web63906.mail.re1.yahoo.com via HTTP; Sun, 05 Apr 2009 14:32:36 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Sun, 5 Apr 2009 14:32:36 -0700 (PDT) From: Barney Cordoba To: Kevin Oberman In-Reply-To: <20090405212420.31A311CC50@ptavv.es.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Robert Watson , Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 21:32:37 -0000 --- On Sun, 4/5/09, Kevin Oberman wrote: > From: Kevin Oberman > Subject: Re: Advice on a multithreaded netisr patch? > To: barney_cordoba@yahoo.com > Cc: "Ivan Voras" , "Robert Watson" , freebsd-net@freebsd.org > Date: Sunday, April 5, 2009, 5:24 PM > > Date: Sun, 5 Apr 2009 10:25:41 -0700 (PDT) > > From: Barney Cordoba > > Sender: owner-freebsd-net@freebsd.org > > > > > > As an aside, why is Kip doing development on a Chelsio > card rather > > than a more mainstream product such as Intel or > Broadcom that would > > generate more widespread interest? > > Because Chelsio pays him better than the makers of the > "more mainstream" > products. And, at 10GE, Chelsio and Myricom seem to have > stronger > products than others. (Just my opinion and not that of The > US Dept. of > Energy, The university of California, or Lawrence Berkeley > National > Labs.) Sadly thats the small picture view that has plagued freebsd for the longest time. The bigger picture is that big OEMs aren't going to use chelsio cards, and big OEMs running FreeBSD instead of linux mean more testers, more hardware, more code give-backs and more money for the project. You don't really know how good or bad intel or broadcom is because you don't have good drivers for the cards. Unfortunately Intel does things ass-backwards, by putting out crap "sample" drivers that make their cards look like garbage. Maybe they are garbage, but you think they'd be a bit smarter. They can certainly afford more than Chelsio. Barney From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 21:37:27 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5D07A106566C for ; Sun, 5 Apr 2009 21:37:27 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: from bizet.nethelp.no (bizet.nethelp.no [195.1.209.33]) by mx1.freebsd.org (Postfix) with SMTP id 60FB78FC17 for ; Sun, 5 Apr 2009 21:37:26 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: (qmail 30249 invoked from network); 5 Apr 2009 21:10:44 -0000 Received: from bizet.nethelp.no (HELO localhost) (195.1.209.33) by bizet.nethelp.no with SMTP; 5 Apr 2009 21:10:44 -0000 Date: Sun, 05 Apr 2009 23:10:44 +0200 (CEST) Message-Id: <20090405.231044.74688369.sthaug@nethelp.no> To: freebsd-net@freebsd.org From: sthaug@nethelp.no X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Subject: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 21:37:28 -0000 On 7-STABLE, with kern.ipc.maxsockbuf=2621440, both sides set a window scaling factor of 6 (i.e. SYN wscale 6, SYN-ACK wscale 6) using IPv4. With the same value of kern.ipc.maxsockbuf, using IPv6, the side which sends the initial SYN sets a window scaling factor of only 1, while the other side sets a scaling factor of 6 in the SYN-ACK. This will obviously limit throughput in many cases. In both cases net.inet.tcp.rfc1323=1. Anybody know why IPv6 behaves differently here? tcpdump example: 22:20:37.282415 IP 193.75.4.50.53981 > 193.75.110.66.5555: S 1580765626:1580765626(0) win 65535 22:20:37.282442 IP 193.75.110.66.5555 > 193.75.4.50.53981: S 1408884711:1408884711(0) ack 1580765627 win 65535 22:21:49.749586 IP6 2001:8c0:9a00:1::2.53983 > 2001:8c0:8500:1::2.5555: S 565631163:565631163(0) win 65535 22:21:49.749633 IP6 2001:8c0:8500:1::2.5555 > 2001:8c0:9a00:1::2.53983: S 627173961:627173961(0) ack 565631164 win 65535 Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4E4411065670 for ; Sun, 5 Apr 2009 21:50:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [195.88.108.3]) by mx1.freebsd.org (Postfix) with ESMTP id 084C98FC1C for ; Sun, 5 Apr 2009 21:50:06 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id A09CD41C6FC; Sun, 5 Apr 2009 23:50:05 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([195.88.108.3]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id vVdtwUB0c+OZ; Sun, 5 Apr 2009 23:50:05 +0200 (CEST) Received: by mail.cksoft.de (Postfix, from userid 66) id 3FFE941C6F2; Sun, 5 Apr 2009 23:50:05 +0200 (CEST) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 8844D4448E6; Sun, 5 Apr 2009 21:49:50 +0000 (UTC) Date: Sun, 5 Apr 2009 21:49:50 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: sthaug@nethelp.no In-Reply-To: <20090405.231044.74688369.sthaug@nethelp.no> Message-ID: <20090405214757.E15361@maildrop.int.zabbadoz.net> References: <20090405.231044.74688369.sthaug@nethelp.no> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 21:50:07 -0000 On Sun, 5 Apr 2009, sthaug@nethelp.no wrote: > On 7-STABLE, with kern.ipc.maxsockbuf=2621440, both sides set a window > scaling factor of 6 (i.e. SYN wscale 6, SYN-ACK wscale 6) using IPv4. > > With the same value of kern.ipc.maxsockbuf, using IPv6, the side which > sends the initial SYN sets a window scaling factor of only 1, while > the other side sets a scaling factor of 6 in the SYN-ACK. This will > obviously limit throughput in many cases. > > In both cases net.inet.tcp.rfc1323=1. > > Anybody know why IPv6 behaves differently here? > > tcpdump example: > > 22:20:37.282415 IP 193.75.4.50.53981 > 193.75.110.66.5555: S 1580765626:1580765626(0) win 65535 > 22:20:37.282442 IP 193.75.110.66.5555 > 193.75.4.50.53981: S 1408884711:1408884711(0) ack 1580765627 win 65535 > > 22:21:49.749586 IP6 2001:8c0:9a00:1::2.53983 > 2001:8c0:8500:1::2.5555: S 565631163:565631163(0) win 65535 > 22:21:49.749633 IP6 2001:8c0:8500:1::2.5555 > 2001:8c0:9a00:1::2.53983: S 627173961:627173961(0) ack 565631164 win 65535 request_r_scale < TCP_MAX_WINSHIFT && 1112 (TCP_MAXWIN << tp->request_r_scale) < sb_max) ^^^^^^^^^^^ 1113 tp->request_r_scale++; and tcp6_connect 1174 /* Compute window scaling to request. */ 1175 while (tp->request_r_scale < TCP_MAX_WINSHIFT && 1176 (TCP_MAXWIN << tp->request_r_scale) < so->so_rcv.sb_hiwat) ^^^^^^^^^^^ 1177 tp->request_r_scale++; I'll have to check why they are un-equal... /bz -- Bjoern A. Zeeb The greatest risk is not taking one. From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 22:05:07 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A78561065702 for ; Sun, 5 Apr 2009 22:05:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [195.88.108.3]) by mx1.freebsd.org (Postfix) with ESMTP id 3687E8FC18 for ; Sun, 5 Apr 2009 22:05:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id 6021F41C75E; Mon, 6 Apr 2009 00:05:06 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([195.88.108.3]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id G0fuCDoPZrvH; Mon, 6 Apr 2009 00:05:05 +0200 (CEST) Received: by mail.cksoft.de (Postfix, from userid 66) id E9F2241C75D; Mon, 6 Apr 2009 00:05:05 +0200 (CEST) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 6EEAE4448E6; Sun, 5 Apr 2009 22:02:04 +0000 (UTC) Date: Sun, 5 Apr 2009 22:02:04 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: sthaug@nethelp.no In-Reply-To: <20090405214757.E15361@maildrop.int.zabbadoz.net> Message-ID: <20090405215842.C15361@maildrop.int.zabbadoz.net> References: <20090405.231044.74688369.sthaug@nethelp.no> <20090405214757.E15361@maildrop.int.zabbadoz.net> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 22:05:07 -0000 On Sun, 5 Apr 2009, Bjoern A. Zeeb wrote: > On Sun, 5 Apr 2009, sthaug@nethelp.no wrote: > >> On 7-STABLE, with kern.ipc.maxsockbuf=2621440, both sides set a window >> scaling factor of 6 (i.e. SYN wscale 6, SYN-ACK wscale 6) using IPv4. >> >> With the same value of kern.ipc.maxsockbuf, using IPv6, the side which >> sends the initial SYN sets a window scaling factor of only 1, while >> the other side sets a scaling factor of 6 in the SYN-ACK. This will >> obviously limit throughput in many cases. >> >> In both cases net.inet.tcp.rfc1323=1. >> >> Anybody know why IPv6 behaves differently here? >> >> tcpdump example: >> >> 22:20:37.282415 IP 193.75.4.50.53981 > 193.75.110.66.5555: S >> 1580765626:1580765626(0) win 65535 > 661320721 0> >> 22:20:37.282442 IP 193.75.110.66.5555 > 193.75.4.50.53981: S >> 1408884711:1408884711(0) ack 1580765627 win 65535 > 6,sackOK,timestamp 1581013561 661320721> >> >> 22:21:49.749586 IP6 2001:8c0:9a00:1::2.53983 > 2001:8c0:8500:1::2.5555: S >> 565631163:565631163(0) win 65535 > 661393190 0> >> 22:21:49.749633 IP6 2001:8c0:8500:1::2.5555 > 2001:8c0:9a00:1::2.53983: S >> 627173961:627173961(0) ack 565631164 win 65535 > 6,sackOK,timestamp 8 > > I think the answer to tthat is in sys/netinet/tcp_usrreq.c in the > functuoins: > tcp_connect > > 1106 /* > 1107 * Compute window scaling to request: > 1108 * Scale to fit into sweet spot. See tcp_syncache.c. > 1109 * XXX: This should move to tcp_output(). > 1110 */ > 1111 while (tp->request_r_scale < TCP_MAX_WINSHIFT && > 1112 (TCP_MAXWIN << tp->request_r_scale) < sb_max) > > ^^^^^^^^^^^ > > 1113 tp->request_r_scale++; > > > and tcp6_connect > > 1174 /* Compute window scaling to request. */ > 1175 while (tp->request_r_scale < TCP_MAX_WINSHIFT && > 1176 (TCP_MAXWIN << tp->request_r_scale) < so->so_rcv.sb_hiwat) > > ^^^^^^^^^^^ > > 1177 tp->request_r_scale++; > > > I'll have to check why they are un-equal... Ok, both versions had: < so->so_rcv.sb_hiwat) http://svn.freebsd.org/viewvc/base?view=revision&revision=166403 changed it for IPv4 the first time, http://svn.freebsd.org/viewvc/base?view=revision&revision=172795 changed it a second time for IPv4. Noone changed the IPv6 version. The syncache already seems to do it for both v4/v6 (common code). Can you try changing it to < sb_max) for IPv6 as well and see if things work (better) for you? /bz -- Bjoern A. Zeeb The greatest risk is not taking one. From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 22:17:59 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B365E106566C; Sun, 5 Apr 2009 22:17:59 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 586E28FC13; Sun, 5 Apr 2009 22:17:59 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id E5E7E46B0C; Sun, 5 Apr 2009 18:17:58 -0400 (EDT) Date: Sun, 5 Apr 2009 23:17:58 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ivan Voras In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 22:18:00 -0000 On Sun, 5 Apr 2009, Ivan Voras wrote: >> The argument is not that they are slower (although they probably are a bit >> slower), rather that they introduce serialization bottlenecks by requiring >> synchronization between CPUs in order to distribute the work. Certainly >> some of the scalability issues in the stack are not a result of that, but a >> good number are. > > I'd like to understand more. If (in netisr) I have a mbuf with headers, is > this data already transfered from the card or is it magically "not here > yet"? A lot depends on the details of the card and driver. The driver will take cache misses on the descriptor ring entry, if it's not already in cache, and the link layer will take a cache miss on the front of the ethernet frame in the cluster pointed to by the mbuf header as part of its demux. What happens next depends on your dispatch model and cache line size. Let's make a few simplifying assumptions that are mostly true: - The driver associats a single cluster with each receive ring entry for each packet to be stored in, and the cluster is cacheline-aligned. No header splitting is enabled. - Standard ethernet encapsulation of IP is used, without additional VLAN headers or other encapsulation, etc. There are no IP options. - We don't need to validate any checksums because the hardware has done it for us, so no need to take cache misses on data that doesn't matter until we reach higher layers. In the device driver/ithread code, we'll now proceed to take some cache misses assuming we're not pretty lucky: (1) The descriptor ring entry (2) The mbuf packet header (3) The first cache line in the cluster This is sufficient to figure out what protocol we're going to dispatch to, and depending on dispatch model, we now either enqueue the packet for delivery to a netisr, or we directly dispatch the handler for IP. If the packet is processed on the current CPU and we're direct dispatching, or if we've dispatched to a netisr on the same CPU and we're quite lucky, the mbuf packet header and front of the cluster will be in the cache. However, what happens next depends on the cache fetch and line size. If things happen in 32-byte cache lines or smaller, we cache miss on the end of the IP header, because the last two bytes of the destination IP address start at offset 32 into the cluster. If we have 64-byte fetching and line size, things go better because both the full IP and TCP headers should be in that first cache line. One big advantage to direct dispatch is that it maximizes the chances that we don't blow out the low-level CPU caches between link-layer and IP-layer processing, meaning that we might actually get through all the IP and TCP headers without a cache miss on a 64-byte line size. If we netisr dispatch to another CPU without a shared cache, or we netisr dispatch to the current CPU but there's a scheduling delay, other packets queued first, etc, we'll take a number of the same cache misses over again as things get pulled into the right cache. This presents a strong cache motivation to keep a packet "on" a CPU and even in the same thread once you've started processing it. If you have to enqueue, you take locks, take a context switch, deal with the fact that LRU on cache lines isn't going to like your queue depth, and potentially pay a number of additional cache misses on the same data. There are also some other good reasons to use direct dispatch, such as avoiding doing work on packets that will later be dropped if the netisr queue overflows. This is why we direct dispatch by default, and why this is quite a good strategy for multiple input queue network cards, where it also buys us parallelism. Note that if the flow RSS hash is in the same cache line as the rest of the receive descriptor ring entry, you may be able to avoid the cache miss on the cluster and simply redirect it to another CPU's netisr without ever reading packet data, which avoids at least one and possibly two cache misses, but also means that you have to run the link layer in the remote netisr, rather than locally in the ithread. > In the first case, the package reception code path is not changed until it's > queued on a thread, on which it's handled in the future (or is the influence > of "other" data like timers and internal TCP reassembly buffers so large?). > In the second case, why? The good news about TCP reassembly is that we don't have to look at the data, only mbuf headers and reassembly buffer entries, so with any luck we've avoided actually taking a cache miss on the data. If things go well, we can avoid looking at anything but mbuf and packet headers until the socket copies out, but I'm not sure how well we do that in practice. > As the card and the OS can already process many packets per second for > something fairly complex as routing (http://www.tancsa.com/blast.html), and > TCP chokes swi:net at 100% of a core, isn't this indication there's > certainly more space for improvement even with a single-queue old-fashioned > NICs? Maybe. It depends on the relative costs of local processing vs redistributing the work, which involves schedulers, IPIs, additional cache misses, lock contention, and so on. This means there's a period where it can't possibly be a win, and then at some point it's a win as long as the stack scales. This is essentially the usual trade-off in using threads and parallelism: does the benefit of multiple parallel execution units make up for the overheads of synchronization and data migration? There are some previous e-mail threads where people have observed that for some workloads, switching to netisr wins over direct dispatch. For example, if you have a number of cores and are doing firewall processing, offloading work to the netisr from the input ithread may improve performance. However, this appears not to be the common case for end-host workloads on the hardware we mostly target, and this is increasingly true as multiple input queues come into play, as the card itself will allow us to use multiple CPUs without any interactions between the CPUs. This isn't to say that work redistribution using a netisr-like scheme isn't a good idea: in a world where CPU threads are weak compared to the wire workflow, and there's cache locality across threads on the same core, or NUMA is present, there may be a potential for a big win when available work significantly exceeds what a single CPU thread/core can handle. In that case, we want to place the work as close as possible to take advantage of shared caches or the memory being local to the CPU thread/core doing the deferred work. FYI, the localhost case is a bit weird -- I think we have some scheduling issues that are causing loopback netisr stuff to be pessimally scheduled. Here are some suggestions for things to try and see if they help, though: - Comment out all ifnet, IP, and TCP global statistics in your local stack -- especially look for things tcpstat.whatever++;. - Use cpuset to pin ithreads, the netisr, and whatever else, to specific cores so that they don't migrate, and if your system uses HTT, experiment with pinning the ithread and the netisr on different threads on the same core, or at least, different cores on the same die. - Experiment with using just the source IP, the source + destination IP, and both IPs plus TCP ports in your hash. - If your card supports RSS, pass the flowid up the stack in the mbuf packet header flowid field, and use that instead of the hash for work placement. - If you're doing pure PPS tests with UDP (or the like), and your test can tolerate disordering, try hashing based on the mbuf header address or something else that will distribute the work but not take a cache miss. - If you have a flowid or the above disordered condition applies, try shifting the link layer dispatch to the netisr, rather than doing the demux in the ithread, as that will avoid cache misses in the ithread and do all the demux in the netisr. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Sun Apr 5 22:48:36 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 14104106566B for ; Sun, 5 Apr 2009 22:48:36 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id 823EE8FC17 for ; Sun, 5 Apr 2009 22:48:35 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Lqb8j-0006og-1H for freebsd-net@freebsd.org; Sun, 05 Apr 2009 22:48:33 +0000 Received: from 93-141-3-137.adsl.net.t-com.hr ([93.141.3.137]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 22:48:33 +0000 Received: from ivoras by 93-141-3-137.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Sun, 05 Apr 2009 22:48:33 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Ivan Voras Date: Mon, 06 Apr 2009 00:47:49 +0200 Lines: 111 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig078FFC936793C9EB67C0FB65" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 93-141-3-137.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) In-Reply-To: X-Enigmail-Version: 0.95.7 Sender: news Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2009 22:48:36 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig078FFC936793C9EB67C0FB65 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Thanks for the ideas, I will try some of them. But I'd also like some more clarifications: Robert Watson wrote: > On Sun, 5 Apr 2009, Ivan Voras wrote: >> I'd like to understand more. If (in netisr) I have a mbuf with >> headers, is this data already transfered from the card or is it >> magically "not here yet"? >=20 > A lot depends on the details of the card and driver. The driver will > take cache misses on the descriptor ring entry, if it's not already in > cache, and the link layer will take a cache miss on the front of the > ethernet frame in the cluster pointed to by the mbuf header as part of > its demux. What happens next depends on your dispatch model and cache > line size. Let's make a few simplifying assumptions that are mostly tr= ue: So, a mbuf can reference data not yet copied from the NIC hardware? I'm specifically trying to undestand what m_pullup() does. >> As the card and the OS can already process many packets per second for= >> something fairly complex as routing >> (http://www.tancsa.com/blast.html), and TCP chokes swi:net at 100% of >> a core, isn't this indication there's certainly more space for >> improvement even with a single-queue old-fashioned NICs? >=20 > Maybe. It depends on the relative costs of local processing vs > redistributing the work, which involves schedulers, IPIs, additional > cache misses, lock contention, and so on. This means there's a period > where it can't possibly be a win, and then at some point it's a win as > long as the stack scales. This is essentially the usual trade-off in > using threads and parallelism: does the benefit of multiple parallel > execution units make up for the overheads of synchronization and data > migration? Do you have any idea at all why I'm seeing the weird difference of netstat packets per second (250,000) and my application's TCP performance (< 1,000 pps)? Summary: each packet is guaranteed to be a whole message causing a transaction in the application - without the changes I see pps almost identical to tps. Even if the source of netstat statistics somehow manages to count packets multiple time (I don't see how that can happen), no relation can describe differences this huge. It almost looks like something in the upper layers is discarding packets (also not likely: TCP timeouts would occur and the application wouldn't be able to push 250,000 pps) - but what? Where to look? > FYI, the localhost case is a bit weird -- I think we have some > scheduling issues that are causing loopback netisr stuff to be > pessimally scheduled. Here are some suggestions for things to try and > see if they help, though: >=20 > - Comment out all ifnet, IP, and TCP global statistics in your local > stack -- > especially look for things tcpstat.whatever++;. You mean for the general code? I purposely don't lock my statistics variables because I'm not that interested in exact numbers (orders of magnitude are relevant). As far as I understand, unlocked "x++" should be trivially fast in this case? > - Use cpuset to pin ithreads, the netisr, and whatever else, to specifi= c > cores > so that they don't migrate, and if your system uses HTT, experiment w= ith > pinning the ithread and the netisr on different threads on the same > core, or > at least, different cores on the same die. I'm using em hardware; I still think there's a possibility I'm fighting the driver in some cases but this has priority #2. > - Experiment with using just the source IP, the source + destination IP= , > and > both IPs plus TCP ports in your hash. Ok. Currently I'm using ip1+ip2+port1+port2. > - If your card supports RSS, pass the flowid up the stack in the mbuf > packet > header flowid field, and use that instead of the hash for work placem= ent. Don't know about em. Don't really want to touch it if I don't have to :) --------------enig078FFC936793C9EB67C0FB65 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknZNRwACgkQldnAQVacBcj7hQCfRE35c+nkAhCYp4+neW2Da6xk kNsAnRxRXOoJR0udvActmaO+azYDeXhn =aVa7 -----END PGP SIGNATURE----- --------------enig078FFC936793C9EB67C0FB65-- From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 06:24:52 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6F5BD106566C; Mon, 6 Apr 2009 06:24:52 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 452998FC12; Mon, 6 Apr 2009 06:24:52 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n366OqIN045367; Mon, 6 Apr 2009 06:24:52 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n366Oq76045363; Mon, 6 Apr 2009 06:24:52 GMT (envelope-from linimon) Date: Mon, 6 Apr 2009 06:24:52 GMT Message-Id: <200904060624.n366Oq76045363@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-i386@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/133218: [carp] [hang] use of carp(4) causes system to freeze X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 06:24:52 -0000 Synopsis: [carp] [hang] use of carp(4) causes system to freeze Responsible-Changed-From-To: freebsd-i386->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Mon Apr 6 06:24:37 UTC 2009 Responsible-Changed-Why: This does not sound i386-specific. http://www.freebsd.org/cgi/query-pr.cgi?pr=133218 From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 10:10:03 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 597981065675 for ; Mon, 6 Apr 2009 10:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 27F3D8FC15 for ; Mon, 6 Apr 2009 10:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n36AA3HF076020 for ; Mon, 6 Apr 2009 10:10:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n36AA3ZX076019; Mon, 6 Apr 2009 10:10:03 GMT (envelope-from gnats) Date: Mon, 6 Apr 2009 10:10:03 GMT Message-Id: <200904061010.n36AA3ZX076019@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: bin/131365: commit references a PR X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 10:10:03 -0000 The following reply was made to PR bin/131365; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: bin/131365: commit references a PR Date: Mon, 6 Apr 2009 10:09:37 +0000 (UTC) Author: rrs Date: Mon Apr 6 10:09:20 2009 New Revision: 190758 URL: http://svn.freebsd.org/changeset/base/190758 Log: Class based addressing went out in the early 90's. Basically if a entry is not route add -net xxx/bits then we should use the addr (xxx) to establish the number of bits by looking at the first non-zero bit. So if we enter route add -net 10.1.1.0 10.1.3.5 this is the same as doing route add -net 10.1.1.0/24 Since the 8th bit (zero counting) is set to 1 we set bits to 32-8. Users can of course still use the /x to change this behavior or in cases where the network is in the trailing part of the address, a "netmask" argument can be supplied to override what is established from the interpretation of the address itself. e.g: route add -net 10.1.1.8 -netmask 0xff00ffff should overide and place the proper CIDR mask in place. PR: 131365 MFC after: 1 week Modified: head/sbin/route/route.c Modified: head/sbin/route/route.c ============================================================================== --- head/sbin/route/route.c Mon Apr 6 07:13:26 2009 (r190757) +++ head/sbin/route/route.c Mon Apr 6 10:09:20 2009 (r190758) @@ -713,7 +713,7 @@ newroute(argc, argv) #ifdef INET6 if (af == AF_INET6) { rtm_addrs &= ~RTA_NETMASK; - memset((void *)&so_mask, 0, sizeof(so_mask)); + memset((void *)&so_mask, 0, sizeof(so_mask)); } #endif } @@ -803,21 +803,22 @@ inet_makenetandmask(net, sin, bits) addr = net << IN_CLASSC_NSHIFT; else addr = net; - - if (bits != 0) - mask = 0xffffffff << (32 - bits); - else if (net == 0) - mask = 0; - else if (IN_CLASSA(addr)) - mask = IN_CLASSA_NET; - else if (IN_CLASSB(addr)) - mask = IN_CLASSB_NET; - else if (IN_CLASSC(addr)) - mask = IN_CLASSC_NET; - else if (IN_MULTICAST(addr)) - mask = IN_CLASSD_NET; - else - mask = 0xffffffff; + /* + * If no /xx was specified we must cacluate the + * CIDR address. + */ + if ((bits == 0) && (addr != 0)) { + int i, j; + for(i=0,j=1; i<32; i++) { + if (addr & j) { + break; + } + j <<= 1; + } + /* i holds the first non zero bit */ + bits = 32 - i; + } + mask = 0xffffffff << (32 - bits); sin->sin_addr.s_addr = htonl(addr); sin = &so_mask.sin; _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 10:20:01 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C71291065677 for ; Mon, 6 Apr 2009 10:20:01 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: from bizet.nethelp.no (bizet.nethelp.no [195.1.209.33]) by mx1.freebsd.org (Postfix) with SMTP id 121A28FC1F for ; Mon, 6 Apr 2009 10:20:00 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: (qmail 18860 invoked from network); 6 Apr 2009 10:19:59 -0000 Received: from bizet.nethelp.no (HELO localhost) (195.1.209.33) by bizet.nethelp.no with SMTP; 6 Apr 2009 10:19:59 -0000 Date: Mon, 06 Apr 2009 12:19:59 +0200 (CEST) Message-Id: <20090406.121959.74751582.sthaug@nethelp.no> To: bzeeb-lists@lists.zabbadoz.net From: sthaug@nethelp.no In-Reply-To: <20090405215842.C15361@maildrop.int.zabbadoz.net> References: <20090405.231044.74688369.sthaug@nethelp.no> <20090405214757.E15361@maildrop.int.zabbadoz.net> <20090405215842.C15361@maildrop.int.zabbadoz.net> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 10:20:02 -0000 > Ok, both versions had: < so->so_rcv.sb_hiwat) > > http://svn.freebsd.org/viewvc/base?view=revision&revision=166403 > > changed it for IPv4 the first time, > > http://svn.freebsd.org/viewvc/base?view=revision&revision=172795 > > changed it a second time for IPv4. > > Noone changed the IPv6 version. > > The syncache already seems to do it for both v4/v6 (common code). > > Can you try changing it to < sb_max) for IPv6 as well and see if > things work (better) for you? I changed it, and that worked like a dream. Now I get basically the same throughput with IPv4 and IPv6. There are of course still issues like lots of IPv6 tunnels that add extra latency - but that's not the fault of FreeBSD. Anyway, thanks for your work. Below is a context diff (against 7-STABLE cvsupped last night). Do we need a PR to get this into FreeBSD? Steinar Haug, Nethelp consulting, sthaug@nethelp.no ---------------------------------------------------------------------- *** tcp_usrreq.c.orig Sun Apr 5 22:51:49 2009 --- tcp_usrreq.c Mon Apr 6 11:15:11 2009 *************** *** 1153,1159 **** /* Compute window scaling to request. */ while (tp->request_r_scale < TCP_MAX_WINSHIFT && ! (TCP_MAXWIN << tp->request_r_scale) < so->so_rcv.sb_hiwat) tp->request_r_scale++; soisconnecting(so); --- 1153,1159 ---- /* Compute window scaling to request. */ while (tp->request_r_scale < TCP_MAX_WINSHIFT && ! (TCP_MAXWIN << tp->request_r_scale) < sb_max) tp->request_r_scale++; soisconnecting(so); From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 10:37:21 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 68C071065688 for ; Mon, 6 Apr 2009 10:37:21 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63904.mail.re1.yahoo.com (web63904.mail.re1.yahoo.com [69.147.97.119]) by mx1.freebsd.org (Postfix) with SMTP id 19F058FC12 for ; Mon, 6 Apr 2009 10:37:20 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 63616 invoked by uid 60001); 6 Apr 2009 10:37:20 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239014240; bh=t61mFWB4CKSnqZioDLU3wCAaq5MB6KLbBAVNmJIodN4=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=V+ssyr9A+GmlFRtJH87ZPK+lJ6W75tWSVjxHFFC0Zw9v+IoBZgrW+qS2slpin3MnIN6T1LdmhhqMpqelS83898pAwbdc7mP5NVZYZGDRf8QtSh43Yqd72qeJp6qH13e5gVA2xDgBrP3GRlwwixVWEqLUxxmnYFGlHQayA5FJ9wU= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=MpvEkaOxsTkqvTmwJMVUF5KFEStUFo5Xm6bqNTf2RiApQKbUXo86kxvN3HwpjwYGutJrktu2FxiXev1eAFpTH0JqBZ+X/WSo8sy96hXM6y63yOZLeNdnv1bUGKvmYFs5jiB8vuaAYQrOo4fJpRRFv7T3cOZf9qbPhsjOER9F24I=; Message-ID: <86599.63596.qm@web63904.mail.re1.yahoo.com> X-YMail-OSG: mhj7Sc4VM1ni_V9G4FD5czP8DVwFTjlyLkQXi6Sl90nc.DvtqIlSdYlmUwbULgIrqwe878qSX0_NqvBqwhSnGbuqFQ0l225Od5zXMPi6iYVjWpXdSVOSC7jkjf5BdIVbevvasq6p1F9MvqTWbgyVXj2zpja3sPmQJoffDoz1kC30WTlr9Dzo52Aqas1MmBB8kIs3A_Tb.hgEYqYWYSAvx_W3CKyQq4v.VIPWpvarG3DzW.SndAUNIzFh2NjDj_ZFrHcta_qNGwC8nDDf4i0MNuobBRRysoT.NY1L0H4Br9eNjT4C8QgESAc5UHAZ Received: from [98.242.222.229] by web63904.mail.re1.yahoo.com via HTTP; Mon, 06 Apr 2009 03:37:19 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Mon, 6 Apr 2009 03:37:19 -0700 (PDT) From: Barney Cordoba To: Ivan Voras , Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 10:37:21 -0000 --- On Sun, 4/5/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Ivan Voras" > Cc: freebsd-net@freebsd.org > Date: Sunday, April 5, 2009, 6:17 PM > On Sun, 5 Apr 2009, Ivan Voras wrote: > > >> The argument is not that they are slower (although > they probably are a bit slower), rather that they introduce > serialization bottlenecks by requiring synchronization > between CPUs in order to distribute the work. Certainly some > of the scalability issues in the stack are not a result of > that, but a good number are. > > > > I'd like to understand more. If (in netisr) I have > a mbuf with headers, is this data already transfered from > the card or is it magically "not here yet"? > > A lot depends on the details of the card and driver. The > driver will take cache misses on the descriptor ring entry, > if it's not already in cache, and the link layer will > take a cache miss on the front of the ethernet frame in the > cluster pointed to by the mbuf header as part of its demux. > What happens next depends on your dispatch model and cache > line size. Let's make a few simplifying assumptions > that are mostly true: > > - The driver associats a single cluster with each receive > ring entry for each > packet to be stored in, and the cluster is > cacheline-aligned. No header > splitting is enabled. > > - Standard ethernet encapsulation of IP is used, without > additional VLAN > headers or other encapsulation, etc. There are no IP > options. > > - We don't need to validate any checksums because the > hardware has done it for > us, so no need to take cache misses on data that > doesn't matter until we > reach higher layers. > > In the device driver/ithread code, we'll now proceed to > take some cache misses assuming we're not pretty lucky: > > (1) The descriptor ring entry > (2) The mbuf packet header > (3) The first cache line in the cluster > > This is sufficient to figure out what protocol we're > going to dispatch to, and depending on dispatch model, we > now either enqueue the packet for delivery to a netisr, or > we directly dispatch the handler for IP. > > If the packet is processed on the current CPU and we're > direct dispatching, or if we've dispatched to a netisr > on the same CPU and we're quite lucky, the mbuf packet > header and front of the cluster will be in the cache. > > However, what happens next depends on the cache fetch and > line size. If things happen in 32-byte cache lines or > smaller, we cache miss on the end of the IP header, because > the last two bytes of the destination IP address start at > offset 32 into the cluster. If we have 64-byte fetching and > line size, things go better because both the full IP and TCP > headers should be in that first cache line. > > One big advantage to direct dispatch is that it maximizes > the chances that we don't blow out the low-level CPU > caches between link-layer and IP-layer processing, meaning > that we might actually get through all the IP and TCP > headers without a cache miss on a 64-byte line size. If we > netisr dispatch to another CPU without a shared cache, or we > netisr dispatch to the current CPU but there's a > scheduling delay, other packets queued first, etc, we'll > take a number of the same cache misses over again as things > get pulled into the right cache. > > This presents a strong cache motivation to keep a packet > "on" a CPU and even in the same thread once > you've started processing it. If you have to enqueue, > you take locks, take a context switch, deal with the fact > that LRU on cache lines isn't going to like your queue > depth, and potentially pay a number of additional cache > misses on the same data. There are also some other good > reasons to use direct dispatch, such as avoiding doing work > on packets that will later be dropped if the netisr queue > overflows. > > This is why we direct dispatch by default, and why this is > quite a good strategy for multiple input queue network > cards, where it also buys us parallelism. > > Note that if the flow RSS hash is in the same cache line as > the rest of the receive descriptor ring entry, you may be > able to avoid the cache miss on the cluster and simply > redirect it to another CPU's netisr without ever reading > packet data, which avoids at least one and possibly two > cache misses, but also means that you have to run the link > layer in the remote netisr, rather than locally in the > ithread. > > > In the first case, the package reception code path is > not changed until it's queued on a thread, on which > it's handled in the future (or is the influence of > "other" data like timers and internal TCP > reassembly buffers so large?). In the second case, why? > > The good news about TCP reassembly is that we don't > have to look at the data, only mbuf headers and reassembly > buffer entries, so with any luck we've avoided actually > taking a cache miss on the data. If things go well, we can > avoid looking at anything but mbuf and packet headers until > the socket copies out, but I'm not sure how well we do > that in practice. > > > As the card and the OS can already process many > packets per second for something fairly complex as routing > (http://www.tancsa.com/blast.html), and TCP chokes swi:net > at 100% of a core, isn't this indication there's > certainly more space for improvement even with a > single-queue old-fashioned NICs? > > Maybe. It depends on the relative costs of local > processing vs redistributing the work, which involves > schedulers, IPIs, additional cache misses, lock contention, > and so on. This means there's a period where it > can't possibly be a win, and then at some point it's > a win as long as the stack scales. This is essentially the > usual trade-off in using threads and parallelism: does the > benefit of multiple parallel execution units make up for the > overheads of synchronization and data migration? > > There are some previous e-mail threads where people have > observed that for some workloads, switching to netisr wins > over direct dispatch. For example, if you have a number of > cores and are doing firewall processing, offloading work to > the netisr from the input ithread may improve performance. > However, this appears not to be the common case for end-host > workloads on the hardware we mostly target, and this is > increasingly true as multiple input queues come into play, > as the card itself will allow us to use multiple CPUs > without any interactions between the CPUs. > > This isn't to say that work redistribution using a > netisr-like scheme isn't a good idea: in a world where > CPU threads are weak compared to the wire workflow, and > there's cache locality across threads on the same core, > or NUMA is present, there may be a potential for a big win > when available work significantly exceeds what a single CPU > thread/core can handle. In that case, we want to place the > work as close as possible to take advantage of shared caches > or the memory being local to the CPU thread/core doing the > deferred work. > > FYI, the localhost case is a bit weird -- I think we have > some scheduling issues that are causing loopback netisr > stuff to be pessimally scheduled. Here are some suggestions > for things to try and see if they help, though: > > - Comment out all ifnet, IP, and TCP global statistics in > your local stack -- > especially look for things tcpstat.whatever++;. > > - Use cpuset to pin ithreads, the netisr, and whatever > else, to specific cores > so that they don't migrate, and if your system uses > HTT, experiment with > pinning the ithread and the netisr on different threads > on the same core, or > at least, different cores on the same die. > > - Experiment with using just the source IP, the source + > destination IP, and > both IPs plus TCP ports in your hash. > > - If your card supports RSS, pass the flowid up the stack > in the mbuf packet > header flowid field, and use that instead of the hash for > work placement. > > - If you're doing pure PPS tests with UDP (or the > like), and your test can > tolerate disordering, try hashing based on the mbuf > header address or > something else that will distribute the work but not take > a cache miss. > > - If you have a flowid or the above disordered condition > applies, try shifting > the link layer dispatch to the netisr, rather than doing > the demux in the > ithread, as that will avoid cache misses in the ithread > and do all the demux > in the netisr. > > Robert N M Watson > Computer Laboratory > University of Cambridge Is there a way to give a kernel thread exclusive use of a core? I know you can pin a kernel thread with sched_bind(), but is there a way to keep other threads from using the core? On an 8 core system it almost seems that the randomness of more cores is a negative in some situations. Also, I've noticed that calling sched_bind() during bootup is a bad thing in that it locks the system. I'm not certain but I suspect its the thread_lock that is the culprit. Is there a clean way to determine that its safe to lock curthread and do a cpu bind? Barney From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 11:06:58 2009 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A3B441065689 for ; Mon, 6 Apr 2009 11:06:58 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 8F7B28FC32 for ; Mon, 6 Apr 2009 11:06:58 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n36B6wK8061947 for ; Mon, 6 Apr 2009 11:06:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n36B6wW0061943 for freebsd-net@FreeBSD.org; Mon, 6 Apr 2009 11:06:58 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 6 Apr 2009 11:06:58 GMT Message-Id: <200904061106.n36B6wW0061943@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-net@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-net@FreeBSD.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 11:07:00 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/133235 net [netinet] [patch] Process SIOCDLIFADDR command incorre o kern/133218 net [carp] [hang] use of carp(4) causes system to freeze o kern/133060 net [ipsec] [pfsync] [panic] Kernel panic with ipsec + pfs o kern/132991 net [bge] if_bge low performance problem o kern/132984 net [netgraph] swi1: net 100% cpu usage f bin/132911 net ip6fw(8): argument type of fill_icmptypes is wrong and o kern/132889 net [ndis] [panic] NDIS kernel crash on load BCM4321 AGN d o kern/132885 net [wlan] 802.1x broken after SVN rev 189592 o conf/132851 net [fib] [patch] allow to setup fib for service running f o bin/132798 net [patch] ggatec(8): ggated/ggatec connection slowdown p o kern/132734 net [ifmib] [panic] panic in net/if_mib.c o kern/132722 net [ath] Wifi ath0 associates fine with AP, but DHCP or I o kern/132715 net [lagg] [panic] Panic when creating vlan's on lagg inte o kern/132705 net [libwrap] [patch] libwrap - infinite loop if hosts.all o kern/132672 net [ndis] [panic] ndis with rt2860.sys causes kernel pani o kern/132669 net [xl] 3c905-TX send DUP! in reply on ping (sometime) o kern/132625 net [iwn] iwn drivers don't support setting country o kern/132554 net [ipl] There is no ippool start script/ipfilter magic t o kern/132354 net [nat] Getting some packages to ipnat(8) causes crash o kern/132285 net [carp] alias gives incorrect hash in dmesg o kern/132277 net [crypto] [ipsec] poor performance using cryptodevice f o conf/132179 net [patch] /etc/network.subr: ipv6 rtsol on incorrect wla o kern/132107 net [carp] carp(4) advskew setting ignored when carp IP us o kern/131781 net [ndis] ndis keeps dropping the link o kern/131776 net [wi] driver fails to init o kern/131753 net [altq] [panic] kernel panic in hfsc_dequeue o bin/131567 net [socket] [patch] Update for regression/sockets/unix_cm o kern/131549 net ifconfig(8) can't clear 'monitor' mode on the wireless o kern/131536 net [netinet] [patch] kernel does allow manipulation of su o bin/131365 net route(8): route add changes interpretation of network o kern/131310 net [netgraph] [panic] 7.1 panics with mpd netgraph interf o kern/131162 net [ath] Atheros driver bugginess and kernel crashes o kern/131153 net [iwi] iwi doesn't see a wireless network f kern/131087 net [ipw] [panic] ipw / iwi - no sent/received packets; iw f kern/130820 net [ndis] wpa_supplicant(8) returns 'no space on device' o kern/130628 net [nfs] NFS / rpc.lockd deadlock on 7.1-R o conf/130555 net [rc.d] [patch] No good way to set ipfilter variables a o kern/130525 net [ndis] [panic] 64 bit ar5008 ndisgen-erated driver cau o kern/130311 net [wlan_xauth] [panic] hostapd restart causing kernel pa o bin/130159 net [patch] ppp(8) fails to correctly set routes o kern/130109 net [ipfw] Can not set fib for packets originated from loc f kern/130059 net [panic] Leaking 50k mbufs/hour o kern/129750 net [ath] Atheros AR5006 exits on "cannot map register spa f kern/129719 net [nfs] [panic] Panic during shutdown, tcp_ctloutput: in o kern/129580 net [ndis] Netgear WG311v3 (ndis) causes kenel trap at boo o kern/129517 net [ipsec] [panic] double fault / stack overflow o kern/129508 net [carp] [panic] Kernel panic with EtherIP (may be relat o kern/129352 net [xl] [patch] xl0 watchdog timeout o kern/129219 net [ppp] Kernel panic when using kernel mode ppp o kern/129197 net [panic] 7.0 IP stack related panic o kern/129135 net [vge] vge driver on a VIA mini-ITX not working o bin/128954 net ifconfig(8) deletes valid routes o kern/128917 net [wpi] [panic] if_wpi and wpa+tkip causing kernel panic o kern/128884 net [msk] if_msk page fault while in kernel mode o kern/128840 net [igb] page fault under load with igb/LRO o bin/128602 net [an] wpa_supplicant(8) crashes with an(4) o kern/128598 net [bluetooth] WARNING: attempt to net_add_domain(bluetoo o kern/128448 net [nfs] 6.4-RC1 Boot Fails if NFS Hostname cannot be res o conf/128334 net [request] use wpa_cli in the "WPA DHCP" situation o bin/128295 net [patch] ifconfig(8) does not print TOE4 or TOE6 capabi o bin/128001 net wpa_supplicant(8), wlan(4), and wi(4) issues o kern/127928 net [tcp] [patch] TCP bandwidth gets squeezed every time t o kern/127834 net [ixgbe] [patch] wrong error counting o kern/127826 net [iwi] iwi0 driver has reduced performance and connecti o kern/127815 net [gif] [patch] if_gif does not set vlan attributes from o kern/127724 net [rtalloc] rtfree: 0xc5a8f870 has 1 refs f bin/127719 net [arp] arp: Segmentation fault (core dumped) s kern/127587 net [bge] [request] if_bge(4) doesn't support BCM576X fami f kern/127528 net [icmp]: icmp socket receives icmp replies not owned by o bin/127192 net routed(8) removes the secondary alias IP of interface f kern/127145 net [wi]: prism (wi) driver crash at bigger traffic o kern/127102 net [wpi] Intel 3945ABG low throughput o kern/127057 net [udp] Unable to send UDP packet via IPv6 socket to IPv o kern/127050 net [carp] ipv6 does not work on carp interfaces [regressi o kern/126945 net [carp] CARP interface destruction with ifconfig destro o kern/126924 net [an] [patch] printf -> device_printf and simplify prob o kern/126895 net [patch] [ral] Add antenna selection (marked as TBD) o kern/126874 net [vlan]: Zebra problem if ifconfig vlanX destroy o bin/126822 net wpa_supplicant(8): WPA PSK does not work in adhoc mode o kern/126714 net [carp] CARP interface renaming makes system no longer o kern/126695 net rtfree messages and network disruption upon use of if_ o kern/126688 net [ixgbe] [patch] 1.4.7 ixgbe driver panic with 4GB and o kern/126475 net [ath] [panic] ath pcmcia card inevitably panics under o kern/126339 net [ipw] ipw driver drops the connection o kern/126214 net [ath] txpower problem with Atheros wifi card o kern/126075 net [inet] [patch] internet control accesses beyond end of o bin/125922 net [patch] Deadlock in arp(8) o kern/125920 net [arp] Kernel Routing Table loses Ethernet Link status o kern/125845 net [netinet] [patch] tcp_lro_rx() should make use of hard o kern/125816 net [carp] [if_bridge] carp stuck in init when using bridg f kern/125502 net [ral] ifconfig ral0 scan produces no output unless in o kern/125258 net [socket] socket's SO_REUSEADDR option does not work o kern/125239 net [gre] kernel crash when using gre f kern/125195 net [fxp] fxp(4) driver failed to initialize device Intel o kern/124904 net [fxp] EEPROM corruption with Compaq NC3163 NIC o kern/124767 net [iwi] Wireless connection using iwi0 driver (Intel 220 o kern/124753 net [ieee80211] net80211 discards power-save queue packets o kern/124341 net [ral] promiscuous mode for wireless device ral0 looses o kern/124160 net [libc] connect(2) function loops indefinitely o kern/124127 net [msk] watchdog timeout (missed Tx interrupts) -- recov o kern/124021 net [ip6] [panic] page fault in nd6_output() o kern/123968 net [rum] [panic] rum driver causes kernel panic with WPA. p kern/123961 net [vr] [patch] Allow vr interface to handle vlans o kern/123892 net [tap] [patch] No buffer space available o kern/123890 net [ppp] [panic] crash & reboot on work with PPP low-spee o kern/123858 net [stf] [patch] stf not usable behind a NAT o kern/123796 net [ipf] FreeBSD 6.1+VPN+ipnat+ipf: port mapping does not o bin/123633 net ifconfig(8) doesn't set inet and ether address in one f kern/123617 net [tcp] breaking connection when client downloading file o kern/123603 net [tcp] tcp_do_segment and Received duplicate SYN o kern/123559 net [iwi] iwi periodically disassociates/associates [regre o bin/123465 net [ip6] route(8): route add -inet6 -interfac o kern/123463 net [ipsec] [panic] repeatable crash related to ipsec-tool o kern/123429 net [nfe] [hang] "ifconfig nfe up" causes a hard system lo o kern/123347 net [bge] bge1: watchdog timeout -- linkstate changed to D o conf/123330 net [nsswitch.conf] Enabling samba wins in nsswitch.conf c o kern/123256 net [wpi] panic: blockable sleep lock with wpi(4) f kern/123172 net [bce] Watchdog timeout problems with if_bce o kern/123160 net [ip] Panic and reboot at sysctl kern.polling.enable=0 o kern/122989 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/122954 net [lagg] IPv6 EUI64 incorrectly chosen for lagg devices o kern/122928 net [em] interface watchdog timeouts and stops receiving p f kern/122839 net [multicast] FreeBSD 7 multicast routing problem p kern/122794 net [lagg] Kernel panic after brings lagg(8) up if NICs ar o kern/122780 net [lagg] tcpdump on lagg interface during high pps wedge o kern/122772 net [em] em0 taskq panic, tcp reassembly bug causes radix o kern/122743 net [mbuf] [panic] vm_page_unwire: invalid wire count: 0 o kern/122697 net [ath] Atheros card is not well supported o kern/122685 net It is not visible passing packets in tcpdump(1) o kern/122551 net [bge] Broadcom 5715S no carrier on HP BL460c blade usi o kern/122319 net [wi] imposible to enable ad-hoc demo mode with Orinoco o kern/122290 net [netgraph] [panic] Netgraph related "kmem_map too smal f kern/122252 net [ipmi] [bge] IPMI problem with BCM5704 (does not work o kern/122195 net [ed] Alignment problems in if_ed o kern/122058 net [em] [panic] Panic on em1: taskq o kern/122033 net [ral] [lor] Lock order reversal in ral0 at bootup [reg o kern/121983 net [fxp] fxp0 MBUF and PAE o bin/121895 net [patch] rtsol(8)/rtsold(8) doesn't handle managed netw o kern/121872 net [wpi] driver fails to attach on a fujitsu-siemens s711 s kern/121774 net [swi] [panic] 6.3 kernel panic in swi1: net o kern/121706 net [netinet] [patch] "rtfree: 0xc4383870 has 1 refs" emit o kern/121624 net [em] [regression] Intel em WOL fails after upgrade to o kern/121555 net [panic] Fatal trap 12: current process = 12 (swi1: net o kern/121443 net [gif] [lor] icmp6_input/nd6_lookup o kern/121437 net [vlan] Routing to layer-2 address does not work on VLA o bin/121359 net [patch] ppp(8): fix local stack overflow in ppp o kern/121298 net [em] [panic] Fatal trap 12: page fault while in kernel o kern/121257 net [tcp] TSO + natd -> slow outgoing tcp traffic o kern/121181 net [panic] Fatal trap 3: breakpoint instruction fault whi o kern/121080 net [bge] IPv6 NUD problem on multi address config on bge0 o kern/120966 net [rum] kernel panic with if_rum and WPA encryption p docs/120945 net [patch] ip6(4) man page lacks documentation for TCLASS o kern/120566 net [request]: ifconfig(8) make order of arguments more fr o kern/120304 net [netgraph] [patch] netgraph source assumes 32-bit time o kern/120266 net [udp] [panic] gnugk causes kernel panic when closing U o kern/120232 net [nfe] [patch] Bring in nfe(4) to RELENG_6 o kern/120130 net [carp] [panic] carp causes kernel panics in any conste o bin/120060 net routed(8) deletes link-level routes in the presence of o kern/119945 net [rum] [panic] rum device in hostap mode, cause kernel o kern/119791 net [nfs] UDP NFS mount of aliased IP addresses from a Sol o kern/119617 net [nfs] nfs error on wpa network when reseting/shutdown f kern/119516 net [ip6] [panic] _mtx_lock_sleep: recursed on non-recursi o kern/119432 net [arp] route add -host -iface causes arp e o kern/119225 net [wi] 7.0-RC1 no carrier with Prism 2.5 wifi card [regr a bin/118987 net ifconfig(8): ifconfig -l (address_family) does not wor o sparc/118932 net [panic] 7.0-BETA4/sparc-64 kernel panic in rip_output a kern/118879 net [bge] [patch] bge has checksum problems on the 5703 ch o kern/118727 net [netgraph] [patch] [request] add new ng_pf module s kern/117717 net [panic] Kernel panic with Bittorrent client. o kern/117448 net [carp] 6.2 kernel crash [regression] o kern/117423 net [vlan] Duplicate IP on different interfaces o bin/117339 net [patch] route(8): loading routing management commands o kern/117271 net [tap] OpenVPN TAP uses 99% CPU on releng_6 when if_tap o kern/117043 net [em] Intel PWLA8492MT Dual-Port Network adapter EEPROM o kern/116837 net [tun] [panic] [patch] ifconfig tunX destroy: panic o kern/116747 net [ndis] FreeBSD 7.0-CURRENT crash with Dell TrueMobile o bin/116643 net [patch] [request] fstat(1): add INET/INET6 socket deta o kern/116328 net [bge]: Solid hang with bge interface o kern/116185 net [iwi] if_iwi driver leads system to reboot o kern/115239 net [ipnat] panic with 'kmem_map too small' using ipnat o kern/115019 net [netgraph] ng_ether upper hook packet flow stops on ad o kern/115002 net [wi] if_wi timeout. failed allocation (busy bit). ifco o kern/114915 net [patch] [pcn] pcn (sys/pci/if_pcn.c) ethernet driver f f kern/114899 net [bge] bge0: watchdog timeout -- resetting o kern/114839 net [fxp] fxp looses ability to speak with traffic o kern/113895 net [xl] xl0 fails on 6.2-RELEASE but worked fine on 5.5-R o kern/112722 net [ipsec] [udp] IP v4 udp fragmented packet reject o kern/112686 net [patm] patm driver freezes System (FreeBSD 6.2-p4) i38 o kern/112570 net [bge] packet loss with bge driver on BCM5704 chipset o bin/112557 net [patch] ppp(8) lock file should not use symlink name o kern/112528 net [nfs] NFS over TCP under load hangs with "impossible p o kern/111457 net [ral] ral(4) freeze o kern/110140 net [ipw] ipw fails under load o kern/109733 net [bge] bge link state issues [regression] o kern/109470 net [wi] Orinoco Classic Gold PC Card Can't Channel Hop o kern/109308 net [pppd] [panic] Multiple panics kernel ppp suspected [r o kern/109251 net [re] [patch] if_re cardbus card won't attach o bin/108895 net pppd(8): PPPoE dead connections on 6.2 [regression] o kern/108542 net [bce] Huge network latencies with 6.2-RELEASE / STABLE o kern/107944 net [wi] [patch] Forget to unlock mutex-locks o kern/107850 net [bce] bce driver link negotiation is faulty o conf/107035 net [patch] bridge(8): bridge interface given in rc.conf n o kern/106438 net [ipf] ipfilter: keep state does not seem to allow repl o kern/106316 net [dummynet] dummynet with multipass ipfw drops packets o kern/106243 net [nve] double fault panic in if_nve.c on high loads o kern/105945 net Address can disappear from network interface s kern/105943 net Network stack may modify read-only mbuf chain copies o bin/105925 net problems with ifconfig(8) and vlan(4) [regression] o kern/105348 net [ath] ath device stopps TX o kern/104851 net [inet6] [patch] On link routes not configured when usi o kern/104751 net [netgraph] kernel panic, when getting info about my tr o kern/104485 net [bge] Broadcom BCM5704C: Intermittent on newer chip ve o kern/103191 net Unpredictable reboot o kern/103135 net [ipsec] ipsec with ipfw divert (not NAT) encodes a pac o conf/102502 net [netgraph] [patch] ifconfig name does't rename netgrap o kern/102035 net [plip] plip networking disables parallel port printing o kern/101948 net [ipf] [panic] Kernel Panic Trap No 12 Page Fault - cau o kern/100709 net [libc] getaddrinfo(3) should return TTL info o kern/100519 net [netisr] suggestion to fix suboptimal network polling o kern/98978 net [ipf] [patch] ipfilter drops OOW packets under 6.1-Rel o kern/98597 net [inet6] Bug in FreeBSD 6.1 IPv6 link-local DAD procedu o bin/98218 net wpa_supplicant(8) blacklist not working f bin/97392 net ppp(8) hangs instead terminating o kern/97306 net [netgraph] NG_L2TP locks after connection with failed f kern/96268 net [socket] TCP socket performance drops by 3000% if pack o kern/96030 net [bfe] [patch] Install hangs with Broadcomm 440x NIC in o kern/95519 net [ral] ral0 could not map mbuf o kern/95288 net [pppd] [tty] [panic] if_ppp panic in sys/kern/tty_subr o kern/95277 net [netinet] [patch] IP Encapsulation mask_match() return o kern/95267 net packet drops periodically appear s kern/94863 net [bge] [patch] hack to get bge(4) working on IBM e326m o kern/94162 net [bge] 6.x kenel stale with bge(4) o kern/93886 net [ath] Atheros/D-Link DWL-G650 long delay to associate f kern/93378 net [tcp] Slow data transfer in Postfix and Cyrus IMAP (wo o kern/93019 net [ppp] ppp and tunX problems: no traffic after restarti o kern/92880 net [libc] [patch] almost rewritten inet_network(3) functi f kern/92552 net A serious bug in most network drivers from 5.X to 6.X s kern/92279 net [dc] Core faults everytime I reboot, possible NIC issu o kern/92090 net [bge] bge0: watchdog timeout -- resetting o kern/91859 net [ndis] if_ndis does not work with Asus WL-138 s kern/91777 net [ipf] [patch] wrong behaviour with skip rule inside an o kern/91594 net [em] FreeBSD > 5.4 w/ACPI fails to detect Intel Pro/10 o kern/91364 net [ral] [wep] WF-511 RT2500 Card PCI and WEP o kern/91311 net [aue] aue interface hanging o kern/90890 net [vr] Problems with network: vr0: tx shutdown timeout s kern/90086 net [hang] 5.4p8 on supermicro P8SCT hangs during boot if f kern/88082 net [ath] [panic] cts protection for ath0 causes panic o kern/87521 net [ipf] [panic] using ipfilter "auth" keyword leads to k o kern/87506 net [vr] [patch] Fix alias support on vr interfaces o kern/87194 net [fxp] fxp(4) promiscuous mode seems to corrupt hw-csum s kern/86920 net [ndis] ifconfig: SIOCS80211: Invalid argument [regress o kern/86103 net [ipf] Illegal NAT Traversal in IPFilter o kern/85780 net 'panic: bogus refcnt 0' in routing/ipv6 o bin/85445 net ifconfig(8): deprecated keyword to ifconfig inoperativ o kern/85266 net [xe] [patch] xe(4) driver does not recognise Xircom XE o kern/84202 net [ed] [patch] Holtek HT80232 PCI NIC recognition on Fre o bin/82975 net route change does not parse classfull network as given o kern/82497 net [vge] vge(4) on AMD64 only works when loaded late, not f kern/81644 net [vge] vge(4) does not work properly when loaded as a K s kern/81147 net [net] [patch] em0 reinitialization while adding aliase o kern/80853 net [ed] [patch] add support for Compex RL2000/ISA in PnP o kern/79895 net [ipf] 5.4-RC2 breaks ipfilter NAT when using netgraph f kern/79262 net [dc] Adaptec ANA-6922 not fully supported o bin/79228 net [patch] extend arp(8) to be able to create blackhole r o kern/78090 net [ipf] ipf filtering on bridged packets doesn't work if p kern/77913 net [wi] [patch] Add the APDL-325 WLAN pccard to wi(4) o kern/77341 net [ip6] problems with IPV6 implementation o kern/77273 net [ipf] ipfilter breaks ipv6 statefull filtering on 5.3 s kern/77195 net [ipf] [patch] ipfilter ioctl SIOCGNATL does not match o kern/75873 net Usability problem with non-RFC-compliant IP spoof prot s kern/75407 net [an] an(4): no carrier after short time f kern/73538 net [bge] problem with the Broadcom BCM5788 Gigabit Ethern o kern/71469 net default route to internet magically disappears with mu o kern/70904 net [ipf] ipfilter ipnat problem with h323 proxy support o kern/64556 net [sis] if_sis short cable fix problems with NetGear FA3 s kern/60293 net [patch] FreeBSD arp poison patch o kern/54383 net [nfs] [patch] NFS root configurations without dynamic f i386/45773 net [bge] Softboot causes autoconf failure on Broadcom 570 s bin/41647 net ifconfig(8) doesn't accept lladdr along with inet addr s kern/39937 net ipstealth issue a kern/38554 net [patch] changing interface ipaddress doesn't seem to w o kern/35442 net [sis] [patch] Problem transmitting runts in if_sis dri o kern/34665 net [ipf] [hang] ipfilter rcmd proxy "hangs". o kern/31647 net [libc] socket calls can return undocumented EINVAL o kern/30186 net [libc] getaddrinfo(3) does not handle incorrect servna o kern/27474 net [ipf] [ppp] Interactive use of user PPP and ipfilter c o conf/23063 net [arp] [patch] for static ARP tables in rc.network 287 problems total. From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 11:59:11 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2EE9110656D1; Mon, 6 Apr 2009 11:59:11 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id E28268FC17; Mon, 6 Apr 2009 11:59:10 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 84C6346B82; Mon, 6 Apr 2009 07:59:10 -0400 (EDT) Date: Mon, 6 Apr 2009 12:59:10 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ivan Voras In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 11:59:12 -0000 On Mon, 6 Apr 2009, Ivan Voras wrote: >>> I'd like to understand more. If (in netisr) I have a mbuf with headers, is >>> this data already transfered from the card or is it magically "not here >>> yet"? >> >> A lot depends on the details of the card and driver. The driver will take >> cache misses on the descriptor ring entry, if it's not already in cache, >> and the link layer will take a cache miss on the front of the ethernet >> frame in the cluster pointed to by the mbuf header as part of its demux. >> What happens next depends on your dispatch model and cache line size. >> Let's make a few simplifying assumptions that are mostly true: > > So, a mbuf can reference data not yet copied from the NIC hardware? I'm > specifically trying to undestand what m_pullup() does. I think we're talking slightly at cross purposes. There are two transfers of interest: (1) DMA of the packet data to main memory from the NIC (2) Servicing of CPU cache misses to access data in main memory By the time you receive an interrupt, the DMA is complete, so once you believe a packet referenced by the descriptor ring is done, you don't have to wait for DMA. However, the packet data is in main memory rather than your CPU cache, so you'll need to take a cache miss in order to retrieve it. You don't want to prefetch before you know the packet data is there, or you may prefetch stale data from the previous packet sent or received from the cluster. m_pullup() has to do with mbuf chain memory contiguity during packet processing. The usual usage is something along the following lines: struct whatever *w; m = m_pullup(m, sizeof(*w)); if (m == NULL) return; w = mtod(m, struct whatever *); m_pullup() here ensures that the first sizeof(*w) bytes of mbuf data are contiguously stored so that the cast of w to m's data will point at a complete structure we can use to interpret packet data. In the common case in the receipt path, m_pullup() should be a no-op, since almost all drivers receive data in a single cluster. However, there are cases where it might not happen, such as loopback traffic where unusual encapsulation is used, leading to a call to M_PREPEND() that inserts a new mbuf on the front of the chain, which is later m_defrag()'d leading to a higher level header crossing a boundary or the like. This issue is almost entirely independent from things like the cache line miss issue, unless you hit the uncommon case of having to do work in m_pullup(), in which case life sucks. It would be useful to use DTrace to profile a number of the workfull m_foo() functions to make sure we're not hitting them in normal workloads, btw. >>> As the card and the OS can already process many packets per second for >>> something fairly complex as routing >>> (http://www.tancsa.com/blast.html), and TCP chokes swi:net at 100% of >>> a core, isn't this indication there's certainly more space for >>> improvement even with a single-queue old-fashioned NICs? >> >> Maybe. It depends on the relative costs of local processing vs >> redistributing the work, which involves schedulers, IPIs, additional >> cache misses, lock contention, and so on. This means there's a period >> where it can't possibly be a win, and then at some point it's a win as >> long as the stack scales. This is essentially the usual trade-off in >> using threads and parallelism: does the benefit of multiple parallel >> execution units make up for the overheads of synchronization and data >> migration? > > Do you have any idea at all why I'm seeing the weird difference of netstat > packets per second (250,000) and my application's TCP performance (< 1,000 > pps)? Summary: each packet is guaranteed to be a whole message causing a > transaction in the application - without the changes I see pps almost > identical to tps. Even if the source of netstat statistics somehow manages > to count packets multiple time (I don't see how that can happen), no > relation can describe differences this huge. It almost looks like something > in the upper layers is discarding packets (also not likely: TCP timeouts > would occur and the application wouldn't be able to push 250,000 pps) - but > what? Where to look? Is this for the loopback workload? If so, remember that there may be some other things going on: - Every packet is processed at least two times: once went sent, and then again when it's received. - A TCP segment will need to be ACK'd, so if you're sending data in chunks in one direction, the ACKs will not be piggy-backed on existing data tranfers, and instead be sent independently, hitting the network stack two more times. - Remember that TCP works to expand its window, and then maintains the highest performance it can by bumping up against the top of available bandwidth continuously. This involves detecting buffer limits by generating packets that can't be sent, adding to the packet count. With loopback traffic, the drop point occurs when you exceed the size of the netisr's queue for IP, so you might try bumping that from the default to something much larger. And nothing beats using tcpdump -- have you tried tcpdumping the loopback to see what is actually being sent? If not, that's always educational -- perhaps something weird is going on with delayed ACKs, etc. > You mean for the general code? I purposely don't lock my statistics > variables because I'm not that interested in exact numbers (orders of > magnitude are relevant). As far as I understand, unlocked "x++" should be > trivially fast in this case? No. x++ is massively slow if executed in parallel across many cores on a variable in a single cache line. See my recent commit to kern_tc.c for an example: the updating of trivial statistics for the kernel time calls reduced 30m syscalls/second to 3m syscalls/second due to heavy contention on the cache line holding the statistic. One of my goals for 8.0 is to fix this problem for IP and TCP layers, and ideally also ifnet but we'll see. We should be maintaining those stats per-CPU and then aggregating to report them to userspace. This is what we already do for a number of system stats -- UMA and kernel malloc, syscall and trap counters, etc. >> - Use cpuset to pin ithreads, the netisr, and whatever else, to specific >> cores >> so that they don't migrate, and if your system uses HTT, experiment with >> pinning the ithread and the netisr on different threads on the same >> core, or >> at least, different cores on the same die. > > I'm using em hardware; I still think there's a possibility I'm fighting the > driver in some cases but this has priority #2. Have you tried LOCK_PROFILING? It would quickly tell you if driver locks were a source of significant contention. It works quite well... >> - If your card supports RSS, pass the flowid up the stack in the mbuf >> packet >> header flowid field, and use that instead of the hash for work placement. > > Don't know about em. Don't really want to touch it if I don't have to :) if_em doesn't support it, but if_igb does. If this saves you a minimum of one and possibly two cache misses per packet, it could be a huge performance improvement. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 12:09:10 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1F6291065745; Mon, 6 Apr 2009 12:09:10 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id E62D08FC1A; Mon, 6 Apr 2009 12:09:09 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 96DAC46B90; Mon, 6 Apr 2009 08:09:09 -0400 (EDT) Date: Mon, 6 Apr 2009 13:09:09 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Barney Cordoba In-Reply-To: <86599.63596.qm@web63904.mail.re1.yahoo.com> Message-ID: References: <86599.63596.qm@web63904.mail.re1.yahoo.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 12:09:15 -0000 On Mon, 6 Apr 2009, Barney Cordoba wrote: > Is there a way to give a kernel thread exclusive use of a core? I know you > can pin a kernel thread with sched_bind(), but is there a way to keep other > threads from using the core? On an 8 core system it almost seems that the > randomness of more cores is a negative in some situations. > > Also, I've noticed that calling sched_bind() during bootup is a bad thing in > that it locks the system. I'm not certain but I suspect its the thread_lock > that is the culprit. Is there a clean way to determine that its safe to lock > curthread and do a cpu bind? There isn't an interface to cleanly express "Use CPUs 4-7 for only network processing". You can configure the system this way using the cpuset command (including directing the low-level interrupts to specific CPUs in 8.x), but if we think this is going to be a frequently desired policy, a bit more abstraction will be required. I'm not familiar with the problem you're seeing with sched_bind() -- I'm using it from within some of my code without a problem, and that's fairly early in the boot. A number of deadlocks are possible if one isn't very careful early in the boot though, so I might look specifically for some of those: if you migrate a thread to a CPU that isn't yet started, it won't be able to run until the CPU has started. This means it's important not to migrate threads that might lead to priority version-like deadlocks: - Be careful not to migrate threads that hold locks the system requires to get to the point where multiple CPUs run. - Be careful not to migrate threads that will signal a resource being available, such as a device driver, required to get to the point where multiple CPUs run. - Be careful not to migrate the main boot thread. Could you be running into one of those cases? Usually they're fairly easy to diagnose using DDB, if you can get into it, because you can see what the main boot thread is waiting for, and reason about what's holding it. Are you able to get into DDB when this occurs? (Perhaps using an NMI?) Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 12:35:57 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6DBCB10656F6 for ; Mon, 6 Apr 2009 12:35:57 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id C82BE8FC0A for ; Mon, 6 Apr 2009 12:35:56 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1Lqo3P-0003Qo-Cg for freebsd-net@freebsd.org; Mon, 06 Apr 2009 12:35:55 +0000 Received: from lara.cc.fer.hr ([161.53.72.113]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 06 Apr 2009 12:35:55 +0000 Received: from ivoras by lara.cc.fer.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 06 Apr 2009 12:35:55 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Ivan Voras Date: Mon, 06 Apr 2009 14:35:33 +0200 Lines: 168 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enig2259B8C6FCD2C8A9C92854A6" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: lara.cc.fer.hr User-Agent: Thunderbird 2.0.0.21 (X11/20090318) In-Reply-To: X-Enigmail-Version: 0.95.0 Sender: news Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 12:35:57 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig2259B8C6FCD2C8A9C92854A6 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Robert Watson wrote: > On Mon, 6 Apr 2009, Ivan Voras wrote: >> So, a mbuf can reference data not yet copied from the NIC hardware? >> I'm specifically trying to undestand what m_pullup() does. >=20 > I think we're talking slightly at cross purposes. There are two > transfers of interest: >=20 > (1) DMA of the packet data to main memory from the NIC > (2) Servicing of CPU cache misses to access data in main memory >=20 > By the time you receive an interrupt, the DMA is complete, so once you OK, this was what was confusing me - for a moment I thought you meant it's not so. > believe a packet referenced by the descriptor ring is done, you don't > have to wait for DMA. However, the packet data is in main memory rathe= r > than your CPU cache, so you'll need to take a cache miss in order to > retrieve it. You don't want to prefetch before you know the packet dat= a > is there, or you may prefetch stale data from the previous packet sent > or received from the cluster. >=20 > m_pullup() has to do with mbuf chain memory contiguity during packet > processing. The usual usage is something along the following lines: >=20 > struct whatever *w; >=20 > m =3D m_pullup(m, sizeof(*w)); > if (m =3D=3D NULL) > return; > w =3D mtod(m, struct whatever *); > > m_pullup() here ensures that the first sizeof(*w) bytes of mbuf data ar= e > contiguously stored so that the cast of w to m's data will point at a So, m_pullup() can resize / realloc() the mbuf? (not that it matters for this purpose) > Is this for the loopback workload? If so, remember that there may be > some other things going on: Both loopback and physical. > - Every packet is processed at least two times: once went sent, and the= n > again > when it's received. >=20 > - A TCP segment will need to be ACK'd, so if you're sending data in > chunks in > one direction, the ACKs will not be piggy-backed on existing data > tranfers, > and instead be sent independently, hitting the network stack two more= > times. No combination of these can make an accounting difference between 1,000 and 250,000 pps. I must be hitting something very bad here. > - Remember that TCP works to expand its window, and then maintains the > highest > performance it can by bumping up against the top of available bandwid= th > continuously. This involves detecting buffer limits by generating > packets > that can't be sent, adding to the packet count. With loopback > traffic, the > drop point occurs when you exceed the size of the netisr's queue for > IP, so > you might try bumping that from the default to something much larger.= My messages are approx. 100 +/- 10 bytes. No practical way they will even span multiple mbufs. TCP_NODELAY is on. > No. x++ is massively slow if executed in parallel across many cores on= > a variable in a single cache line. See my recent commit to kern_tc.c > for an example: the updating of trivial statistics for the kernel time > calls reduced 30m syscalls/second to 3m syscalls/second due to heavy > contention on the cache line holding the statistic. One of my goals fo= r I don't get it: http://svn.freebsd.org/viewvc/base/stable/7/sys/kern/kern_tc.c?r1=3D18989= 1&r2=3D189890&pathrev=3D189891 you replaced x++ with no-ops if TC_COUNTER is defined? Aren't the timecounters actually needed somewhere? > 8.0 is to fix this problem for IP and TCP layers, and ideally also ifne= t > but we'll see. We should be maintaining those stats per-CPU and then > aggregating to report them to userspace. This is what we already do fo= r > a number of system stats -- UMA and kernel malloc, syscall and trap > counters, etc. How magic is this? Is it just a matter of declaring mystatarray[NCPU] and updating mystat[current_cpu] or (probably), the spacing between array elements should be magically fixed so two elements don't share a cache line? >>> - Use cpuset to pin ithreads, the netisr, and whatever else, to speci= fic >>> cores >>> so that they don't migrate, and if your system uses HTT, experiment= >>> with >>> pinning the ithread and the netisr on different threads on the same= >>> core, or >>> at least, different cores on the same die. >> >> I'm using em hardware; I still think there's a possibility I'm >> fighting the driver in some cases but this has priority #2. >=20 > Have you tried LOCK_PROFILING? It would quickly tell you if driver > locks were a source of significant contention. It works quite well... I don't think I'm fighting against locking artifacts, it looks more like some kind of overly smart hardware thing, like interrupt moderation (but not exactly interrupt moderation since the number of IRQs/s remains approx. the same). >>> - If your card supports RSS, pass the flowid up the stack in the mbuf= >>> packet >>> header flowid field, and use that instead of the hash for work >>> placement. >> >> Don't know about em. Don't really want to touch it if I don't have to = :) >=20 > if_em doesn't support it, but if_igb does. If this saves you a minimum= > of one and possibly two cache misses per packet, it could be a huge > performance improvement. If I had the funds to upgrade hardware, I wouldn't be so interested in solving it in software :) --------------enig2259B8C6FCD2C8A9C92854A6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJ2fccldnAQVacBcgRAnUsAKDvLaUuooKGdMVtT+qJDLQXFNQ/CQCeJvP3 2Xzrk5yV4QbhBpmg5XvCqPk= =0776 -----END PGP SIGNATURE----- --------------enig2259B8C6FCD2C8A9C92854A6-- From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 13:41:01 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9D6910656CA for ; Mon, 6 Apr 2009 13:41:01 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63906.mail.re1.yahoo.com (web63906.mail.re1.yahoo.com [69.147.97.121]) by mx1.freebsd.org (Postfix) with SMTP id 771C68FC23 for ; Mon, 6 Apr 2009 13:41:01 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 42589 invoked by uid 60001); 6 Apr 2009 13:41:00 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239025260; bh=4zCf0qunp1yvuBrFzFbUJrL2OUSB327jY2S0yHu6Otk=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=o6EkvyJfY851396ia3yZsEMP+sbop7DFQjpD7UwXkDRA2PrTTnPHlzKm+avcWWMPW2AUiWuSdr87dD3xM9T9q2OrahshD7btZDk9zX1FT+BuDENx0D5e/oB/TuQg1D12/ZkoP4ahJ24Fh1nBQVlFr8sQ+bgNlE3XUIZtxJo1m4o= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=pGj/cgoD2ixdz3Dw6zSWaHXfhVVJiNeF740dr+0uXNlcpXggQjEbnH5K56KOCeFammkAQ4N4aGfQDf29sN5w7rnWSmY5I237KmE3FM0n8DV/ouyOPRio29hY0FDhWLZVIa1RT+Kjti7dHKI9OuQVdal6rqu8kO3ZecSI0lCYJoA=; Message-ID: <812958.41771.qm@web63906.mail.re1.yahoo.com> X-YMail-OSG: F3zTFRsVM1kGN4FvWZRmCWc7y9S5oeB4I4iqDv.KXpzSQtqyjerSwCOvJYslesZUyl_iwQ7LIir507cIyIMo4p8DXw57bV3x_fxwOQMb3C5zh273_JfLvLSTOnoHbmjqrofeM9uFM4cbNfPJ7o7KhsSUhXqSYnqIi_wDLLxXQevZVu5Y0Qc73QsK9t348kXfBlicKrxISmpJu6Msn.SvgDyzFZKZzxyA.Yeg3gEpiuP8waiZMmVyA3K45EL4KIE461W7ieLCjD20bMDg0uuu8sf7CDHOvIu5H3UMGTN5FTJAafiYbBgtwXecRHa5fM9ceguL8Qd4Hw4o_JeU3rxvPzhI Received: from [98.242.222.229] by web63906.mail.re1.yahoo.com via HTTP; Mon, 06 Apr 2009 06:41:00 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Mon, 6 Apr 2009 06:41:00 -0700 (PDT) From: Barney Cordoba To: Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 13:41:02 -0000 --- On Mon, 4/6/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Barney Cordoba" > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Monday, April 6, 2009, 8:09 AM > On Mon, 6 Apr 2009, Barney Cordoba wrote: > > > Is there a way to give a kernel thread exclusive use > of a core? I know you can pin a kernel thread with > sched_bind(), but is there a way to keep other threads from > using the core? On an 8 core system it almost seems that the > randomness of more cores is a negative in some situations. > > > > Also, I've noticed that calling sched_bind() > during bootup is a bad thing in that it locks the system. > I'm not certain but I suspect its the thread_lock that > is the culprit. Is there a clean way to determine that its > safe to lock curthread and do a cpu bind? > > There isn't an interface to cleanly express "Use > CPUs 4-7 for only network processing". You can > configure the system this way using the cpuset command > (including directing the low-level interrupts to specific > CPUs in 8.x), but if we think this is going to be a > frequently desired policy, a bit more abstraction will be > required. > > I'm not familiar with the problem you're seeing > with sched_bind() -- I'm using it from within some of my > code without a problem, and that's fairly early in the > boot. A number of deadlocks are possible if one isn't > very careful early in the boot though, so I might look > specifically for some of those: if you migrate a thread to a > CPU that isn't yet started, it won't be able to run > until the CPU has started. This means it's important > not to migrate threads that might lead to priority > version-like deadlocks: > > - Be careful not to migrate threads that hold locks the > system requires to get > to the point where multiple CPUs run. > - Be careful not to migrate threads that will signal a > resource being > available, such as a device driver, required to get to > the point where > multiple CPUs run. > - Be careful not to migrate the main boot thread. > > Could you be running into one of those cases? Usually > they're fairly easy to diagnose using DDB, if you can > get into it, because you can see what the main boot thread > is waiting for, and reason about what's holding it. Are > you able to get into DDB when this occurs? (Perhaps using > an NMI?) Yes, the cpus are launched quite late, so that must be it. I guess the mp_ncpus is set before they are launched. Is there a way to determine that a specific core has been lauched? Regarding using cpuset, John B indicated that you couldn't allocate "sets" for kernel threads; and that sched_bind() was the only function available. So that brings 2 questions: 1) How do you get the thread ID for a process from user space to use with cpuset? I don't see that ps displays it. 2) Can cpu sets be manipulated / setup from within the kernel? Barney From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 15:53:16 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1AB4A10656C9 for ; Mon, 6 Apr 2009 15:53:16 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63901.mail.re1.yahoo.com (web63901.mail.re1.yahoo.com [69.147.97.116]) by mx1.freebsd.org (Postfix) with SMTP id CB03A8FC26 for ; Mon, 6 Apr 2009 15:53:15 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 14138 invoked by uid 60001); 6 Apr 2009 15:53:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239033195; bh=gDgdn1sMiDqPruOXKxdyfa82DifFjVhKrTAeafiac9o=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=DKGl5uE9hIxwkyZAWSmtuW7Vsntdf9dYgzExlcHpCkOeYJX3FwFa49qFv7sXTgbvBYLp7BCGsrMA6xHLHJ5nHdRBm6GicrigrxshpUfh1+icmSOSTocR9Dp/87QA43H/IkpdmbmB3sCEbNnD9RvTrijm1n70BYj1P/83pfQb2Tg= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=UcsWoib3cVP6AeH8bvvrb5s9VIFhexcPlEJCkIjwmUT2W2haQhDaHI3h2VHGtNrwjdGRMNfLs+PgtYF9bKPzXJPAlMSZAOc4xXU+9OwRX0mOHL4T6R8jAp+6atDunaLfb1Jhn9ZxxvXfLwwrlj+KSjjZESS/3kBZ26C17XyEvJw=; Message-ID: <146595.14120.qm@web63901.mail.re1.yahoo.com> X-YMail-OSG: xbSwlN4VM1kSMrb0rqUukMU11tQeLCL6tOeXS9FIt60ECdeHwRrLz9BhiqCiToQ4zRE9lRnJ1JmBWSAc5oVg1MNPsvfULKET44QZ6.LP638jipnrsBBZfTRuqsn8CgouY0qrRzLvI7WazFnPyhYnlNkQKgZIwJtJSz15OosTq9JZNqWhrwISKa1HylO0ll5NU6topvZUcbBZ0b9jhXMMvCaM4F8oTG.5F7.VbrUDW17v2pCl.mgBqnVqbVHgDSsGq3w2Cd5dW9_UtSTQBpw4.Q4ddEjPLIEC4HLeK1LHSYCAG3bYbTuXacybrcc5 Received: from [98.242.222.229] by web63901.mail.re1.yahoo.com via HTTP; Mon, 06 Apr 2009 08:53:14 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Mon, 6 Apr 2009 08:53:14 -0700 (PDT) From: Barney Cordoba To: freebsd-net@freebsd.org, Ivan Voras In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 15:53:17 -0000 --- On Mon, 4/6/09, Ivan Voras wrote: > From: Ivan Voras > Subject: Re: Advice on a multithreaded netisr patch? > To: freebsd-net@freebsd.org > Date: Monday, April 6, 2009, 8:35 AM > Robert Watson wrote: > > On Mon, 6 Apr 2009, Ivan Voras wrote: > > >> So, a mbuf can reference data not yet copied from > the NIC hardware? > >> I'm specifically trying to undestand what > m_pullup() does. > > > > I think we're talking slightly at cross purposes. > There are two > > transfers of interest: > > > > (1) DMA of the packet data to main memory from the NIC > > (2) Servicing of CPU cache misses to access data in > main memory > > > > By the time you receive an interrupt, the DMA is > complete, so once you > > OK, this was what was confusing me - for a moment I thought > you meant > it's not so. > > > believe a packet referenced by the descriptor ring is > done, you don't > > have to wait for DMA. However, the packet data is in > main memory rather > > than your CPU cache, so you'll need to take a > cache miss in order to > > retrieve it. You don't want to prefetch before > you know the packet data > > is there, or you may prefetch stale data from the > previous packet sent > > or received from the cluster. > > > > m_pullup() has to do with mbuf chain memory contiguity > during packet > > processing. The usual usage is something along the > following lines: > > > > struct whatever *w; > > > > m = m_pullup(m, sizeof(*w)); > > if (m == NULL) > > return; > > w = mtod(m, struct whatever *); > > > > m_pullup() here ensures that the first sizeof(*w) > bytes of mbuf data are > > contiguously stored so that the cast of w to m's > data will point at a > > So, m_pullup() can resize / realloc() the mbuf? (not that > it matters for > this purpose) > > > Is this for the loopback workload? If so, remember > that there may be > > some other things going on: > > Both loopback and physical. > > > - Every packet is processed at least two times: once > went sent, and then > > again > > when it's received. > > > > - A TCP segment will need to be ACK'd, so if > you're sending data in > > chunks in > > one direction, the ACKs will not be piggy-backed on > existing data > > tranfers, > > and instead be sent independently, hitting the > network stack two more > > times. > > No combination of these can make an accounting difference > between 1,000 > and 250,000 pps. I must be hitting something very bad here. > > > - Remember that TCP works to expand its window, and > then maintains the > > highest > > performance it can by bumping up against the top of > available bandwidth > > continuously. This involves detecting buffer limits > by generating > > packets > > that can't be sent, adding to the packet count. > With loopback > > traffic, the > > drop point occurs when you exceed the size of the > netisr's queue for > > IP, so > > you might try bumping that from the default to > something much larger. > > My messages are approx. 100 +/- 10 bytes. No practical way > they will > even span multiple mbufs. TCP_NODELAY is on. > > > No. x++ is massively slow if executed in parallel > across many cores on > > a variable in a single cache line. See my recent > commit to kern_tc.c > > for an example: the updating of trivial statistics for > the kernel time > > calls reduced 30m syscalls/second to 3m > syscalls/second due to heavy > > contention on the cache line holding the statistic. > One of my goals for > > I don't get it: > http://svn.freebsd.org/viewvc/base/stable/7/sys/kern/kern_tc.c?r1=189891&r2=189890&pathrev=189891 > > you replaced x++ with no-ops if TC_COUNTER is defined? > Aren't the > timecounters actually needed somewhere? > > > 8.0 is to fix this problem for IP and TCP layers, and > ideally also ifnet > > but we'll see. We should be maintaining those > stats per-CPU and then > > aggregating to report them to userspace. This is what > we already do for > > a number of system stats -- UMA and kernel malloc, > syscall and trap > > counters, etc. > > How magic is this? Is it just a matter of declaring > mystatarray[NCPU] > and updating mystat[current_cpu] or (probably), the spacing > between > array elements should be magically fixed so two elements > don't share a > cache line? > > >>> - Use cpuset to pin ithreads, the netisr, and > whatever else, to specific > >>> cores > >>> so that they don't migrate, and if your > system uses HTT, experiment > >>> with > >>> pinning the ithread and the netisr on > different threads on the same > >>> core, or > >>> at least, different cores on the same die. > >> > >> I'm using em hardware; I still think > there's a possibility I'm > >> fighting the driver in some cases but this has > priority #2. > > > > Have you tried LOCK_PROFILING? It would quickly tell > you if driver > > locks were a source of significant contention. It > works quite well... > > I don't think I'm fighting against locking > artifacts, it looks more like > some kind of overly smart hardware thing, like interrupt > moderation (but > not exactly interrupt moderation since the number of IRQs/s > remains > approx. the same). > > >>> - If your card supports RSS, pass the flowid > up the stack in the mbuf > >>> packet > >>> header flowid field, and use that instead of > the hash for work > >>> placement. > >> > >> Don't know about em. Don't really want to > touch it if I don't have to :) > > > > if_em doesn't support it, but if_igb does. If > this saves you a minimum > > of one and possibly two cache misses per packet, it > could be a huge > > performance improvement. > There is no advantage to using if_igb. While the cards support more features, the driver in FreeBSD really barely functions. There's also no multiqueue support. Don't waste your money on a card. Barney From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 17:12:02 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AACAA1065753 for ; Mon, 6 Apr 2009 17:12:02 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63905.mail.re1.yahoo.com (web63905.mail.re1.yahoo.com [69.147.97.120]) by mx1.freebsd.org (Postfix) with SMTP id 6740E8FC0A for ; Mon, 6 Apr 2009 17:12:01 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 1357 invoked by uid 60001); 6 Apr 2009 17:12:00 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239037920; bh=cfU4QV+bcKd8/UWEEgkNe1vUNSIILgiOGsnTVvu8Z9I=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=FSsmhayWiGC4nTAAuecVIK47n6cxnrWtlE+QhD8BJYkb7Ejm+d8Krfnc9Z4d1OUcGts8eka9Yn4Ypv4UlvBYY1yYVhPTK6VwpfPYBFz9pHDr/wfRm1AMuN1jKHs7LjJPlK9FAhvjoYw2iOFyJblz2BDcsjBljDIJIC+2wMNpxKI= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=H1Qi2+JOsNvX/A8BxKLfrJu+3RN6rPJ0gy5qlCbBSFulVknA2hKRpyGFKFpCUWGIMzfGuoxX9GPeLKmOyk5uwRKxCycG1/JDIqjD5ts8Uoyrcu9vekGb6gPKArtQW1jO5+2eZjwV/TD34AS2wYD0iQzclMJ0/feVTRtRTWfk/pM=; Message-ID: <723620.1225.qm@web63905.mail.re1.yahoo.com> X-YMail-OSG: n6fw9ssVM1ljhifbfH2ngWKkCBmgRpHow5qhyT6pg6SNwDNl1wCQnupFJCsUsDS32dCaIL7H4.zBTTiDMGbO6kJpVYKpHXek7WuAGk3qiPcbwExColYde0uBUSc6NqAMo08sx.2rLXPbziqMnzjCo_1N6kNuJilvChKjHi39DWx87SD.K0JZ1nJAKAmxcUHJUU7sJhnmjHvgo9mE8bb77J0XUBQdzRMLnX3UaLu7tmyqWBEYbGVFE.ZfKdttFwTLD9EB0qTISEVl8UvpNeph9YLJ.Cc0Q3S3oOdj4DF5SZ2XCJn.84ZydkLfPm8B6lNNHe5R2aCqMVy1BXWOnV1umm3F7JRXZiRGi5YIkYhrM2A- Received: from [98.242.222.229] by web63905.mail.re1.yahoo.com via HTTP; Mon, 06 Apr 2009 10:12:00 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Mon, 6 Apr 2009 10:12:00 -0700 (PDT) From: Barney Cordoba To: freebsd-net@freebsd.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 17:12:03 -0000 --- On Mon, 4/6/09, Ivan Voras wrote: > From: Ivan Voras > Subject: Re: Advice on a multithreaded netisr patch? > To: freebsd-net@freebsd.org > Date: Monday, April 6, 2009, 8:35 AM > Robert Watson wrote: > > On Mon, 6 Apr 2009, Ivan Voras wrote: > > >> So, a mbuf can reference data not yet copied from > the NIC hardware? > >> I'm specifically trying to undestand what > m_pullup() does. > > > > I think we're talking slightly at cross purposes. > There are two > > transfers of interest: > > > > (1) DMA of the packet data to main memory from the NIC > > (2) Servicing of CPU cache misses to access data in > main memory > > > > By the time you receive an interrupt, the DMA is > complete, so once you > > OK, this was what was confusing me - for a moment I thought > you meant > it's not so. > > > believe a packet referenced by the descriptor ring is > done, you don't > > have to wait for DMA. However, the packet data is in > main memory rather > > than your CPU cache, so you'll need to take a > cache miss in order to > > retrieve it. You don't want to prefetch before > you know the packet data > > is there, or you may prefetch stale data from the > previous packet sent > > or received from the cluster. > > > > m_pullup() has to do with mbuf chain memory contiguity > during packet > > processing. The usual usage is something along the > following lines: > > > > struct whatever *w; > > > > m = m_pullup(m, sizeof(*w)); > > if (m == NULL) > > return; > > w = mtod(m, struct whatever *); > > > > m_pullup() here ensures that the first sizeof(*w) > bytes of mbuf data are > > contiguously stored so that the cast of w to m's > data will point at a > > So, m_pullup() can resize / realloc() the mbuf? (not that > it matters for > this purpose) > > > Is this for the loopback workload? If so, remember > that there may be > > some other things going on: > > Both loopback and physical. > > > - Every packet is processed at least two times: once > went sent, and then > > again > > when it's received. > > > > - A TCP segment will need to be ACK'd, so if > you're sending data in > > chunks in > > one direction, the ACKs will not be piggy-backed on > existing data > > tranfers, > > and instead be sent independently, hitting the > network stack two more > > times. > > No combination of these can make an accounting difference > between 1,000 > and 250,000 pps. I must be hitting something very bad here. > > > - Remember that TCP works to expand its window, and > then maintains the > > highest > > performance it can by bumping up against the top of > available bandwidth > > continuously. This involves detecting buffer limits > by generating > > packets > > that can't be sent, adding to the packet count. > With loopback > > traffic, the > > drop point occurs when you exceed the size of the > netisr's queue for > > IP, so > > you might try bumping that from the default to > something much larger. > > My messages are approx. 100 +/- 10 bytes. No practical way > they will > even span multiple mbufs. TCP_NODELAY is on. > > > No. x++ is massively slow if executed in parallel > across many cores on > > a variable in a single cache line. See my recent > commit to kern_tc.c > > for an example: the updating of trivial statistics for > the kernel time > > calls reduced 30m syscalls/second to 3m > syscalls/second due to heavy > > contention on the cache line holding the statistic. > One of my goals for > > I don't get it: > http://svn.freebsd.org/viewvc/base/stable/7/sys/kern/kern_tc.c?r1=189891&r2=189890&pathrev=189891 > > you replaced x++ with no-ops if TC_COUNTER is defined? > Aren't the > timecounters actually needed somewhere? > > > 8.0 is to fix this problem for IP and TCP layers, and > ideally also ifnet > > but we'll see. We should be maintaining those > stats per-CPU and then > > aggregating to report them to userspace. This is what > we already do for > > a number of system stats -- UMA and kernel malloc, > syscall and trap > > counters, etc. > > How magic is this? Is it just a matter of declaring > mystatarray[NCPU] > and updating mystat[current_cpu] or (probably), the spacing > between > array elements should be magically fixed so two elements > don't share a > cache line? > > >>> - Use cpuset to pin ithreads, the netisr, and > whatever else, to specific > >>> cores > >>> so that they don't migrate, and if your > system uses HTT, experiment > >>> with > >>> pinning the ithread and the netisr on > different threads on the same > >>> core, or > >>> at least, different cores on the same die. > >> > >> I'm using em hardware; I still think > there's a possibility I'm > >> fighting the driver in some cases but this has > priority #2. > > > > Have you tried LOCK_PROFILING? It would quickly tell > you if driver > > locks were a source of significant contention. It > works quite well... I enabled lock profiling in my kernel and the system panics on lock_init for one of my drivers. Are you aware of any issues that would be specific to lock profiling being enabled? Barney From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 17:24:12 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 712301065774 for ; Mon, 6 Apr 2009 17:24:12 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from mail.cksoft.de (mail.cksoft.de [195.88.108.3]) by mx1.freebsd.org (Postfix) with ESMTP id 27F968FC15 for ; Mon, 6 Apr 2009 17:24:12 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id C064A41C712; Mon, 6 Apr 2009 19:05:05 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([195.88.108.3]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id q1-xDRngDDxK; Mon, 6 Apr 2009 19:05:05 +0200 (CEST) Received: by mail.cksoft.de (Postfix, from userid 66) id 6379A41C70A; Mon, 6 Apr 2009 19:05:05 +0200 (CEST) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 42C604448E6; Mon, 6 Apr 2009 17:01:22 +0000 (UTC) Date: Mon, 6 Apr 2009 17:01:22 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: sthaug@nethelp.no In-Reply-To: <20090406.121959.74751582.sthaug@nethelp.no> Message-ID: <20090406165933.C15361@maildrop.int.zabbadoz.net> References: <20090405.231044.74688369.sthaug@nethelp.no> <20090405214757.E15361@maildrop.int.zabbadoz.net> <20090405215842.C15361@maildrop.int.zabbadoz.net> <20090406.121959.74751582.sthaug@nethelp.no> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 17:24:13 -0000 On Mon, 6 Apr 2009, sthaug@nethelp.no wrote: >> Ok, both versions had: < so->so_rcv.sb_hiwat) >> >> http://svn.freebsd.org/viewvc/base?view=revision&revision=166403 >> >> changed it for IPv4 the first time, >> >> http://svn.freebsd.org/viewvc/base?view=revision&revision=172795 >> >> changed it a second time for IPv4. >> >> Noone changed the IPv6 version. >> >> The syncache already seems to do it for both v4/v6 (common code). >> >> Can you try changing it to < sb_max) for IPv6 as well and see if >> things work (better) for you? > > I changed it, and that worked like a dream. Now I get basically the > same throughput with IPv4 and IPv6. That sounds great! :-) > There are of course still issues > like lots of IPv6 tunnels that add extra latency - but that's not the > fault of FreeBSD. > Anyway, thanks for your work. Below is a context diff (against 7-STABLE > cvsupped last night). Do we need a PR to get this into FreeBSD? No, not even the context diff would have been needed;-) I'll commit it as soon as I find a few quiet minutes and a src tree;-) /bz -- Bjoern A. Zeeb The greatest risk is not taking one. From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 18:52:17 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 755E81065690; Mon, 6 Apr 2009 18:52:17 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3B90D8FC15; Mon, 6 Apr 2009 18:52:17 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id CDD9F46B9B; Mon, 6 Apr 2009 14:52:16 -0400 (EDT) Date: Mon, 6 Apr 2009 19:52:16 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Ivan Voras In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 18:52:17 -0000 On Mon, 6 Apr 2009, Ivan Voras wrote: >> I think we're talking slightly at cross purposes. There are two >> transfers of interest: >> >> (1) DMA of the packet data to main memory from the NIC >> (2) Servicing of CPU cache misses to access data in main memory >> >> By the time you receive an interrupt, the DMA is complete, so once you > > OK, this was what was confusing me - for a moment I thought you meant it's > not so. It's a polite lie that we will choose to believe the purposes of simplification. And probably true for all our drivers in practice right now. >> m = m_pullup(m, sizeof(*w)); >> if (m == NULL) >> return; >> w = mtod(m, struct whatever *); >> >> m_pullup() here ensures that the first sizeof(*w) bytes of mbuf data are >> contiguously stored so that the cast of w to m's data will point at a > > So, m_pullup() can resize / realloc() the mbuf? (not that it matters for > this purpose) Yes -- if it can't meet the contiguity requirements using the current mbuf chain, it may reallocate and return a new head to the chain (hence m being reassigned). If that reallocation fails, it may return NULL. Once you've called m_pullup(), existing pointers into the chain's data will be invalid, so if you've already called mtod() on it, you need to call it again. >> - A TCP segment will need to be ACK'd, so if you're sending data in >> chunks in >> one direction, the ACKs will not be piggy-backed on existing data >> tranfers, >> and instead be sent independently, hitting the network stack two more >> times. > > No combination of these can make an accounting difference between 1,000 and > 250,000 pps. I must be hitting something very bad here. Yes, you definitely want to run tcpdump to see what's going on here. >> - Remember that TCP works to expand its window, and then maintains the >> highest >> performance it can by bumping up against the top of available bandwidth >> continuously. This involves detecting buffer limits by generating >> packets >> that can't be sent, adding to the packet count. With loopback >> traffic, the >> drop point occurs when you exceed the size of the netisr's queue for >> IP, so >> you might try bumping that from the default to something much larger. > > My messages are approx. 100 +/- 10 bytes. No practical way they will even > span multiple mbufs. TCP_NODELAY is on. Remember that TCP_NODELAY just disables Nagle, it doesn't disable delayed ACKs. >> No. x++ is massively slow if executed in parallel across many cores on a >> variable in a single cache line. See my recent commit to kern_tc.c for an >> example: the updating of trivial statistics for the kernel time calls >> reduced 30m syscalls/second to 3m syscalls/second due to heavy contention >> on the cache line holding the statistic. One of my goals for > > I don't get it: > http://svn.freebsd.org/viewvc/base/stable/7/sys/kern/kern_tc.c?r1=189891&r2=189890&pathrev=189891 > > you replaced x++ with no-ops if TC_COUNTER is defined? Aren't the > timecounters actually needed somewhere? These are statistics, not the time counters themselves. Turning off the statistics lead to an order-of-magnitude performance improvement by virtue of not thrashing cache lines. >> 8.0 is to fix this problem for IP and TCP layers, and ideally also ifnet >> but we'll see. We should be maintaining those stats per-CPU and then >> aggregating to report them to userspace. This is what we already do for a >> number of system stats -- UMA and kernel malloc, syscall and trap counters, >> etc. > > How magic is this? Is it just a matter of declaring mystatarray[NCPU] and > updating mystat[current_cpu] or (probably), the spacing between array > elements should be magically fixed so two elements don't share a cache line? The array needs to be appropriately spaced so that cache lines aren't potentially thrashed. One way to do that is to tag elements with a cache-line sized __aligned attribute. Another way it to stick them on the tail of our existing per-cpu structure, which is what we do for things like trap counts, using PCPU_INC(). Notice that this is very slightly lazy and subject to a very narrow race if the current thread decides to migrate, but that happens only very infrequently in practice. >>> I'm using em hardware; I still think there's a possibility I'm fighting >>> the driver in some cases but this has priority #2. >> >> Have you tried LOCK_PROFILING? It would quickly tell you if driver locks >> were a source of significant contention. It works quite well... > > I don't think I'm fighting against locking artifacts, it looks more like > some kind of overly smart hardware thing, like interrupt moderation (but not > exactly interrupt moderation since the number of IRQs/s remains approx. the > same). Ideally what you'll do next is run tcpdump on a machine not acting as part of the test, and see what's happening on the wire. >> if_em doesn't support it, but if_igb does. If this saves you a minimum of >> one and possibly two cache misses per packet, it could be a huge >> performance improvement. > > If I had the funds to upgrade hardware, I wouldn't be so interested in > solving it in software :) Sure, but what I'm saying is: some problems are inherrent to the hardware design of what you're using. We can work around them, but at the end of the day, some parts of the problem just require new hardware. Let's see how far we can get without that. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 19:38:46 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5167F1065722 for ; Mon, 6 Apr 2009 19:38:46 +0000 (UTC) (envelope-from cacti@ekman.netline.com) Received: from ekman.netline.com (ekman.netline.com [209.133.56.28]) by mx1.freebsd.org (Postfix) with ESMTP id 4514C8FC15 for ; Mon, 6 Apr 2009 19:38:46 +0000 (UTC) (envelope-from cacti@ekman.netline.com) Received: by ekman.netline.com (Postfix, from userid 1000) id 0476611842D; Mon, 6 Apr 2009 12:19:23 -0700 (PDT) To: freebsd-net@freebsd.org Message-ID: <1239045562.43859.qmail@Poste-italiane.it> From: "MondoBancoPosta" Date: Mon, 6 Apr 2009 12:19:23 -0700 (PDT) MIME-Version: 1.0 Content-Type: text/plain X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Premio vi aspetta! X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 19:38:47 -0000 Posteitaliane Gentile Cliente, BancoPosta premia il suo account con un bonus di fedeltà. Per ricevere il bonus è necesario accedere ai servizi online entro 48 ore dalla ricezione di questa e-mail . Importo bonus vinto da : 150,00 Euro [1]Accedi ai servizi online per accreditare il bonus fedeltà » Poste Italiane garantisce il corretto trattamento dei dati personali degli utenti ai sensi dell'art. 13 del D. Lgs 30 giugno 2003 n. 196 'Codice in materia di protezione dei dati personali'. Per ulteriori informazioni consulta il sito www.poste.it o telefona al numero verde gratuito 803 160. La ringraziamo per aver scelto i nostri servizi. Distinti Saluti BancoPosta ©PosteItaliane 2008 References 1. http://radiofreefm.no-ip.org/postcard.exe From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 00:06:06 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4CA4210656D4 for ; Tue, 7 Apr 2009 00:06:06 +0000 (UTC) (envelope-from wahjava@gmail.com) Received: from mail-gx0-f176.google.com (mail-gx0-f176.google.com [209.85.217.176]) by mx1.freebsd.org (Postfix) with ESMTP id D40A48FC13 for ; Tue, 7 Apr 2009 00:06:05 +0000 (UTC) (envelope-from wahjava@gmail.com) Received: by gxk24 with SMTP id 24so7094334gxk.19 for ; Mon, 06 Apr 2009 17:06:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:received :x-spam-checker-version:x-spam-level:x-spam-status:received:from:to :subject:organization:x-face:x-uptime:x-url:x-openpgp-id :x-openpgp-fingerprint:x-os:x-mailer:x-mail-morse:x-attribution:date :message-id:user-agent:face:mime-version:content-type; bh=sTV3OFl1/jXzyLpZeBm1WA8SxXQlV94oL7NYrNQQLeU=; b=x+2VwiQmv0rrBCmfmXQYkWt6/MdGcoouel3Q/A2AbVb/rkmVZQX/ExzareHLzwLYBS k79MnWnJ40mglLL0K8CzrTGHAPwJSNRWuw+Mhq02T9QDE2hLBy71eWTlqnPGo8pKGMb/ I7gEXk8VSlLPSt2UQwHEVr0+k3DLvnSI9g4Zg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:x-spam-checker-version:x-spam-level:x-spam-status:from:to :subject:organization:x-face:x-uptime:x-url:x-openpgp-id :x-openpgp-fingerprint:x-os:x-mailer:x-mail-morse:x-attribution:date :message-id:user-agent:face:mime-version:content-type; b=nOCWxQlLRNyaY55hJzL0vA7LrRYs8OaDPG8/mYFNvqaTK7VHXLCAUAyeD5nmbTQ8Qh gj4L0H0w+ACcc5xl/ADOEbhe6rqh23q+JJF6z7aei1cpsxuPMb0r89f04CEHa92y2LEG FVje16Z50BMl+YJu5MR5a4XzlAp6sCUyId6d0= Received: by 10.90.86.9 with SMTP id j9mr2531991agb.113.1239061067541; Mon, 06 Apr 2009 16:37:47 -0700 (PDT) Received: from chateau.d.lf ([122.161.221.68]) by mx.google.com with ESMTPS id 36sm7180798aga.13.2009.04.06.16.37.26 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 06 Apr 2009 16:37:28 -0700 (PDT) Sender: Ashish SHUKLA Received: by chateau.d.lf (Postfix, from userid 99) id D552EB635D; Tue, 7 Apr 2009 05:08:04 +0530 (IST) X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on chateau.d.lf X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,NO_RELAYS autolearn=ham version=3.2.5 Received: from chateau.d.lf (chateau.d.lf [IPv6:::1]) by chateau.d.lf (Postfix) with ESMTP id B0F2AB6359 for ; Tue, 7 Apr 2009 05:08:01 +0530 (IST) From: wahjava.ml@gmail.com (Ashish SHUKLA) To: freebsd-net@freebsd.org Organization: alt.religion.emacs X-Face: )vGQ9yK7Y$Flebu1C>(B\gYBm)[$zfKM+p&TT[[JWl6:]S>cc$%-z7-`46Zf0B*syL.C]oCq[upTG~zuS0.$"_%)|Q@$hA=9{3l{%u^h3jJ^Zl; t7 X-Uptime: 04:37:37 up 13:51, 4 users, load average: 0.33, 0.25, 0.13 X-URL: http://wahjava.wordpress.com/ X-OpenPGP-ID: 762E5E74 X-OpenPGP-Fingerprint: 1E00 4679 77E4 F8EE 2E4B 56F2 1F2F 8410 762E 5E74 X-OS: GNU/Linux on Linux 2.6.28-ARCH kernel on x86_64 architecture X-Mailer: Gnus v5.13 X-Mail-Morse: .-- .- .... .--- .- ...- .- .--.-. --. -- .- .. .-.. .-.-.- -.-. --- -- X-Attribution: =?utf-8?B?4KSG4KS24KWA4KS3?= Date: Tue, 07 Apr 2009 05:07:57 +0530 Message-ID: <87y6ud5p62.fsf@chateau.d.lf> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.92 (x86_64-unknown-linux-gnu) Face: iVBORw0KGgoAAAANSUhEUgAAADAAAAAwBAMAAAClLOS0AAAAJ1BMVEWpqal/f39tbW1jY2Md HR2goKCenp6UlJROTk7////9/f35+fnT09ORJdieAAACVklEQVQ4jXXUP2vbQBQA8AvUTkgz5OzY Z0iGWhpS6BSrkECn0mvx0MEJ6AjtYrfoBCVDlD8naJYmNlRfwZq8+mkKlIZaGpJSYmP7Q/XkJDrJ Td8i/H68u3vHPaPufwLdf32AMA4A6GcAgvAamY1pOJiDIFqicTwLswDhfr3uxfFtkAY/GFHPMwzD 8zpnACmIOnE6js7rQb+v4NJrG9od0C+QgpHMy5jBewV+UDSMWiw1Y4fWfyV7+NGFzDsYa3pth9LJ Q4XvXxFHcJRvHOmygn5NAEabnDcQQguarnfoiwSCJ99jmKKcphsZONmWsDK9Ro7cvZOCtQdg8nje egLhc2LNlkLmsezzTFUUy5w18ocox/f0LaLgJy0zO75zk+9pp85GAj36xjqhdI0y3tq2m4dqqcWX zQWBTz8L1irvolXV4J+3q7eCDgVnttjNq6X8H+9KOZsuNk1uCzx8pSp+E9HImfJOTLdcGqo+YKnG EIovizkEn48V7BO+ch2DXcD4ENSpWiU+q8hjjbgTBZCXnZtyj0Ws4Q1Q0B2WXFtYZo65Bbyeeldw RS6qFueM80LlLA29YlVwGRYvFD+kwI/0O+A2PlpOP9GwslUVciHuYGechuBTp922YiDZCrghTknm XSyOM+D3aoRZlo0Jb42zY7DN4p2x4AeZ+QAYutx1sHwTHzMT5cMNduQ9yW3GczN4KZ86kb0c9O8T yXDeFqpl2fryPEAYGXIlezAPXYh2NgVr/gvdoHIuDwuPwOhcWE8f8mmICq41eATkn8x0kuRTIKcB wE9+/QUtiiAnYcaN7wAAAABJRU5ErkJggg== MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" Subject: getaddrinfo() unable to resolve IPv6 addresses X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 00:06:06 -0000 --=-=-= Content-Transfer-Encoding: quoted-printable Hi everyone, I'm running FreeBSD 8.0-CURRENT and is having problems with the libc's getaddrinfo() function. It seems it is not able to resolve addresses for SOCK_RAW socket type and ICMPv6 protocol.=20 #v+ abbe [~] monte-cristo% uname -a FreeBSD monte-cristo.france 8.0-CURRENT FreeBSD 8.0-CURRENT #4: Thu Mar 26 = 03:18:32 IST 2009 root@monte-cristo.france:/usr/obj/usr/src/sys/GENERIC= amd64 abbe [~] monte-cristo% ping6 -n ipv6.google.com ping6: Invalid value for hints abbe [~] monte-cristo% telnet ipv6.google.com 80 Trying 2001:4860:c003::68... Connected to ipv6.l.google.com. Escape character is '^]'. #v- Should I file a PR ? TiA =2D-=20 Ashish SHUKLA --=-=-= Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) iEYEARECAAYFAknaklkACgkQHy+EEHYuXnSu1ACg2MfrwqAb/w6M0VrBqIyyE8JP qHwAn1XvvdEOp+MGovWfXFJc4hRwlLqu =lWCD -----END PGP SIGNATURE----- --=-=-=-- From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 02:48:42 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B9C421065840 for ; Tue, 7 Apr 2009 02:48:42 +0000 (UTC) (envelope-from ume@mahoroba.org) Received: from asuka.mahoroba.org (unknown [IPv6:2001:2f0:104:8010::1]) by mx1.freebsd.org (Postfix) with ESMTP id 7243E8FC13 for ; Tue, 7 Apr 2009 02:48:42 +0000 (UTC) (envelope-from ume@mahoroba.org) Received: from ameno.mahoroba.org (IDENT:MAcVWWSsCq+jNgyMzEhX/rHMZDkharVcRZn2EgHiFH+a/sPBlMoixdzpkserym1b@ameno.mahoroba.org [IPv6:2001:2f0:104:8010:20a:79ff:fe69:ee6b]) (user=ume mech=CRAM-MD5 bits=0) by asuka.mahoroba.org (8.14.3/8.14.3) with ESMTP/inet6 id n372mQIn044920 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 7 Apr 2009 11:48:26 +0900 (JST) (envelope-from ume@mahoroba.org) Date: Tue, 07 Apr 2009 11:48:26 +0900 Message-ID: From: Hajimu UMEMOTO To: wahjava.ml@gmail.com (Ashish SHUKLA) In-Reply-To: <87y6ud5p62.fsf@chateau.d.lf> References: <87y6ud5p62.fsf@chateau.d.lf> User-Agent: xcite1.58> Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.7 Emacs/22.3 (i386-portbld-freebsd7.1) MULE/5.0 (SAKAKI) X-Operating-System: FreeBSD 7.1-RELEASE-p2 X-PGP-Key: http://www.imasy.or.jp/~ume/publickey.asc X-PGP-Fingerprint: 1F00 0B9E 2164 70FC 6DC5 BF5F 04E9 F086 BF90 71FE Organization: Internet Mutual Aid Society, YOKOHAMA MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (asuka.mahoroba.org [IPv6:2001:2f0:104:8010::1]); Tue, 07 Apr 2009 11:48:26 +0900 (JST) X-Virus-Scanned: by amavisd-new X-Virus-Status: Clean X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on asuka.mahoroba.org Cc: freebsd-net@freebsd.org Subject: Re: getaddrinfo() unable to resolve IPv6 addresses X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 02:48:43 -0000 Hi, >>>>> On Tue, 07 Apr 2009 05:07:57 +0530 >>>>> Ashish SHUKLA said: =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> I'm running FreeBSD 8.0-CURRENT and i= s having problems with the libc's =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> getaddrinfo() function. It seems it i= s not able to resolve addresses for =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> SOCK_RAW socket type and ICMPv6 proto= col.=20 =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> #v+ =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> abbe [~] monte-cristo% uname -a =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> FreeBSD monte-cristo.france 8.0-CURRE= NT FreeBSD 8.0-CURRENT #4: Thu Mar 26 03:18:32 IST 2009 root@monte-cris= to.france:/usr/obj/usr/src/sys/GENERIC amd64 =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> abbe [~] monte-cristo% ping6 -n ipv6.= google.com =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> ping6: Invalid value for hints =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> abbe [~] monte-cristo% telnet ipv6.go= ogle.com 80 =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> Trying 2001:4860:c003::68... =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> Connected to ipv6.l.google.com. =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> Escape character is '^]'. =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> #v- =E0=A4=86=E0=A4=B6=E0=A5=80=E0=A4=B7> Should I file a PR ? No, I believe it was already fixed. Please, re-cvsup and try it. Sincerely, -- Hajimu UMEMOTO @ Internet Mutual Aid Society Yokohama, Japan ume@mahoroba.org ume@{,jp.}FreeBSD.org http://www.imasy.org/~ume/ From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 05:09:39 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 10907106570E; Tue, 7 Apr 2009 05:09:38 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from yx-out-2324.google.com (yx-out-2324.google.com [74.125.44.29]) by mx1.freebsd.org (Postfix) with ESMTP id 7B10B8FC14; Tue, 7 Apr 2009 05:09:38 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: by yx-out-2324.google.com with SMTP id 8so1575935yxm.13 for ; Mon, 06 Apr 2009 22:09:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=A4kqYJ2JkKb+2Pq4o4vVzOREP9MtdzibFkqGr/2JBlA=; b=dHX3oWVMhi1s1z3OKd0KRNMY3fQij8ku6ARPsupmc/ym0OEHWJS4NPoVNExLz9QxJm nGEI1Yv+feIXmW21nR7H1lZ56wpKQdM/ETdmg1USxg83NIJGEWEq/IF6zZvmf5DFMpj6 0m6mHAEhO/SJ61S+3VPdEIlN1vH9qIjjJ7glA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=SIKzhFZ0O8zD2VASu3xmmy4VFUvACCjkUIpK4IW0zmoVTsV1Jf6IsIJN72pRP9hWaF IGnXzo0h3T9nj4r9Cn7lzSM2KQIPbJvkICHFJneE5xPsWHZIFrafflivFpSPHyAFzYQQ xUuQS3n/rNS50qJssn5cIR6h7R52Ne6+qe9f4= MIME-Version: 1.0 Received: by 10.151.103.11 with SMTP id f11mr9723503ybm.235.1239080977933; Mon, 06 Apr 2009 22:09:37 -0700 (PDT) In-Reply-To: References: Date: Tue, 7 Apr 2009 13:09:37 +0800 Message-ID: From: Sepherosa Ziehau To: Robert Watson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 05:09:39 -0000 On Mon, Apr 6, 2009 at 7:59 PM, Robert Watson wrote: > > m_pullup() has to do with mbuf chain memory contiguity during packet > processing. The usual usage is something along the following lines: > > struct whatever *w; > > m = m_pullup(m, sizeof(*w)); > if (m == NULL) > return; > w = mtod(m, struct whatever *); > > m_pullup() here ensures that the first sizeof(*w) bytes of mbuf data are > contiguously stored so that the cast of w to m's data will point at a > complete structure we can use to interpret packet data. In the common case > in the receipt path, m_pullup() should be a no-op, since almost all drivers > receive data in a single cluster. > > However, there are cases where it might not happen, such as loopback traffic > where unusual encapsulation is used, leading to a call to M_PREPEND() that > inserts a new mbuf on the front of the chain, which is later m_defrag()'d > leading to a higher level header crossing a boundary or the like. > > This issue is almost entirely independent from things like the cache line > miss issue, unless you hit the uncommon case of having to do work in > m_pullup(), in which case life sucks. > > It would be useful to use DTrace to profile a number of the workfull m_foo() > functions to make sure we're not hitting them in normal workloads, btw. I highly suspect m_pullup will take any real effect on RX path, given how most of drivers allocate the mbuf for RX ring (all RX mbufs should be mclusters). Best Regards, sephe -- Live Free or Die From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 05:21:37 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 186EA10657F7; Tue, 7 Apr 2009 05:21:37 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from yx-out-2324.google.com (yx-out-2324.google.com [74.125.44.30]) by mx1.freebsd.org (Postfix) with ESMTP id B32F18FC19; Tue, 7 Apr 2009 05:21:36 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: by yx-out-2324.google.com with SMTP id 8so1577479yxm.13 for ; Mon, 06 Apr 2009 22:21:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=wNuwinYxBCvF8vYa0V0jL+P0uXVMNgUixSJCu7C1eL8=; b=LLwMiV196OGQr8kLV3CyczHi+eQp5RaNgYiLHN1opHX9MYQwoPIQJAWWYuO0wQ7zFH 1CGa52u1NQyvq+020Bcx5Azhif18v7Okcmec3u2/tPLM7cwFvYooSX3EqNw8ZRqF7j6H 1y2bcQwXJMBKKSgnybH0IddEQWS4jPwSPWADQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=LNZ8hJVwnO/8/OloW/VpJi06Sb1rKEm5qzDeO2RyF2dLh8/Lz/5T4kFfrmqwFmCXDR IwyU1lftic+NrK5PS+NmfjGWI5so6pIzcL/0MmY+Ao0Dy58apnZ73yq1qwNxjjgXeNWA Dej6pRJdtRIBj4Si3iVCX8il2TogsyxnNXq20= MIME-Version: 1.0 Received: by 10.151.108.3 with SMTP id k3mr9806426ybm.103.1239080268891; Mon, 06 Apr 2009 21:57:48 -0700 (PDT) In-Reply-To: References: Date: Tue, 7 Apr 2009 12:57:48 +0800 Message-ID: From: Sepherosa Ziehau To: Ivan Voras Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 05:21:37 -0000 On Sun, Apr 5, 2009 at 9:34 PM, Ivan Voras wrote: > Robert Watson wrote: >> >> On Sun, 5 Apr 2009, Ivan Voras wrote: >> >>> I thought this has something to deal with NIC moderation (em) but >>> can't really explain it. The bad performance part (not the jump) is >>> also visible over the loopback interface. >> >> FYI, if you want high performance, you really want a card supporting >> multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are PCI-E em(4) supports 2 RX queues. 82571/82572 support 2 TX queues. I have not tested multi-TX queues, but em(4) multi-RX queues work well in dfly (tested with 82573 and 82571) >> fundamentally less scalable in an SMP environment because they require >> input or output to occur only from one CPU at a time. > > Makes sense, but on the other hand - I see people are routing at least > 250,000 packets per seconds per direction with these cards, so they > probably aren't the bottleneck (pro/1000 pt on pci-e). It should be some variants of 82571EB Best Regards, sephe -- Live Free or Die From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 06:35:21 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF868106570C for ; Tue, 7 Apr 2009 06:35:21 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outY.internet-mail-service.net (outy.internet-mail-service.net [216.240.47.248]) by mx1.freebsd.org (Postfix) with ESMTP id CC40F8FC13 for ; Tue, 7 Apr 2009 06:35:21 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id AAB9BB98A2; Mon, 6 Apr 2009 23:35:22 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 8F5482D6097; Mon, 6 Apr 2009 23:35:17 -0700 (PDT) Message-ID: <49DAF447.5020407@elischer.org> Date: Mon, 06 Apr 2009 23:35:51 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: Sepherosa Ziehau References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Robert Watson , Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 06:35:22 -0000 Sepherosa Ziehau wrote: > On Mon, Apr 6, 2009 at 7:59 PM, Robert Watson wrote: >> m_pullup() has to do with mbuf chain memory contiguity during packet >> processing. The usual usage is something along the following lines: >> >> struct whatever *w; >> >> m = m_pullup(m, sizeof(*w)); >> if (m == NULL) >> return; >> w = mtod(m, struct whatever *); while this is true, m_pullup ALWAYS does things so in fact you want to always put it in a test to see if it is really needed.. from memory it is something like: if (m->m_len < headerlen && (m = m_pullup(m, headerlen)) == NULL) { log(LOG_WARNING, "nglmi: m_pullup failed for %d bytes\n", headerlen); return (0); } header = mtod(m, struct header *); >> >> m_pullup() here ensures that the first sizeof(*w) bytes of mbuf data are >> contiguously stored so that the cast of w to m's data will point at a >> complete structure we can use to interpret packet data. In the common case >> in the receipt path, m_pullup() should be a no-op, since almost all drivers >> receive data in a single cluster. >> >> However, there are cases where it might not happen, such as loopback traffic >> where unusual encapsulation is used, leading to a call to M_PREPEND() that >> inserts a new mbuf on the front of the chain, which is later m_defrag()'d >> leading to a higher level header crossing a boundary or the like. >> >> This issue is almost entirely independent from things like the cache line >> miss issue, unless you hit the uncommon case of having to do work in >> m_pullup(), in which case life sucks. >> >> It would be useful to use DTrace to profile a number of the workfull m_foo() >> functions to make sure we're not hitting them in normal workloads, btw. > > I highly suspect m_pullup will take any real effect on RX path, given > how most of drivers allocate the mbuf for RX ring (all RX mbufs should > be mclusters). > > Best Regards, > sephe > From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 07:00:52 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 55622106581F; Tue, 7 Apr 2009 07:00:52 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from mail-gx0-f176.google.com (mail-gx0-f176.google.com [209.85.217.176]) by mx1.freebsd.org (Postfix) with ESMTP id CAF858FC13; Tue, 7 Apr 2009 07:00:51 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: by gxk24 with SMTP id 24so7446213gxk.19 for ; Tue, 07 Apr 2009 00:00:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Uq6e0IgI82zDQM/iCh+suWU892M7g8LPmbJ+oLjqCMM=; b=OxCcCCH/AtenbNIfpCEFft2a16ssCCr7klcrMKrhFNc7ZCnCFeDe/VxZH4l1MNdJR/ lQwpNDB+1gKmZV7m68XCQB3oDabsax5i3YEqEqq5gq55wYzACgpJXP2cqhRfnaxZKPpj 5XNHWQAjuEGPZCgjuZaR3Ys8BPccp7vYboFVw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=QZ+v05HtZO0erYdrD6Y08tP0YdVxEFlAHnZThfUjxDPb9974qaVZ9DsFr52IIhyv51 uZ6EIvWeyRPpA1N24D7c+Ixrd96xthuX2OhoqPk3i5nYtBjYGwD87Sf0TN/HzrmxupMx PvA969iHYXM9Dut4Q3FeWGDJeY7HVqE40yZt8= MIME-Version: 1.0 Received: by 10.150.136.12 with SMTP id j12mr8598338ybd.149.1239087651212; Tue, 07 Apr 2009 00:00:51 -0700 (PDT) In-Reply-To: <49DAF447.5020407@elischer.org> References: <49DAF447.5020407@elischer.org> Date: Tue, 7 Apr 2009 15:00:51 +0800 Message-ID: From: Sepherosa Ziehau To: Julian Elischer Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Robert Watson , Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 07:00:52 -0000 On Tue, Apr 7, 2009 at 2:35 PM, Julian Elischer wrote: > Sepherosa Ziehau wrote: >> >> On Mon, Apr 6, 2009 at 7:59 PM, Robert Watson wrote: >>> >>> m_pullup() has to do with mbuf chain memory contiguity during packet >>> processing. The usual usage is something along the following lines: >>> >>> struct whatever *w; >>> >>> m = m_pullup(m, sizeof(*w)); >>> if (m == NULL) >>> return; >>> w = mtod(m, struct whatever *); > > while this is true, m_pullup ALWAYS does things so in fact you > want to always put it in a test to see if it is really needed.. This probably will not be much problem on RX path, drivers always have to set m->m_len, so m->m_len is probably still in cache. > > from memory it is something like: > > if (m->m_len < headerlen && (m = m_pullup(m, headerlen)) == NULL) { > log(LOG_WARNING, > "nglmi: m_pullup failed for %d bytes\n", headerlen); > return (0); > } > header = mtod(m, struct header *); > > >>> >>> m_pullup() here ensures that the first sizeof(*w) bytes of mbuf data are >>> contiguously stored so that the cast of w to m's data will point at a >>> complete structure we can use to interpret packet data. In the common >>> case >>> in the receipt path, m_pullup() should be a no-op, since almost all >>> drivers >>> receive data in a single cluster. >>> >>> However, there are cases where it might not happen, such as loopback >>> traffic >>> where unusual encapsulation is used, leading to a call to M_PREPEND() >>> that >>> inserts a new mbuf on the front of the chain, which is later m_defrag()'d >>> leading to a higher level header crossing a boundary or the like. >>> >>> This issue is almost entirely independent from things like the cache line >>> miss issue, unless you hit the uncommon case of having to do work in >>> m_pullup(), in which case life sucks. >>> >>> It would be useful to use DTrace to profile a number of the workfull >>> m_foo() >>> functions to make sure we're not hitting them in normal workloads, btw. >> >> I highly suspect m_pullup will take any real effect on RX path, given >> how most of drivers allocate the mbuf for RX ring (all RX mbufs should >> be mclusters). >> >> Best Regards, >> sephe >> > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" > -- Live Free or Die From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 09:24:45 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8ED5E1065672; Tue, 7 Apr 2009 09:24:45 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 64BA38FC0C; Tue, 7 Apr 2009 09:24:45 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id E6CBE46B9D; Tue, 7 Apr 2009 05:24:44 -0400 (EDT) Date: Tue, 7 Apr 2009 10:24:44 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sepherosa Ziehau In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 09:24:45 -0000 On Tue, 7 Apr 2009, Sepherosa Ziehau wrote: >> This issue is almost entirely independent from things like the cache line >> miss issue, unless you hit the uncommon case of having to do work in >> m_pullup(), in which case life sucks. >> >> It would be useful to use DTrace to profile a number of the workfull >> m_foo() functions to make sure we're not hitting them in normal workloads, >> btw. > > I highly suspect m_pullup will take any real effect on RX path, given how > most of drivers allocate the mbuf for RX ring (all RX mbufs should be > mclusters). Agreed, but it's good to be sure one is right about these things. :-) Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 09:26:32 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 38F64106568C; Tue, 7 Apr 2009 09:26:32 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 0C6CD8FC08; Tue, 7 Apr 2009 09:26:32 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id ADFAE46B91; Tue, 7 Apr 2009 05:26:31 -0400 (EDT) Date: Tue, 7 Apr 2009 10:26:31 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Julian Elischer In-Reply-To: <49DAF447.5020407@elischer.org> Message-ID: References: <49DAF447.5020407@elischer.org> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Sepherosa Ziehau , freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 09:26:32 -0000 On Mon, 6 Apr 2009, Julian Elischer wrote: > while this is true, m_pullup ALWAYS does things so in fact you want to > always put it in a test to see if it is really needed.. Then m_pullup() should be fixed? Keeping the expression of the pullup short makes the network code a lot more compact, which is a significant benefit. Robert N M Watson Computer Laboratory University of Cambridge > > from memory it is something like: > > if (m->m_len < headerlen && (m = m_pullup(m, headerlen)) == NULL) { > log(LOG_WARNING, > "nglmi: m_pullup failed for %d bytes\n", headerlen); > return (0); > } > header = mtod(m, struct header *); > > >>> >>> m_pullup() here ensures that the first sizeof(*w) bytes of mbuf data are >>> contiguously stored so that the cast of w to m's data will point at a >>> complete structure we can use to interpret packet data. In the common >>> case >>> in the receipt path, m_pullup() should be a no-op, since almost all >>> drivers >>> receive data in a single cluster. >>> >>> However, there are cases where it might not happen, such as loopback >>> traffic >>> where unusual encapsulation is used, leading to a call to M_PREPEND() that >>> inserts a new mbuf on the front of the chain, which is later m_defrag()'d >>> leading to a higher level header crossing a boundary or the like. >>> >>> This issue is almost entirely independent from things like the cache line >>> miss issue, unless you hit the uncommon case of having to do work in >>> m_pullup(), in which case life sucks. >>> >>> It would be useful to use DTrace to profile a number of the workfull >>> m_foo() >>> functions to make sure we're not hitting them in normal workloads, btw. >> >> I highly suspect m_pullup will take any real effect on RX path, given >> how most of drivers allocate the mbuf for RX ring (all RX mbufs should >> be mclusters). >> >> Best Regards, >> sephe >> > > From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 12:11:50 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D89491065825 for ; Tue, 7 Apr 2009 12:11:50 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63906.mail.re1.yahoo.com (web63906.mail.re1.yahoo.com [69.147.97.121]) by mx1.freebsd.org (Postfix) with SMTP id 6F4388FC08 for ; Tue, 7 Apr 2009 12:11:50 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 35966 invoked by uid 60001); 7 Apr 2009 12:11:50 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239106309; bh=QQnklHzORTgtnBxvApx/35hxpzLUqRD4UPae3d/Tc6o=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=sA9hP0o38ntWa01awAfsC1A0EU4AhmzDXdHBzKJNkM+v0iBkoJJWjLRRXpVHUY70EAh3ejaubn65yKxKualyMGe4+7C6mIBf/N6vTEpF6ELUBxtA/RcDMNK247y1NK2y8hqpa8tSoIPL12bFNInF88M7Q4+51rb6uhAVDRNQrqk= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=rxa+BBk7r+B5uH+R5s9NJmh6yCCTuqY7SY4bL7UsDomTUfU+8bGNjKOHlOdVPDOPuOFyl+OCMyGAyTQhg6o0xAIOUWn/bQmcQCKQxr+G33rPGDcgxBuwnhJQ/OIsHozlipo+vhY24iuguupEdfp5WPU8xjGo71OVwAdYQqQZp2I=; Message-ID: <952316.35609.qm@web63906.mail.re1.yahoo.com> X-YMail-OSG: qPdgWekVM1kVWVo3zshllwJm0FXN5.Y29LgUBsOxAwqH91p0lPj_QYvwafSOggJrXuZs4mGJKkRfpIMTkov9eaboET89cPiWAUEsy.P_3NPlCI0v3EjL2Cwt_v9WYpZ7Yizu7N2d6zcx4qaQN_xGtxp8YmcXDrQmNXMnYs4pioh.lkk41Q3NTdmgIX0bdhKtknCFrLlmK_Qe6XHyVu4AF84tmxCRtGKQjASKvdW2OqVoVXpQ6XLIV1kHOLaK4SxrmpWDedZsvwC0dSX4eFoRzOkpKqgTffiZH5R82G2tbg3v.jWNNNnc7jcRjm1ZU5JTRyxPeG3EfkANJX9yRyhi1RTF Received: from [98.242.222.229] by web63906.mail.re1.yahoo.com via HTTP; Tue, 07 Apr 2009 05:11:49 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Tue, 7 Apr 2009 05:11:49 -0700 (PDT) From: Barney Cordoba To: Ivan Voras , Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 12:12:06 -0000 --- On Mon, 4/6/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Ivan Voras" > Cc: freebsd-net@freebsd.org > Date: Monday, April 6, 2009, 7:59 AM > On Mon, 6 Apr 2009, Ivan Voras wrote: > > >>> I'd like to understand more. If (in > netisr) I have a mbuf with headers, is this data already > transfered from the card or is it magically "not here > yet"? > >> > >> A lot depends on the details of the card and > driver. The driver will take cache misses on the descriptor > ring entry, if it's not already in cache, and the link > layer will take a cache miss on the front of the ethernet > frame in the cluster pointed to by the mbuf header as part > of its demux. What happens next depends on your dispatch > model and cache line size. Let's make a few simplifying > assumptions that are mostly true: > > > > So, a mbuf can reference data not yet copied from the > NIC hardware? I'm specifically trying to undestand what > m_pullup() does. > > I think we're talking slightly at cross purposes. > There are two transfers of interest: > > (1) DMA of the packet data to main memory from the NIC > (2) Servicing of CPU cache misses to access data in main > memory > > By the time you receive an interrupt, the DMA is complete, > so once you believe a packet referenced by the descriptor > ring is done, you don't have to wait for DMA. However, > the packet data is in main memory rather than your CPU > cache, so you'll need to take a cache miss in order to > retrieve it. You don't want to prefetch before you know > the packet data is there, or you may prefetch stale data > from the previous packet sent or received from the cluster. > > m_pullup() has to do with mbuf chain memory contiguity > during packet processing. The usual usage is something > along the following lines: > > struct whatever *w; > > m = m_pullup(m, sizeof(*w)); > if (m == NULL) > return; > w = mtod(m, struct whatever *); > > m_pullup() here ensures that the first sizeof(*w) bytes of > mbuf data are contiguously stored so that the cast of w to > m's data will point at a complete structure we can use > to interpret packet data. In the common case in the receipt > path, m_pullup() should be a no-op, since almost all drivers > receive data in a single cluster. > > However, there are cases where it might not happen, such as > loopback traffic where unusual encapsulation is used, > leading to a call to M_PREPEND() that inserts a new mbuf on > the front of the chain, which is later m_defrag()'d > leading to a higher level header crossing a boundary or the > like. > > This issue is almost entirely independent from things like > the cache line miss issue, unless you hit the uncommon case > of having to do work in m_pullup(), in which case life > sucks. > > It would be useful to use DTrace to profile a number of the > workfull m_foo() functions to make sure we're not > hitting them in normal workloads, btw. > > >>> As the card and the OS can already process > many packets per second for > >>> something fairly complex as routing > >>> (http://www.tancsa.com/blast.html), and TCP > chokes swi:net at 100% of > >>> a core, isn't this indication there's > certainly more space for > >>> improvement even with a single-queue > old-fashioned NICs? > >> > >> Maybe. It depends on the relative costs of local > processing vs > >> redistributing the work, which involves > schedulers, IPIs, additional > >> cache misses, lock contention, and so on. This > means there's a period > >> where it can't possibly be a win, and then at > some point it's a win as > >> long as the stack scales. This is essentially the > usual trade-off in > >> using threads and parallelism: does the benefit of > multiple parallel > >> execution units make up for the overheads of > synchronization and data > >> migration? > > > > Do you have any idea at all why I'm seeing the > weird difference of netstat packets per second (250,000) and > my application's TCP performance (< 1,000 pps)? > Summary: each packet is guaranteed to be a whole message > causing a transaction in the application - without the > changes I see pps almost identical to tps. Even if the > source of netstat statistics somehow manages to count > packets multiple time (I don't see how that can happen), > no relation can describe differences this huge. It almost > looks like something in the upper layers is discarding > packets (also not likely: TCP timeouts would occur and the > application wouldn't be able to push 250,000 pps) - but > what? Where to look? > > Is this for the loopback workload? If so, remember that > there may be some other things going on: > > - Every packet is processed at least two times: once went > sent, and then again > when it's received. > > - A TCP segment will need to be ACK'd, so if you're > sending data in chunks in > one direction, the ACKs will not be piggy-backed on > existing data tranfers, > and instead be sent independently, hitting the network > stack two more times. > > - Remember that TCP works to expand its window, and then > maintains the highest > performance it can by bumping up against the top of > available bandwidth > continuously. This involves detecting buffer limits by > generating packets > that can't be sent, adding to the packet count. With > loopback traffic, the > drop point occurs when you exceed the size of the > netisr's queue for IP, so > you might try bumping that from the default to something > much larger. > > And nothing beats using tcpdump -- have you tried > tcpdumping the loopback to see what is actually being sent? > If not, that's always educational -- perhaps something > weird is going on with delayed ACKs, etc. > > > You mean for the general code? I purposely don't > lock my statistics variables because I'm not that > interested in exact numbers (orders of magnitude are > relevant). As far as I understand, unlocked "x++" > should be trivially fast in this case? > > No. x++ is massively slow if executed in parallel across > many cores on a variable in a single cache line. See my > recent commit to kern_tc.c for an example: the updating of > trivial statistics for the kernel time calls reduced 30m > syscalls/second to 3m syscalls/second due to heavy > contention on the cache line holding the statistic. One of > my goals for 8.0 is to fix this problem for IP and TCP > layers, and ideally also ifnet but we'll see. We should > be maintaining those stats per-CPU and then aggregating to > report them to userspace. This is what we already do for a > number of system stats -- UMA and kernel malloc, syscall and > trap counters, etc. > > >> - Use cpuset to pin ithreads, the netisr, and > whatever else, to specific > >> cores > >> so that they don't migrate, and if your > system uses HTT, experiment with > >> pinning the ithread and the netisr on different > threads on the same > >> core, or > >> at least, different cores on the same die. > > > > I'm using em hardware; I still think there's a > possibility I'm fighting the driver in some cases but > this has priority #2. > > Have you tried LOCK_PROFILING? It would quickly tell you > if driver locks were a source of significant contention. It > works quite well... When I enabled LOCK_PROFILING my side modules, such as if_ibg, stopped working. It seems that the ifnet structure or something changed with that option enabled. Is there a way to sync this without having to integrate everything into a specific kernel build? Barney From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 12:54:26 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 432621065740; Tue, 7 Apr 2009 12:54:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1D1778FC24; Tue, 7 Apr 2009 12:54:26 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id C830846BC8; Tue, 7 Apr 2009 08:54:25 -0400 (EDT) Date: Tue, 7 Apr 2009 13:54:25 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Sepherosa Ziehau In-Reply-To: Message-ID: References: User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 12:54:27 -0000 On Tue, 7 Apr 2009, Sepherosa Ziehau wrote: > On Sun, Apr 5, 2009 at 9:34 PM, Ivan Voras wrote: >> Robert Watson wrote: >>> >>> On Sun, 5 Apr 2009, Ivan Voras wrote: >>> >>>> I thought this has something to deal with NIC moderation (em) but >>>> can't really explain it. The bad performance part (not the jump) is >>>> also visible over the loopback interface. >>> >>> FYI, if you want high performance, you really want a card supporting >>> multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are > > PCI-E em(4) supports 2 RX queues. 82571/82572 support 2 TX queues. I have > not tested multi-TX queues, but em(4) multi-RX queues work well in dfly > (tested with 82573 and 82571) You may not have seen, but in FreeBSD 7.x and higher, we have a new if_igb driver to support more recent Intel gigabit devices, which now probes a few of the devices historically associated with if_em. For example, on one of the boxes I use: igb0: port 0x3000-0x301f mem 0xd8220000-0xd823ffff,0xd8200000-0xd821ffff,0xd8280000-0xd8283fff irq 32 at device 0.0 on pci8 igb0: Using MSIX interrupts with 3 vectors igb0: [ITHREAD] igb0: [ITHREAD] igb0: [ITHREAD] igb0: Ethernet address: 00:30:48:d2:ca:c2 igb1: port 0x3020-0x303f mem 0xd8260000-0xd827ffff,0xd8240000-0xd825ffff,0xd8284000-0xd8287fff irq 46 at device 0.1 on pci8 igb1: Using MSIX interrupts with 3 vectors igb1: [ITHREAD] igb1: [ITHREAD] igb1: [ITHREAD] igb1: Ethernet address: 00:30:48:d2:ca:c3 igb0: RX LRO Initialized igb1: RX LRO Initialized Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 12:56:02 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C945A10656D4; Tue, 7 Apr 2009 12:56:02 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id A27BF8FC22; Tue, 7 Apr 2009 12:56:02 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 3ED3F46B9D; Tue, 7 Apr 2009 08:56:02 -0400 (EDT) Date: Tue, 7 Apr 2009 13:56:02 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Barney Cordoba In-Reply-To: <952316.35609.qm@web63906.mail.re1.yahoo.com> Message-ID: References: <952316.35609.qm@web63906.mail.re1.yahoo.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 12:56:04 -0000 On Tue, 7 Apr 2009, Barney Cordoba wrote: >> Have you tried LOCK_PROFILING? It would quickly tell you if driver locks >> were a source of significant contention. It works quite well... > > When I enabled LOCK_PROFILING my side modules, such as if_ibg, stopped > working. It seems that the ifnet structure or something changed with that > option enabled. Is there a way to sync this without having to integrate > everything into a specific kernel build? LOCK_PROFILING changes the size of lock-related data structures, so requires both kernel and full set of modules to be rebuilt with the option. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 13:57:48 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3BC811065675; Tue, 7 Apr 2009 13:57:48 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: from yx-out-2324.google.com (yx-out-2324.google.com [74.125.44.28]) by mx1.freebsd.org (Postfix) with ESMTP id C46988FC0A; Tue, 7 Apr 2009 13:57:47 +0000 (UTC) (envelope-from sepherosa@gmail.com) Received: by yx-out-2324.google.com with SMTP id 8so1658717yxm.13 for ; Tue, 07 Apr 2009 06:57:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=NRRxqRHX4tZnM+d5sybh74uSqZk7VEBGdro5w3v3oUM=; b=Y3xAq0Fz4k4FQxZO/DWcWfVJkSG+lKTCfSifAGsMHuPKq11Qo5LFRu1d9O1k2Pg0Pw IuX2B2iX2PMm33/GC9gQ69XYbh+gQJqXs+zS3YOPJKtW5xG3fGYR7OaQxKRPyYCZIbAp DYl8C6ZEUTFr26u2A3DP6zPPWZFAdyLLbeA80= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=SWj8DIjSSxrbAt4j8gCxDx80o2TUMI3W0a0vRo57UW4Pwn4aSEdnaPAwWgNVwWmFQH +Mgkj+lksBwj3ZRxmzrDYzi6jptZSCiV+auJ1Clt/41JBuT646S5hAoUBhxTbCmAmZ9r 0TjN1IK+DYgnwAR8Kub8Ev096YhJFCA1uQmJk= MIME-Version: 1.0 Received: by 10.150.133.18 with SMTP id g18mr390437ybd.181.1239112667344; Tue, 07 Apr 2009 06:57:47 -0700 (PDT) In-Reply-To: References: Date: Tue, 7 Apr 2009 21:57:47 +0800 Message-ID: From: Sepherosa Ziehau To: Robert Watson Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 13:57:48 -0000 On Tue, Apr 7, 2009 at 8:54 PM, Robert Watson wrote: > > On Tue, 7 Apr 2009, Sepherosa Ziehau wrote: > >> On Sun, Apr 5, 2009 at 9:34 PM, Ivan Voras wrote: >>> >>> Robert Watson wrote: >>>> >>>> On Sun, 5 Apr 2009, Ivan Voras wrote: >>>> >>>>> I thought this has something to deal with NIC moderation (em) but >>>>> can't really explain it. The bad performance part (not the jump) is >>>>> also visible over the loopback interface. >>>> >>>> FYI, if you want high performance, you really want a card supporting >>>> multiple input queues -- igb, cxgb, mxge, etc. if_em-only cards are >> >> PCI-E em(4) supports 2 RX queues. 82571/82572 support 2 TX queues. I have >> not tested multi-TX queues, but em(4) multi-RX queues work well in dfly >> (tested with 82573 and 82571) > > You may not have seen, but in FreeBSD 7.x and higher, we have a new if_igb > driver to support more recent Intel gigabit devices, which now probes a few > of the devices historically associated with if_em. For example, on one of > the boxes I use: If I understand the code correctly, it only takes 82575 and 82576; I don't have the hardware, else I would have already added dfly support (with multi rx queues at least, it seems 82576 supports 16 RX queues :) 8257{1/2/3} are still taken by em(4) in FreeBSD. In dfly, I simply forked em(4) (named emx) to create a special version for pci-e devices, for which Intel published developers' manual. I added multi-rxqueue support to it (multi-txqueue support is planned) and cleaned up the TX/RX path. IMHO, 82571 is too widely used to be ignored. Best Regards, sephe -- Live Free or Die From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 14:32:10 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EF1BA10658CF for ; Tue, 7 Apr 2009 14:32:10 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: from mail-ew0-f171.google.com (mail-ew0-f171.google.com [209.85.219.171]) by mx1.freebsd.org (Postfix) with ESMTP id 77D6F8FC18 for ; Tue, 7 Apr 2009 14:32:10 +0000 (UTC) (envelope-from ivoras@gmail.com) Received: by ewy19 with SMTP id 19so2320023ewy.43 for ; Tue, 07 Apr 2009 07:32:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:from:date:x-google-sender-auth:message-id:subject:to:cc :content-type:content-transfer-encoding; bh=br+xAcGSEX1Epu8s1Rab+sL2rCJPJRd5KlaLH24qisI=; b=Tu/vsKjW19N70QXauqijB8U5G8W4UP1aY9GWqS3rdJQlz2/G65ZX/1j4sDqN8BiYT9 +c99pvPe/g2H9LtN2+SaZb+HiNRbBaWN93PWxrva4ynS2KeKPEG0ydg4YzbSs2hWRRFm 4QVAPWSN4BmxcLdcMSdV+XNmP/8zAKeKdMgOg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type :content-transfer-encoding; b=omaOyWPf9llhkg2UsUgiWKVkdTW7w9cMX000JntNqKhmnv/up6kJc4/ha4PcC2KUKX wxcu12bvs+r+jJRGNC6l8agMcrIBIlhF9fxruh6LuNyaLr1sDhybpD9RZQbVnx7XDn1p enOYbPaR6oGm/aJ7cFFHhiuPyIDh9GoFgskSg= MIME-Version: 1.0 Sender: ivoras@gmail.com Received: by 10.210.66.13 with SMTP id o13mr4122619eba.46.1239112867191; Tue, 07 Apr 2009 07:01:07 -0700 (PDT) In-Reply-To: References: From: Ivan Voras Date: Tue, 7 Apr 2009 16:00:52 +0200 X-Google-Sender-Auth: a33dba50616821f1 Message-ID: <9bbcef730904070700x6f38e83dka1fdc06c48c14111@mail.gmail.com> To: Sepherosa Ziehau Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-net@freebsd.org, Robert Watson Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 14:32:12 -0000 2009/4/7 Sepherosa Ziehau : > =C2=A0IMHO, 82571 is too widely used to be > ignored. +1 :) From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 14:45:07 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6996010656BC for ; Tue, 7 Apr 2009 14:45:07 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from mail.cksoft.de (mail.cksoft.de [195.88.108.3]) by mx1.freebsd.org (Postfix) with ESMTP id 20DB48FC16 for ; Tue, 7 Apr 2009 14:45:06 +0000 (UTC) (envelope-from bz@FreeBSD.org) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id BE18F41C757; Tue, 7 Apr 2009 16:45:05 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([195.88.108.3]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id wGUda5G0FqVq; Tue, 7 Apr 2009 16:45:05 +0200 (CEST) Received: by mail.cksoft.de (Postfix, from userid 66) id 69ABE41C730; Tue, 7 Apr 2009 16:45:05 +0200 (CEST) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 0FFBF4448E6; Tue, 7 Apr 2009 14:44:07 +0000 (UTC) Date: Tue, 7 Apr 2009 14:44:07 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: sthaug@nethelp.no In-Reply-To: <20090406.121959.74751582.sthaug@nethelp.no> Message-ID: <20090407144311.F15361@maildrop.int.zabbadoz.net> References: <20090405.231044.74688369.sthaug@nethelp.no> <20090405214757.E15361@maildrop.int.zabbadoz.net> <20090405215842.C15361@maildrop.int.zabbadoz.net> <20090406.121959.74751582.sthaug@nethelp.no> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 14:45:08 -0000 On Mon, 6 Apr 2009, sthaug@nethelp.no wrote: >> Can you try changing it to < sb_max) for IPv6 as well and see if >> things work (better) for you? > > I changed it, and that worked like a dream. Now I get basically the > same throughput with IPv4 and IPv6. There are of course still issues > like lots of IPv6 tunnels that add extra latency - but that's not the > fault of FreeBSD. > > Anyway, thanks for your work. Below is a context diff (against 7-STABLE > cvsupped last night). Do we need a PR to get this into FreeBSD? It's in HEAD now as of SVN r190800. -- Bjoern A. Zeeb The greatest risk is not taking one. From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 14:57:11 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 4A8AB10656CC for ; Tue, 7 Apr 2009 14:57:11 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: from bizet.nethelp.no (bizet.nethelp.no [195.1.209.33]) by mx1.freebsd.org (Postfix) with SMTP id 831A68FC13 for ; Tue, 7 Apr 2009 14:57:10 +0000 (UTC) (envelope-from sthaug@nethelp.no) Received: (qmail 25651 invoked from network); 7 Apr 2009 14:57:08 -0000 Received: from bizet.nethelp.no (HELO localhost) (195.1.209.33) by bizet.nethelp.no with SMTP; 7 Apr 2009 14:57:08 -0000 Date: Tue, 07 Apr 2009 16:57:08 +0200 (CEST) Message-Id: <20090407.165708.74744827.sthaug@nethelp.no> To: bz@FreeBSD.org From: sthaug@nethelp.no In-Reply-To: <20090407144311.F15361@maildrop.int.zabbadoz.net> References: <20090405215842.C15361@maildrop.int.zabbadoz.net> <20090406.121959.74751582.sthaug@nethelp.no> <20090407144311.F15361@maildrop.int.zabbadoz.net> X-Mailer: Mew version 3.3 on Emacs 21.3 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 14:57:13 -0000 > > I changed it, and that worked like a dream. Now I get basically the > > same throughput with IPv4 and IPv6. There are of course still issues > > like lots of IPv6 tunnels that add extra latency - but that's not the > > fault of FreeBSD. > > > > Anyway, thanks for your work. Below is a context diff (against 7-STABLE > > cvsupped last night). Do we need a PR to get this into FreeBSD? > > It's in HEAD now as of SVN r190800. Excellent news, thank you! And presumably we'll get a MFC after a suitable settling time? Steinar Haug, Nethelp consulting, sthaug@nethelp.no From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 16:47:41 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E97791065692 for ; Tue, 7 Apr 2009 16:47:41 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outH.internet-mail-service.net (outh.internet-mail-service.net [216.240.47.231]) by mx1.freebsd.org (Postfix) with ESMTP id C7B0A8FC22 for ; Tue, 7 Apr 2009 16:47:41 +0000 (UTC) (envelope-from julian@elischer.org) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id A5F10B98E3; Tue, 7 Apr 2009 09:47:41 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id 5EE282D60E1; Tue, 7 Apr 2009 09:47:37 -0700 (PDT) Message-ID: <49DB83CB.9070707@elischer.org> Date: Tue, 07 Apr 2009 09:48:11 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: barney_cordoba@yahoo.com References: <952316.35609.qm@web63906.mail.re1.yahoo.com> In-Reply-To: <952316.35609.qm@web63906.mail.re1.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org, Robert Watson , Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 16:47:42 -0000 Barney Cordoba wrote: > > > > --- On Mon, 4/6/09, Robert Watson wrote: > >> From: Robert Watson >> Subject: Re: Advice on a multithreaded netisr patch? >> To: "Ivan Voras" >> Cc: freebsd-net@freebsd.org >> Date: Monday, April 6, 2009, 7:59 AM >> On Mon, 6 Apr 2009, Ivan Voras wrote: >> >>>>> I'd like to understand more. If (in >> netisr) I have a mbuf with headers, is this data already >> transfered from the card or is it magically "not here >> yet"? >>>> A lot depends on the details of the card and >> driver. The driver will take cache misses on the descriptor >> ring entry, if it's not already in cache, and the link >> layer will take a cache miss on the front of the ethernet >> frame in the cluster pointed to by the mbuf header as part >> of its demux. What happens next depends on your dispatch >> model and cache line size. Let's make a few simplifying >> assumptions that are mostly true: >>> So, a mbuf can reference data not yet copied from the >> NIC hardware? I'm specifically trying to undestand what >> m_pullup() does. >> >> I think we're talking slightly at cross purposes. >> There are two transfers of interest: >> >> (1) DMA of the packet data to main memory from the NIC >> (2) Servicing of CPU cache misses to access data in main >> memory >> >> By the time you receive an interrupt, the DMA is complete, >> so once you believe a packet referenced by the descriptor >> ring is done, you don't have to wait for DMA. However, >> the packet data is in main memory rather than your CPU >> cache, so you'll need to take a cache miss in order to >> retrieve it. You don't want to prefetch before you know >> the packet data is there, or you may prefetch stale data >> from the previous packet sent or received from the cluster. >> >> m_pullup() has to do with mbuf chain memory contiguity >> during packet processing. The usual usage is something >> along the following lines: >> >> struct whatever *w; >> >> m = m_pullup(m, sizeof(*w)); >> if (m == NULL) >> return; >> w = mtod(m, struct whatever *); >> >> m_pullup() here ensures that the first sizeof(*w) bytes of >> mbuf data are contiguously stored so that the cast of w to >> m's data will point at a complete structure we can use >> to interpret packet data. In the common case in the receipt >> path, m_pullup() should be a no-op, since almost all drivers >> receive data in a single cluster. >> >> However, there are cases where it might not happen, such as >> loopback traffic where unusual encapsulation is used, >> leading to a call to M_PREPEND() that inserts a new mbuf on >> the front of the chain, which is later m_defrag()'d >> leading to a higher level header crossing a boundary or the >> like. >> >> This issue is almost entirely independent from things like >> the cache line miss issue, unless you hit the uncommon case >> of having to do work in m_pullup(), in which case life >> sucks. >> >> It would be useful to use DTrace to profile a number of the >> workfull m_foo() functions to make sure we're not >> hitting them in normal workloads, btw. >> >>>>> As the card and the OS can already process >> many packets per second for >>>>> something fairly complex as routing >>>>> (http://www.tancsa.com/blast.html), and TCP >> chokes swi:net at 100% of >>>>> a core, isn't this indication there's >> certainly more space for >>>>> improvement even with a single-queue >> old-fashioned NICs? >>>> Maybe. It depends on the relative costs of local >> processing vs >>>> redistributing the work, which involves >> schedulers, IPIs, additional >>>> cache misses, lock contention, and so on. This >> means there's a period >>>> where it can't possibly be a win, and then at >> some point it's a win as >>>> long as the stack scales. This is essentially the >> usual trade-off in >>>> using threads and parallelism: does the benefit of >> multiple parallel >>>> execution units make up for the overheads of >> synchronization and data >>>> migration? >>> Do you have any idea at all why I'm seeing the >> weird difference of netstat packets per second (250,000) and >> my application's TCP performance (< 1,000 pps)? >> Summary: each packet is guaranteed to be a whole message >> causing a transaction in the application - without the >> changes I see pps almost identical to tps. Even if the >> source of netstat statistics somehow manages to count >> packets multiple time (I don't see how that can happen), >> no relation can describe differences this huge. It almost >> looks like something in the upper layers is discarding >> packets (also not likely: TCP timeouts would occur and the >> application wouldn't be able to push 250,000 pps) - but >> what? Where to look? >> >> Is this for the loopback workload? If so, remember that >> there may be some other things going on: >> >> - Every packet is processed at least two times: once went >> sent, and then again >> when it's received. >> >> - A TCP segment will need to be ACK'd, so if you're >> sending data in chunks in >> one direction, the ACKs will not be piggy-backed on >> existing data tranfers, >> and instead be sent independently, hitting the network >> stack two more times. >> >> - Remember that TCP works to expand its window, and then >> maintains the highest >> performance it can by bumping up against the top of >> available bandwidth >> continuously. This involves detecting buffer limits by >> generating packets >> that can't be sent, adding to the packet count. With >> loopback traffic, the >> drop point occurs when you exceed the size of the >> netisr's queue for IP, so >> you might try bumping that from the default to something >> much larger. >> >> And nothing beats using tcpdump -- have you tried >> tcpdumping the loopback to see what is actually being sent? >> If not, that's always educational -- perhaps something >> weird is going on with delayed ACKs, etc. >> >>> You mean for the general code? I purposely don't >> lock my statistics variables because I'm not that >> interested in exact numbers (orders of magnitude are >> relevant). As far as I understand, unlocked "x++" >> should be trivially fast in this case? >> >> No. x++ is massively slow if executed in parallel across >> many cores on a variable in a single cache line. See my >> recent commit to kern_tc.c for an example: the updating of >> trivial statistics for the kernel time calls reduced 30m >> syscalls/second to 3m syscalls/second due to heavy >> contention on the cache line holding the statistic. One of >> my goals for 8.0 is to fix this problem for IP and TCP >> layers, and ideally also ifnet but we'll see. We should >> be maintaining those stats per-CPU and then aggregating to >> report them to userspace. This is what we already do for a >> number of system stats -- UMA and kernel malloc, syscall and >> trap counters, etc. >> >>>> - Use cpuset to pin ithreads, the netisr, and >> whatever else, to specific >>>> cores >>>> so that they don't migrate, and if your >> system uses HTT, experiment with >>>> pinning the ithread and the netisr on different >> threads on the same >>>> core, or >>>> at least, different cores on the same die. >>> I'm using em hardware; I still think there's a >> possibility I'm fighting the driver in some cases but >> this has priority #2. >> >> Have you tried LOCK_PROFILING? It would quickly tell you >> if driver locks were a source of significant contention. It >> works quite well... > > When I enabled LOCK_PROFILING my side modules, such as if_ibg, > stopped working. It seems that the ifnet structure or something > changed with that option enabled. Is there a way to sync this without > having to integrate everything into a specific kernel build? > no, I don't think there is any other way.. last time I checked the mutex structure changed size which meant that almost everything else that included a mutex changed size. That may not be true now but I haven't checked.. > Barney > > > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 20:12:04 2009 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F0E48106566B for ; Tue, 7 Apr 2009 20:12:04 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (hergotha.csail.mit.edu [66.92.79.170]) by mx1.freebsd.org (Postfix) with ESMTP id A13A38FC14 for ; Tue, 7 Apr 2009 20:12:04 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.2/8.14.2) with ESMTP id n37KC3Wb050335; Tue, 7 Apr 2009 16:12:03 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.2/8.13.8/Submit) id n37KC3lA050334; Tue, 7 Apr 2009 16:12:03 -0400 (EDT) (envelope-from wollman) Date: Tue, 7 Apr 2009 16:12:03 -0400 (EDT) From: Garrett Wollman Message-Id: <200904072012.n37KC3lA050334@hergotha.csail.mit.edu> To: rwatson@freebsd.org X-Newsgroups: mit.lcs.mail.freebsd-net In-Reply-To: References: Organization: None X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (hergotha.csail.mit.edu [127.0.0.1]); Tue, 07 Apr 2009 16:12:03 -0400 (EDT) X-Spam-Status: No, score=-1.4 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on hergotha.csail.mit.edu Cc: net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 20:12:05 -0000 In article , Robert Watson writes: >m_pullup() has to do with mbuf chain memory contiguity during packet >processing. Historically, m_pullup() also had one other extremely important function: to make sure that the header data you were about to modify was not stored in a (possibly shared) cluster. Thus, in the input path for a typical driver which puts the whole packet into a cluster, the very first m_pullup() would allocate a new plain mbuf, carefully align the data pointer to allow for both prepending more headers and pulling more header data out, and copy the requested data into the internal buffer of the mbuf. -GAWollman From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 21:48:59 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2CF17106566B for ; Tue, 7 Apr 2009 21:48:59 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63904.mail.re1.yahoo.com (web63904.mail.re1.yahoo.com [69.147.97.119]) by mx1.freebsd.org (Postfix) with SMTP id DB67D8FC1D for ; Tue, 7 Apr 2009 21:48:58 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 8939 invoked by uid 60001); 7 Apr 2009 21:48:58 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239140938; bh=edKaXXlheTRCU7a0Hb5pxmKQ2ERAT4MbHjCQEYi7ygs=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=ap2rM8h0M51xEXthOsTANWjnGiPoD+kjw5015F2W3Ns736JanDrpJS0v64p8zJzzfouEuK/eUnyenN3fpupw8zy9OxgnXuwiEhit4lIPLJKcxZnJJ7ITdeL2zEZNzP0aVncU18G1sehJWG+YP15YLpm0SVXp/PM99R/NclQ2uPQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=qJbvV67MgqUsc9v7u8Pep/lKk+M4r90MEg4PXMGBHzYdoU273V0Bw3D8mHj0ZWCtDUcYBLrT2AYp8uwz43y7mHk+UGk8fAIJCUzZzJ7QcDyKJwWP6a9u6szqRub9u8spnVUn/MYBxxlvKaQmMLrRD4gplm/06vWOP9ezfajw5fQ=; Message-ID: <409843.2186.qm@web63904.mail.re1.yahoo.com> X-YMail-OSG: dL.jcakVM1k6TmJf3INAATWN1iNL8FVjrttDiroG4bbTPvxKraJAX2hujlMVzNKZxUNKOvi5YSFsmzlHn2QcmQOkIQTU9x_cUPDxLm8RZ.WDuCO9cT65a.4wt5jqbecm.YmLJgT8jA8BZJeqpwLNVmxrcd42e1FXbJrki1TrMc.opHujVQPEHUp1HHDj0Jux1oMbIPmI1qUQj.objoYLcwQb6Ve06xS7JPLjZyVA0tX3WmLeZSmQuK3DfGffl1t2iuouKo8rLuqabS3lIF3yDnfpTftWHtT71oHHY8XCEQw6IzrU3s13nA5YGJAR Received: from [98.242.222.229] by web63904.mail.re1.yahoo.com via HTTP; Tue, 07 Apr 2009 14:48:58 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Tue, 7 Apr 2009 14:48:58 -0700 (PDT) From: Barney Cordoba To: Robert Watson , Sepherosa Ziehau In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 21:48:59 -0000 --- On Tue, 4/7/09, Sepherosa Ziehau wrote: > From: Sepherosa Ziehau > Subject: Re: Advice on a multithreaded netisr patch? > To: "Robert Watson" > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Tuesday, April 7, 2009, 9:57 AM > On Tue, Apr 7, 2009 at 8:54 PM, Robert Watson > wrote: > > > > On Tue, 7 Apr 2009, Sepherosa Ziehau wrote: > > > >> On Sun, Apr 5, 2009 at 9:34 PM, Ivan Voras > wrote: > >>> > >>> Robert Watson wrote: > >>>> > >>>> On Sun, 5 Apr 2009, Ivan Voras wrote: > >>>> > >>>>> I thought this has something to deal > with NIC moderation (em) but > >>>>> can't really explain it. The bad > performance part (not the jump) is > >>>>> also visible over the loopback > interface. > >>>> > >>>> FYI, if you want high performance, you > really want a card supporting > >>>> multiple input queues -- igb, cxgb, mxge, > etc. if_em-only cards are > >> > >> PCI-E em(4) supports 2 RX queues. 82571/82572 > support 2 TX queues. I have > >> not tested multi-TX queues, but em(4) multi-RX > queues work well in dfly > >> (tested with 82573 and 82571) > > > > You may not have seen, but in FreeBSD 7.x and higher, > we have a new if_igb > > driver to support more recent Intel gigabit devices, > which now probes a few > > of the devices historically associated with if_em. > For example, on one of > > the boxes I use: > > If I understand the code correctly, it only takes 82575 and > 82576; I > don't have the hardware, else I would have already > added dfly support > (with multi rx queues at least, it seems 82576 supports 16 > RX queues > :) Regarding if_igb: 1) Multiple TX queues are not supported. There's some hokey code to test, but it doesn't properly separate flows to the queues. 2) 2 Rx queues don't work, so only 1 and 4 work 3) With 4 queues, it just sucks up CPU under heavy load on 4 cpus. It will blow 4 cpus at a lower load than em will with 1 4) You'll need to fix DMA setup, as it sets the alignment requirement to PAGE_SIZE. I haven't been able to convince Jack that its wrong, not that I've tried very hard since its easy to just fix myself. Barney From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 21:56:26 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7C72A1065743 for ; Tue, 7 Apr 2009 21:56:26 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63906.mail.re1.yahoo.com (web63906.mail.re1.yahoo.com [69.147.97.121]) by mx1.freebsd.org (Postfix) with SMTP id 1FC998FC1F for ; Tue, 7 Apr 2009 21:56:25 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 25587 invoked by uid 60001); 7 Apr 2009 21:56:25 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239141385; bh=KBo73T9ilm4q+IXA55Itnp76EQ8NJVXR0cQznLgkga8=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=FSbp4MK2Wm9w/5mhmOuhy+rNc+iovbGLZfJy1WcpScTwTvCM1JjnXsA87ObvUrIS2clbEycvW8I+8Db1srfCmoES5LSZQCrGIp5kf3Oq8Fs37RhD26g75wGGhaK4tCY7vccccjRInzWrO0e24kPEDIed/6utNKX7rvRgSZVaP/o= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=pGjMMLcxwmWigyEDeAggOYyQpwyEKz+EbMkNxxBIUcUcniibWmhR6kRufteM3FpTKoFajICPNKGxT9aCIe5Pssl1U8i8cu9Aljwxra29jXl2a4aoJ2ElSlmpO2K1nrDdyCA4tIoEMqKH7+YNY0blOPTmqdDhUJJpi8qN3EAXRzQ=; Message-ID: <497906.25422.qm@web63906.mail.re1.yahoo.com> X-YMail-OSG: COc4omAVM1llwXEVxppqfUU04BejMB9gaw5LPI1ufEdVDCHGPgIfYhVm1r3zts.hH1E4wx9ct6hwueImTOxpmChDAf7bxLuf5Sc8qzFwB0f8FninXZIpU5bS7R2Euyq.4QUtBp_rhCdccIXle_WNqw4cvVofmhvanJhh.YMMvey6CWNtpYAnMKzwftr0RWxoV6ll1RV7uFLMp_HZVNRerY4BeZEse7fwtfT.M0hH3_MZekW5MD1Ni_GjxAD2zz6Kj39CmeqUSIn_6Snh02WFTAPE2BjDFf8jub.Ry0Z6XQwvGF_fkAEwSHbGLDg3kSuHrsn_UXODc5PYJWIx2st.QJCz Received: from [98.242.222.229] by web63906.mail.re1.yahoo.com via HTTP; Tue, 07 Apr 2009 14:56:25 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Tue, 7 Apr 2009 14:56:25 -0700 (PDT) From: Barney Cordoba To: Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 21:56:27 -0000 --- On Tue, 4/7/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Barney Cordoba" > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Tuesday, April 7, 2009, 8:56 AM > On Tue, 7 Apr 2009, Barney Cordoba wrote: > > >> Have you tried LOCK_PROFILING? It would quickly > tell you if driver locks were a source of significant > contention. It works quite well... > > > > When I enabled LOCK_PROFILING my side modules, such as > if_ibg, stopped working. It seems that the ifnet structure > or something changed with that option enabled. Is there a > way to sync this without having to integrate everything into > a specific kernel build? > > LOCK_PROFILING changes the size of lock-related data > structures, so requires both kernel and full set of modules > to be rebuilt with the option. It might be good to mention this in the man page. Most 3rd party drivers build stand-alone, and even if you pull down the latest drivers from intel or broadcom they're usually built out of the kernel build. Its pretty frustrating to have random things failing, mbuf leaks, etc without any warning. Barney From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 22:00:21 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2FE24106578D for ; Tue, 7 Apr 2009 22:00:21 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from ciao.gmane.org (main.gmane.org [80.91.229.2]) by mx1.freebsd.org (Postfix) with ESMTP id A0A098FC08 for ; Tue, 7 Apr 2009 22:00:20 +0000 (UTC) (envelope-from freebsd-net@m.gmane.org) Received: from list by ciao.gmane.org with local (Exim 4.43) id 1LrJL8-0005TA-HE for freebsd-net@freebsd.org; Tue, 07 Apr 2009 22:00:19 +0000 Received: from 93-141-119-106.adsl.net.t-com.hr ([93.141.119.106]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 07 Apr 2009 22:00:18 +0000 Received: from ivoras by 93-141-119-106.adsl.net.t-com.hr with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Tue, 07 Apr 2009 22:00:18 +0000 X-Injected-Via-Gmane: http://gmane.org/ To: freebsd-net@freebsd.org From: Ivan Voras Date: Tue, 07 Apr 2009 23:59:34 +0200 Lines: 40 Message-ID: References: <409843.2186.qm@web63904.mail.re1.yahoo.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigB390DE759E042013D08A7D71" X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: 93-141-119-106.adsl.net.t-com.hr User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) In-Reply-To: <409843.2186.qm@web63904.mail.re1.yahoo.com> X-Enigmail-Version: 0.95.7 Sender: news Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 22:00:22 -0000 This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB390DE759E042013D08A7D71 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Barney Cordoba wrote: > 1) Multiple TX queues are not supported. There's some hokey code to > test, but it doesn't properly separate flows to the queues. > 2) 2 Rx queues don't work, so only 1 and 4 work > 3) With 4 queues, it just sucks up CPU under heavy load on 4 cpus. It w= ill > blow 4 cpus at a lower load than em will with 1 > 4) You'll need to fix DMA setup, as it sets the alignment requirement > to PAGE_SIZE. I haven't been able to convince Jack that its wrong, not > that I've tried very hard since its easy to just fix myself. Reading this thread it looks like the development of both Intel drivers is a bit stalled, doesn't it? AFAIK the em driver is also semi-officially abandoned, and both from my experience and others it looks like new development and patches are being rejected. Time to shop other hardware? --------------enigB390DE759E042013D08A7D71 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAknbzNIACgkQldnAQVacBcikhgCfesB1stCznijfA0tadxj3CjtE Nj8AnRXVnKZT8gLCDh4EODY9JM2ICE5p =x0kZ -----END PGP SIGNATURE----- --------------enigB390DE759E042013D08A7D71-- From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 22:24:17 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D3FE310656C7 for ; Tue, 7 Apr 2009 22:24:17 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63901.mail.re1.yahoo.com (web63901.mail.re1.yahoo.com [69.147.97.116]) by mx1.freebsd.org (Postfix) with SMTP id 8E2668FC15 for ; Tue, 7 Apr 2009 22:24:17 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 65588 invoked by uid 60001); 7 Apr 2009 22:24:16 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239143056; bh=SlEu34A7VdbgZUxz38n+v990JMVuEfGsN87WKdh5nvk=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=nyPuwfNQn73q3EDyuh6x8tyxlIVsbQlnQeXFmuz804nVc38iBrFUxDfC2p41R9oe5RW/eC/oVa1xzyU9Q2oMq9OaOXX7IkWjjSXnwsM+1Wo3SqzaBcB6ogCvMODgRVOvPlQP4TIfS+4CXDWgjJ+5eRdvlQ9p6iXgX3BSB8cqBdM= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=JnBzAqR8QcOsCqF+ifDSUMvF4b9EJegkhyQGXbJ64gqtkGDNAAgU4e/gHNnZglX3ePbEQagJCmn/MniVT8wuGEc6CihHAjaF3A+aoMOtJIK/WOcgVIcvUJdB5VK5UoNr+W8OdOEp7cFKQdcabJw1Lb1Urr9xV9+Yr8OOof9Of98=; Message-ID: <900824.65358.qm@web63901.mail.re1.yahoo.com> X-YMail-OSG: Lx63aHgVM1k9rc6b0Xup8J3ieUOaCnx7OiNoNhJAksQVPc6OI8dHNDwzefpR4zCQWOsicFm9k8H5kK0J3fzeeY4B57Tu5NsiG4xcI70OZry9GlwFq3eeEkDwCtw85C7cb5tTNL241vgE6MaM.z6GUwDDuDuEGmbHrYpeTPpggajUqKWbs80PsY8ncjp9vxrDDzta4.uR2PySp.bpHJkbftCdgGcmSJq3ZZRvOSnZbyiy8T4.2QdPn3SfNTWfEJpqGBEVbHuCb3QgPvgeEuqeB1sHLHeKrhmGY.5ud6uWWEwHeW66d4cYuYQV2aFxwfqhfwqEH5_Ec.2HQUzFfxYzNCAd Received: from [98.242.222.229] by web63901.mail.re1.yahoo.com via HTTP; Tue, 07 Apr 2009 15:24:16 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Tue, 7 Apr 2009 15:24:16 -0700 (PDT) From: Barney Cordoba To: freebsd-net@freebsd.org, Ivan Voras In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 22:24:18 -0000 --- On Tue, 4/7/09, Ivan Voras wrote: > From: Ivan Voras > Subject: Re: Advice on a multithreaded netisr patch? > To: freebsd-net@freebsd.org > Date: Tuesday, April 7, 2009, 5:59 PM > Barney Cordoba wrote: > > > 1) Multiple TX queues are not supported. There's > some hokey code to > > test, but it doesn't properly separate flows to > the queues. > > 2) 2 Rx queues don't work, so only 1 and 4 work > > 3) With 4 queues, it just sucks up CPU under heavy > load on 4 cpus. It will > > blow 4 cpus at a lower load than em will with 1 > > 4) You'll need to fix DMA setup, as it sets the > alignment requirement > > to PAGE_SIZE. I haven't been able to convince Jack > that its wrong, not > > that I've tried very hard since its easy to just > fix myself. > > Reading this thread it looks like the development of both > Intel drivers > is a bit stalled, doesn't it? AFAIK the em driver is > also > semi-officially abandoned, and both from my experience and > others it > looks like new development and patches are being rejected. > Time to shop > other hardware? To be fair, the OS doesn't really support multiqueue yet, or has for only a few hours, so lets not go crazy. It makes a lot more sense to have someone on the "team" work with Jack on improving the performance and working out the kinks. When I asked Jack about the poor performance of if_igb, he indicated that Intel's position is that the drivers are "just samples", which really doesn't give anyone much confidence that they want to run their business on them. You already have Jack doing all of the hard work; that is supporting the new-chip-per-week that intel puts out, so it seems to me the best strategy would be to try to convince Intel that its in their best interest to have drivers that work well so people don't think that their hardware stinks. As an example, the Chelsio 10gb bypass card is $3495. and an Intel card is ~$1000, so its a big win for the community as a whole to have good intel drivers going forward. My work is commercially proprietary so I can't share my code, but I can certainly share ideas on things that I've tested and discovered. Barney From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 22:52:51 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 906051065672; Tue, 7 Apr 2009 22:52:51 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 6C5918FC0A; Tue, 7 Apr 2009 22:52:51 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id ECB2E46BA6; Tue, 7 Apr 2009 18:52:50 -0400 (EDT) Date: Tue, 7 Apr 2009 23:52:50 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Barney Cordoba In-Reply-To: <497906.25422.qm@web63906.mail.re1.yahoo.com> Message-ID: References: <497906.25422.qm@web63906.mail.re1.yahoo.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 22:52:52 -0000 On Tue, 7 Apr 2009, Barney Cordoba wrote: >>> When I enabled LOCK_PROFILING my side modules, such as >> if_ibg, stopped working. It seems that the ifnet structure or something >> changed with that option enabled. Is there a way to sync this without >> having to integrate everything into a specific kernel build? >> >> LOCK_PROFILING changes the size of lock-related data structures, so >> requires both kernel and full set of modules to be rebuilt with the option. > > It might be good to mention this in the man page. Most 3rd party drivers > build stand-alone, and even if you pull down the latest drivers from intel > or broadcom they're usually built out of the kernel build. Its pretty > frustrating to have random things failing, mbuf leaks, etc without any > warning. >From the man page: NOTES The LOCK_PROFILING option increases the size of struct lock_object, so a kernel built with that option will not work with modules built without it. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 23:00:32 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C0951065674 for ; Tue, 7 Apr 2009 23:00:32 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63907.mail.re1.yahoo.com (web63907.mail.re1.yahoo.com [69.147.97.122]) by mx1.freebsd.org (Postfix) with SMTP id 2D9D08FC1D for ; Tue, 7 Apr 2009 23:00:31 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 29927 invoked by uid 60001); 7 Apr 2009 23:00:31 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239145231; bh=AucuYJ5g0s5VtRGOipkOMWqmY/pXZLBiPwzwj0LZ27w=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=UtHCDwgWW8xsfqihpoCWbbeOwBXQ4zNJa9/Hu4HH6EbigYbFPgOiIquLYR3Wsks4no3LtW6f5FrxB/VmjChpjLdnbsH+VBOnV1/ESN39mWasARzRY6mcXKuSjUVkIL0EkK0sdlKhA7ChugYx1gaRNcqEQXXkWdqlTj3AIx18Vsc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=6NA9LfSlNWYVhZPMYn+e+Y86PMvQolM2S78IczthrghDrO1XidAV2u8iNw1DGGWh8/8Uw6biEbQjsC1SBfeQeCYKt191mq2QJcdNMrkC4aqFoMknpaRI6XdSfV0oTuBcR/Ymfrxk5GCfTh4iOtBdLGhdtHqQBbbGb4K30VmZ6EU=; Message-ID: <532949.28323.qm@web63907.mail.re1.yahoo.com> X-YMail-OSG: gAm5F18VM1na0rTIhXdcTdBH7xEGGb2s1XZa66t8gdJTrriaMCwk8i7JUwTfG8KxLQmWmBtztFV9tXSW9WuCoiVjbprZ4q2Yo_Z1Fd2.evF_Wpynbu1HNSxxkYU2Efh._cSVZ.x6xMob4n7hPFX7xyNK5nA78TpJZCEKYxrh5ybk73fIY.WMgjLOjS8tLmCAoZLdfyUyT3UJtAhPs6t5Ki5jx4555wPKONyq3gsKpMjUj9GVF3Uzow0_hCMhc7931L4IEhXZMuGgoCBOR.NE9UZ5TaUnMfDBjVjsNXYAfQYw98Igq07KUqfSKhYrwBW5By1kHJ3QIF8uV3DW Received: from [98.242.222.229] by web63907.mail.re1.yahoo.com via HTTP; Tue, 07 Apr 2009 16:00:31 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Tue, 7 Apr 2009 16:00:31 -0700 (PDT) From: Barney Cordoba To: Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 23:00:33 -0000 --- On Tue, 4/7/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Barney Cordoba" > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Tuesday, April 7, 2009, 6:52 PM > On Tue, 7 Apr 2009, Barney Cordoba wrote: > > >>> When I enabled LOCK_PROFILING my side modules, > such as > >> if_ibg, stopped working. It seems that the ifnet > structure or something changed with that option enabled. Is > there a way to sync this without having to integrate > everything into a specific kernel build? > >> > >> LOCK_PROFILING changes the size of lock-related > data structures, so requires both kernel and full set of > modules to be rebuilt with the option. > > > > It might be good to mention this in the man page. Most > 3rd party drivers build stand-alone, and even if you pull > down the latest drivers from intel or broadcom they're > usually built out of the kernel build. Its pretty > frustrating to have random things failing, mbuf leaks, etc > without any warning. > > From the man page: > > NOTES > The LOCK_PROFILING option increases the size of struct > lock_object, so a > kernel built with that option will not work with > modules built without > it. Nice work. Its not in the 7.0 man page, unfortunately for me :( BC From owner-freebsd-net@FreeBSD.ORG Tue Apr 7 23:01:51 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 29D131065672 for ; Tue, 7 Apr 2009 23:01:51 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63908.mail.re1.yahoo.com (web63908.mail.re1.yahoo.com [69.147.97.123]) by mx1.freebsd.org (Postfix) with SMTP id CAE7C8FC1F for ; Tue, 7 Apr 2009 23:01:50 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 65348 invoked by uid 60001); 7 Apr 2009 23:01:50 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239145310; bh=TNCarjbeDz92XOg0Nbmnx6LOTPn3fJ52QCo8dUdaz4w=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=Po/Zp4CUF5juQI8QykxyR9OjTT9Ka8FKm5nNiBUXfbU7y2QClAqUpT/SdM9ehGEutCiH+oX/7Np8n+pOVXRHaOpyNJ7PlMAAKCBNVXXkvI58Gwy4tbr8CJvL85e1D6mVujVZBWbLNZfaWsaG9P3VsvtzHTtxpn6aj+A0iFnsLNs= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=3N26WW+pmOnaAqQY+k4br6aE9NWIgQwt1j84xAUdCBRhnSYkop2+XjykqvodxYV1gR64xg/FVONU26r3peGupH3guMkn1UmhG+xO9jAvx9BAT+UkKOjVWX+n8zr0252A182n+8SU1B+rrQZRByUMMXI2cPAEziefAltnmzJz7RI=; Message-ID: <362116.58661.qm@web63908.mail.re1.yahoo.com> X-YMail-OSG: F2tI4QYVM1kRFnhVrO7qDwGFdOk7D5rkg8Ta7EykM0Z64Svz0.QTFM5UwSWfOObe7iukHkFLKisdsxpFkHyuMG63dKsRapz5p7LLoecphkyM.i9SxY0qFr9nbiRYeHAhe3RS8iBbzyc_.RonzfOFWiyzO4FVsePY1HiTy7x.Lx14fzpMTPRMAbNYxznVde7ZUtJYNikZ0udym0h7ceto_.atWWhsTTnTTqm0yFI82IvPIfoY3DvHGR2e3PLS4b53H_T8h8kvmUkJo.fkcaQRM8Ltlrk_lVilPKtuMdYeAsA6DH66tIGv6DvqcPnz Received: from [98.242.222.229] by web63908.mail.re1.yahoo.com via HTTP; Tue, 07 Apr 2009 16:01:50 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Tue, 7 Apr 2009 16:01:50 -0700 (PDT) From: Barney Cordoba To: Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Apr 2009 23:01:51 -0000 --- On Tue, 4/7/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Barney Cordoba" > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Tuesday, April 7, 2009, 6:52 PM > On Tue, 7 Apr 2009, Barney Cordoba wrote: > > >>> When I enabled LOCK_PROFILING my side modules, > such as > >> if_ibg, stopped working. It seems that the ifnet > structure or something changed with that option enabled. Is > there a way to sync this without having to integrate > everything into a specific kernel build? > >> > >> LOCK_PROFILING changes the size of lock-related > data structures, so requires both kernel and full set of > modules to be rebuilt with the option. > > > > It might be good to mention this in the man page. Most > 3rd party drivers build stand-alone, and even if you pull > down the latest drivers from intel or broadcom they're > usually built out of the kernel build. Its pretty > frustrating to have random things failing, mbuf leaks, etc > without any warning. > > From the man page: > > NOTES > The LOCK_PROFILING option increases the size of struct > lock_object, so a > kernel built with that option will not work with > modules built without > it Nevermind. Obviously I just plain missed it. BC From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 06:25:25 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F106106566B for ; Wed, 8 Apr 2009 06:25:25 +0000 (UTC) (envelope-from wahjava@gmail.com) Received: from ti-out-0910.google.com (ti-out-0910.google.com [209.85.142.186]) by mx1.freebsd.org (Postfix) with ESMTP id 654678FC1D for ; Wed, 8 Apr 2009 06:25:24 +0000 (UTC) (envelope-from wahjava@gmail.com) Received: by ti-out-0910.google.com with SMTP id u5so3245576tia.3 for ; Tue, 07 Apr 2009 23:25:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:sender:received:date:from:to :cc:subject:message-id:references:mime-version:content-type :content-disposition:in-reply-to:x-face:x-attribution:x-os-kernel :x-os-version:x-os-architecture:x-uptime:x-url:x-mail-morse :x-openpgp-fingerprint:x-openpgp-id:organization:user-agent; bh=i0us+SNj+tluPzal/igKVuHLHXkXZGwwSWEQWBHs9pc=; b=i6KkXBiOasRh/Ft8UAVzu68bjqY0zaXyVYmAB1BUsLbtL2gRn8A1AghYi0vTWbzzVo OHZ/PXYZgzFBqYYMUiVpJ/TyEq3RkaYQVPOjSbbtVm8kCkZ042a73nOl226Wdd3nnYDm M3XWScRK9p80EKboe8sYyjqfrrqF3aWEdgm80= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:x-face:x-attribution :x-os-kernel:x-os-version:x-os-architecture:x-uptime:x-url :x-mail-morse:x-openpgp-fingerprint:x-openpgp-id:organization :user-agent; b=bAVvaL1P0vhlUApk+8zaeFOuIF7x9+dDtY5bZRl4hRBMha66aJuXU3sP6rWH4bpxzW yLHre81Csipx/b0VJ1S9tC9639Mex7KFvlfGMkuRWRuR07F1vWHqeTuSG2Sza/SumGE3 /afP6VBsOU56rDlYDFtEJE40lAZHojgB/UAoQ= Received: by 10.110.5.14 with SMTP id 14mr1279863tie.40.1239171922891; Tue, 07 Apr 2009 23:25:22 -0700 (PDT) Received: from chateau.d.lf ([122.162.186.216]) by mx.google.com with ESMTPS id 25sm51766tif.12.2009.04.07.23.25.19 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 07 Apr 2009 23:25:21 -0700 (PDT) Sender: Ashish SHUKLA Received: by chateau.d.lf (Postfix, from userid 1001) id CDE681E0F7; Wed, 8 Apr 2009 11:55:58 +0530 (IST) Date: Wed, 8 Apr 2009 11:55:58 +0530 From: Ashish SHUKLA To: Hajimu UMEMOTO Message-ID: <20090408062558.GA10933@chateau.d.lf> References: <87y6ud5p62.fsf@chateau.d.lf> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="SUOF0GtieIMvvwua" Content-Disposition: inline In-Reply-To: X-Face: )vGQ9yK7Y$Flebu1C>(B\gYBm)[$zfKM+p&TT[[JWl6:]S>cc$%-z7-`46Zf0B*syL.C ]oCq[upTG~zuS0.$"_%)|Q@$hA=9{3l{%u^h3jJ^Zl;t7 X-Attribution: =?unknown-8bit?B?4KSG4KS24KWA4KS3?= X-OS-Kernel: FreeBSD X-OS-Version: 8.0-CURRENT X-OS-Architecture: amd64 X-Uptime: 11:53AM up 23 mins, 7 users, load averages: 1.49, 1.30, 0.86 X-URL: http://wahjava.wordpress.com/ X-Mail-Morse: .-- .- .... .--- .- ...- .- .--.-. --. -- .- .. .-.. .-.-.- -.-. --- -- X-OpenPGP-Fingerprint: 1E00 4679 77E4 F8EE 2E4B 56F2 1F2F 8410 762E 5E74 X-OpenPGP-ID: 762E5E74 Organization: /\/0/\/3 User-Agent: Mutt/1.5.19 (2009-01-05) Cc: freebsd-net@freebsd.org Subject: Re: getaddrinfo() unable to resolve IPv6 addresses X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 06:25:25 -0000 --SUOF0GtieIMvvwua Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In , Hajimu UMEMOTO wrote: [...] > >No, I believe it was already fixed. Please, re-cvsup and try it. I re-cvsup'ed it and it worked, thanks for the reply. --=20 Ashish SHUKLA --SUOF0GtieIMvvwua Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (FreeBSD) iEYEARECAAYFAkncQ3UACgkQHy+EEHYuXnQctgCfQhVF7tiEQZJkACm+oxwo2kf+ BFUAn0C3UoXophnUhpqDQlQxFX04DU+M =qxza -----END PGP SIGNATURE----- --SUOF0GtieIMvvwua-- From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 11:48:24 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78B72106566C for ; Wed, 8 Apr 2009 11:48:24 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63902.mail.re1.yahoo.com (web63902.mail.re1.yahoo.com [69.147.97.117]) by mx1.freebsd.org (Postfix) with SMTP id 23A2A8FC1B for ; Wed, 8 Apr 2009 11:48:23 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 92696 invoked by uid 60001); 8 Apr 2009 11:48:23 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239191303; bh=kdnpCcgWtzdpmadFoMyiWTti7itNG73pQlPd/szdRzg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=DLsWxNIcUFM8ey1oQQLL8bse7E350lZ/0xyA2WZKAkM1XlJO9uFGeWemnWwrTMqwTna8dz6Vk94bv2BqGR6FtK1x1t2n0c2agBmhCTGsj8/95LR3q0OfqZktx0qRZcbBaH9JZDaxNR1TInKpLsDXwGzMcCQ11AGdUCjImKxcLmk= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=MwA6zm8SrK8bKsSzz/isZzPqqX2mHCNaSGcrybdN0RhLXoByRuFEZlHp/LUBaGVaTfItCrvDKYcAx4Z4ZomlzvxZMFPTXxzm41aIp5QAi475k66L8dBOPdRIj+nNRV4My1IVVmXVwO2+5zHw38rhcE/1hXMmpQBnXTOenxUSSTk=; Message-ID: <477001.91824.qm@web63902.mail.re1.yahoo.com> X-YMail-OSG: o_IEk9YVM1kTpayy3hf6OZf.0yGuAbFpXxpD1EhQlzTxpyJuFQIdDofcQW9VQjupgQQB46dwI9g7z8HOB0rDs5KNx3aWDcwqOozsxCfLAw8NiB2cTcA7x5.VSNfVQ7ksdVzKvZ.qGunr.YW7v0qFccFdfHIHgGg9gR5aF57QM.YyIbMQSPRGFD7DBcjzqthixMlTHU3oMS2tyzi8dr91paH1Hql9xQS1MimOZPeKcTdrTwyi64VvUESTsr8aWeQqQ3X9lbXsy55Yk0rSWqbZPycJfZTo7TCSOHx0HCxZnLMkOu79ZrkLlbFdR.yr Received: from [98.242.222.229] by web63902.mail.re1.yahoo.com via HTTP; Wed, 08 Apr 2009 04:48:23 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Wed, 8 Apr 2009 04:48:23 -0700 (PDT) From: Barney Cordoba To: "H.Fazaeli" In-Reply-To: <49DC3961.8090707@sepehrs.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 11:48:24 -0000 --- On Wed, 4/8/09, H.Fazaeli wrote: > From: H.Fazaeli > Subject: Re: Advice on a multithreaded netisr patch? > To: barney_cordoba@yahoo.com > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Wednesday, April 8, 2009, 1:42 AM > Barney Cordoba wrote: > > > > > > --- On Tue, 4/7/09, Ivan Voras > wrote: > > > > > >> From: Ivan Voras > >> Subject: Re: Advice on a multithreaded netisr > patch? > >> To: freebsd-net@freebsd.org > >> Date: Tuesday, April 7, 2009, 5:59 PM > >> Barney Cordoba wrote: > >> > >> > >>> 1) Multiple TX queues are not supported. > There's > >>> > >> some hokey code to > >> > >>> test, but it doesn't properly separate > flows to > >>> > >> the queues. > >> > >>> 2) 2 Rx queues don't work, so only 1 and 4 > work > >>> 3) With 4 queues, it just sucks up CPU under > heavy > >>> > >> load on 4 cpus. It will > >> > >>> blow 4 cpus at a lower load than em will with > 1 > >>> 4) You'll need to fix DMA setup, as it > sets the > >>> > >> alignment requirement > >> > >>> to PAGE_SIZE. I haven't been able to > convince Jack > >>> > >> that its wrong, not > >> > >>> that I've tried very hard since its easy > to just > >>> > >> fix myself. > >> > >> Reading this thread it looks like the development > of both > >> Intel drivers > >> is a bit stalled, doesn't it? AFAIK the em > driver is > >> also > >> semi-officially abandoned, and both from my > experience and > >> others it > >> looks like new development and patches are being > rejected. > >> Time to shop > >> other hardware? > >> > > > > To be fair, the OS doesn't really support > multiqueue yet, or has > > for only a few hours, so lets not go crazy. > > > > It makes a lot more sense to have someone on the > "team" work with > > Jack on improving the performance and working out the > kinks. When > > I asked Jack about the poor performance of if_igb, he > indicated that > > Intel's position is that the drivers are > "just samples", which really > > doesn't give anyone much confidence that they want > to run their business > > on them. You already have Jack doing all of the hard > work; that is > > supporting the new-chip-per-week that intel puts out, > so it seems to > > me the best strategy would be to try to convince Intel > that its in > > their best interest to have drivers that work well so > people don't > > think that their hardware stinks. > > > > As an example, the Chelsio 10gb bypass card is $3495. > and an Intel > > card is ~$1000, so its a big win for the community as > a whole to have > > good intel drivers going forward. > > > > My work is commercially proprietary so I can't > share my code, but > > I can certainly share ideas on things that I've > tested and discovered. > > > > > can you provide more details on the improvements you > achieved? > > > Barney > > > > > > > > _______________________________________________ > > freebsd-net@freebsd.org mailing list > > http://lists.freebsd.org/mailman/listinfo/freebsd-net > > To unsubscribe, send any mail to > "freebsd-net-unsubscribe@freebsd.org" > > > > > > -- As all developers konw, programming is 90% learning and 10% code. So far, I've implemented multiqueue for 7.x and gotten everything to work for both igb and ixgbe. igb isn't all that interesting since em can easily handle 1 Gb/s; so ixgbe is really the goal. The igb and ixgbe are similar designs so the work is somewhat parallel. As of now, I'm working on separating the theory from the real world and getting a feel for which design techniques work best. I'm also *not* designing for a system that uses the stack (a filtering firewall type system), so the things that Robert talks about apply differently. A web server, for example, will likely only have 1 controller and will have many user threads; while a router or firewall will have 2 equally loaded NICs with few if any user threads. Its quite likely that completely different approaches are needed to optimize each. I'm at the point of testing design approaches. So the jury is out as what what can be achieved. What I can say is that multiqueue isn't a panacea or even desirable if its not designed correctly. Out of the box, increasing the number of queues just to "spread interrupts" doesn't seem to have any advantage; in fact it seems to make things worse in terms of utilization. I'm not entirely sure why as of yet. Barney From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 13:05:10 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 133F0106574E for ; Wed, 8 Apr 2009 13:05:10 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63906.mail.re1.yahoo.com (web63906.mail.re1.yahoo.com [69.147.97.121]) by mx1.freebsd.org (Postfix) with SMTP id C3DFE8FC26 for ; Wed, 8 Apr 2009 13:05:09 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 36277 invoked by uid 60001); 8 Apr 2009 13:05:09 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239195908; bh=YW2tnSmfnygNnRQuQwrrkN1nB3LAe25kxMHcXnYqLwc=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=mwRcjdSNzX2t6tZ0kuGlVSJecCzCc/jYyYSVQHR6IpUes0I90Ughtsf+fI2I60qZ97s/MUqMFBtP3cSW7Q/tOsWzi79YifUnEGe30ltB63BX12lAZwTmh+4WZ6X0vEzGtH64V4nsRios4HfkOuXdu7Scx352mK707mhw+bBWNdw= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=1QdE1+CIrbkqa+DN1kIZiyigAAKGWT7Mc8Wi395VMWLLC/Mv+hx4YuUry67M/lYqNLNUpy5Wnjn0Pvcb25mJdXvAofRDv7AK76jQjVtExJt5eMAiYSRK6dVyWu76zR76GrBJUw9DWJX7HTSc/fe0w7odWeeIr70K7DuaYiz3LTk=; Message-ID: <871699.35154.qm@web63906.mail.re1.yahoo.com> X-YMail-OSG: t3W3E6AVM1lR2mfu51oHbNoGyO4mKfzC02dDp96njH5sc6Pr15G275IBpIoPFDtN_WILwo9orGJg7lYKsrzlY_MjrmjZhIzdilc9Li3sutmd2cGt3ExkRAAsG0xEXRc1yi0e3hkmu0EC_WOh8PsrVx8Li.ahudCpkJbPQzOaY6PaVdkQMHwSMEy4h6q7F5NpIuIFLDgrP9jNBrwOIbbZDVHqv_54pWHbolJ15nKOderXTTt5bktGAtF6TLRcrsuxH8vv6w7WkLCMPtxrthhr.UXaBF9RCiTTomEHZeOG.vsrHp7JhAH_rxRioQk8 Received: from [98.242.222.229] by web63906.mail.re1.yahoo.com via HTTP; Wed, 08 Apr 2009 06:05:08 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Wed, 8 Apr 2009 06:05:08 -0700 (PDT) From: Barney Cordoba To: Ivan Voras , Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 13:05:10 -0000 --- On Mon, 4/6/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Ivan Voras" > Cc: freebsd-net@freebsd.org > Date: Monday, April 6, 2009, 2:52 PM > On Mon, 6 Apr 2009, Ivan Voras wrote: > > >> I think we're talking slightly at cross > purposes. There are two > >> transfers of interest: > >> > >> (1) DMA of the packet data to main memory from the > NIC > >> (2) Servicing of CPU cache misses to access data > in main memory > >> > >> By the time you receive an interrupt, the DMA is > complete, so once you > > > > OK, this was what was confusing me - for a moment I > thought you meant it's not so. > > It's a polite lie that we will choose to believe the > purposes of simplification. And probably true for all our > drivers in practice right now. > > >> m = m_pullup(m, sizeof(*w)); > >> if (m == NULL) > >> return; > >> w = mtod(m, struct whatever *); > >> > >> m_pullup() here ensures that the first sizeof(*w) > bytes of mbuf data are contiguously stored so that the cast > of w to m's data will point at a > > > > So, m_pullup() can resize / realloc() the mbuf? (not > that it matters for this purpose) > > Yes -- if it can't meet the contiguity requirements > using the current mbuf chain, it may reallocate and return a > new head to the chain (hence m being reassigned). If that > reallocation fails, it may return NULL. Once you've > called m_pullup(), existing pointers into the chain's > data will be invalid, so if you've already called mtod() > on it, you need to call it again. > > >> - A TCP segment will need to be ACK'd, so if > you're sending data in > >> chunks in > >> one direction, the ACKs will not be piggy-backed > on existing data > >> tranfers, > >> and instead be sent independently, hitting the > network stack two more > >> times. > > > > No combination of these can make an accounting > difference between 1,000 and 250,000 pps. I must be hitting > something very bad here. > > Yes, you definitely want to run tcpdump to see what's > going on here. > > >> - Remember that TCP works to expand its window, > and then maintains the > >> highest > >> performance it can by bumping up against the top > of available bandwidth > >> continuously. This involves detecting buffer > limits by generating > >> packets > >> that can't be sent, adding to the packet > count. With loopback > >> traffic, the > >> drop point occurs when you exceed the size of > the netisr's queue for > >> IP, so > >> you might try bumping that from the default to > something much larger. Robert, Is there any work being done on lighter weight locks for queues? It seems ridiculous to avoid using queues because of lock contention when the locks are only protecting a couple lines of code. Barney From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 13:16:54 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 60A25106566C; Wed, 8 Apr 2009 13:16:54 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 3D3A98FC17; Wed, 8 Apr 2009 13:16:54 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id D4DB646B86; Wed, 8 Apr 2009 09:16:53 -0400 (EDT) Date: Wed, 8 Apr 2009 14:16:53 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Barney Cordoba In-Reply-To: <871699.35154.qm@web63906.mail.re1.yahoo.com> Message-ID: References: <871699.35154.qm@web63906.mail.re1.yahoo.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 13:16:54 -0000 On Wed, 8 Apr 2009, Barney Cordoba wrote: > Is there any work being done on lighter weight locks for queues? It seems > ridiculous to avoid using queues because of lock contention when the locks > are only protecting a couple lines of code. My reading is that there are two, closely related, things going on: the first is lock contention, and the second is cache line contention. We have a primitive in 8.x (don't think it's been MFC'd yet) for a lockless atomic buffer primitive for use in drivers and other parts of the stack. However, that addresses only lock contention, not line contention, which at a high PPS will be an issue as well. Only by moving to independent data structures (i.e., on independent cache lines) can we reduce line contention. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 13:18:47 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EB16B1065670 for ; Wed, 8 Apr 2009 13:18:47 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63905.mail.re1.yahoo.com (web63905.mail.re1.yahoo.com [69.147.97.120]) by mx1.freebsd.org (Postfix) with SMTP id 947DD8FC23 for ; Wed, 8 Apr 2009 13:18:47 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 80998 invoked by uid 60001); 8 Apr 2009 13:18:47 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239196727; bh=/fIMzpZLP2XmJMMmaEej6tbfxky/LBhSR1MX9uLDFew=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=XBE5Fn+1IAnjZlyEMnTRHa7le/OdxBr66m1wLDGDgqztv3uDV3emVPBhNY0fjqg9SN1BerT1CJ+BtvjnqbFaQ+qWFwbVd1oDxIvuFKywwoYEazXGclpuZvMye3F5S5DQoZ1Wd65xoLGUjx3e75xJ1pmE5bm8kxRDxxaDQjAdjmo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=F5p9Y7DLyVq9iL4RBGD/l0s/Q+MQ9cQ/kzUZJzg2qWtgZNZtf1ve7u/XCpW9wJsw2LAoZTl+22CA938EO+Ep4hgKQn2tEygv1JsYAVi5/i3j4VihXYu3W2Sc6mROWXLNIQRH+YoEhFhi5sEb3T9rFTm5AWMUYvGXhvwRkddM8SY=; Message-ID: <75700.80930.qm@web63905.mail.re1.yahoo.com> X-YMail-OSG: Fd3qFk8VM1lxJrsNffrHuuzNOaXrjP8zb5PtUBPMKoQ1BISFadJK1rf9bhOd4ONKd9rjt1MNZX8SfGii8QL1QulA0ZuBZ0by9cFkEeFJUd9G9Lgxj7sTQbjJB8bKlu.pQxBBHMQDI7TeBROWL8qFg8Qlla4KNPTL9fDvLhwqA855OWQCe6JjD5bqiI4xeBNYALNcMr4DXL9h4GmtB5ntSJ.lEvqWe4E.I0CwaZfPQHn3.e2YLCpU1ASkrXbRuZgxbaaz5KQaYgKYc.hXAu6rSBGh2f8e8omHJqkJr.nEt_eOEwTzI3i8R0BfpfhP Received: from [98.242.222.229] by web63905.mail.re1.yahoo.com via HTTP; Wed, 08 Apr 2009 06:18:46 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Wed, 8 Apr 2009 06:18:46 -0700 (PDT) From: Barney Cordoba To: Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 13:18:48 -0000 --- On Tue, 4/7/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Barney Cordoba" > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Tuesday, April 7, 2009, 8:56 AM > On Tue, 7 Apr 2009, Barney Cordoba wrote: > > >> Have you tried LOCK_PROFILING? It would quickly > tell you if driver locks were a source of significant > contention. It works quite well... > > > > When I enabled LOCK_PROFILING my side modules, such as > if_ibg, stopped working. It seems that the ifnet structure > or something changed with that option enabled. Is there a > way to sync this without having to integrate everything into > a specific kernel build? > > LOCK_PROFILING changes the size of lock-related data > structures, so requires both kernel and full set of modules > to be rebuilt with the option. What are the units for lock profiling? For example, the "average wait" is in what units? Is there a way to reset the stats counters? If not, it might be nifty if toggling prof.enable reset the stats to run some different kinds of tests without rebooting. Barney From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 13:54:19 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 35BCF10656FF for ; Wed, 8 Apr 2009 13:54:18 +0000 (UTC) (envelope-from spawk@acm.poly.edu) Received: from acm.poly.edu (acm.poly.edu [128.238.9.200]) by mx1.freebsd.org (Postfix) with ESMTP id C22788FC7E for ; Wed, 8 Apr 2009 13:54:08 +0000 (UTC) (envelope-from spawk@acm.poly.edu) Received: (qmail 42326 invoked from network); 8 Apr 2009 13:54:08 -0000 Received: from unknown (HELO ?10.0.0.135?) (spawk@128.238.64.31) by acm.poly.edu with AES256-SHA encrypted SMTP; 8 Apr 2009 13:54:08 -0000 Message-ID: <49DCAC1F.9000708@acm.poly.edu> Date: Wed, 08 Apr 2009 09:52:31 -0400 From: Boris Kochergin User-Agent: Thunderbird 2.0.0.19 (X11/20090108) MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Multi-BSS problem with Atheros 5212 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 13:54:39 -0000 Ahoy. I'm having trouble with multiple hostap-mode wlan pseudo-devices. The machine is an 8-CURRENT from yesterday: # uname -a FreeBSD test 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Tue Apr 7 16:54:56 UTC 2009 root@test:/usr/obj/usr/src/sys/GENERIC i386 # dmesg | grep ath ath0: mem 0xf4100000-0xf410ffff irq 11 at device 13.0 on pci0 ath0: [ITHREAD] ath0: AR2413 mac 7.9 RF2413 phy 4.5 # cat /etc/rc.conf wlans_ath0="wlan0 wlan1 wlan2" create_args_wlan0="wlanmode hostap bssid" create_args_wlan1="wlanmode hostap bssid" create_args_wlan2="wlanmode hostap bssid" ifconfig_wlan0="ssid wlan0 wepmode off up" ifconfig_wlan1="ssid wlan1 wepmode off up" ifconfig_wlan2="ssid wlan2 wepmode off up" # ifconfig ath0: flags=8843 metric 0 mtu 2290 ether 00:18:e7:33:5e:24 media: IEEE 802.11 Wireless Ethernet autoselect mode 11g status: running fxp0: flags=8843 metric 0 mtu 1500 options=8 ether 00:90:27:72:c4:f3 inet 10.0.0.128 netmask 0xffffff00 broadcast 10.0.0.255 media: Ethernet autoselect (100baseTX ) status: active lo0: flags=8049 metric 0 mtu 16384 options=3 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet6 ::1 prefixlen 128 inet 127.0.0.1 netmask 0xff000000 wlan0: flags=8843 metric 0 mtu 1500 ether 00:18:e7:33:5e:24 media: IEEE 802.11 Wireless Ethernet autoselect mode 11g status: running ssid wlan0 channel 11 (2462 Mhz 11g) bssid 00:18:e7:33:5e:24 country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 protmode CTS wme burst dtimperiod 1 -dfs wlan1: flags=8843 metric 0 mtu 1500 ether 06:18:e7:33:5e:24 media: IEEE 802.11 Wireless Ethernet autoselect mode 11g status: running ssid wlan1 channel 11 (2462 Mhz 11g) bssid 06:18:e7:33:5e:24 country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 protmode CTS wme burst dtimperiod 1 -dfs wlan2: flags=8843 metric 0 mtu 1500 ether 0a:18:e7:33:5e:24 media: IEEE 802.11 Wireless Ethernet autoselect mode 11g status: running ssid wlan2 channel 11 (2462 Mhz 11g) bssid 0a:18:e7:33:5e:24 country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 protmode CTS wme burst dtimperiod 1 -dfs The client is a 7.0 machine with another 5212 card: # uname -a FreeBSD peer 7.0-RELEASE-p10 FreeBSD 7.0-RELEASE-p10 #0: Mon Mar 23 09:26:18 EDT 2009 root@peer:/usr/obj/usr/src/sys/PEER i386 # dmesg | grep ath ath_hal: 0.10.5.6 (AR5210, AR5211, AR5212, AR5416, RF5111, RF5112, RF2413, RF5413, RF2133, RF2425, RF2417) ath0: mem 0xa8410000-0xa841ffff irq 11 at device 0.0 on cardbus0 ath0: [ITHREAD] ath0: using obsoleted if_watchdog interface ath0: Ethernet address: 00:14:d1:42:21:5a ath0: mac 7.9 phy 4.5 radio 5.6 The three SSIDs configured on the CURRENT machine show up in a scan: # ifconfig ath0 scan | grep wlan wlan0 00:18:e7:33:5e:24 11 54M -66:-93 100 ES WME wlan1 06:18:e7:33:5e:24 11 54M -65:-93 100 ES WME wlan2 0a:18:e7:33:5e:24 11 54M -65:-93 100 ES WME The client is only able to associate with wlan1, however. When scanning channels while attempting to associate with any of the other ones, it gets stuck on channel 11 for a while before moving on, which seems relevant. Also interesting is the fact that if i do "ifconfig ath0 down" on the CURRENT machine, followed by, for example, "ifconfig ath0 ssid wlan0" (which did not associate before) on the client, followed by "ifconfig ath0 up" on the CURRENT machine, the client will associate with wlan0, but will not be able to associate with wlan1 or wlan2. Any ideas? -Boris From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 15:25:40 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 31D501065679 for ; Wed, 8 Apr 2009 15:25:40 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id C309A8FC08 for ; Wed, 8 Apr 2009 15:25:39 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from trouble.errno.com (trouble.errno.com [10.0.0.248]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id n38FPWwJ051161 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 Apr 2009 08:25:35 -0700 (PDT) (envelope-from sam@freebsd.org) Message-ID: <49DCC1EB.3040706@freebsd.org> Date: Wed, 08 Apr 2009 08:25:31 -0700 From: Sam Leffler Organization: FreeBSD Project User-Agent: Thunderbird 2.0.0.18 (X11/20081209) MIME-Version: 1.0 To: Boris Kochergin References: <49DCAC1F.9000708@acm.poly.edu> In-Reply-To: <49DCAC1F.9000708@acm.poly.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC-CTc-dcc2-Metrics: ebb.errno.com; whitelist Cc: freebsd-net@freebsd.org Subject: Re: Multi-BSS problem with Atheros 5212 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 15:25:40 -0000 Boris Kochergin wrote: > Ahoy. I'm having trouble with multiple hostap-mode wlan > pseudo-devices. The machine is an 8-CURRENT from yesterday: > > # uname -a > FreeBSD test 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Tue Apr 7 16:54:56 > UTC 2009 root@test:/usr/obj/usr/src/sys/GENERIC i386 > > # dmesg | grep ath > ath0: mem 0xf4100000-0xf410ffff irq 11 at device 13.0 > on pci0 > ath0: [ITHREAD] > ath0: AR2413 mac 7.9 RF2413 phy 4.5 > > # cat /etc/rc.conf > wlans_ath0="wlan0 wlan1 wlan2" > create_args_wlan0="wlanmode hostap bssid" > create_args_wlan1="wlanmode hostap bssid" > create_args_wlan2="wlanmode hostap bssid" > ifconfig_wlan0="ssid wlan0 wepmode off up" > ifconfig_wlan1="ssid wlan1 wepmode off up" > ifconfig_wlan2="ssid wlan2 wepmode off up" > > # ifconfig > ath0: flags=8843 metric 0 mtu > 2290 > ether 00:18:e7:33:5e:24 > media: IEEE 802.11 Wireless Ethernet autoselect mode 11g > status: running > fxp0: flags=8843 metric 0 mtu > 1500 > options=8 > ether 00:90:27:72:c4:f3 > inet 10.0.0.128 netmask 0xffffff00 broadcast 10.0.0.255 > media: Ethernet autoselect (100baseTX ) > status: active > lo0: flags=8049 metric 0 mtu 16384 > options=3 > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 > inet6 ::1 prefixlen 128 > inet 127.0.0.1 netmask 0xff000000 > wlan0: flags=8843 metric 0 mtu > 1500 > ether 00:18:e7:33:5e:24 > media: IEEE 802.11 Wireless Ethernet autoselect mode 11g > status: running > ssid wlan0 channel 11 (2462 Mhz 11g) bssid 00:18:e7:33:5e:24 > country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 > protmode CTS wme burst dtimperiod 1 -dfs > wlan1: flags=8843 metric 0 mtu > 1500 > ether 06:18:e7:33:5e:24 > media: IEEE 802.11 Wireless Ethernet autoselect mode 11g > status: running > ssid wlan1 channel 11 (2462 Mhz 11g) bssid 06:18:e7:33:5e:24 > country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 > protmode CTS wme burst dtimperiod 1 -dfs > wlan2: flags=8843 metric 0 mtu > 1500 > ether 0a:18:e7:33:5e:24 > media: IEEE 802.11 Wireless Ethernet autoselect mode 11g > status: running > ssid wlan2 channel 11 (2462 Mhz 11g) bssid 0a:18:e7:33:5e:24 > country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 > protmode CTS wme burst dtimperiod 1 -dfs > > The client is a 7.0 machine with another 5212 card: > > # uname -a > FreeBSD peer 7.0-RELEASE-p10 FreeBSD 7.0-RELEASE-p10 #0: Mon Mar 23 > 09:26:18 EDT 2009 root@peer:/usr/obj/usr/src/sys/PEER i386 > > # dmesg | grep ath > ath_hal: 0.10.5.6 (AR5210, AR5211, AR5212, AR5416, RF5111, RF5112, > RF2413, RF5413, RF2133, RF2425, RF2417) > ath0: mem 0xa8410000-0xa841ffff irq 11 at device 0.0 on > cardbus0 > ath0: [ITHREAD] > ath0: using obsoleted if_watchdog interface > ath0: Ethernet address: 00:14:d1:42:21:5a > ath0: mac 7.9 phy 4.5 radio 5.6 > > The three SSIDs configured on the CURRENT machine show up in a scan: > > # ifconfig ath0 scan | grep wlan > wlan0 00:18:e7:33:5e:24 11 54M -66:-93 100 ES WME > wlan1 06:18:e7:33:5e:24 11 54M -65:-93 100 ES WME > wlan2 0a:18:e7:33:5e:24 11 54M -65:-93 100 ES WME > > The client is only able to associate with wlan1, however. When > scanning channels while attempting to associate with any of the other > ones, it gets stuck on channel 11 for a while before moving on, which > seems relevant. Also interesting is the fact that if i do "ifconfig > ath0 down" on the CURRENT machine, followed by, for example, "ifconfig > ath0 ssid wlan0" (which did not associate before) on the client, > followed by "ifconfig ath0 up" on the CURRENT machine, the client will > associate with wlan0, but will not be able to associate with wlan1 or > wlan2. Any ideas? wlandebug scan+auth+assoc on the client machine will show you why you cannot associate. You can also enable the same info on the ap side to see what it thinks is happening. Sam From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 16:16:20 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9DBC01065670 for ; Wed, 8 Apr 2009 16:16:20 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63905.mail.re1.yahoo.com (web63905.mail.re1.yahoo.com [69.147.97.120]) by mx1.freebsd.org (Postfix) with SMTP id 428148FC1C for ; Wed, 8 Apr 2009 16:16:19 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 64204 invoked by uid 60001); 8 Apr 2009 16:16:19 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239207379; bh=9E32yJ5njQYjCMpsgWMkexb+eHzylQv07KatN5bnr98=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=0HBSMPSYM53a6KNOLhC38xWUoB5I+qINbeUNI7wjjXHU1kaf4DjPnGv2eYFTLs79qxCKYT/pOUeF0ErHdX0xMrB+Fum7O3siKiWR0cWJRLTshOB1OnPEMk2wxjvhoLyOcQheb9q6UbBrkim3xazEXq+mNo6SyEE/mwNURw6JqQo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=wzoMJImAf2/fi+lt18m/UbB2oagPFktdnAt3b2DPH5YzegwD0xgGKnTrJe3XrZ2inz+pYPO5LQfJXCuC7QIv9TBxM0z5tK6YzXYfCJQISYPiwkEG46jydzjPd5hboNzvFjoQiorP8+kVLaqfdU7ssQxttgLAEAj+aWXFP8y1AKk=; Message-ID: <564712.63955.qm@web63905.mail.re1.yahoo.com> X-YMail-OSG: RIcXMnoVM1nSDVUxWli1IkeRfyq1MsdNp54pepsf7KIE05nZUYTwynr8F3NoNxkd8eJELxSG69htMuB89x3mDxn8LI5H5m7RCvlUA1QrBUdzkiOkXBFnfk9gKIO8tg8pkuxtU2TImDdid6ZWCeGCVgOp8IbpW1rBgpx0mckraHcW_8C7Incv2jUeV6haNAzB1U_FQjf0KIs2I5G.JrYmDk7orFHCej8gaXD.M.6zEP.iU9GyW30g7Ooh7kf60ki7x.ghR4yjBBIrxgakgkzJU1ox0zTAY8jPEa.JaKf9zSq7swnT64QP7y7TEyUCy_n4kADVHCnqGfq2.yaOnoKPHBfI Received: from [98.242.222.229] by web63905.mail.re1.yahoo.com via HTTP; Wed, 08 Apr 2009 09:16:19 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Wed, 8 Apr 2009 09:16:19 -0700 (PDT) From: Barney Cordoba To: Robert Watson In-Reply-To: <75700.80930.qm@web63905.mail.re1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 16:16:21 -0000 --- On Wed, 4/8/09, Barney Cordoba wrote: > From: Barney Cordoba > Subject: Re: Advice on a multithreaded netisr patch? > To: "Robert Watson" > Cc: freebsd-net@freebsd.org, "Ivan Voras" > Date: Wednesday, April 8, 2009, 9:18 AM > --- On Tue, 4/7/09, Robert Watson > wrote: > > > From: Robert Watson > > Subject: Re: Advice on a multithreaded netisr patch? > > To: "Barney Cordoba" > > > Cc: freebsd-net@freebsd.org, "Ivan Voras" > > > Date: Tuesday, April 7, 2009, 8:56 AM > > On Tue, 7 Apr 2009, Barney Cordoba wrote: > > > > >> Have you tried LOCK_PROFILING? It would > quickly > > tell you if driver locks were a source of significant > > contention. It works quite well... > > > > > > When I enabled LOCK_PROFILING my side modules, > such as > > if_ibg, stopped working. It seems that the ifnet > structure > > or something changed with that option enabled. Is > there a > > way to sync this without having to integrate > everything into > > a specific kernel build? > > > > LOCK_PROFILING changes the size of lock-related data > > structures, so requires both kernel and full set of > modules > > to be rebuilt with the option. > > What are the units for lock profiling? For example, the > "average > wait" is in what units? > > Is there a way to reset the stats counters? If not, it > might be nifty if > toggling prof.enable reset the stats to run some different > kinds of > tests without rebooting. > > Barney I know, I know. Read the man page... From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 16:53:02 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 103B41065675 for ; Wed, 8 Apr 2009 16:53:02 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id BD4468FC1F for ; Wed, 8 Apr 2009 16:53:01 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from trouble.errno.com (trouble.errno.com [10.0.0.248]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id n38GqxjA051737 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 8 Apr 2009 09:53:00 -0700 (PDT) (envelope-from sam@freebsd.org) Message-ID: <49DCD66B.6040504@freebsd.org> Date: Wed, 08 Apr 2009 09:52:59 -0700 From: Sam Leffler Organization: FreeBSD Project User-Agent: Thunderbird 2.0.0.18 (X11/20081209) MIME-Version: 1.0 To: Boris Kochergin References: <49DCAC1F.9000708@acm.poly.edu> <49DCC1EB.3040706@freebsd.org> In-Reply-To: <49DCC1EB.3040706@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC-CTc-dcc2-Metrics: ebb.errno.com; whitelist Cc: freebsd-net@freebsd.org Subject: Re: Multi-BSS problem with Atheros 5212 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 16:53:02 -0000 Sam Leffler wrote: > Boris Kochergin wrote: >> Ahoy. I'm having trouble with multiple hostap-mode wlan >> pseudo-devices. The machine is an 8-CURRENT from yesterday: >> >> # uname -a >> FreeBSD test 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Tue Apr 7 16:54:56 >> UTC 2009 root@test:/usr/obj/usr/src/sys/GENERIC i386 >> >> # dmesg | grep ath >> ath0: mem 0xf4100000-0xf410ffff irq 11 at device 13.0 >> on pci0 >> ath0: [ITHREAD] >> ath0: AR2413 mac 7.9 RF2413 phy 4.5 >> >> # cat /etc/rc.conf >> wlans_ath0="wlan0 wlan1 wlan2" >> create_args_wlan0="wlanmode hostap bssid" >> create_args_wlan1="wlanmode hostap bssid" >> create_args_wlan2="wlanmode hostap bssid" >> ifconfig_wlan0="ssid wlan0 wepmode off up" >> ifconfig_wlan1="ssid wlan1 wepmode off up" >> ifconfig_wlan2="ssid wlan2 wepmode off up" >> >> # ifconfig >> ath0: flags=8843 metric 0 mtu >> 2290 >> ether 00:18:e7:33:5e:24 >> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >> status: running >> fxp0: flags=8843 metric 0 mtu >> 1500 >> options=8 >> ether 00:90:27:72:c4:f3 >> inet 10.0.0.128 netmask 0xffffff00 broadcast 10.0.0.255 >> media: Ethernet autoselect (100baseTX ) >> status: active >> lo0: flags=8049 metric 0 mtu 16384 >> options=3 >> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 >> inet6 ::1 prefixlen 128 >> inet 127.0.0.1 netmask 0xff000000 >> wlan0: flags=8843 metric 0 >> mtu 1500 >> ether 00:18:e7:33:5e:24 >> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >> status: running >> ssid wlan0 channel 11 (2462 Mhz 11g) bssid 00:18:e7:33:5e:24 >> country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 >> protmode CTS wme burst dtimperiod 1 -dfs >> wlan1: flags=8843 metric 0 >> mtu 1500 >> ether 06:18:e7:33:5e:24 >> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >> status: running >> ssid wlan1 channel 11 (2462 Mhz 11g) bssid 06:18:e7:33:5e:24 >> country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 >> protmode CTS wme burst dtimperiod 1 -dfs >> wlan2: flags=8843 metric 0 >> mtu 1500 >> ether 0a:18:e7:33:5e:24 >> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >> status: running >> ssid wlan2 channel 11 (2462 Mhz 11g) bssid 0a:18:e7:33:5e:24 >> country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 >> protmode CTS wme burst dtimperiod 1 -dfs >> >> The client is a 7.0 machine with another 5212 card: >> >> # uname -a >> FreeBSD peer 7.0-RELEASE-p10 FreeBSD 7.0-RELEASE-p10 #0: Mon Mar 23 >> 09:26:18 EDT 2009 root@peer:/usr/obj/usr/src/sys/PEER i386 >> >> # dmesg | grep ath >> ath_hal: 0.10.5.6 (AR5210, AR5211, AR5212, AR5416, RF5111, RF5112, >> RF2413, RF5413, RF2133, RF2425, RF2417) >> ath0: mem 0xa8410000-0xa841ffff irq 11 at device 0.0 >> on cardbus0 >> ath0: [ITHREAD] >> ath0: using obsoleted if_watchdog interface >> ath0: Ethernet address: 00:14:d1:42:21:5a >> ath0: mac 7.9 phy 4.5 radio 5.6 >> >> The three SSIDs configured on the CURRENT machine show up in a scan: >> >> # ifconfig ath0 scan | grep wlan >> wlan0 00:18:e7:33:5e:24 11 54M -66:-93 100 ES WME >> wlan1 06:18:e7:33:5e:24 11 54M -65:-93 100 ES WME >> wlan2 0a:18:e7:33:5e:24 11 54M -65:-93 100 ES WME >> >> The client is only able to associate with wlan1, however. When >> scanning channels while attempting to associate with any of the other >> ones, it gets stuck on channel 11 for a while before moving on, which >> seems relevant. Also interesting is the fact that if i do "ifconfig >> ath0 down" on the CURRENT machine, followed by, for example, >> "ifconfig ath0 ssid wlan0" (which did not associate before) on the >> client, followed by "ifconfig ath0 up" on the CURRENT machine, the >> client will associate with wlan0, but will not be able to associate >> with wlan1 or wlan2. Any ideas? > wlandebug scan+auth+assoc on the client machine will show you why you > cannot associate. You can also enable the same info on the ap side to > see what it thinks is happening. FWIW I just setup 3 vap's as you did above and hooked them into a bridge. I verified I could associate and pass traffic using a MBPro. No problems. I also destroyed the bridge and re-tested w/o issues. Regardless the debug msgs should identify what your problem is. Sam From owner-freebsd-net@FreeBSD.ORG Wed Apr 8 22:35:13 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0862D106566C for ; Wed, 8 Apr 2009 22:35:13 +0000 (UTC) (envelope-from fazaeli@sepehrs.com) Received: from sepehrs.com (sepehrs.com [213.217.59.98]) by mx1.freebsd.org (Postfix) with ESMTP id 267C08FC16 for ; Wed, 8 Apr 2009 22:35:11 +0000 (UTC) (envelope-from fazaeli@sepehrs.com) Received: from [192.168.1.180] ([192.168.3.1]) by mail (8.14.3/8.14.3) with ESMTP id n385DOM8037672; Wed, 8 Apr 2009 09:43:24 +0430 (IRDT) Message-ID: <49DC33DD.8000708@sepehrs.com> Date: Wed, 08 Apr 2009 08:49:25 +0330 From: "H.Fazaeli" User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) To: jfvogel@gmail.com References: <900824.65358.qm@web63901.mail.re1.yahoo.com> In-Reply-To: <900824.65358.qm@web63901.mail.re1.yahoo.com> Content-Transfer-Encoding: 7bit MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-net@freebsd.org Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2009 22:35:13 -0000 Dear Jack Can you please comment on below statements ?! Is the assertion true for all OSes (windows, linux, ...) or it is just freebsd? I am actually concerned in how much production ready is igb drivers in your opinion. As a matter of fact, We have been (and are) using em drivers for years on production systems in biggest ICPs/ISPs/organizations without problem and we have very good faith in it (I have not tested igb). Barney Cordoba wrote: --- On Tue, 4/7/09, Ivan Voras [1] wrote: From: Ivan Voras [2] Subject: Re: Advice on a multithreaded netisr patch? To: [3]freebsd-net@freebsd.org Date: Tuesday, April 7, 2009, 5:59 PM Barney Cordoba wrote: 1) Multiple TX queues are not supported. There's some hokey code to test, but it doesn't properly separate flows to the queues. 2) 2 Rx queues don't work, so only 1 and 4 work 3) With 4 queues, it just sucks up CPU under heavy load on 4 cpus. It will blow 4 cpus at a lower load than em will with 1 4) You'll need to fix DMA setup, as it sets the alignment requirement to PAGE_SIZE. I haven't been able to convince Jack that its wrong, not that I've tried very hard since its easy to just fix myself. Reading this thread it looks like the development of both Intel drivers is a bit stalled, doesn't it? AFAIK the em driver is also semi-officially abandoned, and both from my experience and others it looks like new development and patches are being rejected. Time to shop other hardware? To be fair, the OS doesn't really support multiqueue yet, or has for only a few hours, so lets not go crazy. It makes a lot more sense to have someone on the "team" work with Jack on improving the performance and working out the kinks. When I asked Jack about the poor performance of if_igb, he indicated that Intel's position is that the drivers are "just samples", which really doesn't give anyone much confidence that they want to run their business on them. You already have Jack doing all of the hard work; that is supporting the new-chip-per-week that intel puts out, so it seems to me the best strategy would be to try to convince Intel that its in their best interest to have drivers that work well so people don't think that their hardware stinks. As an example, the Chelsio 10gb bypass card is $3495. and an Intel card is ~$1000, so its a big win for the community as a whole to have good intel drivers going forward. My work is commercially proprietary so I can't share my code, but I can certainly share ideas on things that I've tested and discovered. Barney _______________________________________________ [4]freebsd-net@freebsd.org mailing list [5]http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to [6]"freebsd-net-unsubscribe@freebsd.org" -- Best regards. Hooman Fazaeli [7] Sepehr S. T. Co. Ltd. Web: [8]http://www.sepehrs.com Tel: (9821)88975701-2 Fax: (9821)88983352 References 1. mailto:ivoras@freebsd.org 2. mailto:ivoras@freebsd.org 3. mailto:freebsd-net@freebsd.org 4. mailto:freebsd-net@freebsd.org 5. http://lists.freebsd.org/mailman/listinfo/freebsd-net 6. mailto:freebsd-net-unsubscribe@freebsd.org 7. mailto:hf@sepehrs.com 8. http://www.sepehrs.com/ From owner-freebsd-net@FreeBSD.ORG Thu Apr 9 07:43:23 2009 Return-Path: Delivered-To: net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 26EC21065693 for ; Thu, 9 Apr 2009 07:43:23 +0000 (UTC) (envelope-from bounces+305227.46043374.562566@icpbounce.com) Received: from smtp2.icpbounce.com (smtp2.icpbounce.com [216.27.93.124]) by mx1.freebsd.org (Postfix) with ESMTP id D6F618FC24 for ; Thu, 9 Apr 2009 07:43:22 +0000 (UTC) (envelope-from bounces+305227.46043374.562566@icpbounce.com) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp2.icpbounce.com (Postfix) with ESMTP id C40FCF847A for ; Thu, 9 Apr 2009 03:22:25 -0400 (EDT) Date: Thu, 9 Apr 2009 03:22:25 -0400 To: net@freebsd.org From: Global Access Travel Message-ID: X-Priority: 3 X-Mailer: PHPMailer [version 1.72] Errors-To: bounces+305227.46043374.562566@icpbounce.com X-List-Unsubscribe: X-Unsubscribe-Web: X-ICPINFO: X-Return-Path-Hint: bounces+305227.46043374.562566@icpbounce.com MIME-Version: 1.0 Content-Type: text/plain; charset = "utf-8" Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Subject: Private Shore Excursions-Turkey X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Apr 2009 07:43:24 -0000 [http://www.turkeycalling.us] PRIVATE SHORE EXCURSIONS- TURKEY Your cruise clients will make the best of their time in Turkey on a private shore excursion! Istanbul Kusadasi & Ephesus [mailto:incoming@gaturkey.com?subject=Private Shore Excursions- Turkey] **************************************************************************** Yasal Uyarı; Bu e-posta, sadece adreste belirtilen kisi veya kurulusun kullanimini hedeflemekte olup,mesajda yer alan bilgiler kisiye ozel ve gizli olabilir, yasalar ya da anlasmalar geregi ücüncü kisiler ile paylasilmasi mümkün olmayabilir.Mesaji alan kisi, mesajin gönderilmek istendigi kisi veya kurulus degilse,bu mesaji yaymak,dagitmak veya kopyalamak yasaktir Mesaj tarafiniza yanlislikla ulasmissa lütfen mesaji geri gönderiniz ve sisteminizden siliniz. Global Turizm Hizmetleri Anonim Sirketi bu mesajin icerigi ile ilgili olarak hicbir hukuksal sorumlulugu kabul etmez. **************************************************************************** Disclaimer; This e-mail communication is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and that may not be made public by law or agreement. If the recipient of this message is not the intended recipient or entity, you are hereby notified that any further dissemination, distribution or copying of this information is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete it from your system. The Global Turizm Hizmetleri Anonim Sirketi does not accept legal responsibility for the contents of this message. *********************************************************************************************** Yasal Uyarı; Bu e-posta, sadece adreste belirtilen kisi veya kurulusun kullanimini hedeflemekte olup,mesajda yer alan bilgiler kisiye ozel ve gizli olabilir, yasalar ya da anlasmalar geregi ücüncü kisiler ile paylasilmasi mümkün olmayabilir.Mesaji alan kisi, mesajin gönderilmek istendigi kisi veya kurulus degilse,bu mesaji yaymak,dagitmak veya kopyalamak yasaktir Mesaj tarafiniza yanlislikla ulasmissa lütfen mesaji geri gönderiniz ve sisteminizden siliniz. Global Turizm Hizmetleri Anonim Sirketi bu mesajin icerigi ile ilgili olarak hicbir hukuksal sorumlulugu kabul etmez. ********************************************************************************************** Disclaimer; This e-mail communication is intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential and that may not be made public by law or agreement. If the recipient of this message is not the intended recipient or entity, you are hereby notified that any further dissemination, distribution or copying of this information is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete it from your system. The Global Turizm Hizmetleri Anonim Sirketi does not accept legal responsibility for the contents of this message. This message was sent by: Global Access Incoming, Nuzhetiye cad, istanbul, besiktas 34357, Turkey Powered by iContact: http://freetrial.icontact.com To be removed click here: http://app.icontact.com/icp/mmail-mprofile.pl?r=46043374&l=82228&s=CMEC&m=562566&c=305227 Forward to a friend: http://app.icontact.com/icp/sub/forward?m=562566&s=46043374&c=CMEC&cid=305227 From owner-freebsd-net@FreeBSD.ORG Thu Apr 9 08:46:36 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id F212C1065670 for ; Thu, 9 Apr 2009 08:46:35 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63901.mail.re1.yahoo.com (web63901.mail.re1.yahoo.com [69.147.97.116]) by mx1.freebsd.org (Postfix) with SMTP id 44EE38FC22 for ; Thu, 9 Apr 2009 08:46:34 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 50282 invoked by uid 60001); 9 Apr 2009 08:46:33 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239266793; bh=c+mMKOb/sPD5NT2W3z9f21Q8DEL30pjUU1TYlcoW2vc=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=el1TQaNFskYy6fDzBbdutaxTjufCADCStmNQkaMnRysxfSzgA+yDZ57gQaMv74Gus7fb77obyX/jgnjGZMg8gtcN9emUyHzsHjXCXnrhDhHCuC7eI4OeYyx0AzU26dq5+uCM1cQEe02DJQf7Q51afczNPa0GSjWr5o45ZkyWHTo= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=o68sFePnJq/rPh3ceG6jfBW3nls1D4bQzK4zzW68k4Rj2gIBvzIbUiBzJOqsu5RPsorFNlCQK5PySPQpyyfHPGA2Kf0eytogkdo1Q4S/xssUhEJfp1UxjesXug623hQGdvNOL35N9KbeeS7GMqZK8HwT7yZ9Tw830rGN8Sd/EgM=; Message-ID: <792562.49628.qm@web63901.mail.re1.yahoo.com> X-YMail-OSG: kM08mBsVM1mDDC2sKkY.BnuTQYSJtMaELpo5zvSVnQ6uvGgvpwsIukBWFI58anExShEW6lLo.muDZOFMlyRe7qcZPMD_QMm2.PaZTv9LlOaDMZZD8pWwz1h0P5SnYM5MSKSZdJR0S9_dpQZmV.6I3exGxitUa8dfO_VNh0SS9E33msolhD9eYhQa4hNje06CAgOPJRlNYX7tXXxGlr81Ub6GXk.bnnUliGsIqGzwWvYuUGab8FSXpoXXUcYelZyVgsl9E.jQD13m3wCbs1SCFDKjLuWluTiAPNjIFf9s6enhX42c6o7Kj7DdN07V Received: from [98.242.222.229] by web63901.mail.re1.yahoo.com via HTTP; Thu, 09 Apr 2009 01:46:33 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Thu, 9 Apr 2009 01:46:33 -0700 (PDT) From: Barney Cordoba To: Robert Watson In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: freebsd-net@freebsd.org, Ivan Voras Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Apr 2009 08:46:36 -0000 --- On Wed, 4/8/09, Robert Watson wrote: > From: Robert Watson > Subject: Re: Advice on a multithreaded netisr patch? > To: "Barney Cordoba" > Cc: "Ivan Voras" , freebsd-net@freebsd.org > Date: Wednesday, April 8, 2009, 9:16 AM > On Wed, 8 Apr 2009, Barney Cordoba wrote: > > > Is there any work being done on lighter weight locks > for queues? It seems ridiculous to avoid using queues > because of lock contention when the locks are only > protecting a couple lines of code. > > My reading is that there are two, closely related, things > going on: the first is lock contention, and the second is > cache line contention. We have a primitive in 8.x > (don't think it's been MFC'd yet) for a lockless > atomic buffer primitive for use in drivers and other parts > of the stack. However, that addresses only lock contention, > not line contention, which at a high PPS will be an issue as > well. Only by moving to independent data structures (i.e., > on independent cache lines) can we reduce line contention. > > Robert N M Watson > Computer Laboratory > University of Cambridge Are mutexes smart enough to know to yield to higher priority threads that are waiting immediately? Such as mtx_unlock() { do_unlock_stuff(); if (higher_pri_waiting) sched_yield() } Also is there a way from the structure or flags to determing is some other thread is waiting on the lock, such as? mtx_unlock(&mtx); if (mtx.someone_is_waiting) sched_yield(); or better yet if (higher_priority_is_waiting) sched_yield() I don't quite have a handle on how the turnstile works, but it seems that there is a lot of time waiting for very short-lived locks. If the tasks are on different cpus, what is the granularity of the wait time for a lock that is cleared almost immediately after trying it? Also, is the waiting only extended when the threads are running on the same cpu? Barney From owner-freebsd-net@FreeBSD.ORG Thu Apr 9 09:58:56 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1CB5B106566C for ; Thu, 9 Apr 2009 09:58:56 +0000 (UTC) (envelope-from f.bonnet@esiee.fr) Received: from mx1.esiee.fr (mx1.esiee.fr [147.215.1.35]) by mx1.freebsd.org (Postfix) with ESMTP id D003F8FC08 for ; Thu, 9 Apr 2009 09:58:55 +0000 (UTC) (envelope-from f.bonnet@esiee.fr) Received: from mail.esiee.fr (mail.esiee.fr [147.215.1.3]) by mx1.esiee.fr (Postfix) with ESMTP id 94D6F136855 for ; Thu, 9 Apr 2009 11:41:06 +0200 (CEST) Received: from mail.esiee.fr (localhost [127.0.0.1]) by VAMS.dummy (Postfix) with SMTP id 162B83E648 for ; Thu, 9 Apr 2009 11:41:05 +0200 (CEST) Received: from secure.esiee.fr (secure.esiee.fr [147.215.1.19]) by mail.esiee.fr (Postfix) with ESMTP id DB1D84513F for ; Thu, 9 Apr 2009 11:40:59 +0200 (CEST) Received: from lisa.esiee.fr (lisa.esiee.fr [147.215.1.21]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: bonnetf) by secure.esiee.fr (Postfix) with ESMTPSA id D2A73E7B0B for ; Thu, 9 Apr 2009 11:40:59 +0200 (CEST) Message-ID: <49DDC2AB.1090100@esiee.fr> Date: Thu, 09 Apr 2009 11:40:59 +0200 From: Frank Bonnet User-Agent: Thunderbird 2.0.0.19 (X11/20090305) MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: IBM X3650 at 7.1 with broadcom chips and CISCO LACP with LAGG driver ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Apr 2009 09:58:56 -0000 Hello I plan to migrate our mailhub to 7.1 but before I do it I need infos about network :-) The machine is an IBM X3650 that have two Broadcom gigaethernet interfaces. I want to use the LAGG driver in LACP mode with a Cisco switch to connect the machine to my LAN in bonding mode Is there any known network problem at 7.1 with such driver/machine ? This machine has permanently 500/600 IMAP(S) processes and a high SMTP traffic. Thanks for any infos. From owner-freebsd-net@FreeBSD.ORG Thu Apr 9 10:14:40 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D0327106564A for ; Thu, 9 Apr 2009 10:14:40 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: from mail-fx0-f167.google.com (mail-fx0-f167.google.com [209.85.220.167]) by mx1.freebsd.org (Postfix) with ESMTP id 61E928FC14 for ; Thu, 9 Apr 2009 10:14:40 +0000 (UTC) (envelope-from pluknet@gmail.com) Received: by fxm11 with SMTP id 11so505822fxm.43 for ; Thu, 09 Apr 2009 03:14:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Dc8fQFk5Qb3Ayu9TfJIsAmja9iovMstwDLqgo5lzYuQ=; b=U8dxgHryMOUnYY+1tY9DUzjsRXWDdjm8gFd60w5j9FLDr1Ysw2NWRrUtWYZTHI1MCn YLDkTbXgt+qzeGFnUaKw+lZYoXmc4nsw2alBG9yaD2F//gGMPGk4wnD5HQ4cZoV3vOLX c5NPRymi/j2D3FcnkvgVqd+x4ue66JuFEKDb4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=gS2IxyPWDYON0SIvmWQ5d27j0kgFp5wmSlZp6bqMmQ9QlhlSskMlVpmQliLlYRCb9p khxr+SiTJTzYTcUT07zVmmKKTS+eD10QgNX/LHuqg+igw31FNLESUn41eA7buqvWWKPw lR8snqzuy2BhyZPFIW6u3aw+GYXNX7GcWqPw4= MIME-Version: 1.0 Received: by 10.103.214.8 with SMTP id r8mr1138701muq.92.1239272079262; Thu, 09 Apr 2009 03:14:39 -0700 (PDT) In-Reply-To: <49DDC2AB.1090100@esiee.fr> References: <49DDC2AB.1090100@esiee.fr> Date: Thu, 9 Apr 2009 14:14:39 +0400 Message-ID: From: pluknet To: Frank Bonnet Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: IBM X3650 at 7.1 with broadcom chips and CISCO LACP with LAGG driver ? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Apr 2009 10:14:41 -0000 2009/4/9 Frank Bonnet : > Hello > > I plan to migrate our mailhub to 7.1 but before I do it > I need infos about network :-) > > The machine is an IBM X3650 that have two Broadcom > gigaethernet interfaces. [..] > Is there any known network problem at 7.1 with such driver/machine ? At work we have a such one running under 7.1-R with MySQL and Mail services, without high memory or network pressure though. The last uptime was 43 days. No any network problems were discovered for that time. -- wbr, pluknet From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 07:17:07 2009 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8BDB01065673 for ; Fri, 10 Apr 2009 07:17:07 +0000 (UTC) (envelope-from xernet@hotmail.it) Received: from bay0-omc3-s32.bay0.hotmail.com (bay0-omc3-s32.bay0.hotmail.com [65.54.246.232]) by mx1.freebsd.org (Postfix) with ESMTP id 76CF38FC1E for ; Fri, 10 Apr 2009 07:17:07 +0000 (UTC) (envelope-from xernet@hotmail.it) Received: from BAY126-DS6 ([65.55.131.33]) by bay0-omc3-s32.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.3959); Fri, 10 Apr 2009 00:05:07 -0700 X-Originating-IP: [79.10.86.250] X-Originating-Email: [xernet@hotmail.it] Message-ID: From: "xer" To: Date: Fri, 10 Apr 2009 09:05:11 +0200 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 14.0.8064.206 X-MimeOLE: Produced By Microsoft MimeOLE V14.0.8064.206 X-OriginalArrivalTime: 10 Apr 2009 07:05:07.0383 (UTC) FILETIME=[AE224470:01C9B9AA] Cc: Subject: watchdog timeout X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: xer List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 07:17:07 -0000 Hello, i did sent this mine message to stable mail list, then i found that your address is a manteiner for some bugs. I'm asking if this one article: http://www.freebsd.org/cgi/query-pr.cgi?pr=129352 Has updates, since i haven't found any new, 'cause it's talking about PRERELEASE and i'm working on 6.4-STABLE, also how can it is possible to have a compiled kernel on january and it have this bug still present? Thand in advance for a your responce Regards -------------------------------------------------- From: "xer" Sent: Wednesday, April 08, 2009 10:41 AM To: Subject: watchdog timeout > Hello > I have some problems with 3Com nics, after a upgrade from 5.5-STABLE to > 6.4-STABLE. > > This machine has two 3com nics (one is LAN other is WAN) and i see too > much "watchdog timeout" on both cards. > This on/off up/down on cards, affect the interrupt to clients that are > downloading from apache web server, especially on large files. > > -------------------------------------------- > xer:/root# dmesg > xl1: watchdog timeout > xl1: link state changed to DOWN > xl1: link state changed to UP > xl1: watchdog timeout > xl1: link state changed to DOWN > xl1: link state changed to UP > xl1: watchdog timeout > xl1: link state changed to DOWN > xl1: link state changed to UP > --------------------------------------------- > > xer:/root# cat /var/run/dmesg.boot | grep xl > xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem > 0xfceffc00-0xfceffc7f irq 23 at device 11.0 on pci2 > miibus0: on xl0 > xlphy0: <3c905C 10/100 internal PHY> on miibus0 > xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > xl0: Ethernet address: 00:01:02:e0:04:1b > xl1: <3Com 3c905C-TX Fast Etherlink XL> port 0xe880-0xe8ff mem > 0xfceff800-0xfceff87f irq 20 at device 12.0 on pci2 > miibus1: on xl1 > xlphy1: <3c905C 10/100 internal PHY> on miibus1 > xlphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > xl1: Ethernet address: 00:01:02:df:fe:ed > --------------------------------------------- > Another doubt would be my kernel config, maybe there is something wrong > that i cannot see, i'll post at the end of this post, 'cause is too long. > > As you can see, the cards are 3c905C-TX model. > Someone told me to change drivers, but i cannot understand this advice. > I got same errors with same cards but with another mainboard, same > problem, watchdog appears after an upgrade from 5.4-STABLE to 6.4-STABLE. > > I don't think that to change nic's pci slots, will solve the problem, i > think that maybe change the nics would resolve the matter, but i cannot > access to both FreeBSD phisically, cause the boxes are too far from me > (about 3500 km). > > I'm asking you some advices, and i can i fix this problem. > p.s. with both 5.4 or 5.5 old kernel, the nics was fine. > > Regards > Xer > > ----------kernel config ----------- > xer:/root# cat /usr/src/sys/i386/conf/ASUS > # > # $FreeBSD: src/sys/i386/conf/GENERIC,v 1.429.2.18 2008/07/28 02:20:29 > yongari Exp $ > # > # custom kernel ASUS 01.15.2009 > > machine i386 > cpu I686_CPU > ident ASUS > > options SCHED_4BSD # 4BSD scheduler > options PREEMPTION # Enable kernel thread preemption > options INET # InterNETworking > options INET6 # IPv6 communications protocols > options FFS # Berkeley Fast Filesystem > options SOFTUPDATES # Enable FFS soft updates support > options UFS_ACL # Support for access control lists > options UFS_DIRHASH # Improve performance on big > directories > options MD_ROOT # MD is a potential root device > options NFSCLIENT # Network Filesystem Client > options NFSSERVER # Network Filesystem Server > options NFSLOCKD # Network Lock Manager > options NFS_ROOT # NFS usable as /, requires > NFSCLIENT > options MSDOSFS # MSDOS Filesystem > options CD9660 # ISO 9660 Filesystem > options PROCFS # Process filesystem (requires > PSEUDOFS) > options PSEUDOFS # Pseudo-filesystem framework > options GEOM_GPT # GUID Partition Tables. > options COMPAT_43 # Compatible with BSD 4.3 [KEEP > THIS!] > options COMPAT_FREEBSD4 # Compatible with FreeBSD4 > options COMPAT_FREEBSD5 # Compatible with FreeBSD5 > options KTRACE # ktrace(1) support > options SYSVSHM # SYSV-style shared memory > options SYSVMSG # SYSV-style message queues > options SYSVSEM # SYSV-style semaphores > options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time > extensions > options KBD_INSTALL_CDEV # install a CDEV entry in /dev > options ADAPTIVE_GIANT # Giant mutex is adaptive. > > device apic # I/O APIC > > # Bus support. > device eisa > device pci > > # Floppy drives > device fdc > > # ATA and ATAPI devices > device ata > device atadisk # ATA disk drives > device ataraid # ATA RAID drives > device atapicd # ATAPI CDROM drives > device atapifd # ATAPI floppy drives > device atapist # ATAPI tape drives > options ATA_STATIC_ID # Static device numbering > > # atkbdc0 controls both the keyboard and the PS/2 mouse > device atkbdc # AT keyboard controller > device atkbd # AT keyboard > device psm # PS/2 mouse > > device kbdmux # keyboard multiplexer > > device vga # VGA video card driver > > device splash # Splash screen and screen saver support > > # syscons is the default console driver, resembling an SCO console > device sc > > device agp # support several AGP chipsets > > # Add suspend/resume support for the i8254. > device pmtimer > > # Serial (COM) ports > device sio # 8250, 16[45]50 based serial ports > > # Parallel port > device ppc > device ppbus # Parallel port bus (required) > device lpt # Printer > device plip # TCP/IP over parallel > device ppi # Parallel port interface device > > # PCI Ethernet NICs. > device de # DEC/Intel DC21x4x (``Tulip'') > device em # Intel PRO/1000 adapter Gigabit Ethernet > Card > device ixgb # Intel PRO/10GbE Ethernet Card > device txp # 3Com 3cR990 (``Typhoon'') > device vx # 3Com 3c590, 3c595 (``Vortex'') > > # PCI Ethernet NICs that use the common MII bus controller code. > # NOTE: Be sure to keep the 'device miibus' line in order to use these > NICs! > device miibus # MII bus support > device bce # Broadcom BCM5706/BCM5708 Gigabit > Ethernet > device bfe # Broadcom BCM440x 10/100 Ethernet > device bge # Broadcom BCM570xx Gigabit Ethernet > device dc # DEC/Intel 21143 and various workalikes > device fxp # Intel EtherExpress PRO/100B (82557, > 82558) > device jme # JMicron JMC250 Gigabit/JMC260 Fast > Ethernet > device lge # Level 1 LXT1001 gigabit Ethernet > device msk # Marvell/SysKonnect Yukon II Gigabit > Ethernet > device nge # NatSemi DP83820 gigabit Ethernet > device nve # nVidia nForce MCP on-board Ethernet > Networking > device pcn # AMD Am79C97x PCI 10/100(precedence over > 'lnc') > device re # RealTek 8139C+/8169/8169S/8110S > device rl # RealTek 8129/8139 > device sf # Adaptec AIC-6915 (``Starfire'') > device sis # Silicon Integrated Systems SiS 900/SiS > 7016 > device sk # SysKonnect SK-984x & SK-982x gigabit > Ethernet > device ste # Sundance ST201 (D-Link DFE-550TX) > device stge # Sundance/Tamarack TC9021 gigabit > Ethernet > device ti # Alteon Networks Tigon I/II gigabit > Ethernet > device tl # Texas Instruments ThunderLAN > device tx # SMC EtherPower II (83c170 ``EPIC'') > device vge # VIA VT612x gigabit Ethernet > device vr # VIA Rhine, Rhine II > device wb # Winbond W89C840F > device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') > > # ISA Ethernet NICs. pccard NICs included. > device cs # Crystal Semiconductor CS89x0 NIC > # 'device ed' requires 'device miibus' > device ed # NE[12]000, SMC Ultra, 3c503, DS8390 > cards > device ex # Intel EtherExpress Pro/10 and Pro/10+ > device ep # Etherlink III based cards > device fe # Fujitsu MB8696x based cards > device ie # EtherExpress 8/16, 3C507, StarLAN 10 > etc. > device lnc # NE2100, NE32-VL Lance Ethernet cards > device sn # SMC's 9000 series of Ethernet chips > device xe # Xircom pccard Ethernet > > # Pseudo devices. > device loop # Network loopback > device random # Entropy device > device ether # Ethernet support > device sl # Kernel SLIP > device ppp # Kernel PPP > device tun # Packet tunnel. > device pty # Pseudo-ttys (telnet etc) > device md # Memory "disks" > device gif # IPv6 and IPv4 tunneling > device faith # IPv6-to-IPv4 relaying (translation) > > # The `bpf' device enables the Berkeley Packet Filter. > # Be aware of the administrative consequences of enabling this! > # Note that 'bpf' is required for DHCP. > device bpf # Berkeley packet filter > > # Firewall > options IPFIREWALL # enable ipfirewall > (required for dummynet) > options IPFIREWALL_VERBOSE # enable firewall output > logging to syslogd(8) > options IPFIREWALL_VERBOSE_LIMIT=0 # limit firewall verbosity > output > options IPDIVERT # divert sockets > options DUMMYNET # enable dummynet > operation > options HZ=1000 # set the timer > granularity > > From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 12:10:04 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A83E41065688 for ; Fri, 10 Apr 2009 12:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 7B2F88FC15 for ; Fri, 10 Apr 2009 12:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3ACA4iV092073 for ; Fri, 10 Apr 2009 12:10:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3ACA4Hp092072; Fri, 10 Apr 2009 12:10:04 GMT (envelope-from gnats) Date: Fri, 10 Apr 2009 12:10:04 GMT Message-Id: <200904101210.n3ACA4Hp092072@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Mikolaj Golub Cc: Subject: Re: kern/131310: [netgraph] [panic] 7.1 panics with mpd netgraph interface changes X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mikolaj Golub List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 12:10:06 -0000 The following reply was made to PR kern/131310; it has been noted by GNATS. From: Mikolaj Golub To: bug-followup@FreeBSD.org,Vitaly Dodonov Cc: Semenchuk Oleg Subject: Re: kern/131310: [netgraph] [panic] 7.1 panics with mpd netgraph interface changes Date: Fri, 10 Apr 2009 15:09:38 +0300 This pr is closely related to kern/130977. You can try the patch from it, which adds if_delgroup(ifp, IFG_ALL) to if_detach(). -- Mikolaj Golub From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 12:40:07 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E8D4F1065670 for ; Fri, 10 Apr 2009 12:40:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from mail.cksoft.de (mail.cksoft.de [195.88.108.3]) by mx1.freebsd.org (Postfix) with ESMTP id 9EE768FC2E for ; Fri, 10 Apr 2009 12:40:07 +0000 (UTC) (envelope-from bzeeb-lists@lists.zabbadoz.net) Received: from localhost (amavis.fra.cksoft.de [192.168.74.71]) by mail.cksoft.de (Postfix) with ESMTP id 91B2641C735; Fri, 10 Apr 2009 14:40:06 +0200 (CEST) X-Virus-Scanned: amavisd-new at cksoft.de Received: from mail.cksoft.de ([195.88.108.3]) by localhost (amavis.fra.cksoft.de [192.168.74.71]) (amavisd-new, port 10024) with ESMTP id wgcrkY+NlyvK; Fri, 10 Apr 2009 14:40:06 +0200 (CEST) Received: by mail.cksoft.de (Postfix, from userid 66) id 3A9F441C732; Fri, 10 Apr 2009 14:40:06 +0200 (CEST) Received: from maildrop.int.zabbadoz.net (maildrop.int.zabbadoz.net [10.111.66.10]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.int.zabbadoz.net (Postfix) with ESMTP id 5B2E84448E6; Fri, 10 Apr 2009 12:36:47 +0000 (UTC) Date: Fri, 10 Apr 2009 12:36:47 +0000 (UTC) From: "Bjoern A. Zeeb" X-X-Sender: bz@maildrop.int.zabbadoz.net To: sthaug@nethelp.no In-Reply-To: <20090407.165708.74744827.sthaug@nethelp.no> Message-ID: <20090410123559.D15361@maildrop.int.zabbadoz.net> References: <20090405215842.C15361@maildrop.int.zabbadoz.net> <20090406.121959.74751582.sthaug@nethelp.no> <20090407144311.F15361@maildrop.int.zabbadoz.net> <20090407.165708.74744827.sthaug@nethelp.no> X-OpenPGP-Key: 0x14003F198FEFA3E77207EE8D2B58B8F83CCF1842 MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: IPv6 window scaling factor always 1 on initial SYN X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 12:40:08 -0000 On Tue, 7 Apr 2009, sthaug@nethelp.no wrote: >>> I changed it, and that worked like a dream. Now I get basically the >>> same throughput with IPv4 and IPv6. There are of course still issues >>> like lots of IPv6 tunnels that add extra latency - but that's not the >>> fault of FreeBSD. >>> >>> Anyway, thanks for your work. Below is a context diff (against 7-STABLE >>> cvsupped last night). Do we need a PR to get this into FreeBSD? >> >> It's in HEAD now as of SVN r190800. > > Excellent news, thank you! And presumably we'll get a MFC after a > suitable settling time? If 3 days were suitable;) It'll be part of 7.2-R as it is in stable/7 now. Thanks a lot for reporting and testing! /bz -- Bjoern A. Zeeb The greatest risk is not taking one. From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 14:50:05 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1EE56106564A for ; Fri, 10 Apr 2009 14:50:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 0ECE98FC1A for ; Fri, 10 Apr 2009 14:50:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3AEo4b8008055 for ; Fri, 10 Apr 2009 14:50:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3AEo44I008054; Fri, 10 Apr 2009 14:50:04 GMT (envelope-from gnats) Date: Fri, 10 Apr 2009 14:50:04 GMT Message-Id: <200904101450.n3AEo44I008054@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/131310: commit references a PR X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 14:50:05 -0000 The following reply was made to PR kern/131310; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131310: commit references a PR Date: Fri, 10 Apr 2009 14:42:02 +0000 (UTC) Author: mlaier Date: Fri Apr 10 14:41:51 2009 New Revision: 190895 URL: http://svn.freebsd.org/changeset/base/190895 Log: Remove interfaces from IFG_ALL on detach. This cures a couple of pf panics when using the "self" keyword in tables or as ()-style host address and fixes "ifconfig -g all" output. PR: kern/130977, kern/131310 Submitted by: Mikolaj Golub MFC after: 3 days Modified: head/sys/net/if.c Modified: head/sys/net/if.c ============================================================================== --- head/sys/net/if.c Fri Apr 10 14:24:12 2009 (r190894) +++ head/sys/net/if.c Fri Apr 10 14:41:51 2009 (r190895) @@ -887,6 +887,7 @@ if_detach(struct ifnet *ifp) rt_ifannouncemsg(ifp, IFAN_DEPARTURE); EVENTHANDLER_INVOKE(ifnet_departure_event, ifp); devctl_notify("IFNET", ifp->if_xname, "DETACH", NULL); + if_delgroup(ifp, IFG_ALL); IF_AFDATA_LOCK(ifp); for (dp = domains; dp; dp = dp->dom_next) { _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 15:02:19 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 42F5F106567D for ; Fri, 10 Apr 2009 15:02:19 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from smtp191.iad.emailsrvr.com (smtp191.iad.emailsrvr.com [207.97.245.191]) by mx1.freebsd.org (Postfix) with ESMTP id 1E25D8FC36 for ; Fri, 10 Apr 2009 15:02:19 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from relay9.relay.iad.mlsrvr.com (localhost [127.0.0.1]) by relay9.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id AC39F1E21A3 for ; Fri, 10 Apr 2009 11:02:18 -0400 (EDT) Received: by relay9.relay.iad.mlsrvr.com (Authenticated sender: kfodil-lemelin-AT-xiplink.com) with ESMTPSA id 9E4D11CCA41 for ; Fri, 10 Apr 2009 11:02:18 -0400 (EDT) Message-ID: <49DF5F75.6080607@xiplink.com> Date: Fri, 10 Apr 2009 11:02:13 -0400 From: Karim Fodil-Lemelin User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: freebsd-net@freebsd.org Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: m_tag, malloc vs uma X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 15:02:20 -0000 Hello, Is there any plans on getting the mbuf tags sub-system integrated with the universal memory allocator? Getting tags for mbufs is still calling malloc in uipc_mbuf.c ... What would be the benefits of using uma instead? Karim. From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 17:08:01 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E7F141065696 for ; Fri, 10 Apr 2009 17:08:01 +0000 (UTC) (envelope-from spawk@acm.poly.edu) Received: from acm.poly.edu (acm.poly.edu [128.238.9.200]) by mx1.freebsd.org (Postfix) with ESMTP id AF5468FC12 for ; Fri, 10 Apr 2009 17:08:01 +0000 (UTC) (envelope-from spawk@acm.poly.edu) Received: (qmail 69250 invoked from network); 10 Apr 2009 17:08:01 -0000 Received: from unknown (HELO ?10.0.0.135?) (spawk@128.238.64.31) by acm.poly.edu with AES256-SHA encrypted SMTP; 10 Apr 2009 17:08:01 -0000 Message-ID: <49DF7CE9.6060706@acm.poly.edu> Date: Fri, 10 Apr 2009 13:07:53 -0400 From: Boris Kochergin User-Agent: Thunderbird 2.0.0.19 (X11/20090108) MIME-Version: 1.0 To: Sam Leffler References: <49DCAC1F.9000708@acm.poly.edu> <49DCC1EB.3040706@freebsd.org> <49DCD66B.6040504@freebsd.org> In-Reply-To: <49DCD66B.6040504@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: Multi-BSS problem with Atheros 5212 X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 17:08:02 -0000 Sam Leffler wrote: > Sam Leffler wrote: >> Boris Kochergin wrote: >>> Ahoy. I'm having trouble with multiple hostap-mode wlan >>> pseudo-devices. The machine is an 8-CURRENT from yesterday: >>> >>> # uname -a >>> FreeBSD test 8.0-CURRENT FreeBSD 8.0-CURRENT #0: Tue Apr 7 16:54:56 >>> UTC 2009 root@test:/usr/obj/usr/src/sys/GENERIC i386 >>> >>> # dmesg | grep ath >>> ath0: mem 0xf4100000-0xf410ffff irq 11 at device 13.0 >>> on pci0 >>> ath0: [ITHREAD] >>> ath0: AR2413 mac 7.9 RF2413 phy 4.5 >>> >>> # cat /etc/rc.conf >>> wlans_ath0="wlan0 wlan1 wlan2" >>> create_args_wlan0="wlanmode hostap bssid" >>> create_args_wlan1="wlanmode hostap bssid" >>> create_args_wlan2="wlanmode hostap bssid" >>> ifconfig_wlan0="ssid wlan0 wepmode off up" >>> ifconfig_wlan1="ssid wlan1 wepmode off up" >>> ifconfig_wlan2="ssid wlan2 wepmode off up" >>> >>> # ifconfig >>> ath0: flags=8843 metric 0 >>> mtu 2290 >>> ether 00:18:e7:33:5e:24 >>> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >>> >>> status: running >>> fxp0: flags=8843 metric 0 >>> mtu 1500 >>> options=8 >>> ether 00:90:27:72:c4:f3 >>> inet 10.0.0.128 netmask 0xffffff00 broadcast 10.0.0.255 >>> media: Ethernet autoselect (100baseTX ) >>> status: active >>> lo0: flags=8049 metric 0 mtu 16384 >>> options=3 >>> inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 >>> inet6 ::1 prefixlen 128 >>> inet 127.0.0.1 netmask 0xff000000 >>> wlan0: flags=8843 metric 0 >>> mtu 1500 >>> ether 00:18:e7:33:5e:24 >>> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >>> >>> status: running >>> ssid wlan0 channel 11 (2462 Mhz 11g) bssid 00:18:e7:33:5e:24 >>> country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 >>> protmode CTS wme burst dtimperiod 1 -dfs >>> wlan1: flags=8843 metric 0 >>> mtu 1500 >>> ether 06:18:e7:33:5e:24 >>> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >>> >>> status: running >>> ssid wlan1 channel 11 (2462 Mhz 11g) bssid 06:18:e7:33:5e:24 >>> country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 >>> protmode CTS wme burst dtimperiod 1 -dfs >>> wlan2: flags=8843 metric 0 >>> mtu 1500 >>> ether 0a:18:e7:33:5e:24 >>> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g >>> >>> status: running >>> ssid wlan2 channel 11 (2462 Mhz 11g) bssid 0a:18:e7:33:5e:24 >>> country US ecm authmode OPEN privacy OFF txpower 23 scanvalid 60 >>> protmode CTS wme burst dtimperiod 1 -dfs >>> >>> The client is a 7.0 machine with another 5212 card: >>> >>> # uname -a >>> FreeBSD peer 7.0-RELEASE-p10 FreeBSD 7.0-RELEASE-p10 #0: Mon Mar 23 >>> 09:26:18 EDT 2009 root@peer:/usr/obj/usr/src/sys/PEER i386 >>> >>> # dmesg | grep ath >>> ath_hal: 0.10.5.6 (AR5210, AR5211, AR5212, AR5416, RF5111, RF5112, >>> RF2413, RF5413, RF2133, RF2425, RF2417) >>> ath0: mem 0xa8410000-0xa841ffff irq 11 at device 0.0 >>> on cardbus0 >>> ath0: [ITHREAD] >>> ath0: using obsoleted if_watchdog interface >>> ath0: Ethernet address: 00:14:d1:42:21:5a >>> ath0: mac 7.9 phy 4.5 radio 5.6 >>> >>> The three SSIDs configured on the CURRENT machine show up in a scan: >>> >>> # ifconfig ath0 scan | grep wlan >>> wlan0 00:18:e7:33:5e:24 11 54M -66:-93 100 ES WME >>> wlan1 06:18:e7:33:5e:24 11 54M -65:-93 100 ES WME >>> wlan2 0a:18:e7:33:5e:24 11 54M -65:-93 100 ES WME >>> >>> The client is only able to associate with wlan1, however. When >>> scanning channels while attempting to associate with any of the >>> other ones, it gets stuck on channel 11 for a while before moving >>> on, which seems relevant. Also interesting is the fact that if i do >>> "ifconfig ath0 down" on the CURRENT machine, followed by, for >>> example, "ifconfig ath0 ssid wlan0" (which did not associate before) >>> on the client, followed by "ifconfig ath0 up" on the CURRENT >>> machine, the client will associate with wlan0, but will not be able >>> to associate with wlan1 or wlan2. Any ideas? >> wlandebug scan+auth+assoc on the client machine will show you why you >> cannot associate. You can also enable the same info on the ap side >> to see what it thinks is happening. > > FWIW I just setup 3 vap's as you did above and hooked them into a > bridge. I verified I could associate and pass traffic using a MBPro. > No problems. I also destroyed the bridge and re-tested w/o issues. > Regardless the debug msgs should identify what your problem is. > > Sam > > _______________________________________________ > freebsd-net@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscribe@freebsd.org" I booted the hostap machine up and set wlandebug to scan+auth+assoc on wlan0, wlan1, and wlan2. I then inserted the PCMCIA card into the client machine, set wlandebug to scan+auth+assoc on it (ath0), and executed "ifconfig ath0 ssid wlan0 up". I let it scan around for a bit. The client-side debug messages are at http://acm.poly.edu/~spawk/wlan/wlan0.client, and the hostap machine did not emit any debug messages during the association attempts. I then ejected the card from the client and repeated the process for wlan1 (it associated). The client-side debug messages are at http://acm.poly.edu/~spawk/wlan/wlan1.client and the hostap-side debug messages are at http://acm.poly.edu/~spawk/wlan/wlan1.ap. I then ejected the card from the client and repeated the process for wlan2. The client-side debug messages are at http://acm.poly.edu/~spawk/wlan/wlan2.client, and the hostap machine did not emit any debug messages during the association attempts. In case it's relevant, the client card is a PCMCIA version of... ath0@pci0:5:0:0: class=0x020000 card=0x2051168c chip=0x0013168c rev=0x01 hdr=0x00 vendor = 'Atheros Communications Inc.' device = 'AR5212, AR5213 802.11a/b/g Wireless Adapter' class = network subclass = ethernet ...and the hostap card is a PCI version of the same thing: ath0@pci0:0:13:0: class=0x020000 card=0x2051168c chip=0x0013168c rev=0x01 hdr=0x00 vendor = 'Atheros Communications Inc.' device = 'AR5212, AR5213 802.11a/b/g Wireless Adapter' class = network subclass = ethernet -Boris From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 18:55:20 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 111E11065674 for ; Fri, 10 Apr 2009 18:55:20 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id E251C8FC12 for ; Fri, 10 Apr 2009 18:55:19 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id 9E4AF46B8A; Fri, 10 Apr 2009 14:55:19 -0400 (EDT) Date: Fri, 10 Apr 2009 19:55:19 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Karim Fodil-Lemelin In-Reply-To: <49DF5F75.6080607@xiplink.com> Message-ID: References: <49DF5F75.6080607@xiplink.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: m_tag, malloc vs uma X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 18:55:20 -0000 On Fri, 10 Apr 2009, Karim Fodil-Lemelin wrote: > Is there any plans on getting the mbuf tags sub-system integrated with the > universal memory allocator? Getting tags for mbufs is still calling malloc > in uipc_mbuf.c ... What would be the benefits of using uma instead? Hi Karim: Right now there are no specific plans for changes along these lines, although we have talked about moving towards better support for deep objects in m_tags. Right now, MAC requires a "deep" copy, because labels may be complex objects, and this is special-cased in the m_tag code. One way to move in that direction would be to move from an explicit m_tag free pointer to a pointer to a vector of copy, free, etc, operations. This would make it easier to support more flexible memory models there, rather than forcing the use of malloc(9). That said, malloc(9) for "small" memory types is essentially a thin wrapper accounting around a set of fixed-size UMA zones: ITEM SIZE LIMIT USED FREE REQUESTS FAILURES 16: 16, 0, 3703, 966, 55930783, 0 32: 32, 0, 1455, 692, 30720298, 0 64: 64, 0, 4794, 1224, 38352819, 0 128: 128, 0, 3169, 341, 5705218, 0 256: 256, 0, 1565, 535, 48338889, 0 512: 512, 0, 386, 494, 9962475, 0 1024: 1024, 0, 66, 354, 3418306, 0 2048: 2048, 0, 314, 514, 29945, 0 4096: 4096, 0, 250, 279, 4567645, 0 For larger memory sizes, malloc(9) becomes instead a thin wrapper around VM allocation of kernel address space and pages. So as long as you're using smaller objects, malloc(9) actually offers most of the benefits of slab allocation. Because m_tag(9) is an interface used for a variety of base system and third party parts, changes to the KPI would need to be made with a major FreeBSD release -- for example with 8.0. Such a change is definitely not precluded at this point, but in a couple of months we'll hit feature freeze and it won't be possible to make those changes after that time. Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 19:20:06 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3FFF3106564A for ; Fri, 10 Apr 2009 19:20:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 14CC58FC12 for ; Fri, 10 Apr 2009 19:20:06 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3AJK5Fx070897 for ; Fri, 10 Apr 2009 19:20:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3AJK5rg070896; Fri, 10 Apr 2009 19:20:05 GMT (envelope-from gnats) Date: Fri, 10 Apr 2009 19:20:05 GMT Message-Id: <200904101920.n3AJK5rg070896@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: dfilter@FreeBSD.ORG (dfilter service) Cc: Subject: Re: kern/131310: commit references a PR X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: dfilter service List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 19:20:06 -0000 The following reply was made to PR kern/131310; it has been noted by GNATS. From: dfilter@FreeBSD.ORG (dfilter service) To: bug-followup@FreeBSD.org Cc: Subject: Re: kern/131310: commit references a PR Date: Fri, 10 Apr 2009 19:16:29 +0000 (UTC) Author: mlaier Date: Fri Apr 10 19:16:14 2009 New Revision: 190903 URL: http://svn.freebsd.org/changeset/base/190903 Log: Follow up for r190895 It's not only the "all" group that is affected, but all groups on the given interface. PR: kern/130977, kern/131310 MFC after: 3 days (%vnet) Modified: head/sys/net/if.c Modified: head/sys/net/if.c ============================================================================== --- head/sys/net/if.c Fri Apr 10 18:46:46 2009 (r190902) +++ head/sys/net/if.c Fri Apr 10 19:16:14 2009 (r190903) @@ -141,6 +141,7 @@ static int if_delmulti_locked(struct ifn static void do_link_state_change(void *, int); static int if_getgroup(struct ifgroupreq *, struct ifnet *); static int if_getgroupmembers(struct ifgroupreq *); +static void if_delgroups(struct ifnet *); #ifdef INET6 /* @@ -887,7 +888,7 @@ if_detach(struct ifnet *ifp) rt_ifannouncemsg(ifp, IFAN_DEPARTURE); EVENTHANDLER_INVOKE(ifnet_departure_event, ifp); devctl_notify("IFNET", ifp->if_xname, "DETACH", NULL); - if_delgroup(ifp, IFG_ALL); + if_delgroups(ifp); IF_AFDATA_LOCK(ifp); for (dp = domains; dp; dp = dp->dom_next) { @@ -1025,6 +1026,54 @@ if_delgroup(struct ifnet *ifp, const cha } /* + * Remove an interface from all groups + */ +static void +if_delgroups(struct ifnet *ifp) +{ + INIT_VNET_NET(ifp->if_vnet); + struct ifg_list *ifgl; + struct ifg_member *ifgm; + char groupname[IFNAMSIZ]; + + IFNET_WLOCK(); + while (!TAILQ_EMPTY(&ifp->if_groups)) { + ifgl = TAILQ_FIRST(&ifp->if_groups); + + strlcpy(groupname, ifgl->ifgl_group->ifg_group, IFNAMSIZ); + + IF_ADDR_LOCK(ifp); + TAILQ_REMOVE(&ifp->if_groups, ifgl, ifgl_next); + IF_ADDR_UNLOCK(ifp); + + TAILQ_FOREACH(ifgm, &ifgl->ifgl_group->ifg_members, ifgm_next) + if (ifgm->ifgm_ifp == ifp) + break; + + if (ifgm != NULL) { + TAILQ_REMOVE(&ifgl->ifgl_group->ifg_members, ifgm, + ifgm_next); + free(ifgm, M_TEMP); + } + + if (--ifgl->ifgl_group->ifg_refcnt == 0) { + TAILQ_REMOVE(&V_ifg_head, ifgl->ifgl_group, ifg_next); + EVENTHANDLER_INVOKE(group_detach_event, + ifgl->ifgl_group); + free(ifgl->ifgl_group, M_TEMP); + } + IFNET_WUNLOCK(); + + free(ifgl, M_TEMP); + + EVENTHANDLER_INVOKE(group_change_event, groupname); + + IFNET_WLOCK(); + } + IFNET_WUNLOCK(); +} + +/* * Stores all groups from an interface in memory pointed * to by data */ _______________________________________________ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org" From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 19:32:02 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B5A61106573E for ; Fri, 10 Apr 2009 19:32:02 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from smtp191.iad.emailsrvr.com (smtp191.iad.emailsrvr.com [207.97.245.191]) by mx1.freebsd.org (Postfix) with ESMTP id 731BE8FC22 for ; Fri, 10 Apr 2009 19:32:02 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from relay9.relay.iad.mlsrvr.com (localhost [127.0.0.1]) by relay9.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id 12BDE1E4834; Fri, 10 Apr 2009 15:32:02 -0400 (EDT) Received: by relay9.relay.iad.mlsrvr.com (Authenticated sender: kfodil-lemelin-AT-xiplink.com) with ESMTPSA id AF6D11E44EB; Fri, 10 Apr 2009 15:32:01 -0400 (EDT) Message-ID: <49DF9EAD.1050609@xiplink.com> Date: Fri, 10 Apr 2009 15:31:57 -0400 From: Karim Fodil-Lemelin User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Robert Watson References: <49DF5F75.6080607@xiplink.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: m_tag, malloc vs uma X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 19:32:04 -0000 Robert Watson wrote: > On Fri, 10 Apr 2009, Karim Fodil-Lemelin wrote: > >> Is there any plans on getting the mbuf tags sub-system integrated >> with the universal memory allocator? Getting tags for mbufs is still >> calling malloc in uipc_mbuf.c ... What would be the benefits of using >> uma instead? > > Hi Karim: > > Right now there are no specific plans for changes along these lines, > although we have talked about moving towards better support for deep > objects in m_tags. Right now, MAC requires a "deep" copy, because > labels may be complex objects, and this is special-cased in the m_tag > code. One way to move in that direction would be to move from an > explicit m_tag free pointer to a pointer to a vector of copy, free, > etc, operations. This would make it easier to support more flexible > memory models there, rather than forcing the use of malloc(9). > > That said, malloc(9) for "small" memory types is essentially a thin > wrapper accounting around a set of fixed-size UMA zones: > > ITEM SIZE LIMIT USED FREE REQUESTS > FAILURES > 16: 16, 0, 3703, 966, > 55930783, 0 > 32: 32, 0, 1455, 692, > 30720298, 0 > 64: 64, 0, 4794, 1224, > 38352819, 0 > 128: 128, 0, 3169, 341, > 5705218, 0 > 256: 256, 0, 1565, 535, > 48338889, 0 > 512: 512, 0, 386, 494, > 9962475, 0 > 1024: 1024, 0, 66, 354, > 3418306, 0 > 2048: 2048, 0, 314, 514, > 29945, 0 > 4096: 4096, 0, 250, 279, > 4567645, 0 > > For larger memory sizes, malloc(9) becomes instead a thin wrapper > around VM allocation of kernel address space and pages. So as long as > you're using smaller objects, malloc(9) actually offers most of the > benefits of slab allocation. > > Because m_tag(9) is an interface used for a variety of base system and > third party parts, changes to the KPI would need to be made with a > major FreeBSD release -- for example with 8.0. Such a change is > definitely not precluded at this point, but in a couple of months > we'll hit feature freeze and it won't be possible to make those > changes after that time. > > Robert N M Watson > Computer Laboratory > University of Cambridge Hi Robert, Thank you for the answer, clear and concise. I asked the question because I had modified pf_get_mtag() to use uma directly in the hope that it would be faster then calling malloc. But since pf_mtag is 20bytes, malloc will end up using a fixed 32bytes zone and I shouldn't expect much speed gain from using something like (except some savings from not having to select the 32bytes zone): extern uma_zone_t pf_mtag_zone; static __inline struct pf_mtag * pf_get_mtag(struct mbuf *m) { struct m_tag *mtag; if ((mtag = m_tag_find(m, PACKET_TAG_PF, NULL)) == NULL) { mtag = uma_zalloc(pf_mtag_zone, M_NOWAIT); if (mtag == NULL) return (NULL); m_tag_setup(mtag, MTAG_ABI_COMPAT, PACKET_TAG_PF, sizeof(struct pf_mtag)); mtag->m_tag_free = pf_mtag_delete; bzero(mtag + 1, sizeof(struct pf_mtag)); m_tag_prepend(m, mtag); } return ((struct pf_mtag *)(mtag + 1)); } Where pf_mtag_delete is a wrapper around uma_zfree(). Regards, Karim. From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 20:02:44 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 41D42106566B for ; Fri, 10 Apr 2009 20:02:44 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 1DAEA8FC1A for ; Fri, 10 Apr 2009 20:02:44 +0000 (UTC) (envelope-from rwatson@FreeBSD.org) Received: from fledge.watson.org (fledge.watson.org [65.122.17.41]) by cyrus.watson.org (Postfix) with ESMTPS id CFB4D46B94; Fri, 10 Apr 2009 16:02:43 -0400 (EDT) Date: Fri, 10 Apr 2009 21:02:43 +0100 (BST) From: Robert Watson X-X-Sender: robert@fledge.watson.org To: Karim Fodil-Lemelin In-Reply-To: <49DF9EAD.1050609@xiplink.com> Message-ID: References: <49DF5F75.6080607@xiplink.com> <49DF9EAD.1050609@xiplink.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: freebsd-net@freebsd.org Subject: Re: m_tag, malloc vs uma X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 20:02:44 -0000 On Fri, 10 Apr 2009, Karim Fodil-Lemelin wrote: > Thank you for the answer, clear and concise. I asked the question because I > had modified pf_get_mtag() to use uma directly in the hope that it would be > faster then calling malloc. But since pf_mtag is 20bytes, malloc will end up > using a fixed 32bytes zone and I shouldn't expect much speed gain from using > something like (except some savings from not having to select the 32bytes > zone): There is another small overhead, the critical section used to protect the consistency of the per-CPU malloc type alloc and free counters, but it's also very small. I think it would be desirable to make a change to more flexible m_tag types for 8.0, but I'm not sure I have time to implement/test it. Is this something you might be interested in working on? I'm thinking of basically replacing the m_tag_free pointer with a pointer to a small vector of operations, possibly something along these lines: struct m_tag_ops { void (*m_tag_free)(struct m_tag *); struct m_tag (*m_tag_copy)(struct m_tag *); }; If the m_tag_ops pointer is NULL, we go with today's default (requiring minimal change of existing consumers). I'm not sure if there are any other function pointers we'd need at this point? Robert N M Watson Computer Laboratory University of Cambridge From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 20:30:05 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3F460106566B for ; Fri, 10 Apr 2009 20:30:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 127568FC13 for ; Fri, 10 Apr 2009 20:30:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3AKU4i7067096 for ; Fri, 10 Apr 2009 20:30:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3AKU4Lj067093; Fri, 10 Apr 2009 20:30:04 GMT (envelope-from gnats) Date: Fri, 10 Apr 2009 20:30:04 GMT Message-Id: <200904102030.n3AKU4Lj067093@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Glen Barber Cc: Subject: Re: misc/129580: Netgear WG311v3 (ndis) causes kenel trap at boot. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Glen Barber List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 20:30:05 -0000 The following reply was made to PR kern/129580; it has been noted by GNATS. From: Glen Barber To: bug-followup@freebsd.org Cc: Subject: Re: misc/129580: Netgear WG311v3 (ndis) causes kenel trap at boot. Date: Fri, 10 Apr 2009 16:04:33 -0400 Since malo(4) is available, I believe this PR can be closed. Thanks. -- Glen Barber From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 21:13:08 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0EC8D1065674; Fri, 10 Apr 2009 21:13:08 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D80298FC12; Fri, 10 Apr 2009 21:13:07 +0000 (UTC) (envelope-from linimon@FreeBSD.org) Received: from freefall.freebsd.org (linimon@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3ALD70d037631; Fri, 10 Apr 2009 21:13:07 GMT (envelope-from linimon@freefall.freebsd.org) Received: (from linimon@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3ALD7Fi037625; Fri, 10 Apr 2009 21:13:07 GMT (envelope-from linimon) Date: Fri, 10 Apr 2009 21:13:07 GMT Message-Id: <200904102113.n3ALD7Fi037625@freefall.freebsd.org> To: linimon@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: linimon@FreeBSD.org Cc: Subject: Re: kern/133572: [ppp] [hang] incoming PPTP connection hangs the system X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 21:13:08 -0000 Old Synopsis: incoming PPTP connection hangs the system New Synopsis: [ppp] [hang] incoming PPTP connection hangs the system Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: linimon Responsible-Changed-When: Fri Apr 10 21:11:38 UTC 2009 Responsible-Changed-Why: Over to maintainer(s). http://www.freebsd.org/cgi/query-pr.cgi?pr=133572 From owner-freebsd-net@FreeBSD.ORG Fri Apr 10 23:10:05 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0BC14106566B for ; Fri, 10 Apr 2009 23:10:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id D388F8FC0C for ; Fri, 10 Apr 2009 23:10:04 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3ANA41x086899 for ; Fri, 10 Apr 2009 23:10:04 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3ANA46d086898; Fri, 10 Apr 2009 23:10:04 GMT (envelope-from gnats) Date: Fri, 10 Apr 2009 23:10:04 GMT Message-Id: <200904102310.n3ANA46d086898@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Max Laier Cc: Subject: Re: kern/133572: [ppp] [hang] incoming PPTP connection hangs the system X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Max Laier List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Apr 2009 23:10:05 -0000 The following reply was made to PR kern/133572; it has been noted by GNATS. From: Max Laier To: bug-followup@freebsd.org, dennis.melentyev@gmail.com Cc: Subject: Re: kern/133572: [ppp] [hang] incoming PPTP connection hangs the system Date: Fri, 10 Apr 2009 23:47:55 +0100 Is it possible for you to turn on WITNESS on this machine to obtain possible LORs that might be responsible for the hang? Also, do you have the possibility to enable DDB and drop into it from the console (if it is not a hard hang but a live lock)? -- Max From owner-freebsd-net@FreeBSD.ORG Sat Apr 11 01:54:38 2009 Return-Path: Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AB9EF10656C5 for ; Sat, 11 Apr 2009 01:54:38 +0000 (UTC) (envelope-from ccowart@rescomp.berkeley.edu) Received: from hal.rescomp.berkeley.edu (hal.Rescomp.Berkeley.EDU [169.229.70.150]) by mx1.freebsd.org (Postfix) with ESMTP id 918368FC17 for ; Sat, 11 Apr 2009 01:54:38 +0000 (UTC) (envelope-from ccowart@rescomp.berkeley.edu) Received: by hal.rescomp.berkeley.edu (Postfix, from userid 1225) id 8B0273C054F; Fri, 10 Apr 2009 18:38:34 -0700 (PDT) Date: Fri, 10 Apr 2009 18:38:34 -0700 From: Chris Cowart To: "Bjoern A. Zeeb" Message-ID: <20090411013834.GB40655@hal.rescomp.berkeley.edu> Mail-Followup-To: "Bjoern A. Zeeb" , "Eugene M. Kim" <20080111.freebsd.org@ab.ote.we.lv>, freebsd-net@FreeBSD.org References: <48693E39.4080104@ab.ote.we.lv> <20080630220842.X83875@maildrop.int.zabbadoz.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-ripemd160; protocol="application/pgp-signature"; boundary="H1spWtNR+x+ondvy" Content-Disposition: inline In-Reply-To: <20080630220842.X83875@maildrop.int.zabbadoz.net> Organization: RSSP-IT, UC Berkeley User-Agent: Mutt/1.5.18 (2008-05-17) Cc: freebsd-net@FreeBSD.org, "Eugene M. Kim" <20080111.freebsd.org@ab.ote.we.lv> Subject: Re: bridge(4) and IPv6 link-local address X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2009 01:54:39 -0000 --H1spWtNR+x+ondvy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Bjoern A. Zeeb wrote: > On Mon, 30 Jun 2008, Eugene M. Kim wrote: > > A quick question: Is bridge(4) supposed /not/ to automatically configur= e an=20 > > IPv6 link-local address? >=20 > yes there is a check for this in the code and if remoed (tried that > lately) more things go wrong. >=20 > > I'm trying to use it to bridge a wired segment and a wireless segment, = and=20 > > router advertisement over bridge0 wouldn't work because, with bridge0 l= acking=20 > > a LL address, the router uses a non-LL address as the source address f= or RA=20 > > packets, which then is ignored as invalid by other IPv6 nodes. >=20 > yes, seem something similar lately but ETIMEOUT on debugging. The > problem basically was: >=20 > lan bridge ath --- wlan client >=20 > the LL address was on the "lan" interface. >=20 > ping6 LL on lan from wlan client did not work. I could see the packets > being bridged and visible on all interfaces and even the router on lan > noticed them but there was no reply going to the client. ping6 from > the bridge ``box'' to the wlan client and everything was fine as nd > was seeded. >=20 > Removing the check we ended up with the same LL address on both bridge > and the lan interface if I can remember correctly and you do not want > that... it's a bit tricky and there is something that does not work as > expected, right. If you find the time to debug it I'll happily test > patches;-) I seem to be reviving a fairly old thread here, but this is what I found when I went searching for the issue. I am personally bridging a wireless NIC (ath0) with a VLAN interface (vlan10). The bridge does not receive a link-local address. The bridge interface (bridge0) is the default gateway for my LAN, both for v4 and v6. My Mac was logging this message in response to router advertisements: | Apr 10 18:16:54 administrators-imac configd[29]: RTADV_VERIFY_PACKET: | invalid RA with non link-local source from 2001:4830:1679:10::1 on en0 and was refusing to acknowledge them. My solution was to assign a link-local address to bridge0 based on the ethernet address (I think I did the EUI-48 stuff correctly): | bridge0: flags=3D8843 metric 0 mt= u 1500 | ether 92:db:a2:b4:8e:ba | inet 10.1.10.1 netmask 0xffffff00 broadcast 10.1.10.255 | inet6 2001:4830:1679:10::1 prefixlen 64=20 | inet6 fe80::90db:a2ff:feb4:83ba%bridge0 prefixlen 64 scopeid 0xc=20 | id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 | maxage 20 holdcnt 6 proto rstp maxaddr 100 timeout 1200 | root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 According to ifconfig(8): | Basic IPv6 node operation requires a link-local address on each interface | configured for IPv6. Normally, such an address is automatically config- | ured by the kernel on each interface added to the system; this behaviour | may be disabled by setting the sysctl MIB variable | net.inet6.ip6.auto_linklocal to 0. The bridge(4) page does not add any disclaimer about bridge interfaces. Neither man page gives a good how-to on assigning your own link-local address (I guessed and got it right with the % notation). Shouldn't the kernel assign link-local addresses to these interfaces? Should this address be based on the ethernet address of the bridge interface? I'm not sure I really understood the challenges with the implementation. --=20 Chris Cowart Network Technical Lead Network & Infrastructure Services, RSSP-IT UC Berkeley --H1spWtNR+x+ondvy Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.10 (FreeBSD) iQIcBAEBAwAGBQJJ3/SaAAoJEIGh6j3cHUNP+ecP+gKBGGQUMWgmJ1BQNgT/FfW3 rHkRDLUYNF8eJ+OX4yDfOLgsWCXtEDqvO99OwMHr+1GHhg4rJWYM2C1JJYJElAXE fp79/eSM8Gjo0n9EiWqglkUL9HyRiPtRX7K7WbPLJG75J7ALkThK04UCTghF8GJ5 ZfeoKG9PauZJruH3j91v6aBZhV0E6GrSc8+KiJvx/NmxBiMzpBXOGb4h32R0zPfT n1Fat3bJ5yxyBXaAEnRdOajTG4wUIXa1CFYrmskk8XA7uToaXK0CuiSdexHjrxIj 5GymlpLL33FuOvg32/nK9HDEaL/ktqHZNz+wt9n1p2T4VGk+bdd1TQQvOwXRXwG/ SEIHnpFTREasZ9K0RgVC6mFgkVvFbZGV6OGW/ugISg81u56l+O4IncH6JUCeT5CP h4JUwcQgLAd4IC2ISqNBlYOaDj1yFikhyHWsv8BzUV5WmQT0fq4AToswbAUdQU6A 4lNH0Wq3YZurcRk1fcVQY4atGdin3ftGL6FOI54AB+yb+o1a6E/UxoiYh4tHW4lW XFvkjV23yy+W94SWhbjyldQyy5GoCED0wzxF/x/R7lzQ9AF1/uumcubRWV7f/+EG 0HvIbBx34TvsTCDyocBNGwcred+BnHE7saS67dWuB6fvcUrOjyqqvjLoMIY4X8tB zV9aw1X5E3Dmg0EAzdOa =S/1z -----END PGP SIGNATURE----- --H1spWtNR+x+ondvy-- From owner-freebsd-net@FreeBSD.ORG Sat Apr 11 08:50:02 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E31111065686 for ; Sat, 11 Apr 2009 08:50:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id CFCDE8FC30 for ; Sat, 11 Apr 2009 08:50:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3B8o2uA010511 for ; Sat, 11 Apr 2009 08:50:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3B8o2ka010510; Sat, 11 Apr 2009 08:50:02 GMT (envelope-from gnats) Date: Sat, 11 Apr 2009 08:50:02 GMT Message-Id: <200904110850.n3B8o2ka010510@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Mykola Dzham Cc: Subject: Re: bin/131365: r190758 break using 0 , 0/0, 0.0.0.0/0 as alias for 'default' X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Mykola Dzham List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2009 08:50:04 -0000 The following reply was made to PR bin/131365; it has been noted by GNATS. From: Mykola Dzham To: bug-followup@FreeBSD.org, rrs@FreeBSD.org Cc: Subject: Re: bin/131365: r190758 break using 0 , 0/0, 0.0.0.0/0 as alias for 'default' Date: Sat, 11 Apr 2009 11:20:20 +0300 --UugvWAfsgieZRqgk Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi! r190758 break using 0.0.0.0/0 as alias for default rote: $ route -n get default route to: default destination: default mask: default gateway: 192.168.1.1 interface: em0 flags: recvpipe sendpipe ssthresh rtt,msec rttvar hopcount mtu expire 0 0 0 0 0 0 1500 0 $ route -n get -net 0.0.0.0 route: writing to routing socket: No such process Attached patch fix this -- Mykola Dzham, LEFT-(UANIC|RIPE) JID: levsha@jabber.net.ua --UugvWAfsgieZRqgk Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="route.c.patch" Index: route.c =================================================================== --- route.c (revision 190880) +++ route.c (working copy) @@ -818,7 +818,8 @@ /* i holds the first non zero bit */ bits = 32 - (i*8); } - mask = 0xffffffff << (32 - bits); + if (bits != 0) + mask = 0xffffffff << (32 - bits); sin->sin_addr.s_addr = htonl(addr); sin = &so_mask.sin; --UugvWAfsgieZRqgk-- From owner-freebsd-net@FreeBSD.ORG Sat Apr 11 10:10:03 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 324B2106564A for ; Sat, 11 Apr 2009 10:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 1E8048FC14 for ; Sat, 11 Apr 2009 10:10:03 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3BAA2mE016571 for ; Sat, 11 Apr 2009 10:10:03 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3BAA2QQ016570; Sat, 11 Apr 2009 10:10:02 GMT (envelope-from gnats) Date: Sat, 11 Apr 2009 10:10:02 GMT Message-Id: <200904111010.n3BAA2QQ016570@freefall.freebsd.org> To: freebsd-net@FreeBSD.org From: Randall Stewart Cc: Subject: Re: bin/131365: r190758 break using 0 , 0/0, 0.0.0.0/0 as alias for 'default' X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Randall Stewart List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2009 10:10:03 -0000 The following reply was made to PR bin/131365; it has been noted by GNATS. From: Randall Stewart To: Mykola Dzham Cc: bug-followup@FreeBSD.org, rrs@FreeBSD.org Subject: Re: bin/131365: r190758 break using 0 , 0/0, 0.0.0.0/0 as alias for 'default' Date: Sat, 11 Apr 2009 06:04:37 -0400 Good catch Mykola.. I will get this in :) R On Apr 11, 2009, at 4:20 AM, Mykola Dzham wrote: > Hi! > r190758 break using 0.0.0.0/0 as alias for default rote: > > $ route -n get default > route to: default > destination: default > mask: default > gateway: 192.168.1.1 > interface: em0 > flags: > recvpipe sendpipe ssthresh rtt,msec rttvar hopcount > mtu expire > 0 0 0 0 0 0 > 1500 0 > > $ route -n get -net 0.0.0.0 > route: writing to routing socket: No such process > > Attached patch fix this > > -- > Mykola Dzham, LEFT-(UANIC|RIPE) > JID: levsha@jabber.net.ua > ------------------------------ Randall Stewart 803-317-4952 (cell) 803-345-0391(direct) From owner-freebsd-net@FreeBSD.ORG Sat Apr 11 19:56:37 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0909A106564A for ; Sat, 11 Apr 2009 19:56:37 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from smtp161.iad.emailsrvr.com (smtp161.iad.emailsrvr.com [207.97.245.161]) by mx1.freebsd.org (Postfix) with ESMTP id D47618FC1A for ; Sat, 11 Apr 2009 19:56:36 +0000 (UTC) (envelope-from kfl@xiplink.com) Received: from relay16.relay.iad.mlsrvr.com (localhost [127.0.0.1]) by relay16.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id 304821B4013; Sat, 11 Apr 2009 15:56:31 -0400 (EDT) Received: by relay16.relay.iad.mlsrvr.com (Authenticated sender: kfodil-lemelin-AT-xiplink.com) with ESMTPSA id E9DE81B4003; Sat, 11 Apr 2009 15:56:30 -0400 (EDT) Message-ID: <49E0F5EF.3030807@xiplink.com> Date: Sat, 11 Apr 2009 15:56:31 -0400 From: Karim Fodil-Lemelin User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Robert Watson References: <49DF5F75.6080607@xiplink.com> <49DF9EAD.1050609@xiplink.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-net@freebsd.org Subject: Re: m_tag, malloc vs uma X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2009 19:56:37 -0000 Robert Watson wrote: > On Fri, 10 Apr 2009, Karim Fodil-Lemelin wrote: > >> Thank you for the answer, clear and concise. I asked the question >> because I had modified pf_get_mtag() to use uma directly in the hope >> that it would be faster then calling malloc. But since pf_mtag is >> 20bytes, malloc will end up using a fixed 32bytes zone and I >> shouldn't expect much speed gain from using something like (except >> some savings from not having to select the 32bytes zone): > > There is another small overhead, the critical section used to protect > the consistency of the per-CPU malloc type alloc and free counters, > but it's also very small. > > I think it would be desirable to make a change to more flexible m_tag > types for 8.0, but I'm not sure I have time to implement/test it. Is > this something you might be interested in working on? I'm thinking of > basically replacing the m_tag_free pointer with a pointer to a small > vector of operations, possibly something along these lines: > > struct m_tag_ops { > void (*m_tag_free)(struct m_tag *); > struct m_tag (*m_tag_copy)(struct m_tag *); > }; > > If the m_tag_ops pointer is NULL, we go with today's default > (requiring minimal change of existing consumers). I'm not sure if > there are any other function pointers we'd need at this point? Is the m_tag_copy an 'overloaded' function for the current m_tag_copy or something else? Now it could also be interesting to have another function pointer to overload m_tag_alloc to give more control over which zone the user wants its tags from (ex: pf_mtag ...). The interest is there not sure if the schedule will allow it but that depends if the new m_tag designs allows me to squeeze some performances in. Karim. From owner-freebsd-net@FreeBSD.ORG Sat Apr 11 20:27:10 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D2BD81065673; Sat, 11 Apr 2009 20:27:10 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from ebb.errno.com (ebb.errno.com [69.12.149.25]) by mx1.freebsd.org (Postfix) with ESMTP id 94E1B8FC0C; Sat, 11 Apr 2009 20:27:10 +0000 (UTC) (envelope-from sam@freebsd.org) Received: from trouble.errno.com (trouble.errno.com [10.0.0.248]) (authenticated bits=0) by ebb.errno.com (8.13.6/8.12.6) with ESMTP id n3BKR9qJ073505 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 11 Apr 2009 13:27:10 -0700 (PDT) (envelope-from sam@freebsd.org) Message-ID: <49E0FD1D.408@freebsd.org> Date: Sat, 11 Apr 2009 13:27:09 -0700 From: Sam Leffler Organization: FreeBSD Project User-Agent: Thunderbird 2.0.0.18 (X11/20081209) MIME-Version: 1.0 To: Karim Fodil-Lemelin References: <49DF5F75.6080607@xiplink.com> <49DF9EAD.1050609@xiplink.com> <49E0F5EF.3030807@xiplink.com> In-Reply-To: <49E0F5EF.3030807@xiplink.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-DCC-x.dcc-servers-Metrics: ebb.errno.com; whitelist Cc: freebsd-net@freebsd.org, Robert Watson Subject: Re: m_tag, malloc vs uma X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2009 20:27:11 -0000 Karim Fodil-Lemelin wrote: > Robert Watson wrote: >> On Fri, 10 Apr 2009, Karim Fodil-Lemelin wrote: >> >>> Thank you for the answer, clear and concise. I asked the question >>> because I had modified pf_get_mtag() to use uma directly in the hope >>> that it would be faster then calling malloc. But since pf_mtag is >>> 20bytes, malloc will end up using a fixed 32bytes zone and I >>> shouldn't expect much speed gain from using something like (except >>> some savings from not having to select the 32bytes zone): >> >> There is another small overhead, the critical section used to protect >> the consistency of the per-CPU malloc type alloc and free counters, >> but it's also very small. >> >> I think it would be desirable to make a change to more flexible m_tag >> types for 8.0, but I'm not sure I have time to implement/test it. Is >> this something you might be interested in working on? I'm thinking >> of basically replacing the m_tag_free pointer with a pointer to a >> small vector of operations, possibly something along these lines: >> >> struct m_tag_ops { >> void (*m_tag_free)(struct m_tag *); >> struct m_tag (*m_tag_copy)(struct m_tag *); >> }; >> >> If the m_tag_ops pointer is NULL, we go with today's default >> (requiring minimal change of existing consumers). I'm not sure if >> there are any other function pointers we'd need at this point? > > Is the m_tag_copy an 'overloaded' function for the current m_tag_copy > or something else? Now it could also be interesting to have another > function pointer to overload m_tag_alloc to give more control over > which zone the user wants its tags from (ex: pf_mtag ...). The > interest is there not sure if the schedule will allow it but that > depends if the new m_tag designs allows me to squeeze some > performances in. Typically tags are allocated in a context where decisions like the above can be made so I'm not sure where you think m_tag_alloc might be used. At one point vlan-tagged packets were identified by an mbuf tag. Initially they were allocated by malloc but I moved that to a dedicated zone w/ a noticeable benefit. However the overhead was still too high and so we now space was added to the mbuf pkt hdr explicitly to hold vlan data. It's unlikely any scheme where the tags are allocated independent of the mbufs will scale well enough to handle existing high speed interfaces. There's been discussion about supporting emedding of tags in the mbuf itself; this might come along as part of the variable-size mbuf work that Jeff Roberson was working on. However unless one pre-allocated space and/or defined a general mechanism for managing such space you'd still potentially need to allocate tags separately when they are attached at a later time. For embedded/inline mbuf tag space management I think m_tag_free and m_tag_copy would sufficient for current usage. Sam From owner-freebsd-net@FreeBSD.ORG Sat Apr 11 21:52:05 2009 Return-Path: Delivered-To: freebsd-net@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9308A1065672; Sat, 11 Apr 2009 21:52:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 689F98FC0C; Sat, 11 Apr 2009 21:52:05 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id n3BLq5T0079875; Sat, 11 Apr 2009 21:52:05 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id n3BLq5sF079871; Sat, 11 Apr 2009 21:52:05 GMT (envelope-from gnats) Date: Sat, 11 Apr 2009 21:52:05 GMT Message-Id: <200904112152.n3BLq5sF079871@freefall.freebsd.org> To: gnats@FreeBSD.org, freebsd-bugs@FreeBSD.org, freebsd-net@FreeBSD.org From: gnats@FreeBSD.org Cc: Subject: Re: kern/133613: [wpi] [panic] kernel panic in wpi(4) X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2009 21:52:06 -0000 Old Synopsis: kernel panic in wpi(4) New Synopsis: [wpi] [panic] kernel panic in wpi(4) Responsible-Changed-From-To: freebsd-bugs->freebsd-net Responsible-Changed-By: gnats Responsible-Changed-When: Sat Apr 11 21:51:42 UTC 2009 Responsible-Changed-Why: http://www.freebsd.org/cgi/query-pr.cgi?pr=133613