From owner-freebsd-net@FreeBSD.ORG Mon Apr 6 15:53:16 2009 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1AB4A10656C9 for ; Mon, 6 Apr 2009 15:53:16 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63901.mail.re1.yahoo.com (web63901.mail.re1.yahoo.com [69.147.97.116]) by mx1.freebsd.org (Postfix) with SMTP id CB03A8FC26 for ; Mon, 6 Apr 2009 15:53:15 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: (qmail 14138 invoked by uid 60001); 6 Apr 2009 15:53:15 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1239033195; bh=gDgdn1sMiDqPruOXKxdyfa82DifFjVhKrTAeafiac9o=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=DKGl5uE9hIxwkyZAWSmtuW7Vsntdf9dYgzExlcHpCkOeYJX3FwFa49qFv7sXTgbvBYLp7BCGsrMA6xHLHJ5nHdRBm6GicrigrxshpUfh1+icmSOSTocR9Dp/87QA43H/IkpdmbmB3sCEbNnD9RvTrijm1n70BYj1P/83pfQb2Tg= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=UcsWoib3cVP6AeH8bvvrb5s9VIFhexcPlEJCkIjwmUT2W2haQhDaHI3h2VHGtNrwjdGRMNfLs+PgtYF9bKPzXJPAlMSZAOc4xXU+9OwRX0mOHL4T6R8jAp+6atDunaLfb1Jhn9ZxxvXfLwwrlj+KSjjZESS/3kBZ26C17XyEvJw=; Message-ID: <146595.14120.qm@web63901.mail.re1.yahoo.com> X-YMail-OSG: xbSwlN4VM1kSMrb0rqUukMU11tQeLCL6tOeXS9FIt60ECdeHwRrLz9BhiqCiToQ4zRE9lRnJ1JmBWSAc5oVg1MNPsvfULKET44QZ6.LP638jipnrsBBZfTRuqsn8CgouY0qrRzLvI7WazFnPyhYnlNkQKgZIwJtJSz15OosTq9JZNqWhrwISKa1HylO0ll5NU6topvZUcbBZ0b9jhXMMvCaM4F8oTG.5F7.VbrUDW17v2pCl.mgBqnVqbVHgDSsGq3w2Cd5dW9_UtSTQBpw4.Q4ddEjPLIEC4HLeK1LHSYCAG3bYbTuXacybrcc5 Received: from [98.242.222.229] by web63901.mail.re1.yahoo.com via HTTP; Mon, 06 Apr 2009 08:53:14 PDT X-Mailer: YahooMailWebService/0.7.289.1 Date: Mon, 6 Apr 2009 08:53:14 -0700 (PDT) From: Barney Cordoba To: freebsd-net@freebsd.org, Ivan Voras In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Subject: Re: Advice on a multithreaded netisr patch? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: barney_cordoba@yahoo.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Apr 2009 15:53:17 -0000 --- On Mon, 4/6/09, Ivan Voras wrote: > From: Ivan Voras > Subject: Re: Advice on a multithreaded netisr patch? > To: freebsd-net@freebsd.org > Date: Monday, April 6, 2009, 8:35 AM > Robert Watson wrote: > > On Mon, 6 Apr 2009, Ivan Voras wrote: > > >> So, a mbuf can reference data not yet copied from > the NIC hardware? > >> I'm specifically trying to undestand what > m_pullup() does. > > > > I think we're talking slightly at cross purposes. > There are two > > transfers of interest: > > > > (1) DMA of the packet data to main memory from the NIC > > (2) Servicing of CPU cache misses to access data in > main memory > > > > By the time you receive an interrupt, the DMA is > complete, so once you > > OK, this was what was confusing me - for a moment I thought > you meant > it's not so. > > > believe a packet referenced by the descriptor ring is > done, you don't > > have to wait for DMA. However, the packet data is in > main memory rather > > than your CPU cache, so you'll need to take a > cache miss in order to > > retrieve it. You don't want to prefetch before > you know the packet data > > is there, or you may prefetch stale data from the > previous packet sent > > or received from the cluster. > > > > m_pullup() has to do with mbuf chain memory contiguity > during packet > > processing. The usual usage is something along the > following lines: > > > > struct whatever *w; > > > > m = m_pullup(m, sizeof(*w)); > > if (m == NULL) > > return; > > w = mtod(m, struct whatever *); > > > > m_pullup() here ensures that the first sizeof(*w) > bytes of mbuf data are > > contiguously stored so that the cast of w to m's > data will point at a > > So, m_pullup() can resize / realloc() the mbuf? (not that > it matters for > this purpose) > > > Is this for the loopback workload? If so, remember > that there may be > > some other things going on: > > Both loopback and physical. > > > - Every packet is processed at least two times: once > went sent, and then > > again > > when it's received. > > > > - A TCP segment will need to be ACK'd, so if > you're sending data in > > chunks in > > one direction, the ACKs will not be piggy-backed on > existing data > > tranfers, > > and instead be sent independently, hitting the > network stack two more > > times. > > No combination of these can make an accounting difference > between 1,000 > and 250,000 pps. I must be hitting something very bad here. > > > - Remember that TCP works to expand its window, and > then maintains the > > highest > > performance it can by bumping up against the top of > available bandwidth > > continuously. This involves detecting buffer limits > by generating > > packets > > that can't be sent, adding to the packet count. > With loopback > > traffic, the > > drop point occurs when you exceed the size of the > netisr's queue for > > IP, so > > you might try bumping that from the default to > something much larger. > > My messages are approx. 100 +/- 10 bytes. No practical way > they will > even span multiple mbufs. TCP_NODELAY is on. > > > No. x++ is massively slow if executed in parallel > across many cores on > > a variable in a single cache line. See my recent > commit to kern_tc.c > > for an example: the updating of trivial statistics for > the kernel time > > calls reduced 30m syscalls/second to 3m > syscalls/second due to heavy > > contention on the cache line holding the statistic. > One of my goals for > > I don't get it: > http://svn.freebsd.org/viewvc/base/stable/7/sys/kern/kern_tc.c?r1=189891&r2=189890&pathrev=189891 > > you replaced x++ with no-ops if TC_COUNTER is defined? > Aren't the > timecounters actually needed somewhere? > > > 8.0 is to fix this problem for IP and TCP layers, and > ideally also ifnet > > but we'll see. We should be maintaining those > stats per-CPU and then > > aggregating to report them to userspace. This is what > we already do for > > a number of system stats -- UMA and kernel malloc, > syscall and trap > > counters, etc. > > How magic is this? Is it just a matter of declaring > mystatarray[NCPU] > and updating mystat[current_cpu] or (probably), the spacing > between > array elements should be magically fixed so two elements > don't share a > cache line? > > >>> - Use cpuset to pin ithreads, the netisr, and > whatever else, to specific > >>> cores > >>> so that they don't migrate, and if your > system uses HTT, experiment > >>> with > >>> pinning the ithread and the netisr on > different threads on the same > >>> core, or > >>> at least, different cores on the same die. > >> > >> I'm using em hardware; I still think > there's a possibility I'm > >> fighting the driver in some cases but this has > priority #2. > > > > Have you tried LOCK_PROFILING? It would quickly tell > you if driver > > locks were a source of significant contention. It > works quite well... > > I don't think I'm fighting against locking > artifacts, it looks more like > some kind of overly smart hardware thing, like interrupt > moderation (but > not exactly interrupt moderation since the number of IRQs/s > remains > approx. the same). > > >>> - If your card supports RSS, pass the flowid > up the stack in the mbuf > >>> packet > >>> header flowid field, and use that instead of > the hash for work > >>> placement. > >> > >> Don't know about em. Don't really want to > touch it if I don't have to :) > > > > if_em doesn't support it, but if_igb does. If > this saves you a minimum > > of one and possibly two cache misses per packet, it > could be a huge > > performance improvement. > There is no advantage to using if_igb. While the cards support more features, the driver in FreeBSD really barely functions. There's also no multiqueue support. Don't waste your money on a card. Barney