From owner-freebsd-performance@FreeBSD.ORG Sun May 9 14:10:11 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 579A2106566B for ; Sun, 9 May 2010 14:10:11 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63906.mail.re1.yahoo.com (web63906.mail.re1.yahoo.com [69.147.97.121]) by mx1.freebsd.org (Postfix) with SMTP id F0EC18FC16 for ; Sun, 9 May 2010 14:10:10 +0000 (UTC) Received: (qmail 90163 invoked by uid 60001); 9 May 2010 13:43:29 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1273412609; bh=TYYeKxOng2diqDzCLWpmZZog1vV0TTsJ9TGPeJZppVs=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=cOcBIVDVJqhIZLdBugfi4oqFyOmCqb1bnGq/NK5cGmwp7AVqx78z/IAhQZ+ieMduaKvYsKTtli2FkcHhP0k91IaT6fQksNidl8dsG/pSrL/3c5QLKd9URruXyPvIeyEkpzqAmx6sfNYi8IyX+WisjQXYTHQGTxxU7maBDr5QVJ4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type; b=WqduifL1dxU8x4rSb6wLzQY2FgxRs/GrviiGy5xQQIQDT3yoYC4o2sbYXeHAmLeEpnQrn0Ka/005WxWPNsUi1J1/LFbQs7JpqRhYjgy1DkZPNUlIuFnD/rER665pzCVg08rDmOBnSlIJXxMHHk2KmmiqT5bid5GYFCoQUJolUME=; Message-ID: <473112.87657.qm@web63906.mail.re1.yahoo.com> X-YMail-OSG: BbyP8E4VM1kdICaQHL3giW6jTy2qMbQt4GbBg6uRXrhi3Al Uiu.0PoCHH6fJoinWEBSRBPRfLdwM6gtWdsLymuaCxqP21RJbOAikgKVdxdZ yWDPCrNLzronaRijo9qvuMW02WYoAZJhzEZrmleLflxJOaYczo9o.QzD9PL3 xtNRkcY50Zft9EEY55fW4h0E6Ripi6pylz.1ipVZrKosR4gqx1oPL_uLiLWo P_HtSGmmJRyEhL0MegjJ8fvsfQG5fhU6nB.vh_8p1yQvfoiEs6I.EmTgmLx4 Ij6E- Received: from [98.203.21.152] by web63906.mail.re1.yahoo.com via HTTP; Sun, 09 May 2010 06:43:29 PDT X-Mailer: YahooMailClassic/10.1.11 YahooMailWebService/0.8.103.269680 Date: Sun, 9 May 2010 06:43:29 -0700 (PDT) From: Barney Cordoba To: Vincent Hoffman , Murat Balaban In-Reply-To: <1273323582.3304.31.camel@efe> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailman-Approved-At: Sun, 09 May 2010 14:15:40 +0000 Cc: freebsd-net@freebsd.org, freebsd-performance@freebsd.org, grarpamp Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 May 2010 14:10:11 -0000 --- On Sat, 5/8/10, Murat Balaban wrote: > From: Murat Balaban > Subject: Re: Intel 10Gb > To: "Vincent Hoffman" > Cc: freebsd-net@freebsd.org, freebsd-performance@freebsd.org, "grarpamp" > Date: Saturday, May 8, 2010, 8:59 AM > > Much of the FreeBSD networking stack has been made parallel > in order to > cope with high packet rates at 10 Gig/sec operation. > > I've seen good numbers (near 10 Gig) in my tests involving > TCP/UDP > send/receive. (latest Intel driver). > > As far as BPF is concerned, above statement does not hold > true, > since there is some work that needs to be done here in > terms > of BPF locking and parallelism. My tests show that there > is a high lock contention around "bpf interface lock", > resulting > in input errors at high packet rates and with many bpf > devices. > > I belive GSoC 2010 project, Multiqueue BPF, is a milestone > for this: > http://www.freebsd.org/projects/ideas/ideas.html#p-multiqbpf > > I'm also working on this problem myself and will post a > diff whenever > I have something usable. > > > -- > Murat > http://www.enderunix.org/murat/ > > > > On Sat, 2010-05-08 at 10:01 +0100, Vincent Hoffman > > wrote: > > Looks a little like > > http://lists.freebsd.org/pipermail/svn-src-all/2010-May/023679.html > > but for intel. cool. > > > > Vince > > On 07/05/2010 23:01, grarpamp wrote: > > > Just wondering in general these days how close > FreeBSD is to > > > full 10Gb rates at various packet sizes from > minimum ethernet > > > frame to max jumbo 65k++. For things like BPF, > ipfw/pf, routing, > > > switching, etc. > > > http://www.ntop.org/blog/?p=86 > > > _______________________________________________ Blah, Blah, Blah. Let's see some real numbers on real networks under real loads. Until then, you've got nothing. BC From owner-freebsd-performance@FreeBSD.ORG Sun May 9 17:12:27 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E8B98106564A; Sun, 9 May 2010 17:12:27 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-ww0-f54.google.com (mail-ww0-f54.google.com [74.125.82.54]) by mx1.freebsd.org (Postfix) with ESMTP id 493AC8FC13; Sun, 9 May 2010 17:12:26 +0000 (UTC) Received: by wwd20 with SMTP id 20so641885wwd.13 for ; Sun, 09 May 2010 10:12:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=OK4RJYhBGfSrCwbocgcSFmwMkxfVYKRrfyPrwjNPYW8=; b=P5O2lGO0gxXKyc3KfwpQudgPDXcinGtZKPWLVdWh9a/Hz3TtUv9vcgsyJZ1Y+5+7P4 XAO0RWlPmD0XiEoqkZMyMRcdIMqUzMkWnCjjhWm7Cp0URbG/wI6iTIoe2wGFgJ0uH13U aINQ4AKoonYSlc7Jgvnk2Xm8DCOWEyrtBsttY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=BbvfOvAycPJ1XspT8mFKlpN7qeR6hoQ1cMe5YJFQSS8hozmoNT2Dg9YGxbZ2LbTvGY JAd6s1bbSzxJvc0lXpixZ6sfgy95Pa8ne/zWMR2+aKxA8j0N4Wz0EfnGR96xn5M6TlQz 9rS20HY17n/Pa8AlUGZxgCOR2Dxh9B2NyK9KA= MIME-Version: 1.0 Received: by 10.216.87.68 with SMTP id x46mr1709896wee.145.1273425145932; Sun, 09 May 2010 10:12:25 -0700 (PDT) Received: by 10.216.29.129 with HTTP; Sun, 9 May 2010 10:12:25 -0700 (PDT) In-Reply-To: <473112.87657.qm@web63906.mail.re1.yahoo.com> References: <1273323582.3304.31.camel@efe> <473112.87657.qm@web63906.mail.re1.yahoo.com> Date: Sun, 9 May 2010 10:12:25 -0700 Message-ID: From: Jack Vogel To: Barney Cordoba Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, grarpamp , Vincent Hoffman Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 May 2010 17:12:28 -0000 On Sun, May 9, 2010 at 6:43 AM, Barney Cordoba wrote: > > > --- On Sat, 5/8/10, Murat Balaban wrote: > > > From: Murat Balaban > > Subject: Re: Intel 10Gb > > To: "Vincent Hoffman" > > Cc: freebsd-net@freebsd.org, freebsd-performance@freebsd.org, "grarpamp" > > > Date: Saturday, May 8, 2010, 8:59 AM > > > > Much of the FreeBSD networking stack has been made parallel > > in order to > > cope with high packet rates at 10 Gig/sec operation. > > > > I've seen good numbers (near 10 Gig) in my tests involving > > TCP/UDP > > send/receive. (latest Intel driver). > > > > As far as BPF is concerned, above statement does not hold > > true, > > since there is some work that needs to be done here in > > terms > > of BPF locking and parallelism. My tests show that there > > is a high lock contention around "bpf interface lock", > > resulting > > in input errors at high packet rates and with many bpf > > devices. > > > > I belive GSoC 2010 project, Multiqueue BPF, is a milestone > > for this: > > http://www.freebsd.org/projects/ideas/ideas.html#p-multiqbpf > > > > I'm also working on this problem myself and will post a > > diff whenever > > I have something usable. > > > > > > -- > > Murat > > http://www.enderunix.org/murat/ > > > > > > > > On Sat, 2010-05-08 at 10:01 +0100, Vincent Hoffman > > > > wrote: > > > Looks a little like > > > http://lists.freebsd.org/pipermail/svn-src-all/2010-May/023679.html > > > but for intel. cool. > > > > > > Vince > > > On 07/05/2010 23:01, grarpamp wrote: > > > > Just wondering in general these days how close > > FreeBSD is to > > > > full 10Gb rates at various packet sizes from > > minimum ethernet > > > > frame to max jumbo 65k++. For things like BPF, > > ipfw/pf, routing, > > > > switching, etc. > > > > http://www.ntop.org/blog/?p=86 > > > > _______________________________________________ > > Blah, Blah, Blah. Let's see some real numbers on real networks under > real loads. Until then, you've got nothing. > > BC > > > Blah blah blah, you're one to talk, do you EVER do anything but criticize others? Nothing is right. Jack From owner-freebsd-performance@FreeBSD.ORG Tue May 11 12:59:51 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F17251065679 for ; Tue, 11 May 2010 12:59:51 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63906.mail.re1.yahoo.com (web63906.mail.re1.yahoo.com [69.147.97.121]) by mx1.freebsd.org (Postfix) with SMTP id AB6238FC13 for ; Tue, 11 May 2010 12:59:51 +0000 (UTC) Received: (qmail 10635 invoked by uid 60001); 11 May 2010 12:59:51 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1273582790; bh=RMGwqI1OtH2/hJx297UYwmSUI8J/5Qk3K+cdp2pSuHg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=m703tIYGYMBy+1XYbmHuowAI6iosVut4dDjv5zgyLtC6R9laW7ZlznsYhEH9lQOxuFt3SeiV91E8wL47uDfQlcnG2PvICAum0OYtxQVeNOzu7yIP38H9iEQTdsGdjeINuAPvfaLOh5MXidquIeitAa3jXnLVCCho1LwE/qX6rg4= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=as/HoNaL7r6fZuAAWDGttECM0J8Jy7d5L0/z7iSGP1m3N9tSRuVWwwUlmWGO1yAYWQ4i9oA/HOcQah/Forgal+H57s8ksiNvDi7XdNXpDqpwJ5PFdMA4BbtZwjitSLDJZwzO1hf7lMlD6AoN1ij+gFFSofNZWRkEZYX8vRpo5HA=; Message-ID: <980105.10457.qm@web63906.mail.re1.yahoo.com> X-YMail-OSG: WFbHVxcVM1legPPiOyyO82PUKAyKxVqDgDMjkTJpo5pQOhe SQ9kdCEFq4ekbLUc9.1_qVA04i3E0LzMcI10KZLSTjkdaVtOEAThCzU4cyyI S0rPn9lqzslIVXH72JF9oMuMiDKWnqjLDOMulm2EO2POCvV4oRU1RyxN5yly k8o_Qb648QqXskAG7F8Z3iE4hG2.d4NugPg3JZ202rGRB82t.aEFUZAnXU6T oOC3wsKbj9I3cTMSMa2pNQr0LAidQW4JUblbJ7cGrbnD8pscVl_1q9khtHO9 dLFxDmVsoXdUt2fv505aGcW2u1OJCJRE- Received: from [98.203.21.152] by web63906.mail.re1.yahoo.com via HTTP; Tue, 11 May 2010 05:59:50 PDT X-Mailer: YahooMailClassic/10.1.11 YahooMailWebService/0.8.103.269680 Date: Tue, 11 May 2010 05:59:50 -0700 (PDT) From: Barney Cordoba To: Jack Vogel In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Tue, 11 May 2010 13:10:46 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, grarpamp , Vincent Hoffman Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 May 2010 12:59:52 -0000 =0A=0A--- On Sun, 5/9/10, Jack Vogel wrote:=0A=0A> From= : Jack Vogel =0A> Subject: Re: Intel 10Gb=0A> To: "Barne= y Cordoba" =0A> Cc: "Murat Balaban" , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, "grarpa= mp" , "Vincent Hoffman" =0A> Date: = Sunday, May 9, 2010, 1:12 PM=0A> On Sun, May 9, 2010 at 6:43 AM,=0A> Barney= Cordoba wrote:=0A> =0A> >=0A> >=0A> > --- On Sat= , 5/8/10, Murat Balaban =0A> wrote:=0A> >=0A> > > From= : Murat Balaban =0A> > > Subject: Re: Intel 10Gb=0A> >= > To: "Vincent Hoffman" =0A> > > Cc: freebsd-net@freeb= sd.org,=0A> freebsd-performance@freebsd.org,=0A> "grarpamp"=0A> > =0A> > > Date: Saturday, May 8, 2010, 8:59 AM=0A> > >=0A> > > Mu= ch of the FreeBSD networking stack has been=0A> made parallel=0A> > > in or= der to=0A> > > cope with high packet rates at 10 Gig/sec=0A> operation.=0A>= > >=0A> > > I've seen good numbers (near 10 Gig) in my tests=0A> involving= =0A> > > TCP/UDP=0A> > > send/receive. (latest Intel driver).=0A> > >=0A> >= > As far as BPF is concerned, above statement does=0A> not hold=0A> > > tr= ue,=0A> > > since there is some work that needs to be done=0A> here in=0A> = > > terms=0A> > > of BPF locking and parallelism. My tests show=0A> that th= ere=0A> > > is a high lock contention around "bpf interface=0A> lock",=0A> = > > resulting=0A> > > in input errors at high packet rates and with=0A> man= y bpf=0A> > > devices.=0A> > >=0A> > > I belive GSoC 2010 project, Multique= ue BPF, is a=0A> milestone=0A> > > for this:=0A> > > http://www.freebsd.org= /projects/ideas/ideas.html#p-multiqbpf=0A> > >=0A> > > I'm also working on = this problem myself and will=0A> post a=0A> > > diff whenever=0A> > > I hav= e something usable.=0A> > >=0A> > >=0A> > > --=0A> > > Murat=0A> > > http:/= /www.enderunix.org/murat/=0A> > >=0A> > >=0A> > >=0A> > > On Sat, 2010-05-0= 8 at 10:01 +0100, Vincent=0A> Hoffman=0A> > >=0A> > >=A0 wrote:=0A> > > > L= ooks a little like=0A> > > > http://lists.freebsd.org/pipermail/svn-src-all= /2010-May/023679.html=0A> > > > but for intel. cool.=0A> > > >=0A> > > > Vi= nce=0A> > > > On 07/05/2010 23:01, grarpamp wrote:=0A> > > > > Just wonderi= ng in general these days=0A> how close=0A> > > FreeBSD is to=0A> > > > > fu= ll 10Gb rates at various packet sizes=0A> from=0A> > > minimum ethernet=0A>= > > > > frame to max jumbo 65k++. For things=0A> like BPF,=0A> > > ipfw/pf= , routing,=0A> > > > > switching, etc.=0A> > > > > http://www.ntop.org/blog= /?p=3D86=0A> > > > >=0A> _______________________________________________=0A= > >=0A> > Blah, Blah, Blah. Let's see some real numbers on real=0A> network= s under=0A> > real loads. Until then, you've got nothing.=0A> >=0A> > BC=0A= > >=0A> >=0A> >=0A> Blah blah blah, you're one to talk, do you EVER do anyt= hing=0A> but=0A> criticize others? Nothing is right.=0A> =0A> Jack=0A=0ATho= se who expect pats on the back for not getting the job done have no=0Achanc= e of succeeding. Without criticism you only have delusion.=0A=0AI'm not cri= ticizing the work, even though its worthy of criticism. I'm =0Acriticizing = touting successes without any real-world evidence to support=0Athe claim.= =0A=0ABC=0A=0A=0A From owner-freebsd-performance@FreeBSD.ORG Tue May 11 13:51:10 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id F4220106564A; Tue, 11 May 2010 13:51:09 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id B66AA8FC19; Tue, 11 May 2010 13:51:09 +0000 (UTC) Received: from grapeape2.cs.duke.edu (grapeape2.cs.duke.edu [152.3.140.76]) by duke.cs.duke.edu (8.14.2/8.14.2) with ESMTP id o4BDp5kb008584 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 11 May 2010 09:51:05 -0400 (EDT) X-DKIM: Sendmail DKIM Filter v2.8.3 duke.cs.duke.edu o4BDp5kb008584 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail; t=1273585865; bh=sJ4HXaoJGcjlwosClQLRaemaLSaYjhPgnfgE2LGqOrw=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:In-Reply-To; b=dtnsbOENfuHYNaPeHg1mDSYHIrdCEn4Ttv+4rx8aqMWrH+JJLnfY2S/8QKJ9onsap GjqGz/cHztHdxjBgeMBahxB365e75b22fh4HH2B6q8oyjW4SOejRsFaOpu12exM+O9 wso4eQP0nRwhWIBjvVtFQGi2QVjLwaARmNi2CgnY= Received: (from gallatin@localhost) by grapeape2.cs.duke.edu (8.12.10/8.12.10/Submit) id o4BDp3nH029426; Tue, 11 May 2010 09:51:03 -0400 (EDT) Date: Tue, 11 May 2010 09:51:03 -0400 From: Andrew Gallatin To: Murat Balaban Message-ID: <20100511135103.GA29403@grapeape2.cs.duke.edu> References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1273323582.3304.31.camel@efe> X-Operating-System: SunOS 5.10 on an sun4u User-Agent: Mutt/1.5.13 (2006-08-11) Cc: freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 May 2010 13:51:10 -0000 Murat Balaban [murat@enderunix.org] wrote: > > Much of the FreeBSD networking stack has been made parallel in order to > cope with high packet rates at 10 Gig/sec operation. > > I've seen good numbers (near 10 Gig) in my tests involving TCP/UDP > send/receive. (latest Intel driver). > > As far as BPF is concerned, above statement does not hold true, > since there is some work that needs to be done here in terms > of BPF locking and parallelism. My tests show that there > is a high lock contention around "bpf interface lock", resulting > in input errors at high packet rates and with many bpf devices. If you're interested in 10GbE packet sniffing at line rate on the cheap, have a look at the Myri10GE "sniffer" interface. This is a special software package that takes a normal mxge(4) NIC, and replaces the driver/firmware with a "myri_snf" driver/firmware which is optimized for packet sniffing. Using this driver/firmware combo, we can receive minimal packets at line rate (14.8Mpps) to userspace. You can even access this using a libpcap interface. The trick is that the fast paths are OS-bypass, and don't suffer from OS overheads, like lock contention. See http://www.myri.com/scs/SNF/doc/index.html for details. Best Regards, Drew From owner-freebsd-performance@FreeBSD.ORG Tue May 11 14:17:53 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 733161065670 for ; Tue, 11 May 2010 14:17:53 +0000 (UTC) (envelope-from patrick@klos.com) Received: from sage.klos.com (sage.klos.com [192.80.49.1]) by mx1.freebsd.org (Postfix) with ESMTP id 4AF598FC0A for ; Tue, 11 May 2010 14:17:53 +0000 (UTC) Received: from [192.168.2.131] (c-98-217-139-25.hsd1.nh.comcast.net [98.217.139.25]) (authenticated bits=0) by sage.klos.com (8.14.4/8.14.3) with ESMTP id o4BDwm0U024110 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 11 May 2010 13:58:49 GMT Message-ID: <4BE9628E.9030708@klos.com> Date: Tue, 11 May 2010 09:58:38 -0400 From: Patrick Klos User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: freebsd-performance@freebsd.org X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.5 (sage.klos.com [192.80.49.1]); Tue, 11 May 2010 13:58:50 +0000 (UTC) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Intel 82599 with non-Intel SFP+'s? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 May 2010 14:17:53 -0000 Hello, I am building a packet capture box based on the Intel 82599 controller in a FreeBSD box. I purchased the Intel Ethernet X520 cards and Finisar SFP+'s, but apparently the 82599 does not support non-Intel SFP+'s? The code in the driver checks for the SFP vendor if a bit in the device capabilities is not set: ixgbe_get_device_caps(hw, &enforce_sfp); if (!(enforce_sfp & IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP)) { // check if the PHY is Intel only } Any idea how to set the IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP bit in the hardware? Is it even settable? Why does the 82599 care? I can't find any reference to it in the 82599 datasheet. Thanks, Patrick Klos From owner-freebsd-performance@FreeBSD.ORG Tue May 11 17:29:47 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 969BB106566B for ; Tue, 11 May 2010 17:29:47 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-wy0-f182.google.com (mail-wy0-f182.google.com [74.125.82.182]) by mx1.freebsd.org (Postfix) with ESMTP id 28AD38FC15 for ; Tue, 11 May 2010 17:29:46 +0000 (UTC) Received: by wyg36 with SMTP id 36so524853wyg.13 for ; Tue, 11 May 2010 10:29:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=qfdj0eq+KNMYr5CzV5hD3TU3pmG+0an7FpBTRQXR00M=; b=bo1dXHFoBNQ66YGtYNxfZ5IBEOxkVXVbcvVr5i4epzTLFRLb9yrFhZzS0YIHu8Vfut Ary3xffruZTkVaaYtNArWS0k3Ev65eyUfj7lgwGNWUTVn9Y1QCZ3f61E8le/o2lqm//i uoUs7u1soKgIvVkDSOsvIJhZr0+6RijJgEeyw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=PYefOZX9auP44fM7ZZyxwr/ImHQCiCaDl4alocdx9d7e3DNLE9ZgcKNJpmjFbZBbQY HLB9ME8eXz8HZsBoErMNEk4o8nucsIlqy3LHLZVWtEQBRgCRs/ViP4o9kE0/C89FzRt+ SMs8Ri6eyA3qn01Ib+Sk2/Aw+zD7GY84febVA= MIME-Version: 1.0 Received: by 10.216.89.143 with SMTP id c15mr3743582wef.127.1273598986025; Tue, 11 May 2010 10:29:46 -0700 (PDT) Received: by 10.216.29.129 with HTTP; Tue, 11 May 2010 10:29:40 -0700 (PDT) In-Reply-To: <4BE9628E.9030708@klos.com> References: <4BE9628E.9030708@klos.com> Date: Tue, 11 May 2010 10:29:40 -0700 Message-ID: From: Jack Vogel To: Patrick Klos Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-performance@freebsd.org Subject: Re: Intel 82599 with non-Intel SFP+'s? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 May 2010 17:29:47 -0000 Intel can only support a finite set of hardware, it is NOT a matter of it being some "Intel" part, its a matter of some SFPs that are out there DO NOT WORK, so engineering here was able to delimit, validate, and thus certify a specific set of SFPs, the software check is there to make sure that you use something we can know works. Jack On Tue, May 11, 2010 at 6:58 AM, Patrick Klos wrote: > Hello, > > I am building a packet capture box based on the Intel 82599 controller in a > FreeBSD box. I purchased the Intel Ethernet X520 cards and Finisar SFP+'s, > but apparently the 82599 does not support non-Intel SFP+'s? The code in the > driver checks for the SFP vendor if a bit in the device capabilities is not > set: > > ixgbe_get_device_caps(hw, &enforce_sfp); > if (!(enforce_sfp & IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP)) { > // check if the PHY is Intel only > } > > Any idea how to set the IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP bit in the > hardware? Is it even settable? Why does the 82599 care? I can't find any > reference to it in the 82599 datasheet. > > Thanks, > > Patrick Klos > > > > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to " > freebsd-performance-unsubscribe@freebsd.org" > From owner-freebsd-performance@FreeBSD.ORG Tue May 11 19:02:09 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C6672106566B for ; Tue, 11 May 2010 19:02:09 +0000 (UTC) (envelope-from patrick@klos.com) Received: from sage.klos.com (sage.klos.com [IPv6:2001:470:a068:1:21b:21ff:fe06:b8ac]) by mx1.freebsd.org (Postfix) with ESMTP id A6A1F8FC15 for ; Tue, 11 May 2010 19:02:09 +0000 (UTC) Received: from [192.168.2.131] (c-98-217-139-25.hsd1.nh.comcast.net [98.217.139.25]) (authenticated bits=0) by sage.klos.com (8.14.4/8.14.3) with ESMTP id o4BJ25UN006651 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 11 May 2010 19:02:06 GMT Message-ID: <4BE9A9A7.4000004@klos.com> Date: Tue, 11 May 2010 15:01:59 -0400 From: Patrick Klos User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Jack Vogel References: <4BE9628E.9030708@klos.com> In-Reply-To: X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.5 (sage.klos.com [192.80.49.1]); Tue, 11 May 2010 19:02:07 +0000 (UTC) Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: freebsd-performance@freebsd.org Subject: Re: Intel 82599 with non-Intel SFP+'s? X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 11 May 2010 19:02:09 -0000 Jack Vogel wrote: > Intel can only support a finite set of hardware, it is NOT a matter of > it being some "Intel" part, its > a matter of some SFPs that are out there DO NOT WORK, so engineering > here was able to > delimit, validate, and thus certify a specific set of SFPs, the > software check is there to make > sure that you use something we can know works. Thanks for the reply Jack, The code for the 82599 is specific in that it checks for (and allows to be used) ONLY Intel SFP+'s. The 82598 is a little more flexible in that it supports 4 vendors (including Finisar). Any idea why 4 SFP+ vendors are supported on the 82598, but not the 82599? Also, the very existance of a definition for IXGBE_DEVICE_CAPS_ALLOW_ANY_SFP (and code to check it) implies the ability of the 82599 to be able to be configured to support "any" SFP, although I can't find any reference to the bit or capability in the 82599 datasheet? Is that a possible "future" feature? Lastly (for now), can support for an additional SFP+ (like the Finisar) be */added /*to the 82599 driver or is there something that would prevent that? Thanks again! Patrick From owner-freebsd-performance@FreeBSD.ORG Fri May 14 14:07:54 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 3A107106564A; Fri, 14 May 2010 14:07:54 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id ED92F8FC0A; Fri, 14 May 2010 14:07:53 +0000 (UTC) Received: from [172.31.193.10] (rrcs-98-101-145-84.midsouth.biz.rr.com [98.101.145.84]) (authenticated bits=0) by duke.cs.duke.edu (8.14.2/8.14.2) with ESMTP id o4EE7jas024681 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 14 May 2010 10:07:45 -0400 (EDT) X-DKIM: Sendmail DKIM Filter v2.8.3 duke.cs.duke.edu o4EE7jas024681 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail; t=1273846067; bh=kXi5tWSmoSa8QcZzXchvHOCewVYqctK7J0WUDkPf6hc=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=sL95rcXK77xDld358fA5oQ7Z5vCe/Xag1v6oR05mOPKvVieK20O94ANwkhWcBvdiH Z3oijdWQeM7pIzFQinoS1Q1NAbxiYxPBqDBuHWBy66KhsC6yciiXA8zDGU6MacKsIv v5H6RMOZNH8N9UtNq7vIab5fbEp4moRxeV0SHnz8= Message-ID: <4BED5929.5020302@cs.duke.edu> Date: Fri, 14 May 2010 10:07:37 -0400 From: Andrew Gallatin User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: Alexander Sack References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 14:07:54 -0000 Alexander Sack wrote: <...> >> Using this driver/firmware combo, we can receive minimal packets at >> line rate (14.8Mpps) to userspace. You can even access this using a >> libpcap interface. The trick is that the fast paths are OS-bypass, >> and don't suffer from OS overheads, like lock contention. See >> http://www.myri.com/scs/SNF/doc/index.html for details. > > But your timestamps will be atrocious at 10G speeds. Myricom doesn't > timestamp packets AFAIK. If you want reliable timestamps you need to > look at companies like Endace, Napatech, etc. I see your old help ticket in our system. Yes, our timestamping is not as good as a dedicated capture card with a GPS reference, but it is good enough for most people. > PS I am not sure but Intel also supports writing packets directly in > cache (yet I thought the 82599 driver actually does a prefetch anyway > which had me confused on why that helps) You're talking about DCA. We support DCA as well (and I suspect some other 10G NICs do to). There are a few barriers to using DCA on FreeBSD, not least of which is that FreeBSD doesn't currently have the infrastructure to support it (no IOATDMA or DCA drivers). DCA is also problematic because support from system/motherboard vendors is very spotty. The vendor must provide the correct tag table in BIOS such that the tags match the CPU/core numbering in the system. Many motherboard vendors don't bother with this, and you cannot enable DCA on a lot of systems, even though the underlying chipset supports DCA. I've done hacks to force-enable it in the past, with mixed results. The problem is that DCA depends on having the correct tag table, so that packets can be prefetched into the correct CPU's cache. If the tag table is incorrect, DCA is a big pessimization, because it blows the cache in other CPUs. That said, I would *love* it if FreeBSD grew ioatdma/dca support. Jack, does Intel have any interest in porting DCA support to FreeBSD? Drew From owner-freebsd-performance@FreeBSD.ORG Fri May 14 15:41:30 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DEFB81065680; Fri, 14 May 2010 15:41:30 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id AE9D68FC15; Fri, 14 May 2010 15:41:30 +0000 (UTC) Received: from [172.31.193.10] (rrcs-98-101-145-84.midsouth.biz.rr.com [98.101.145.84]) (authenticated bits=0) by duke.cs.duke.edu (8.14.2/8.14.2) with ESMTP id o4EFfMw4029229 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 14 May 2010 11:41:22 -0400 (EDT) X-DKIM: Sendmail DKIM Filter v2.8.3 duke.cs.duke.edu o4EFfMw4029229 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail; t=1273851682; bh=UhxsDdqkR9mVhD1eRi0qYT3pEt2QSIHMcGElkeZQMRo=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=ItOz9nPi0INCIFRVqrgmAUcUDYQkel7bM1RNBGJH+axu/6wlcIV08ynDblvGwHbk5 +E1gvUIpl54q4TCEDfQDQFe04cgmoeScbQwYwnhMK4HCZ+mnvaYWCRQQ/v7pEXM7bG H9HE5W3zhXv7aTDwD8mYIclfp4olR/IjWbffb/Y8= Message-ID: <4BED6F1B.7070602@cs.duke.edu> Date: Fri, 14 May 2010 11:41:15 -0400 From: Andrew Gallatin User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: Alexander Sack References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 15:41:31 -0000 Alexander Sack wrote: > On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin wrote: >> Alexander Sack wrote: >> <...> >>>> Using this driver/firmware combo, we can receive minimal packets at >>>> line rate (14.8Mpps) to userspace. You can even access this using a >>>> libpcap interface. The trick is that the fast paths are OS-bypass, >>>> and don't suffer from OS overheads, like lock contention. See >>>> http://www.myri.com/scs/SNF/doc/index.html for details. >>> But your timestamps will be atrocious at 10G speeds. Myricom doesn't >>> timestamp packets AFAIK. If you want reliable timestamps you need to >>> look at companies like Endace, Napatech, etc. >> I see your old help ticket in our system. Yes, our timestamping >> is not as good as a dedicated capture card with a GPS reference, >> but it is good enough for most people. > > I was told btw that it doesn't timestamp at ALL. I am assuming NOW > that is incorrect. I think you might have misunderstood how we do timestamping. I definately don't understand it, and I work there ;) I do know that there is NIC component of it (eg, it is not 100% done in the host). I also realize that it is not is good as something that is 1PPS GPS based. > Define *most* people. I may have a skewed view of the market, but it seems like some people care deeply about accurate timestamps, and others (mostly doing deep packet inspection) care only within a few milliseconds, or even seconds. > I am not knocking the Myricom card. In fact I so wish you guys would > just add the ability to latch to a 1PPS for timestamping and it would > be perfect. > > We use I think an older version of the card internally for replay. > Its a great multi-purpose card. > > However with IPG at 10G in the nanoseconds, anyone trying to do OWDs > or RTT will find it difficult compared to an Endace or Napatech card. > > Btw, I was referring to bpf(4) specifically, so please don't take my > comments as a knock against it. > >>> PS I am not sure but Intel also supports writing packets directly in >>> cache (yet I thought the 82599 driver actually does a prefetch anyway >>> which had me confused on why that helps) >> You're talking about DCA. We support DCA as well (and I suspect some >> other 10G NICs do to). There are a few barriers to using DCA on >> FreeBSD, not least of which is that FreeBSD doesn't currently have the >> infrastructure to support it (no IOATDMA or DCA drivers). > > Right. > >> DCA is also problematic because support from system/motherboard >> vendors is very spotty. The vendor must provide the correct tag table >> in BIOS such that the tags match the CPU/core numbering in the system. >> Many motherboard vendors don't bother with this, and you cannot enable >> DCA on a lot of systems, even though the underlying chipset supports >> DCA. I've done hacks to force-enable it in the past, with mixed >> results. The problem is that DCA depends on having the correct tag >> table, so that packets can be prefetched into the correct CPU's cache. >> If the tag table is incorrect, DCA is a big pessimization, because it >> blows the cache in other CPUs. > > Right. > >> That said, I would *love* it if FreeBSD grew ioatdma/dca support. >> Jack, does Intel have any interest in porting DCA support to FreeBSD? > > Question for Jack or Drew, what DOES FreeBSD have to do to support > DCA? I thought DCA was something you just enable on the NIC chipset > and if the system is IOATDMA aware, it just works. Is that not right > (assuming cache tags are correct and accessible)? i.e. I thought this > was hardware black magic than anything specific the OS has to do. IOATDMA and DCA are sort of unfairly joined for two reasons: The DCA control stuff is implemented as part of the IOATDMA PCIe device, and IOATDMA is a great usage model for DCA, since you'd want the DMAs that it does to be prefetched. To use DCA you need: - A DCA driver to talk to the IOATDMA/DCA pcie device, and obtain the tag table - An interface that a client device (eg, NIC driver) can use to obtain either the tag table, or at least the correct tag for the CPU that the interrupt handler is bound to. The basic support in a NIC driver boils down to something like: nic_interrupt_handler() { if (sc->dca.enabled && (curcpu != sc->dca.last_cpu)) { sc->dca.last_cpu = curcpu; tag = dca_get_tag(curcpu); WRITE_REG(sc, DCA_TAG, tag); } } Drew From owner-freebsd-performance@FreeBSD.ORG Fri May 14 16:22:30 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 78AE01065673; Fri, 14 May 2010 16:22:30 +0000 (UTC) (envelope-from gallatin@cs.duke.edu) Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by mx1.freebsd.org (Postfix) with ESMTP id 3608A8FC14; Fri, 14 May 2010 16:22:30 +0000 (UTC) Received: from [172.31.193.10] (rrcs-98-101-145-84.midsouth.biz.rr.com [98.101.145.84]) (authenticated bits=0) by duke.cs.duke.edu (8.14.2/8.14.2) with ESMTP id o4EGMKsQ001920 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 14 May 2010 12:22:20 -0400 (EDT) X-DKIM: Sendmail DKIM Filter v2.8.3 duke.cs.duke.edu o4EGMKsQ001920 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=cs.duke.edu; s=mail; t=1273854144; bh=wr56XOlwsERw69EPXpJL+Sjo3Vw4rfhjkik0vEY2SV4=; h=Message-ID:Date:From:MIME-Version:To:CC:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=wBBZ09/PqOw/pz2+c+vAcbaDviFXC4fUKEDsSk27EF0yN6wi0G+ybNSHbXO86Kpkr 5C+jGxW9PdIBHr0k0YrZPrmPETMJkgVHmDrim15zKgFKBuKxIByB8nDnBcESseln8x 5gdeUL60JbpVqmVAit1RItnDKPdhDzXO80qARunc= Message-ID: <4BED78B5.8000906@cs.duke.edu> Date: Fri, 14 May 2010 12:22:13 -0400 From: Andrew Gallatin User-Agent: Thunderbird 2.0.0.23 (X11/20090817) MIME-Version: 1.0 To: Alexander Sack References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> <4BED6F1B.7070602@cs.duke.edu> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 16:22:30 -0000 Alexander Sack wrote: >> To use DCA you need: >> >> - A DCA driver to talk to the IOATDMA/DCA pcie device, and obtain the tag >> table >> - An interface that a client device (eg, NIC driver) can use to obtain >> either the tag table, or at least the correct tag for the CPU >> that the interrupt handler is bound to. The basic support in >> a NIC driver boils down to something like: >> >> nic_interrupt_handler() >> { >> if (sc->dca.enabled && (curcpu != sc->dca.last_cpu)) { >> sc->dca.last_cpu = curcpu; >> tag = dca_get_tag(curcpu); >> WRITE_REG(sc, DCA_TAG, tag); >> } >> } > > Drew, at least in the Intel documentation, it seems the NIC uses the > LAPIC id to tell the PCIe TLPs where to put inbound NIC I/O (in the > TLP the DCA info is stored) to the appropriate core's cache. i.e. the > heuristic you gave above is more granular than what I think Intel The pseudo-code above was intended to be the MSI-X interrupt handler for a single queue, not some dispatcher for multiple queues. Sorry that wasn't clear. So yes, the DCA tag value may be different per queue. > does. I could be wrong, maybe Jack can chime in and correct me. But > it seems with Intel chipsets it is a per queue parameter which allows > you to bind a core cache's to a queue via DCA. The added piece to > this for at least bpf(4) consumers is to have bpf(4) subscribe to > these queues AND to allow an interface for libpcap applications to > know where what queue is on what core and THEN bind to it. Yes, everything associated with a queue must be bound to the same core (or at least to cores which share a cache). Drew From owner-freebsd-performance@FreeBSD.ORG Fri May 14 14:01:17 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 46EE6106564A; Fri, 14 May 2010 14:01:17 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 270868FC14; Fri, 14 May 2010 14:01:09 +0000 (UTC) Received: by gwj16 with SMTP id 16so1525377gwj.13 for ; Fri, 14 May 2010 07:01:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=0yvqTnCrJxEGBPHRrjRmWnC7w5/rzJyxOyM4xZjTS5g=; b=ORbeAAhCWsklLBcMkpUNRMVkvpOF/U+evl/7ZQxjddAf6p76QZWpcspGvMex/VGFqE /F072nqiT1rBE/XQhBlxoMEFrZNp3LosqXJhBWbbd9aGoHp65kiGGbpWpfReQfH/oLwj K7J9hhhuYrv3S2sNYLBRwD6A438cg9ovxaMqU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=hI7c9yqdVm2WBvhGpDYhqWPCIz7Syo/XhmD7OsOBgao8wxUKqv8ZA0XW/QchHmUGal HIQG+mfhJoIfwamSgW3uJYMtIBWyhln6zP/pSjDsE9dcaxT1L8j3jBlAOpGe4HMVnfYR tWZ74aRZKIYsk1Jcag6kOGdm05eP9ZBNpGEr4= MIME-Version: 1.0 Received: by 10.101.196.30 with SMTP id y30mr1087611anp.251.1273843928706; Fri, 14 May 2010 06:32:08 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Fri, 14 May 2010 06:32:08 -0700 (PDT) In-Reply-To: <20100511135103.GA29403@grapeape2.cs.duke.edu> References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> Date: Fri, 14 May 2010 09:32:08 -0400 Message-ID: From: Alexander Sack To: Andrew Gallatin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Fri, 14 May 2010 16:56:50 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 14:01:17 -0000 On Tue, May 11, 2010 at 9:51 AM, Andrew Gallatin wro= te: > Murat Balaban [murat@enderunix.org] wrote: >> >> Much of the FreeBSD networking stack has been made parallel in order to >> cope with high packet rates at 10 Gig/sec operation. >> >> I've seen good numbers (near 10 Gig) in my tests involving TCP/UDP >> send/receive. (latest Intel driver). >> >> As far as BPF is concerned, above statement does not hold true, >> since there is some work that needs to be done here in terms >> of BPF locking and parallelism. My tests show that there >> is a high lock contention around "bpf interface lock", resulting >> in input errors at high packet rates and with many bpf devices. > > If you're interested in 10GbE packet sniffing at line rate on the > cheap, have a look at the Myri10GE "sniffer" interface. =A0This is a > special software package that takes a normal mxge(4) NIC, and replaces > the driver/firmware with a "myri_snf" driver/firmware which is > optimized for packet sniffing. > > Using this driver/firmware combo, we can receive minimal packets at > line rate (14.8Mpps) to userspace. =A0You can even access this using a > libpcap interface. =A0The trick is that the fast paths are OS-bypass, > and don't suffer from OS overheads, like lock contention. =A0See > http://www.myri.com/scs/SNF/doc/index.html for details. But your timestamps will be atrocious at 10G speeds. Myricom doesn't timestamp packets AFAIK. If you want reliable timestamps you need to look at companies like Endace, Napatech, etc. We do a lot of packet capture and work on bpf(4) all the time. My biggest concern for reliable 10G packet capture is timestamps. The call to microtime up in catchpacket() is not going to cut it (it barely cuts it for GIGE line rate speeds). I'd be interested in doing the multi-queue bpf(4) myself (perhaps I should ask? I don't know if non-summer-of-code folks are allowed?). I believe the goal is not so much throughput but cache affinity. It would be nice if say the listener application (libpcap) could bind itself to the same core that the driver's queue is receiving packets on so everything from catching to post-processing all work with a very warm cache (theoretically). I think that's the idea. It would also allow multiple applications to subscribe to potentially different queues that are doing some form of load balancing. Again, Intel's 82599 chipset supports flow based queues (albeit the size of the flow table is limited). Note, zero-copy bpf(4) is your friend in all use cases at 10G speeds! :) -aps PS I am not sure but Intel also supports writing packets directly in cache (yet I thought the 82599 driver actually does a prefetch anyway which had me confused on why that helps) From owner-freebsd-performance@FreeBSD.ORG Fri May 14 17:01:28 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0D26F106566C; Fri, 14 May 2010 17:01:28 +0000 (UTC) (envelope-from jfvogel@gmail.com) Received: from mail-ww0-f54.google.com (mail-ww0-f54.google.com [74.125.82.54]) by mx1.freebsd.org (Postfix) with ESMTP id 651218FC08; Fri, 14 May 2010 17:01:27 +0000 (UTC) Received: by wwb18 with SMTP id 18so452024wwb.13 for ; Fri, 14 May 2010 10:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=mb0FBfUT//flROo8fIA4o0tNFLt84XE7zXoiURSK37c=; b=isBroku26b4zfEFdLPuqLQodE0TCQQr32Ux9ys55068MWh7mdhHF2/xqB+vUNWvS+h s+8jHgh/w9D7yu+b43ugF8RhbI6tD+557VlLqgKEjmwe+hNhfrSeDaekXN7NnRWDIojo 8+IX1xs/C+7lKW6JB6bYZ20YOPzR7GksX+3yk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=poBupNReuh0QL1hyecyb2xPpM6nN+pkxbfko9NogzVKAvKGM7995bzm8X/C35jtFOt +0KTAkl96EJFayOjMhMTt1ywAoFPHgoFHFO7F2QBfHPPITQo50KZesPH/y5MdfOnKra6 vOUnXEcXvqkdyydkmLOIT7XAux2bh5+/geqlA= MIME-Version: 1.0 Received: by 10.216.88.211 with SMTP id a61mr491807wef.65.1273856486242; Fri, 14 May 2010 10:01:26 -0700 (PDT) Received: by 10.216.29.129 with HTTP; Fri, 14 May 2010 10:01:24 -0700 (PDT) In-Reply-To: References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> Date: Fri, 14 May 2010 10:01:24 -0700 Message-ID: From: Jack Vogel To: Alexander Sack Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Andrew Gallatin Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 17:01:28 -0000 On Fri, May 14, 2010 at 8:18 AM, Alexander Sack wrote: > On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin > wrote: > > Alexander Sack wrote: > > <...> > >>> Using this driver/firmware combo, we can receive minimal packets at > >>> line rate (14.8Mpps) to userspace. You can even access this using a > >>> libpcap interface. The trick is that the fast paths are OS-bypass, > >>> and don't suffer from OS overheads, like lock contention. See > >>> http://www.myri.com/scs/SNF/doc/index.html for details. > >> > >> But your timestamps will be atrocious at 10G speeds. Myricom doesn't > >> timestamp packets AFAIK. If you want reliable timestamps you need to > >> look at companies like Endace, Napatech, etc. > > > > I see your old help ticket in our system. Yes, our timestamping > > is not as good as a dedicated capture card with a GPS reference, > > but it is good enough for most people. > > I was told btw that it doesn't timestamp at ALL. I am assuming NOW > that is incorrect. > > Define *most* people. > > I am not knocking the Myricom card. In fact I so wish you guys would > just add the ability to latch to a 1PPS for timestamping and it would > be perfect. > > We use I think an older version of the card internally for replay. > Its a great multi-purpose card. > > However with IPG at 10G in the nanoseconds, anyone trying to do OWDs > or RTT will find it difficult compared to an Endace or Napatech card. > > Btw, I was referring to bpf(4) specifically, so please don't take my > comments as a knock against it. > > >> PS I am not sure but Intel also supports writing packets directly in > >> cache (yet I thought the 82599 driver actually does a prefetch anyway > >> which had me confused on why that helps) > > > > You're talking about DCA. We support DCA as well (and I suspect some > > other 10G NICs do to). There are a few barriers to using DCA on > > FreeBSD, not least of which is that FreeBSD doesn't currently have the > > infrastructure to support it (no IOATDMA or DCA drivers). > > Right. > > > DCA is also problematic because support from system/motherboard > > vendors is very spotty. The vendor must provide the correct tag table > > in BIOS such that the tags match the CPU/core numbering in the system. > > Many motherboard vendors don't bother with this, and you cannot enable > > DCA on a lot of systems, even though the underlying chipset supports > > DCA. I've done hacks to force-enable it in the past, with mixed > > results. The problem is that DCA depends on having the correct tag > > table, so that packets can be prefetched into the correct CPU's cache. > > If the tag table is incorrect, DCA is a big pessimization, because it > > blows the cache in other CPUs. > > Right. > > > That said, I would *love* it if FreeBSD grew ioatdma/dca support. > > Jack, does Intel have any interest in porting DCA support to FreeBSD? > > Question for Jack or Drew, what DOES FreeBSD have to do to support > DCA? I thought DCA was something you just enable on the NIC chipset > and if the system is IOATDMA aware, it just works. Is that not right > (assuming cache tags are correct and accessible)? i.e. I thought this > was hardware black magic than anything specific the OS has to do. > > OK, let me see if I can clarify some of this. First, there IS an I/OAT driver that I did for FreeBSD like 3 or 4 years ago, in the timeframe that we put the feature out. However, at that time all it was good for was the DMA aspect of things, and Prafulla used it to accelerate the stack copies; interest did not seem that great so I put the code aside, its not badly dated and needs to be brought up to date due to there being a few different versions of the hardware now. At one point maybe a year back I started to take the code apart thinking I would JUST do DCA, that got back-burnered due to other higher priority issues, but its still an item in my queue. I also had a nibble of an interest in using the DMA engine so perhaps I should not go down the road of just doing the DCA support in the I/OAT part of the driver. The question is how to make the infrastructure work. To answer Alexander's question, DCA support is NOT in the NIC, its in the chipset, that's why the I/OAT driver was done as a seperate driver, but the NIC was the user of the info, its been a while since I was into the code but if memory serves the I/OAT driver just enables the support in the chipset, and then the NIC driver configures its engine to use it. DCA and DMA were supported in Linux in the same driver because the chipset features were easily handled together perhaps, I'm not sure :) Fabien's data earlier in this thread suggested that a strategicallly placed prefetch did you more good than DCA did if I recall, what do you all think of that? As far as I'm concerned right now I am willing to resurrect the driver, clean it up and make the features available, we can see how valuable they are after that, how does that sound?? Cheers, Jack From owner-freebsd-performance@FreeBSD.ORG Fri May 14 15:18:38 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9F8261065673; Fri, 14 May 2010 15:18:38 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-gy0-f182.google.com (mail-gy0-f182.google.com [209.85.160.182]) by mx1.freebsd.org (Postfix) with ESMTP id 360AB8FC15; Fri, 14 May 2010 15:18:37 +0000 (UTC) Received: by gyh20 with SMTP id 20so1516731gyh.13 for ; Fri, 14 May 2010 08:18:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZJmkRaUGSWNAOue41Aq+/ib7rODr3nrVpztvpGlNPgE=; b=Ou8l2DMizPpyp2L0B3BFrt8eSi9sR36qCpfuCfjHBnD/8HuzzYmK349cOpN5pfg6nX mj9YbQl4KcoJnXx3UwidmszOjPaHTAp/2WZ2Ib+X7JC1uFrL9UswyX2HTFxRAxj7zfoJ OzgKeIM4M/uClgsBM1Vf3vkYHevTKY/6awyQA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=NaYpCVpB7tBWMPuklzuYqHUyOhzNKJZi70jEWpX8Dc66lXqw5R345GopzTrvgt5AP8 4dTXnMRILh1ojPl/k0+MydBg8FbrpTa8f6MZImECxL8Ru9uClNc0zpV5ZbmBAy4WQtsD h/fXZoK7NDCAtFocB9erajUfsXcx+nSML8+6A= MIME-Version: 1.0 Received: by 10.101.203.9 with SMTP id f9mr1446280anq.208.1273850314273; Fri, 14 May 2010 08:18:34 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Fri, 14 May 2010 08:18:33 -0700 (PDT) In-Reply-To: <4BED5929.5020302@cs.duke.edu> References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> Date: Fri, 14 May 2010 11:18:33 -0400 Message-ID: From: Alexander Sack To: Andrew Gallatin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Fri, 14 May 2010 17:02:56 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 15:18:38 -0000 On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin wr= ote: > Alexander Sack wrote: > <...> >>> Using this driver/firmware combo, we can receive minimal packets at >>> line rate (14.8Mpps) to userspace. =A0You can even access this using a >>> libpcap interface. =A0The trick is that the fast paths are OS-bypass, >>> and don't suffer from OS overheads, like lock contention. =A0See >>> http://www.myri.com/scs/SNF/doc/index.html for details. >> >> But your timestamps will be atrocious at 10G speeds. =A0Myricom doesn't >> timestamp packets AFAIK. =A0If you want reliable timestamps you need to >> look at companies like Endace, Napatech, etc. > > I see your old help ticket in our system. =A0Yes, our timestamping > is not as good as a dedicated capture card with a GPS reference, > but it is good enough for most people. I was told btw that it doesn't timestamp at ALL. I am assuming NOW that is incorrect. Define *most* people. I am not knocking the Myricom card. In fact I so wish you guys would just add the ability to latch to a 1PPS for timestamping and it would be perfect. We use I think an older version of the card internally for replay. Its a great multi-purpose card. However with IPG at 10G in the nanoseconds, anyone trying to do OWDs or RTT will find it difficult compared to an Endace or Napatech card. Btw, I was referring to bpf(4) specifically, so please don't take my comments as a knock against it. >> PS I am not sure but Intel also supports writing packets directly in >> cache (yet I thought the 82599 driver actually does a prefetch anyway >> which had me confused on why that helps) > > You're talking about DCA. =A0We support DCA as well (and I suspect some > other 10G NICs do to). =A0There are a few barriers to using DCA on > FreeBSD, not least of which is that FreeBSD doesn't currently have the > infrastructure to support it (no IOATDMA or DCA drivers). Right. > DCA is also problematic because support from system/motherboard > vendors is very spotty. =A0The vendor must provide the correct tag table > in BIOS such that the tags match the CPU/core numbering in the system. > Many motherboard vendors don't bother with this, and you cannot enable > DCA on a lot of systems, even though the underlying chipset supports > DCA. =A0I've done hacks to force-enable it in the past, with mixed > results. The problem is that DCA depends on having the correct tag > table, so that packets can be prefetched into the correct CPU's cache. > If the tag table is incorrect, DCA is a big pessimization, because it > blows the cache in other CPUs. Right. > That said, I would *love* it if FreeBSD grew ioatdma/dca support. > Jack, does Intel have any interest in porting DCA support to FreeBSD? Question for Jack or Drew, what DOES FreeBSD have to do to support DCA? I thought DCA was something you just enable on the NIC chipset and if the system is IOATDMA aware, it just works. Is that not right (assuming cache tags are correct and accessible)? i.e. I thought this was hardware black magic than anything specific the OS has to do. -aps From owner-freebsd-performance@FreeBSD.ORG Fri May 14 16:13:43 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id AA1D1106566C; Fri, 14 May 2010 16:13:43 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-yx0-f185.google.com (mail-yx0-f185.google.com [209.85.210.185]) by mx1.freebsd.org (Postfix) with ESMTP id 4C21A8FC17; Fri, 14 May 2010 16:13:42 +0000 (UTC) Received: by yxe15 with SMTP id 15so1154511yxe.7 for ; Fri, 14 May 2010 09:13:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=lfOS74UNxkCZAgwh/zVgcRQF2ud/eUL9uJjW4s62ips=; b=TRTgwISCY1g4g+AgFho4CGCNLMpbtqw5fAPF2YX00wlQNV9VRE4JTD1xb56lZZhFyM FA2+i5gqmUJd2um5lSVt0JKYVLNf91oGCwZzsE5P1sOCNpx+yZLh78AQ+6y9SBi2Lfnn pNEEeJxE6zTOl8Ca7cCFKvrLrC0APcSHTC37g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=gI5SbI53GxyAn4NWrpDGmEu+r6nY/NDJ16oywzzVZ+AWtQ7ABjCYoBX+hKSlzKO5R4 O+gJ3W1BHMG/FQEHozSsfdnhyBwIzXmpSC+cGzpvnsdRtcU+XcFZFKz3fLm6R0Gpndov Db+0gral/0ub5ycsMxMyNHF8e0jzhzirLUtBo= MIME-Version: 1.0 Received: by 10.101.203.9 with SMTP id f9mr1519158anq.208.1273853622431; Fri, 14 May 2010 09:13:42 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Fri, 14 May 2010 09:13:42 -0700 (PDT) In-Reply-To: <4BED6F1B.7070602@cs.duke.edu> References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> <4BED6F1B.7070602@cs.duke.edu> Date: Fri, 14 May 2010 12:13:42 -0400 Message-ID: From: Alexander Sack To: Andrew Gallatin Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Fri, 14 May 2010 17:03:13 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 16:13:43 -0000 On Fri, May 14, 2010 at 11:41 AM, Andrew Gallatin wr= ote: > Alexander Sack wrote: >> On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin >> wrote: >>> Alexander Sack wrote: >>> <...> >>>>> Using this driver/firmware combo, we can receive minimal packets at >>>>> line rate (14.8Mpps) to userspace. =A0You can even access this using = a >>>>> libpcap interface. =A0The trick is that the fast paths are OS-bypass, >>>>> and don't suffer from OS overheads, like lock contention. =A0See >>>>> http://www.myri.com/scs/SNF/doc/index.html for details. >>>> But your timestamps will be atrocious at 10G speeds. =A0Myricom doesn'= t >>>> timestamp packets AFAIK. =A0If you want reliable timestamps you need t= o >>>> look at companies like Endace, Napatech, etc. >>> I see your old help ticket in our system. =A0Yes, our timestamping >>> is not as good as a dedicated capture card with a GPS reference, >>> but it is good enough for most people. >> >> I was told btw that it doesn't timestamp at ALL. =A0I am assuming NOW >> that is incorrect. > > I think you might have misunderstood how we do timestamping. > I definately don't understand it, and I work there ;) No problem. :) > I do know that there is NIC component of it (eg, it is not 100% > done in the host). =A0I also realize that it is not is good as > something that is 1PPS GPS based. I need to grab your docs and start reading it again. I would like to support data capture using the Myricom card. I somehow missed this. I had thought the timestamps were software generated only. > >> Define *most* people. > > I may have a skewed view of the market, but it seems like > some people care deeply about accurate timestamps, and > others (mostly doing deep packet inspection) care only > within a few milliseconds, or even seconds. In our case Andrew, the folks who are doing deep packet inspection REQUIRE reasonable time stamps to correlate events and do generate reasonable stats. But I hear you, if you are just looking to see the packet data, then timestamp accuracy isn't your top priority. >> Question for Jack or Drew, what DOES FreeBSD have to do to support >> DCA? =A0I thought DCA was something you just enable on the NIC chipset >> and if the system is IOATDMA aware, it just works. =A0Is that not right >> (assuming cache tags are correct and accessible)? =A0i.e. I thought this >> was hardware black magic than anything specific the OS has to do. > > IOATDMA and DCA are sort of unfairly joined for two reasons: The DCA > control stuff is implemented as part of the IOATDMA PCIe device, and > IOATDMA is a great usage model for DCA, since you'd want the DMAs > that it does to be prefetched. > > To use DCA you need: > > - A DCA driver to talk to the IOATDMA/DCA pcie device, and obtain the tag > =A0 =A0 =A0 =A0table > - An interface that a client device (eg, NIC driver) can use to obtain > =A0 =A0 =A0 =A0either the tag table, or at least the correct tag for the = CPU > =A0 =A0 =A0 =A0that the interrupt handler is bound to. =A0The basic suppo= rt in > =A0 =A0 =A0 =A0a NIC driver boils down to something like: > > nic_interrupt_handler() > { > =A0if (sc->dca.enabled && (curcpu !=3D sc->dca.last_cpu)) { > =A0 =A0 sc->dca.last_cpu =3D curcpu; > =A0 =A0 tag =3D dca_get_tag(curcpu); > =A0 =A0 WRITE_REG(sc, DCA_TAG, tag); > =A0} > } Drew, at least in the Intel documentation, it seems the NIC uses the LAPIC id to tell the PCIe TLPs where to put inbound NIC I/O (in the TLP the DCA info is stored) to the appropriate core's cache. i.e. the heuristic you gave above is more granular than what I think Intel does. I could be wrong, maybe Jack can chime in and correct me. But it seems with Intel chipsets it is a per queue parameter which allows you to bind a core cache's to a queue via DCA. The added piece to this for at least bpf(4) consumers is to have bpf(4) subscribe to these queues AND to allow an interface for libpcap applications to know where what queue is on what core and THEN bind to it. I think that is the general idea....I think! :) -aps From owner-freebsd-performance@FreeBSD.ORG Fri May 14 16:28:57 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E9081106566C; Fri, 14 May 2010 16:28:57 +0000 (UTC) (envelope-from Leonid.Grossman@exar.com) Received: from owa.neterion.com (mx.neterion.com [72.1.205.142]) by mx1.freebsd.org (Postfix) with ESMTP id 9D56E8FC17; Fri, 14 May 2010 16:28:57 +0000 (UTC) X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Date: Fri, 14 May 2010 12:16:46 -0400 Message-ID: <78C9135A3D2ECE4B8162EBDCE82CAD77067570BE@nekter> In-Reply-To: <4BED6F1B.7070602@cs.duke.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Intel 10Gb thread-index: AcrzfAS6x46iHf9BR7miHTqNtyKiDAABBBOA References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> <4BED6F1B.7070602@cs.duke.edu> From: "Leonid Grossman" To: "Andrew Gallatin" , "Alexander Sack" X-Mailman-Approved-At: Fri, 14 May 2010 17:03:37 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org Subject: RE: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 16:28:58 -0000 Neterion/Exar x3100 is one of generic 10GbE NICs that supports timestamping in hardware, along with some other packet capturing/monitoring featiures; here is a relevant paragraph from programming manual: "Receive Frame Timestamp Feature The x3100 has the ability to label each incoming frame with a timestamp to allow a host entity to record the arrival time of incoming packets. The host uses the XMAC_TIMESTAMP register to control its operation. To enable the feature, the "EN" field must be set. Once the timestamp feature is enabled, the FCS value of each frame will be replaced with the value in a free-running 32-bit counter with a default period of 3.2 ns. The "USE_LINK_ID" determines if the full 32 bits of the of the FCS are used for the timestamp, or if the most significant 2 bits are used to identify which port the frame came in on, and 30 bits are used for the timestamp. The "INTERVAL" field can be used to programmably change the period between several values: 3.2 ns (the default), 6.4 ns, 12.8 ns, 25.6 ns, 51.2 ns, 102.4 ns, and 204.8 ns. NOTE: To take advantage of this feature, "XMAC_CFG_PORTn.STRIP_FCS" must be set to 0 to pass the FCS to the host." > -----Original Message----- > From: owner-freebsd-performance@freebsd.org [mailto:owner-freebsd- > performance@freebsd.org] On Behalf Of Andrew Gallatin > Sent: Friday, May 14, 2010 8:41 AM > To: Alexander Sack > Cc: Murat Balaban; freebsd-net@freebsd.org; freebsd- > performance@freebsd.org > Subject: Re: Intel 10Gb >=20 > Alexander Sack wrote: > > On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin > wrote: > >> Alexander Sack wrote: > >> <...> > >>>> Using this driver/firmware combo, we can receive minimal packets > at > >>>> line rate (14.8Mpps) to userspace. You can even access this > using a > >>>> libpcap interface. The trick is that the fast paths are OS- > bypass, > >>>> and don't suffer from OS overheads, like lock contention. See > >>>> http://www.myri.com/scs/SNF/doc/index.html for details. > >>> But your timestamps will be atrocious at 10G speeds. Myricom > doesn't > >>> timestamp packets AFAIK. If you want reliable timestamps you need > to > >>> look at companies like Endace, Napatech, etc. > >> I see your old help ticket in our system. Yes, our timestamping > >> is not as good as a dedicated capture card with a GPS reference, > >> but it is good enough for most people. > > > > I was told btw that it doesn't timestamp at ALL. I am assuming NOW > > that is incorrect. >=20 > I think you might have misunderstood how we do timestamping. > I definately don't understand it, and I work there ;) > I do know that there is NIC component of it (eg, it is not 100% > done in the host). I also realize that it is not is good as > something that is 1PPS GPS based. >=20 > > Define *most* people. >=20 > I may have a skewed view of the market, but it seems like > some people care deeply about accurate timestamps, and > others (mostly doing deep packet inspection) care only > within a few milliseconds, or even seconds. >=20 > > I am not knocking the Myricom card. In fact I so wish you guys > would > > just add the ability to latch to a 1PPS for timestamping and it > would > > be perfect. > > > > We use I think an older version of the card internally for replay. > > Its a great multi-purpose card. > > > > However with IPG at 10G in the nanoseconds, anyone trying to do OWDs > > or RTT will find it difficult compared to an Endace or Napatech > card. > > > > Btw, I was referring to bpf(4) specifically, so please don't take my > > comments as a knock against it. > > > >>> PS I am not sure but Intel also supports writing packets directly > in > >>> cache (yet I thought the 82599 driver actually does a prefetch > anyway > >>> which had me confused on why that helps) > >> You're talking about DCA. We support DCA as well (and I suspect > some > >> other 10G NICs do to). There are a few barriers to using DCA on > >> FreeBSD, not least of which is that FreeBSD doesn't currently have > the > >> infrastructure to support it (no IOATDMA or DCA drivers). > > > > Right. > > > >> DCA is also problematic because support from system/motherboard > >> vendors is very spotty. The vendor must provide the correct tag > table > >> in BIOS such that the tags match the CPU/core numbering in the > system. > >> Many motherboard vendors don't bother with this, and you cannot > enable > >> DCA on a lot of systems, even though the underlying chipset > supports > >> DCA. I've done hacks to force-enable it in the past, with mixed > >> results. The problem is that DCA depends on having the correct tag > >> table, so that packets can be prefetched into the correct CPU's > cache. > >> If the tag table is incorrect, DCA is a big pessimization, because > it > >> blows the cache in other CPUs. > > > > Right. > > > >> That said, I would *love* it if FreeBSD grew ioatdma/dca support. > >> Jack, does Intel have any interest in porting DCA support to > FreeBSD? > > > > Question for Jack or Drew, what DOES FreeBSD have to do to support > > DCA? I thought DCA was something you just enable on the NIC chipset > > and if the system is IOATDMA aware, it just works. Is that not > right > > (assuming cache tags are correct and accessible)? i.e. I thought > this > > was hardware black magic than anything specific the OS has to do. >=20 > IOATDMA and DCA are sort of unfairly joined for two reasons: The DCA > control stuff is implemented as part of the IOATDMA PCIe device, and > IOATDMA is a great usage model for DCA, since you'd want the DMAs > that it does to be prefetched. >=20 > To use DCA you need: >=20 > - A DCA driver to talk to the IOATDMA/DCA pcie device, and obtain the > tag > table > - An interface that a client device (eg, NIC driver) can use to obtain > either the tag table, or at least the correct tag for the CPU > that the interrupt handler is bound to. The basic support in > a NIC driver boils down to something like: >=20 > nic_interrupt_handler() > { > if (sc->dca.enabled && (curcpu !=3D sc->dca.last_cpu)) { > sc->dca.last_cpu =3D curcpu; > tag =3D dca_get_tag(curcpu); > WRITE_REG(sc, DCA_TAG, tag); > } > } >=20 > Drew > _______________________________________________ > freebsd-performance@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-performance > To unsubscribe, send any mail to "freebsd-performance- > unsubscribe@freebsd.org" From owner-freebsd-performance@FreeBSD.ORG Fri May 14 17:20:23 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BB12C1065670; Fri, 14 May 2010 17:20:23 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-yw0-f181.google.com (mail-yw0-f181.google.com [209.85.211.181]) by mx1.freebsd.org (Postfix) with ESMTP id 4E7BB8FC17; Fri, 14 May 2010 17:20:22 +0000 (UTC) Received: by ywh11 with SMTP id 11so1435813ywh.7 for ; Fri, 14 May 2010 10:20:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=NqZFPlSfG/uLdkjsSIhq0IKDA7WAKPdN7ilFOtGm9+w=; b=IDgwxHdm74Kgk2FJLScPB1UoCFE04h33HrVAw/23/vR2ty56DjTwrNJoygQoqS+ifz lGwzrRZ2F9Qb/INVO+5e7O7wzf0w+a0LRYV9j3HcbcbGBoIKXI6PUPhEenws3QFKYZN2 1zHE5eIWoGnOoYo6llPcxbkAVyKpFd2TF8bXc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=FduRR33H5djFfHR3E4R4QgkxAbntHDD+cKpq7VeKhBNFikjm66jFGaD5t4II89oXw7 /j/2zkE6yT3ojxsvprUVT/y9n1Ds8Vs5bS3+m9cAW1ItBg5MGZp8F2JgRNUgzGF9Twg3 l5sr9pTxofMtM+7cX3WuqPc5WlDyK1hW/eHd0= MIME-Version: 1.0 Received: by 10.101.181.40 with SMTP id i40mr1719251anp.193.1273857622101; Fri, 14 May 2010 10:20:22 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Fri, 14 May 2010 10:20:21 -0700 (PDT) In-Reply-To: References: <4BE52856.3000601@unsane.co.uk> <1273323582.3304.31.camel@efe> <20100511135103.GA29403@grapeape2.cs.duke.edu> <4BED5929.5020302@cs.duke.edu> Date: Fri, 14 May 2010 13:20:21 -0400 Message-ID: From: Alexander Sack To: Jack Vogel Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Fri, 14 May 2010 17:44:30 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Andrew Gallatin Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 May 2010 17:20:23 -0000 On Fri, May 14, 2010 at 1:01 PM, Jack Vogel wrote: > > > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack wrot= e: >> >> On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin >> wrote: >> > Alexander Sack wrote: >> > <...> >> >>> Using this driver/firmware combo, we can receive minimal packets at >> >>> line rate (14.8Mpps) to userspace. =A0You can even access this using= a >> >>> libpcap interface. =A0The trick is that the fast paths are OS-bypass= , >> >>> and don't suffer from OS overheads, like lock contention. =A0See >> >>> http://www.myri.com/scs/SNF/doc/index.html for details. >> >> >> >> But your timestamps will be atrocious at 10G speeds. =A0Myricom doesn= 't >> >> timestamp packets AFAIK. =A0If you want reliable timestamps you need = to >> >> look at companies like Endace, Napatech, etc. >> > >> > I see your old help ticket in our system. =A0Yes, our timestamping >> > is not as good as a dedicated capture card with a GPS reference, >> > but it is good enough for most people. >> >> I was told btw that it doesn't timestamp at ALL. =A0I am assuming NOW >> that is incorrect. >> >> Define *most* people. >> >> I am not knocking the Myricom card. =A0In fact I so wish you guys would >> just add the ability to latch to a 1PPS for timestamping and it would >> be perfect. >> >> We use I think an older version of the card internally for replay. >> Its a great multi-purpose card. >> >> However with IPG at 10G in the nanoseconds, anyone trying to do OWDs >> or RTT will find it difficult compared to an Endace or Napatech card. >> >> Btw, I was referring to bpf(4) specifically, so please don't take my >> comments as a knock against it. >> >> >> PS I am not sure but Intel also supports writing packets directly in >> >> cache (yet I thought the 82599 driver actually does a prefetch anyway >> >> which had me confused on why that helps) >> > >> > You're talking about DCA. =A0We support DCA as well (and I suspect som= e >> > other 10G NICs do to). =A0There are a few barriers to using DCA on >> > FreeBSD, not least of which is that FreeBSD doesn't currently have the >> > infrastructure to support it (no IOATDMA or DCA drivers). >> >> Right. >> >> > DCA is also problematic because support from system/motherboard >> > vendors is very spotty. =A0The vendor must provide the correct tag tab= le >> > in BIOS such that the tags match the CPU/core numbering in the system. >> > Many motherboard vendors don't bother with this, and you cannot enable >> > DCA on a lot of systems, even though the underlying chipset supports >> > DCA. =A0I've done hacks to force-enable it in the past, with mixed >> > results. The problem is that DCA depends on having the correct tag >> > table, so that packets can be prefetched into the correct CPU's cache. >> > If the tag table is incorrect, DCA is a big pessimization, because it >> > blows the cache in other CPUs. >> >> Right. >> >> > That said, I would *love* it if FreeBSD grew ioatdma/dca support. >> > Jack, does Intel have any interest in porting DCA support to FreeBSD? >> >> Question for Jack or Drew, what DOES FreeBSD have to do to support >> DCA? =A0I thought DCA was something you just enable on the NIC chipset >> and if the system is IOATDMA aware, it just works. =A0Is that not right >> (assuming cache tags are correct and accessible)? =A0i.e. I thought this >> was hardware black magic than anything specific the OS has to do. >> > > OK, let me see if I can clarify some of this. First, there IS an I/OAT > driver > that I did for FreeBSD like 3 or 4 years ago, in the timeframe that we pu= t > the feature out. However, at that time all it was good for was the DMA > aspect > of things, and Prafulla used it to accelerate the stack copies; interest = did > not seem that great so I put the code aside, its not badly dated and need= s > to be brought up to date due to there being a few different versions of t= he > hardware now. > > At one point maybe a year back I started to take the code apart thinking > I would JUST do DCA, that got back-burnered due to other higher priority > issues, but its still an item in my queue. > > I also had a nibble of an interest in using the DMA engine so perhaps I > should not go down the road of just doing the DCA support in the I/OAT > part of the driver. The question is how to make the infrastructure work. > > To answer Alexander's question, DCA support is NOT in the NIC, its in > the chipset, that's why the I/OAT driver was done as a seperate driver, > but the NIC was the user of the info, its been a while since I was into > the code but if memory serves the I/OAT driver just enables the support > in the chipset, and then the NIC driver configures its engine to use it. Thank you very much Jack! :) It was not clear from the docs what was where to me. I just assumed this was Intel NIC knew Intel chipset black magic! LOL. > DCA and DMA were supported in Linux in the same driver because > the chipset features were easily handled together perhaps, I'm not > sure :) Ok! (it was my other reference) > Fabien's data earlier in this thread suggested that a strategicallly > placed prefetch did you more good than DCA did if I recall, what > do you all think of that? I thought there was a thread where prefetch didn't do much for you....lol..= . If you just prefetch willy-nilly then don't you run the risk of packets hitting caches on cores outside of what the application reading them is on thereby defeating the whole purpose of prefetch? > As far as I'm concerned right now I am willing to resurrect the driver, > clean it up and make the features available, we can see how valuable > they are after that, how does that sound?? Sounds good to me. I at least put it somewhere publicly for people to look= at. -aps From owner-freebsd-performance@FreeBSD.ORG Sat May 15 13:23:59 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 942D41065673 for ; Sat, 15 May 2010 13:23:58 +0000 (UTC) (envelope-from barney_cordoba@yahoo.com) Received: from web63908.mail.re1.yahoo.com (web63908.mail.re1.yahoo.com [69.147.97.123]) by mx1.freebsd.org (Postfix) with SMTP id 516B78FC2A for ; Sat, 15 May 2010 13:23:58 +0000 (UTC) Received: (qmail 38237 invoked by uid 60001); 15 May 2010 13:23:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1273929837; bh=YTqbwfT9mYmWVQTgyP3CKB2UfD756kj+hl140axZ8I0=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=KdhbsdYDgT5bHCR/Kf7Kyuf2Zy2dWBgC/YczY4KiLdmM1h+jYgUKUFTTuhpjZErBOHG2ZmewRXjbWrTVhBcYq/kXHKi5UZTJUFg04cLd5tOBVEjJqr9IlhWpeWT9Pq8AvtoLNg/kK1x/gr0WiQHbfmkrgrPzw8hS4Hw37IaGcbI= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:Cc:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=Rm94iVUy/4mAI3wrKIqd2aTJkRppd0e5AgmMn8WHbdC2PlU6e7WfVkPsOQZHB45vWURlxxG5KuBQGL2LZ4WCoXM04xqXW0TP2u7ujuEXe2k4KeL/dwkTpd6RyCxzwWTVUS856b/f/bnntvCZ889bhDHAOJD8miexUccbgEolB14=; Message-ID: <620965.38211.qm@web63908.mail.re1.yahoo.com> X-YMail-OSG: E7zg2pIVM1kiA2F3RPEaNz7m.6wonVexcWJEV0UcUCsqda3 ryb9NokrA.1pw_91v8d4bACZUPqYc.kynihgL0A4Pg8bBRtx3LJNpKhArLZS _M5D2T29HvF93Dnm1nmgL7.mDY0OwPHmKImNwu0BRaEdGbRuUZh9tlZqzbf2 DUm1JpAUPFfbYVE2fTYdLdG.0CjZkXBhRLdkQO7w5eIB0u8.tzO9G6ylTT3_ YbDnrATp_K3he9d.B2GRpk_82.URPyZ3NrgleOwF8F9WPXw1riTTlzbzhMCQ .VZhAaBQeHQgfsyM1IjhNSQCFd0dNRUs- Received: from [98.203.21.152] by web63908.mail.re1.yahoo.com via HTTP; Sat, 15 May 2010 06:23:57 PDT X-Mailer: YahooMailClassic/10.1.11 YahooMailWebService/0.8.103.269680 Date: Sat, 15 May 2010 06:23:57 -0700 (PDT) From: Barney Cordoba To: Jack Vogel , Alexander Sack In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Sat, 15 May 2010 14:12:49 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Andrew Gallatin Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 May 2010 13:23:59 -0000 =0A=0A--- On Fri, 5/14/10, Alexander Sack wrote:=0A=0A= > From: Alexander Sack =0A> Subject: Re: Intel 10Gb=0A>= To: "Jack Vogel" =0A> Cc: "Murat Balaban" , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, "Andrew= Gallatin" =0A> Date: Friday, May 14, 2010, 1:20 PM= =0A> On Fri, May 14, 2010 at 1:01 PM, Jack=0A> Vogel =0A= > wrote:=0A> >=0A> >=0A> > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack = =0A> wrote:=0A> >>=0A> >> On Fri, May 14, 2010 at 10:07= AM, Andrew Gallatin=0A> =0A> >> wrote:=0A> >> > Alex= ander Sack wrote:=0A> >> > <...>=0A> >> >>> Using this driver/firmware comb= o, we=0A> can receive minimal packets at=0A> >> >>> line rate (14.8Mpps) to= userspace.=0A> =A0You can even access this using a=0A> >> >>> libpcap inte= rface. =A0The trick is=0A> that the fast paths are OS-bypass,=0A> >> >>> an= d don't suffer from OS overheads,=0A> like lock contention. =A0See=0A> >> >= >> http://www.myri.com/scs/SNF/doc/index.html for=0A> details.=0A> >> >>=0A= > >> >> But your timestamps will be atrocious at=0A> 10G speeds. =A0Myricom= doesn't=0A> >> >> timestamp packets AFAIK. =A0If you want=0A> reliable tim= estamps you need to=0A> >> >> look at companies like Endace, Napatech,=0A> = etc.=0A> >> >=0A> >> > I see your old help ticket in our system.=0A> =A0Yes= , our timestamping=0A> >> > is not as good as a dedicated capture card=0A> = with a GPS reference,=0A> >> > but it is good enough for most people.=0A> >= >=0A> >> I was told btw that it doesn't timestamp at ALL.=0A> =A0I am assum= ing NOW=0A> >> that is incorrect.=0A> >>=0A> >> Define *most* people.=0A> >= >=0A> >> I am not knocking the Myricom card. =A0In fact I so=0A> wish you g= uys would=0A> >> just add the ability to latch to a 1PPS for=0A> timestampi= ng and it would=0A> >> be perfect.=0A> >>=0A> >> We use I think an older ve= rsion of the card=0A> internally for replay.=0A> >> Its a great multi-purpo= se card.=0A> >>=0A> >> However with IPG at 10G in the nanoseconds, anyone= =0A> trying to do OWDs=0A> >> or RTT will find it difficult compared to an= =0A> Endace or Napatech card.=0A> >>=0A> >> Btw, I was referring to bpf(4) = specifically, so=0A> please don't take my=0A> >> comments as a knock agains= t it.=0A> >>=0A> >> >> PS I am not sure but Intel also supports=0A> writing= packets directly in=0A> >> >> cache (yet I thought the 82599 driver=0A> ac= tually does a prefetch anyway=0A> >> >> which had me confused on why that h= elps)=0A> >> >=0A> >> > You're talking about DCA. =A0We support DCA as=0A> = well (and I suspect some=0A> >> > other 10G NICs do to). =A0There are a few= =0A> barriers to using DCA on=0A> >> > FreeBSD, not least of which is that = FreeBSD=0A> doesn't currently have the=0A> >> > infrastructure to support i= t (no IOATDMA or=0A> DCA drivers).=0A> >>=0A> >> Right.=0A> >>=0A> >> > DCA= is also problematic because support from=0A> system/motherboard=0A> >> > v= endors is very spotty. =A0The vendor must=0A> provide the correct tag table= =0A> >> > in BIOS such that the tags match the CPU/core=0A> numbering in th= e system.=0A> >> > Many motherboard vendors don't bother with=0A> this, and= you cannot enable=0A> >> > DCA on a lot of systems, even though the=0A> un= derlying chipset supports=0A> >> > DCA. =A0I've done hacks to force-enable = it in=0A> the past, with mixed=0A> >> > results. The problem is that DCA de= pends on=0A> having the correct tag=0A> >> > table, so that packets can be = prefetched into=0A> the correct CPU's cache.=0A> >> > If the tag table is i= ncorrect, DCA is a big=0A> pessimization, because it=0A> >> > blows the cac= he in other CPUs.=0A> >>=0A> >> Right.=0A> >>=0A> >> > That said, I would *= love* it if FreeBSD grew=0A> ioatdma/dca support.=0A> >> > Jack, does Intel= have any interest in porting=0A> DCA support to FreeBSD?=0A> >>=0A> >> Que= stion for Jack or Drew, what DOES FreeBSD have=0A> to do to support=0A> >> = DCA? =A0I thought DCA was something you just enable=0A> on the NIC chipset= =0A> >> and if the system is IOATDMA aware, it just works.=0A> =A0Is that n= ot right=0A> >> (assuming cache tags are correct and accessible)?=0A> =A0i.= e. I thought this=0A> >> was hardware black magic than anything specific=0A= > the OS has to do.=0A> >>=0A> >=0A> > OK, let me see if I can clarify some= of this. First,=0A> there IS an I/OAT=0A> > driver=0A> > that I did for Fr= eeBSD like 3 or 4 years ago, in the=0A> timeframe that we put=0A> > the fea= ture out. However, at that time all it was good=0A> for was the DMA=0A> > a= spect=0A> > of things, and Prafulla used it to accelerate the=0A> stack cop= ies; interest did=0A> > not seem that great so I put the code aside, its no= t=0A> badly dated and needs=0A> > to be brought up to date due to there bei= ng a few=0A> different versions of the=0A> > hardware now.=0A> >=0A> > At o= ne point maybe a year back I started to take the=0A> code apart thinking=0A= > > I would JUST do DCA, that got back-burnered due to=0A> other higher pri= ority=0A> > issues, but its still an item in my queue.=0A> >=0A> > I also h= ad a nibble of an interest in using the DMA=0A> engine so perhaps I=0A> > s= hould not go down the road of just doing the DCA=0A> support in the I/OAT= =0A> > part of the driver. The question is how to make the=0A> infrastructu= re work.=0A> >=0A> > To answer Alexander's question, DCA support is NOT in= =0A> the NIC, its in=0A> > the chipset, that's why the I/OAT driver was don= e as a=0A> seperate driver,=0A> > but the NIC was the user of the info, its= been a while=0A> since I was into=0A> > the code but if memory serves the = I/OAT driver just=0A> enables the support=0A> > in the chipset, and then th= e NIC driver configures its=0A> engine to use it.=0A> =0A> Thank you very m= uch Jack!=A0 :)=A0 It was not clear=0A> from the docs what was=0A> where to= me.=A0 I just assumed this was Intel NIC knew=0A> Intel chipset=0A> black = magic!=A0 LOL.=0A> =0A> > DCA and DMA were supported in Linux in the same d= river=0A> because=0A> > the chipset features were easily handled together= =0A> perhaps, I'm not=0A> > sure :)=0A> =0A> Ok!=A0 (it was my other refere= nce)=0A> =0A> > Fabien's data earlier in this thread suggested that a=0A> s= trategicallly=0A> > placed prefetch did you more good than DCA did if I=0A>= recall, what=0A> > do you all think of that?=0A> =0A> I thought there was = a thread where prefetch didn't do much=0A> for you....lol...=0A> =0A> If yo= u just prefetch willy-nilly then don't you run the=0A> risk of=0A> packets = hitting caches on cores outside of what the=0A> application=0A> reading the= m is on thereby defeating the whole purpose of=0A> prefetch?=0A> =0A> > As = far as I'm concerned right now I am willing to=0A> resurrect the driver,=0A= > > clean it up and make the features available, we can=0A> see how valuabl= e=0A> > they are after that, how does that sound??=0A> =0A> Sounds good to = me.=A0 I at least put it somewhere=0A> publicly for people to look at.=0A> = =0A> -aps=0A=0AOf course none of this has anything to do with the original = subject.=0AProcessing a monodirectional stream is really no problem, nor do= es=0Ait require any sort of special design consideration. All of this chatt= er=0Aabout card features is largely minutia. =0A=0AModern processors are so= fast that its a waste of brain cells to spend=0Atime trying to squeeze non= oseconds from packet gathering. You guys sound=0Athe same as when you were = trying to do 10Mb/s ethernet with ISA bus NICs.=0A=0AIt makes no sense to f= ocus on optimizing tires for a car which can't break=0A 80Mph. The entire p= roblem is lock contention. Until you have a driver=0Athat can scale to a po= int where 10gb/s is workable without significant=0Alock contention, you're = just feeding a dead body.=0A=0AUnless of course your goal for 10gb/s for Fr= eeBSD is for it to be a really=0Agood network monitor. =0A=0ABC =0A=0A=0A = From owner-freebsd-performance@FreeBSD.ORG Sat May 15 21:49:34 2010 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 9666B1065673; Sat, 15 May 2010 21:49:34 +0000 (UTC) (envelope-from pisymbol@gmail.com) Received: from mail-gw0-f54.google.com (mail-gw0-f54.google.com [74.125.83.54]) by mx1.freebsd.org (Postfix) with ESMTP id 3A49D8FC14; Sat, 15 May 2010 21:49:33 +0000 (UTC) Received: by gwb15 with SMTP id 15so94344gwb.13 for ; Sat, 15 May 2010 14:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=yt36Eh76OCLtD9PEvOo1mTycfE1DLFZStVxG0/5SHnM=; b=S3W2MpT4fxRnq8sXpSvMDjLG70iVr9JZ1LAIYQaUIh6MUGtMdhjmu+MCoPgqxSnmqG LSazrIhA5QKYhVx/wIHblrkKqM4IgqE61+a56/fott1EvC9VNBiuS3F5KPR98MKdWUqc L9TMriksCWGZz5bvKCPi+f5ALCaSuF3Fbw53M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=eUu955Vh2EPpQSxV3f8C566U+9Wncvkw80R9hkqxUznvDI8uQVL1wQl+Eb3KWL6goX /sQgqKQzRgHbaWWK83GqNflfGZznNprnmmjDXWXLouEbe3/241Ft7yLyASKfOc2Dy5ug UVjcSaStFSQGREh75aKeTfS099NPjoCmbuebI= MIME-Version: 1.0 Received: by 10.100.246.35 with SMTP id t35mr3960057anh.14.1273960173419; Sat, 15 May 2010 14:49:33 -0700 (PDT) Received: by 10.100.58.2 with HTTP; Sat, 15 May 2010 14:49:33 -0700 (PDT) In-Reply-To: <620965.38211.qm@web63908.mail.re1.yahoo.com> References: <620965.38211.qm@web63908.mail.re1.yahoo.com> Date: Sat, 15 May 2010 17:49:33 -0400 Message-ID: From: Alexander Sack To: Barney Cordoba Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Mailman-Approved-At: Sat, 15 May 2010 23:00:50 +0000 Cc: Murat Balaban , freebsd-net@freebsd.org, freebsd-performance@freebsd.org, Jack Vogel , Andrew Gallatin Subject: Re: Intel 10Gb X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 May 2010 21:49:34 -0000 On Sat, May 15, 2010 at 9:23 AM, Barney Cordoba wrote: > > > --- On Fri, 5/14/10, Alexander Sack wrote: > >> From: Alexander Sack >> Subject: Re: Intel 10Gb >> To: "Jack Vogel" >> Cc: "Murat Balaban" , freebsd-net@freebsd.org, free= bsd-performance@freebsd.org, "Andrew Gallatin" >> Date: Friday, May 14, 2010, 1:20 PM >> On Fri, May 14, 2010 at 1:01 PM, Jack >> Vogel >> wrote: >> > >> > >> > On Fri, May 14, 2010 at 8:18 AM, Alexander Sack >> wrote: >> >> >> >> On Fri, May 14, 2010 at 10:07 AM, Andrew Gallatin >> >> >> wrote: >> >> > Alexander Sack wrote: >> >> > <...> >> >> >>> Using this driver/firmware combo, we >> can receive minimal packets at >> >> >>> line rate (14.8Mpps) to userspace. >> =A0You can even access this using a >> >> >>> libpcap interface. =A0The trick is >> that the fast paths are OS-bypass, >> >> >>> and don't suffer from OS overheads, >> like lock contention. =A0See >> >> >>> http://www.myri.com/scs/SNF/doc/index.html for >> details. >> >> >> >> >> >> But your timestamps will be atrocious at >> 10G speeds. =A0Myricom doesn't >> >> >> timestamp packets AFAIK. =A0If you want >> reliable timestamps you need to >> >> >> look at companies like Endace, Napatech, >> etc. >> >> > >> >> > I see your old help ticket in our system. >> =A0Yes, our timestamping >> >> > is not as good as a dedicated capture card >> with a GPS reference, >> >> > but it is good enough for most people. >> >> >> >> I was told btw that it doesn't timestamp at ALL. >> =A0I am assuming NOW >> >> that is incorrect. >> >> >> >> Define *most* people. >> >> >> >> I am not knocking the Myricom card. =A0In fact I so >> wish you guys would >> >> just add the ability to latch to a 1PPS for >> timestamping and it would >> >> be perfect. >> >> >> >> We use I think an older version of the card >> internally for replay. >> >> Its a great multi-purpose card. >> >> >> >> However with IPG at 10G in the nanoseconds, anyone >> trying to do OWDs >> >> or RTT will find it difficult compared to an >> Endace or Napatech card. >> >> >> >> Btw, I was referring to bpf(4) specifically, so >> please don't take my >> >> comments as a knock against it. >> >> >> >> >> PS I am not sure but Intel also supports >> writing packets directly in >> >> >> cache (yet I thought the 82599 driver >> actually does a prefetch anyway >> >> >> which had me confused on why that helps) >> >> > >> >> > You're talking about DCA. =A0We support DCA as >> well (and I suspect some >> >> > other 10G NICs do to). =A0There are a few >> barriers to using DCA on >> >> > FreeBSD, not least of which is that FreeBSD >> doesn't currently have the >> >> > infrastructure to support it (no IOATDMA or >> DCA drivers). >> >> >> >> Right. >> >> >> >> > DCA is also problematic because support from >> system/motherboard >> >> > vendors is very spotty. =A0The vendor must >> provide the correct tag table >> >> > in BIOS such that the tags match the CPU/core >> numbering in the system. >> >> > Many motherboard vendors don't bother with >> this, and you cannot enable >> >> > DCA on a lot of systems, even though the >> underlying chipset supports >> >> > DCA. =A0I've done hacks to force-enable it in >> the past, with mixed >> >> > results. The problem is that DCA depends on >> having the correct tag >> >> > table, so that packets can be prefetched into >> the correct CPU's cache. >> >> > If the tag table is incorrect, DCA is a big >> pessimization, because it >> >> > blows the cache in other CPUs. >> >> >> >> Right. >> >> >> >> > That said, I would *love* it if FreeBSD grew >> ioatdma/dca support. >> >> > Jack, does Intel have any interest in porting >> DCA support to FreeBSD? >> >> >> >> Question for Jack or Drew, what DOES FreeBSD have >> to do to support >> >> DCA? =A0I thought DCA was something you just enable >> on the NIC chipset >> >> and if the system is IOATDMA aware, it just works. >> =A0Is that not right >> >> (assuming cache tags are correct and accessible)? >> =A0i.e. I thought this >> >> was hardware black magic than anything specific >> the OS has to do. >> >> >> > >> > OK, let me see if I can clarify some of this. First, >> there IS an I/OAT >> > driver >> > that I did for FreeBSD like 3 or 4 years ago, in the >> timeframe that we put >> > the feature out. However, at that time all it was good >> for was the DMA >> > aspect >> > of things, and Prafulla used it to accelerate the >> stack copies; interest did >> > not seem that great so I put the code aside, its not >> badly dated and needs >> > to be brought up to date due to there being a few >> different versions of the >> > hardware now. >> > >> > At one point maybe a year back I started to take the >> code apart thinking >> > I would JUST do DCA, that got back-burnered due to >> other higher priority >> > issues, but its still an item in my queue. >> > >> > I also had a nibble of an interest in using the DMA >> engine so perhaps I >> > should not go down the road of just doing the DCA >> support in the I/OAT >> > part of the driver. The question is how to make the >> infrastructure work. >> > >> > To answer Alexander's question, DCA support is NOT in >> the NIC, its in >> > the chipset, that's why the I/OAT driver was done as a >> seperate driver, >> > but the NIC was the user of the info, its been a while >> since I was into >> > the code but if memory serves the I/OAT driver just >> enables the support >> > in the chipset, and then the NIC driver configures its >> engine to use it. >> >> Thank you very much Jack!=A0 :)=A0 It was not clear >> from the docs what was >> where to me.=A0 I just assumed this was Intel NIC knew >> Intel chipset >> black magic!=A0 LOL. >> >> > DCA and DMA were supported in Linux in the same driver >> because >> > the chipset features were easily handled together >> perhaps, I'm not >> > sure :) >> >> Ok!=A0 (it was my other reference) >> >> > Fabien's data earlier in this thread suggested that a >> strategicallly >> > placed prefetch did you more good than DCA did if I >> recall, what >> > do you all think of that? >> >> I thought there was a thread where prefetch didn't do much >> for you....lol... >> >> If you just prefetch willy-nilly then don't you run the >> risk of >> packets hitting caches on cores outside of what the >> application >> reading them is on thereby defeating the whole purpose of >> prefetch? >> >> > As far as I'm concerned right now I am willing to >> resurrect the driver, >> > clean it up and make the features available, we can >> see how valuable >> > they are after that, how does that sound?? >> >> Sounds good to me.=A0 I at least put it somewhere >> publicly for people to look at. >> >> -aps > > Of course none of this has anything to do with the original subject. > Processing a monodirectional stream is really no problem, nor does > it require any sort of special design consideration. All of this chatter > about card features is largely minutia. > > Modern processors are so fast that its a waste of brain cells to spend > time trying to squeeze nonoseconds from packet gathering. You guys sound > the same as when you were trying to do 10Mb/s ethernet with ISA bus NICs. It depends on what you really mean and what lock contention you are specifically talking about. The NIC features as well as multi-queue bpf(4) is a way to distribute the load across multiple cores thereby lowering total CPU overhead (that's always good) AS WELL AS provide the ability for libpcap consumers to post-process caught packets in cache. Most third-party capture cards already do just this: they are typically stream or feed based and allow for flow based steering to distribute the load across cores. Intel only recently has added this in their 10g chipsets (Jack can correct if I'm wrong). All of these things help both in capture and post-processing. > It makes no sense to focus on optimizing tires for a car which can't brea= k > =A080Mph. The entire problem is lock contention. Until you have a driver > that can scale to a point where 10gb/s is workable without significant > lock contention, you're just feeding a dead body. Lock contention in bpf(4) or in the NIC driver or in both? :) > Unless of course your goal for 10gb/s for FreeBSD is for it to be a reall= y > good network monitor. That is exactly my goal: it would be great to see FreeBSD as a fantastic general-purpose network monitor at 10gb/s speeds. There are couple of issues one of which is also timestamping. -aps