From owner-freebsd-arch@FreeBSD.ORG Tue Jul 15 00:44:16 2014 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C48AF297; Tue, 15 Jul 2014 00:44:16 +0000 (UTC) Received: from mail-qa0-x230.google.com (mail-qa0-x230.google.com [IPv6:2607:f8b0:400d:c00::230]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 78A1B256B; Tue, 15 Jul 2014 00:44:16 +0000 (UTC) Received: by mail-qa0-f48.google.com with SMTP id m5so1848060qaj.7 for ; Mon, 14 Jul 2014 17:44:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:message-id:subject:from:to:content-type; bh=FvN3gfIZgZCEaegkqLkRZsZs+kB/LvfwkJn2UNHjVFI=; b=pCmSgKvUj/I0TZPmRsKZpoc0cKyOX/T0SUG960r++eYdKWaNdckyJ4nIGKt8AiaQqs O9J9mkK7XFXAiaoZeWUQMNfkkGYk2G3dghCf9Ws1Q1Htg4w0UNbSQxLI4NtkA0E4hn65 JkbvEAXXVDExhi5YFWFwVplp5Z0Zu13QkH2/JTy0x5fQKkKVELR0vWR/vdbCW2xEr3xO AOHXuoTETzhUeK0Zpb6S3kMGjbQunisd/qml9AwNbwrKw5eUuOIyLqGAHU7406v0VrM0 IQSpXvWvKAguja+Hw0KXVBNC3XrCwCIfiyj+cFofUQrw3UDFjUTSrLCMg6Oqwaeo7wVh QHpA== MIME-Version: 1.0 X-Received: by 10.224.171.197 with SMTP id i5mr28743881qaz.55.1405385055606; Mon, 14 Jul 2014 17:44:15 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.224.202.193 with HTTP; Mon, 14 Jul 2014 17:44:15 -0700 (PDT) Date: Mon, 14 Jul 2014 17:44:15 -0700 X-Google-Sender-Auth: drKNq-70BpKBM9JhjBBupuyS5po Message-ID: Subject: UDP/TCP versus IP frames - subtle out of order packets with hardware hashing From: Adrian Chadd To: "freebsd-arch@freebsd.org" , FreeBSD Net Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Jul 2014 00:44:16 -0000 Hi, Whilst digging into UDP receive side scaling on the intel ixgbe(4) NIC, I stumbled across how it hashes traffic between IP fragmented traffic and non IP-fragmented traffic. Here's how it surfaced: * the ixgbe(4) NIC is configured to hash on both IP (2-tuple) and TCP/UDP (4-tuple); * when a non-fragmented UDP frame comes in, it's hashed on the 4-tuple and comes into queue A; * when a fragmented UDP frame comes in, it's hashed on the IP 2-tuple and comes into queue B. So if there's a mix of small and large datagrams, we'll end up with some packets coming in via queue A and some by queue B. In normal operation that'll result in out of order packets. For the RSS stuff I'm working on it means that some packets will match the PCBGROUP setup and some won't. By default UDP configures a 2-tuple hash so it expects packets to come in hashed appropriately. But that only matches for large frames. For small frames it'll be hashed via the 4-tuple and it won't match. The ip reassembly code doesn't recalculate the flowid/flowtype once it's finished. It'd be nice to do that before further processing so it can be placed in the right netisr. So there's a couple of semi-overlapping issues: * Right now we could get TCP and UDP frames out of order. I'd like to at least have ixgbe(4) hash on the 2-tuple for UDP rather than the 4-tuple. That fixes that silly corner case. It's not likely going to show up except for things like forwarding workloads. Maybe people doing memcached work, I'm not sure. * Whether or not to calculate the flowid/flowtype in ip_reass() (or maybe in the netisr input path, in case there's no flowid assigned) so work is better distributed; * .. then if we do that, we could do 4-tuple UDP hashing again and we'd just recalculate for any large frames. Here's what happened with Linux and ixgbe in 2010 on this topic: http://comments.gmane.org/gmane.linux.network/166687 What do people think? -a