From owner-freebsd-current@FreeBSD.ORG Sat May 2 09:03:53 2015 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 311DAD52; Sat, 2 May 2015 09:03:53 +0000 (UTC) Received: from mail-ig0-x22c.google.com (mail-ig0-x22c.google.com [IPv6:2607:f8b0:4001:c05::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 022A411CC; Sat, 2 May 2015 09:03:52 +0000 (UTC) Received: by igbyr2 with SMTP id yr2so54764736igb.0; Sat, 02 May 2015 02:03:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=MqnlBVLvIfhoB1jY4Nx8wpqCp5Urb5ZF7nsOCKYzQDw=; b=exsQiJHRpF9wrn9sanao/NH//RPuIpOQDeKBYIjKxKXnsXpe8iDnC5KG9z5CKM7M7f C/RCvQ4zHVe2lZBbiRAlCD9WNYhQANAv4ti6S49nvjBIHibL9ReqFemkmUWievwGtSBm w4cMr0xXpxd/Sq21Zcojw3SITBOHvOl3iFoiYAOmM56T/bkAm7R6f2LLfHRavtw+p2kG 0YFG9K1jOpGIG5HscqAdEatoa0vqlSETFIeQ1c0nSovu3xfzaUdY67wKLjZCm0erxRop gR9Ez5xspdMsTmRafhqnR4LPnJGag8R4rigVhSYJ5q0QqQWKr6FLTFCQ2cD9Cy1AfABh y8gA== MIME-Version: 1.0 X-Received: by 10.107.168.143 with SMTP id e15mr16701890ioj.88.1430557432213; Sat, 02 May 2015 02:03:52 -0700 (PDT) Sender: adrian.chadd@gmail.com Received: by 10.36.38.133 with HTTP; Sat, 2 May 2015 02:03:52 -0700 (PDT) In-Reply-To: <1494.1430550164@critter.freebsd.dk> References: <1494.1430550164@critter.freebsd.dk> Date: Sat, 2 May 2015 02:03:52 -0700 X-Google-Sender-Auth: QBT3T5CkZxuyjqTKRW1Ppx7tBxg Message-ID: Subject: Re: iwn crashes in current (r282269) From: Adrian Chadd To: Poul-Henning Kamp , "freebsd-wireless@freebsd.org" Cc: "current@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 May 2015 09:03:53 -0000 Hi, On 2 May 2015 at 00:02, Poul-Henning Kamp wrote: > May 2 01:01:34 critter kernel: iwn0: device timeout > May 2 01:01:34 critter kernel: firmware: 'iwn6000g2afw' version 0: 677296 bytes loaded at 0xffffffff81f880c0 > May 2 01:01:34 critter kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601 > May 2 01:01:40 critter kernel: iwn0: iwn_tx_data: m=0xfffff80236fe8500: seqno (9550) (78) != ring index (0) ! > May 2 01:01:40 critter kernel: iwn0: iwn_intr: fatal firmware error > May 2 01:01:40 critter kernel: iwn0: iwn_panicked: controller panicked, iv_state = 5; resetting... > May 2 01:01:40 critter kernel: firmware: 'iwn6000g2afw' version 0: 677296 bytes loaded at 0xffffffff81f880c0 > May 2 01:01:40 critter kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601 > > And then the machine hung. > > No further details, as the screen-blanker was on. So there's something odd with iwn and sequence number allocations. what's supposed to happen here is that: * net80211 handles sequence number allocation; * then A-MPDU is negotiated; * then the driver handles sequence number allocations. The firmware requires that for 11n transmit, each frame goes into a ring slot that's seqno % 256. It's not an arbitrary slot. It'll panic otherwise, like you saw above. Now, something's upsetting it. It may be a noisy environment leading to BAR frame transmissions and eventual tear-down of the A-MPDU state, leading to net80211 taking over sequence number allocation again. I fixed a whole of those races in the ath(4) driver when I implemented 11n and found there's no locking at all going on there. :( It could also be something inside net80211 that's advancing the sequence number space, even though A-MPDU is enabled. There's only a couple of places where ni_txseqs is updated in net80211. If it were getting updated there, it should be obvious. But it does do a check to see if AMPDU is enabled and running, and none of that is consistently locked. iwn_addba_response() sets the ni_txseq for the tid to be whatever was negotiated during the aggregation negotiation (ADDBA) and then sets the initial ring slot id to be whatever the starting sequence number is ('ssn' in *_ampdu_tx_start()). iwn_tx_data() does do sequence number allocation there. It's possible we're seeing races where aggregation is being torn down during active transmit and the state is all mucked up. I recall seeing issues in ath(4) where there were some packets queued between sending out the initial aggregation negotiation and it being negotiated, which meant some packets would go out with sequence numbers /after/ what was initially negotatied during ADDBA. Ie: * you're at seq X, and you negotiate ADDBA at seq X; * you queue a bunch of transmit frames, seq X -> X + n; * peer says "ADDBA acceptable, starting seq X"; * the next frame you transmit comes from seq X + n + 1, but the other peer is confused. Here it may show up as: * you negotiate seq X via addba; * you queue a bunch more frames via the normal transmit path; * you get the addba response, set initial ssn to X; * the 'cur' pointer here in the ring is now X % 256, but the next frame you transmit is (X + n) % 256, and stuff is out of alignment. So, would someone please help see if that's the case? That'd be really helpful. :) -adrian