From owner-freebsd-stable@FreeBSD.ORG Thu Jan 9 01:12:40 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id A8A43786 for ; Thu, 9 Jan 2014 01:12:40 +0000 (UTC) Received: from mail-pd0-x22d.google.com (mail-pd0-x22d.google.com [IPv6:2607:f8b0:400e:c02::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 7A56E150F for ; Thu, 9 Jan 2014 01:12:40 +0000 (UTC) Received: by mail-pd0-f173.google.com with SMTP id p10so2485055pdj.18 for ; Wed, 08 Jan 2014 17:12:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=Vimn64e9ybQEUnqJU6rDDNWnxuyp6hzSVFQY6a0kDEQ=; b=PgOd06hAXWgIdxMF+ZRPYhRoO5sAVpTXRjBI6Oz02sOFumJOsRLju9B1AMX5iUBfVL RNMFVpmFpoGW+2VeHxOMgR8WvHCGmR37gU1qI+IIy0aIpmI9flmd6cXXG6hIZ3CWLwWL DDaAwhaigNzjMtPfBz7M7U7zb6rVzFWTJevPtxN5d8yUYZLlVxurISfY8zs9wlHa1hWx YZskA45lgTwXDJJjyXO9aCxnmhHaZN+vy32s/mHcbASBmiePQ3atWomL18OQLatQ0xFl XTuo+jHvqsA+4HemtUpLxLV/O0BCeODVRT8l4LxZQIjxvKqCYBcUIL0sGjU/x23cc3aL Zd9w== X-Received: by 10.68.197.234 with SMTP id ix10mr354130pbc.80.1389229960060; Wed, 08 Jan 2014 17:12:40 -0800 (PST) Received: from pyunyh@gmail.com (lpe4.p59-icn.cdngp.net. [114.111.62.249]) by mx.google.com with ESMTPSA id ju10sm5397503pbd.33.2014.01.08.17.12.37 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 08 Jan 2014 17:12:39 -0800 (PST) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Thu, 09 Jan 2014 10:12:35 +0900 From: Yonghyeon PYUN Date: Thu, 9 Jan 2014 10:12:35 +0900 To: Curtis Villamizar Subject: Re: regression: msk0 watchdog timeout and interrupt storm Message-ID: <20140109011235.GA2813@michelle.cdnetworks.com> References: <20140107084938.GA1361@michelle.cdnetworks.com> <201401080311.s083BWf9038444@maildrop2.v6ds.occnc.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <201401080311.s083BWf9038444@maildrop2.v6ds.occnc.com> User-Agent: Mutt/1.4.2.3i Cc: freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.17 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jan 2014 01:12:40 -0000 On Tue, Jan 07, 2014 at 10:11:32PM -0500, Curtis Villamizar wrote: > > In message <20140107084938.GA1361@michelle.cdnetworks.com> > Yonghyeon PYUN writes: > > > On Mon, Jan 06, 2014 at 10:20:40AM -0500, Curtis Villamizar wrote: > > > > [...] > > > > > Here are some relevant parts of dmesg. Is there anything else you want? > > > > > > real memory = 2147483648 (2048 MB) > > > avail memory = 2061438976 (1965 MB) > > > Event timer "LAPIC" quality 400 > > > ACPI APIC Table: > > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > > > FreeBSD/SMP: 1 package(s) x 2 core(s) > > > cpu0 (BSP): APIC ID: 0 > > > cpu1 (AP): APIC ID: 1 > > > > > > pcib2: irq 19 at device 7.0 on pci0 > > > pci2: on pcib2 > > > on pci1 > > > pcib2: irq 19 at device 7.0 on pci0 > > > pci2: on pcib2 > > > mskc0: port 0xe800-0xe8ff mem > > > 0xfebfc000-0xfebfffff irq 19 at device 0.0 on pci2 > > > msk0: > > > on mskc0 > > > msk0: Ethernet address: c8:9c:dc:56:38:ef > > > miibus0: on msk0 > > > e1000phy0: PHY 0 on miibus0 > > > e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, > > > 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, > > > auto, auto-flow > > > > > > > Thank you for the info. > > > > > The computer is a Lenovo ThinkCenter (small tower) and not an uncommon > > > machine so others are likely to run into this. > > > > > > > > Please let me know what I could do to help debug this. > > > > > > > > > > > > > If you have more than 4GB memory, try reducing the amount of > > > > memory(e.g. 3G) in /boot/loader.conf and let me know whether that > > > > makes any difference for you. > > > > Note, in order to test this you have to back out your local > > > > changes. > > > > > > Only have 2 GB memory. > > > > > > > Ok, that means my wild guess was not right. :-( > > > > > > [...] > > > > > > I'm under the impression that the controller may have additional > > > > DMA addressing limitation where TX/RX and status LEs should have > > > > the same high DMA address. Due to the lack of documentation I'm > > > > not sure about that. If the issue does not happen with 3GB memory, > > > > we have to use 32bit DMA addressing. > > > > > > We have 2 GB memory so the problem with the original code does happen > > > with less than 4 GB memory. Everything has the same high address of > > > zero. > > > > > > > Right. > > > > > Is there anything else you want me to try? > > > > msk(4) uses 4KB alignment for status/TX/RX rings. Your local change > > will reduce the number of status LEs to be 1024. Stock msk(4) will > > use 2048 entries for status LEs and you said the cons variable is > > stuck with 1024 in this case. I have no idea this can happen at > > this moment. > > Did msk(4) ever work on your box? If the answer is yes, would you > > back out both r258780 and your local change? > > This host worked for a few years under FreeBSD 8.x and FreeBSD 9.x, > most recently 9.2. I have other machines running stable_10 at about > the 10.0.beta3 vintage. I had mostly good luck building the ports I > use (except openoffice never seems to build). > > I transferred a bunch of small stuff over after upgrading to > 10.0.beta3 on this machine but then with the big move of a tar backup > through the GbE, it locked up consisitently. > > I tried my patch and symptom gone. > > > I have a small local diff which was made after seeing r258780. But > > I'm not sure whether it makes any difference. > > So it seems what you want me to do is: > > 1. verify whether just backing out r258780 on if_mskreg.h fixes this. > > 2. if so, then put back r258780 and try your patch below and see if > that fixes it. > Correct. > I think I can find some time to do this maybe immediately or at least > very soon. After doing that I will report back. Please stand by. > Thank you. > > > Curtis > > > > > > btw - I added someone from Marvell on the Bcc in case he wants to join > > > in on the conversation or give us a hint in private email.