From owner-freebsd-questions@FreeBSD.ORG Wed May 30 14:27:29 2007 Return-Path: X-Original-To: freebsd-questions@freebsd.org Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 737BA16A400 for ; Wed, 30 May 2007 14:27:29 +0000 (UTC) (envelope-from scott@butlerpress.com) Received: from pavilion.cheryljwillson.com (cheryljwillson.com [71.36.251.213]) by mx1.freebsd.org (Postfix) with ESMTP id 2550113C43E for ; Wed, 30 May 2007 14:27:29 +0000 (UTC) (envelope-from scott@butlerpress.com) Received: from [192.168.1.203] (cheryljwillson.com [71.36.251.213]) by pavilion.cheryljwillson.com (Postfix) with ESMTP id 3C7344605A; Wed, 30 May 2007 07:46:41 -0700 (PDT) In-Reply-To: <20070529232621.GB1575@rot13.obsecurity.org> References: <20070529232621.GB1575@rot13.obsecurity.org> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <0DCCD2FD-813E-41B5-B56A-0605E7BDC2E8@butlerpress.com> Content-Transfer-Encoding: 7bit From: Scott Willson Date: Wed, 30 May 2007 07:10:03 -0700 To: freebsd-questions@freebsd.org X-Mailer: Apple Mail (2.752.2) Cc: Kris Kennaway Subject: Re: Panic With Large Network Copy X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 30 May 2007 14:27:29 -0000 On May 29, 2007, at 4:26 PM, Kris Kennaway wrote: > On Tue, May 29, 2007 at 03:36:49PM -0700, Scott Willson wrote: >> I am seeing hard (often no core dump) crashes on a new AMD64 box >> running 6.2 RELEASE. When I try to rsync 10+ GB of backup files to >> the new box, I can reliably crash it after about 20 minutes; often >> quicker if I do something else intensive at the same time, like >> compile MySQL. Here are the box specs: >> ASUS M2NPV-VM motherboard >> AMD A64 3800+ 2.4G CPU >> 2 x 1 GB SuperTalent DDR2 667 RAM >> 2 x 500G Samsung SATA2 drives >> MATSHITADVD-ROM SR-8585 DVD drive (ancient) >> >> Most times, I don't even get a core dump. Here's one I did get: >> panic: double fault >> Uptime: 20m26s >> Dumping 2014 MB (2 chunks) >> chunk 0: 1MB (159 pages) ... ok >> chunk 1: 2014MB (515552 pages) 1998 1982 1966 1950 1934 1918 1902 >> 1886 1870 1854 1838 1822 1806 1790 1774 1758 1742 1726 1710 1694 1678 >> 1662 1646 1630 1614 1598 1582 1566 1550 1534 1518 1502 1486 1470 1454 >> 1438 1422 1406 1390 1374 1358 1342 1326 1310 1294 1278 1262 1246 1230 >> 1214 1198 1182 1166 1150 1134 1118 1102 1086 1070 1054 1038 1022 1006 >> 990 974 958 942 926 910 894 878 862 846 830 814 798 782 766 750 734 >> 718 702 686 670 654 638 622 606 590 574 558 542 526 510 494 478 462 >> 446 430 414 398 382 366 350 334 318 302 286 270 254 238 222 206 190 >> 174 158 142 126 110 94 78 62 46 30 14 >> >> #0 doadump () at pcpu.h:172 >> 172 __asm __volatile("movq %%gs:0,%0" : "=r" (td)); >> (kgdb) backtrace >> #0 doadump () at pcpu.h:172 >> #1 0x0000000000000004 in ?? () >> #2 0xffffffff803f6093 in boot (howto=260) at /usr/src/sys/kern/ >> kern_shutdown.c:409 >> #3 0xffffffff803f6696 in panic (fmt=0xffffff0079a08be0 "X??y") at / >> usr/src/sys/kern/kern_shutdown.c:565 >> #4 0xffffffff80610e70 in dblfault_handler () at /usr/src/sys/amd64/ >> amd64/trap.c:680 >> #5 0xffffffff805fe2f2 in Xdblfault () at /usr/src/sys/amd64/amd64/ >> exception.S:192 >> #6 0xffffffff80439844 in m_tag_delete_chain (m=0x0, t=0x0) at /usr/ >> src/sys/kern/uipc_mbuf2.c:346 >> #7 0xffffffff803eac0d in mb_dtor_mbuf (mem=0x0, size=0, arg=0x0) >> at / >> usr/src/sys/kern/kern_mbuf.c:338 >> #8 0xffffffff80592a24 in uma_zfree_arg (zone=0x0, item=0x0, >> udata=0x0) at /usr/src/sys/vm/uma_core.c:2270 >> #9 0xffffffff804371f0 in m_freem (mb=0x0) at uma.h:303 >> #10 0xffffffff80634125 in nve_ospackettx (ctx=0xffffff00798aac00, >> id=0xffffffffb19ea6d0, success=0) at /usr/src/sys/dev/nve/if_nve.c: >> 1551 > > This looks like a nve driver bug to me. You may wish to try the > nfe driver. > > Kris Thanks for the suggestion, Kris. I compiled a new kernel without nve, compiled nfe-20070512.tar.gz with the e1000phy.patch, and I enabled device polling: e1000phy0: on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto nfe0: Ethernet address: 00:1a:92:cb:b2:eb nfe0: [FAST] No more panics, but I see a lot of error messages under load: May 29 20:25:17 brooklyn kernel: nfe0: tx v2 error 0x6204 May 29 20:28:15 brooklyn kernel: nfe0: watchdog timeout (missed Tx interrupts) -- recovering The only odd thing about my current setup is that the server is sharing a old hub with other old hardware, and it looks like I've just got 10baseT: ifconfig nfe0 nfe0: flags=8843 mtu 1500 options=8 inet 192.168.1.154 netmask 0xffffff00 broadcast 192.168.1.255 ether 00:1a:92:cb:b2:eb media: Ethernet autoselect (10baseT/UTP ) status: active For now, I've installed an old spare Ethernet card, and I see no errors, so I'm going to roll with that for now. I'm also going to followup with the nfe driver's maintainer in case he's interested. Scott