From owner-freebsd-questions  Mon Apr  2 13:53:37 2001
Delivered-To: freebsd-questions@freebsd.org
Received: from neko.cts.com (neko.cts.com [209.68.192.150])
	by hub.freebsd.org (Postfix) with ESMTP id E0C4037B71D
	for <freebsd-questions@freebsd.org>; Mon,  2 Apr 2001 13:53:30 -0700 (PDT)
	(envelope-from ctsmhn@cts.com)
Received: from CARTMAN (cartman.cts.com [205.163.23.192])
	by neko.cts.com (8.9.3/8.9.3) with SMTP id NAA04033
	for <freebsd-questions@freebsd.org>; Mon, 2 Apr 2001 13:53:30 -0700 (PDT)
From: "Matthew H. North" <ctsmhn@cts.com>
To: <freebsd-questions@freebsd.org>
Subject: Reproducible kernel panics, 4.2-STABLE, various hardware
Date: Mon, 2 Apr 2001 13:57:26 -0700
Message-ID: <LAEKJNMMIOPFCDDPFBDECEBICBAA.ctsmhn@cts.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700
Importance: Normal
Sender: owner-freebsd-questions@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG


Hello,

After spending hours trouble-shooting this and looking for similar problems
and solutions in the bug archive, google, etc., I'm running out of ideas.

I am responsible for maintaining a squid cache system for my company.
Currently, the system is: PIII/800MHz, 768MB RAM (1GB cache), FreeBSD
4.2-STABLE, Squid-2.3STABLE4 (compiled locally instead of using the FreeBSD
port).  Boot drive and Squid cache drives are SCSI-LVD running at 80MB/s off
of the Adaptec 29160N card.  Softupdates are not enabled on *any* of the
drives in this system.

Unfortunately this system has been plagued by kernel panics for a number of
weeks now.  I originally assumed there was some sort of hardware problem,
but I've since replaced every last piece of hardware: MB, RAM, CPU, SCSI
card, SCSI cable, SCSI drives, video card, NICs.  Every last piece of
hardware has been replaced, yet the kernel panics persists on a fairly
regular basis.

Here's the scenario:

Start with a freshly installed 4.2-STABLE O/S, and a freshly installed
2.3STABLE4 Squid distribution, on a system identical to, or very similar to,
the one described above.

Start squid and begin hitting the system with a fairly consistent load (an
average of about 2.3 million hits per day, or ~1600 HPM, average request
object size about 5kb).  Let the squid cache drives fill up.  After the
cache drives have filled, wait about 48 to 72 hours.  The kernel will panic
with the output shown below.  From this point forward, the kernel will
continue to panic at about the same interval: every 48 to 72 hours, provided
the cache drives are full (I have yet to see a panic when not *all* of the
cache drives are full, although that doesn't mean it won't happen).

Here's the output from the kernel and DDB (dmesg output is at the bottom of
this email):

Panic Output
------------
mode = 0100644, num = 2804226, fs = /usr/local/squid/cache2
panic: ffs_valloc: dup alloc
debugger ("panic")
stopped at Debugger +0x34: movb $0, in Debugger .396


DDB Trace Output
----------------
Debugger(c0237e03) at Debugger+0x34
panic(c0243a01,c02439e0,81a4,115934,c2fbf0d4) at panic+0x70
ffs_valloc(d8b04a40,81a4,c302d680,d89f6ca4,d89f6e00) at ffs_valloc+0xf8
ufs_makeinode(81a4,d8b04a40,d89f6ee0,d89f6ef4) at ufs_makeinode+0x57
ufs_create(d89f6e00,d89f6e74,c0181960,d89f6e00,0) at ufs_create+0x28
ufs_vnoperate(d89f6e00,0,c33094c0,d89f6f80,95) at ufs_vnoperate+0x15
vn_open(d89f6ecc,60e,1a4,d626d560,3) at vn_open+0x10c
open(d626d560,d89f6f80,80a2bb8,bfbffdd0,bfbffddc) at open+0xb8
syscall2(bfbf002f,bfbf002f,bfbf002f,bfbffddc,bfbffdd0) at syscall2+0x1f1
Xint0x80_syscall() at Xint0x80_syscall+0x25


It is always the same panic: 'ffs_valloc: dup alloc'.  However, if multiple
cache drives are being used, the 'fs = ' will change randomly - it doesn't
target any specific drive or fs.

As mentioned previously, *everything* in this system has been replaced:

Adaptec 29160N card
2x fxp? NICs (Intel EtherPro 10/100)
Trident PCI VGA
RAM
Pentium III/800MHz CPU
Motherboard
OS reinstalled
Squid reinstalled
Squid cache drives erased

Yet the panics persist.  I need to comment at this point that user software
(in this case squid) should never be able to trigger kernel panics.  This
said, I'm getting to the point where I must conclude one of the following:

- A particular brand or version of hardware we're using is manufactured with
a defect
- The OS has a bug

Can anyone comment on this problem?  Given that it's so easily reproducible,
I imagine someone must have seen this before.  I'm also particularly
interested in knowing what circumstances produce the panic above.  It
appears (by my own interpretation of the FreeBSD code) to be an error where
the kernel tries to allocate an already-used i-node.  But under what
circumstances can that happen?  NOTE that all file systems test clean upon
bootup prior to this error occurring.  So if file systems are being
corrupted, it's happening during *normal* operation of the system.

- Matt

Matthew H. North
Software Engineer
CTSnet Inc., an Allegiance Telecom Company
mailto:ctsmhn@cts.com
t 858.637.3600
f 858.637.3630

dmesg output
------------
Copyright (c) 1992-2001 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.2-STABLE #0: Tue Feb 27 11:51:43 PST 2001
    XXXX@XXX.XXX.com:/usr/obj/usr/src/sys/WEBPROXY
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 801823558 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (801.82-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x683  Stepping = 3

Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,
PAT,PSE36,MMX,FXSR,SSE>
real memory  = 536870912 (524288K bytes)
avail memory = 519684096 (507504K bytes)
Preloaded elf kernel "kernel" at 0xc02d8000.
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
pcib2: <VIA 82C598MVP (Apollo MVP3) PCI-PCI (AGP) bridge> at device 1.0 on
pci0
pci1: <PCI bus> on pcib2
isab0: <VIA 82C596B PCI-ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
pci0: <VIA Apollo ATA controller> at 7.1
fxp0: <Intel Pro 10/100B/100+ Ethernet> port 0xd800-0xd83f mem
0xd5000000-0xd50fffff,0xd5201000-0xd5201fff irq 11 at device 15.0 on pci0
fxp0: Ethernet address XX:90:27:d1:12:XX
fxp1: <Intel Pro 10/100B/100+ Ethernet> port 0xdc00-0xdc3f mem
0xd5100000-0xd51fffff,0xd5200000-0xd5200fff irq 10 at device 17.0 on pci0
fxp1: Ethernet address XX:d0:b7:5d:d0:XX
ahc0: <Adaptec 29160 Ultra160 SCSI adapter> port 0xe000-0xe0ff mem
0xd5202000-0xd5202fff irq 12 at device 18.0 on pci0
aic7892: Wide Channel A, SCSI Id=7, 32/255 SCBs
pcib1: <Host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib1
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> on isa0
sc0: VGA <16 virtual consoles, flags=0x200>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
Waiting 10 seconds for SCSI devices to settle
Mounting root from ufs:/dev/da0s1a
da1 at ahc0 bus 0 target 1 lun 0
da1: <SEAGATE ST150176LW 0002> Fixed Direct Access SCSI-2 device
da1: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing
Enabled
da1: 47702MB (97693755 512 byte sectors: 255H 63S/T 6081C)
da2 at ahc0 bus 0 target 2 lun 0
da2: <SEAGATE ST136403LW 0002> Fixed Direct Access SCSI-2 device
da2: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing
Enabled
da2: 34732MB (71132960 512 byte sectors: 255H 63S/T 4427C)
da3 at ahc0 bus 0 target 3 lun 0
da3: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device
da3: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing
Enabled
da3: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da0 at ahc0 bus 0 target 0 lun 0
da0: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device
da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing
Enabled
da0: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-questions" in the body of the message