From owner-freebsd-questions Mon Apr 2 13:53:37 2001 Delivered-To: freebsd-questions@freebsd.org Received: from neko.cts.com (neko.cts.com [209.68.192.150]) by hub.freebsd.org (Postfix) with ESMTP id E0C4037B71D for ; Mon, 2 Apr 2001 13:53:30 -0700 (PDT) (envelope-from ctsmhn@cts.com) Received: from CARTMAN (cartman.cts.com [205.163.23.192]) by neko.cts.com (8.9.3/8.9.3) with SMTP id NAA04033 for ; Mon, 2 Apr 2001 13:53:30 -0700 (PDT) From: "Matthew H. North" To: Subject: Reproducible kernel panics, 4.2-STABLE, various hardware Date: Mon, 2 Apr 2001 13:57:26 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Importance: Normal Sender: owner-freebsd-questions@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hello, After spending hours trouble-shooting this and looking for similar problems and solutions in the bug archive, google, etc., I'm running out of ideas. I am responsible for maintaining a squid cache system for my company. Currently, the system is: PIII/800MHz, 768MB RAM (1GB cache), FreeBSD 4.2-STABLE, Squid-2.3STABLE4 (compiled locally instead of using the FreeBSD port). Boot drive and Squid cache drives are SCSI-LVD running at 80MB/s off of the Adaptec 29160N card. Softupdates are not enabled on *any* of the drives in this system. Unfortunately this system has been plagued by kernel panics for a number of weeks now. I originally assumed there was some sort of hardware problem, but I've since replaced every last piece of hardware: MB, RAM, CPU, SCSI card, SCSI cable, SCSI drives, video card, NICs. Every last piece of hardware has been replaced, yet the kernel panics persists on a fairly regular basis. Here's the scenario: Start with a freshly installed 4.2-STABLE O/S, and a freshly installed 2.3STABLE4 Squid distribution, on a system identical to, or very similar to, the one described above. Start squid and begin hitting the system with a fairly consistent load (an average of about 2.3 million hits per day, or ~1600 HPM, average request object size about 5kb). Let the squid cache drives fill up. After the cache drives have filled, wait about 48 to 72 hours. The kernel will panic with the output shown below. From this point forward, the kernel will continue to panic at about the same interval: every 48 to 72 hours, provided the cache drives are full (I have yet to see a panic when not *all* of the cache drives are full, although that doesn't mean it won't happen). Here's the output from the kernel and DDB (dmesg output is at the bottom of this email): Panic Output ------------ mode = 0100644, num = 2804226, fs = /usr/local/squid/cache2 panic: ffs_valloc: dup alloc debugger ("panic") stopped at Debugger +0x34: movb $0, in Debugger .396 DDB Trace Output ---------------- Debugger(c0237e03) at Debugger+0x34 panic(c0243a01,c02439e0,81a4,115934,c2fbf0d4) at panic+0x70 ffs_valloc(d8b04a40,81a4,c302d680,d89f6ca4,d89f6e00) at ffs_valloc+0xf8 ufs_makeinode(81a4,d8b04a40,d89f6ee0,d89f6ef4) at ufs_makeinode+0x57 ufs_create(d89f6e00,d89f6e74,c0181960,d89f6e00,0) at ufs_create+0x28 ufs_vnoperate(d89f6e00,0,c33094c0,d89f6f80,95) at ufs_vnoperate+0x15 vn_open(d89f6ecc,60e,1a4,d626d560,3) at vn_open+0x10c open(d626d560,d89f6f80,80a2bb8,bfbffdd0,bfbffddc) at open+0xb8 syscall2(bfbf002f,bfbf002f,bfbf002f,bfbffddc,bfbffdd0) at syscall2+0x1f1 Xint0x80_syscall() at Xint0x80_syscall+0x25 It is always the same panic: 'ffs_valloc: dup alloc'. However, if multiple cache drives are being used, the 'fs = ' will change randomly - it doesn't target any specific drive or fs. As mentioned previously, *everything* in this system has been replaced: Adaptec 29160N card 2x fxp? NICs (Intel EtherPro 10/100) Trident PCI VGA RAM Pentium III/800MHz CPU Motherboard OS reinstalled Squid reinstalled Squid cache drives erased Yet the panics persist. I need to comment at this point that user software (in this case squid) should never be able to trigger kernel panics. This said, I'm getting to the point where I must conclude one of the following: - A particular brand or version of hardware we're using is manufactured with a defect - The OS has a bug Can anyone comment on this problem? Given that it's so easily reproducible, I imagine someone must have seen this before. I'm also particularly interested in knowing what circumstances produce the panic above. It appears (by my own interpretation of the FreeBSD code) to be an error where the kernel tries to allocate an already-used i-node. But under what circumstances can that happen? NOTE that all file systems test clean upon bootup prior to this error occurring. So if file systems are being corrupted, it's happening during *normal* operation of the system. - Matt Matthew H. North Software Engineer CTSnet Inc., an Allegiance Telecom Company mailto:ctsmhn@cts.com t 858.637.3600 f 858.637.3630 dmesg output ------------ Copyright (c) 1992-2001 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.2-STABLE #0: Tue Feb 27 11:51:43 PST 2001 XXXX@XXX.XXX.com:/usr/obj/usr/src/sys/WEBPROXY Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 801823558 Hz CPU: Pentium III/Pentium III Xeon/Celeron (801.82-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x683 Stepping = 3 Features=0x383f9ff real memory = 536870912 (524288K bytes) avail memory = 519684096 (507504K bytes) Preloaded elf kernel "kernel" at 0xc02d8000. Pentium Pro MTRR support enabled npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard pci0: on pcib0 pcib2: at device 1.0 on pci0 pci1: on pcib2 isab0: at device 7.0 on pci0 isa0: on isab0 pci0: at 7.1 fxp0: port 0xd800-0xd83f mem 0xd5000000-0xd50fffff,0xd5201000-0xd5201fff irq 11 at device 15.0 on pci0 fxp0: Ethernet address XX:90:27:d1:12:XX fxp1: port 0xdc00-0xdc3f mem 0xd5100000-0xd51fffff,0xd5200000-0xd5200fff irq 10 at device 17.0 on pci0 fxp1: Ethernet address XX:d0:b7:5d:d0:XX ahc0: port 0xe000-0xe0ff mem 0xd5202000-0xd5202fff irq 12 at device 18.0 on pci0 aic7892: Wide Channel A, SCSI Id=7, 32/255 SCBs pcib1: on motherboard pci2: on pcib1 fdc0: at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: at port 0x60,0x64 on isa0 atkbd0: irq 1 on atkbdc0 vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: on isa0 sc0: VGA <16 virtual consoles, flags=0x200> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A Waiting 10 seconds for SCSI devices to settle Mounting root from ufs:/dev/da0s1a da1 at ahc0 bus 0 target 1 lun 0 da1: Fixed Direct Access SCSI-2 device da1: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled da1: 47702MB (97693755 512 byte sectors: 255H 63S/T 6081C) da2 at ahc0 bus 0 target 2 lun 0 da2: Fixed Direct Access SCSI-2 device da2: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled da2: 34732MB (71132960 512 byte sectors: 255H 63S/T 4427C) da3 at ahc0 bus 0 target 3 lun 0 da3: Fixed Direct Access SCSI-2 device da3: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled da3: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C) da0 at ahc0 bus 0 target 0 lun 0 da0: Fixed Direct Access SCSI-2 device da0: 80.000MB/s transfers (40.000MHz, offset 15, 16bit), Tagged Queueing Enabled da0: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C) To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-questions" in the body of the message