Date: Mon, 17 Apr 2000 18:54:20 -0400 From: Dave Chapeskie <dchapes@borderware.com> To: Adrian Chadd <adrian@freebsd.org>, Matthew Dillon <dillon@apollo.backplane.com> Cc: freebsd-fs@freebsd.org Subject: Re: vnode_free_list corruption [patch] Message-ID: <00Apr17.185117edt.117127@gateway.borderware.com> In-Reply-To: <20000418042733.I59015@ewok.creative.net.au>; from Adrian Chadd on Tue, Apr 18, 2000 at 04:27:35AM %2B0800 References: <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au>
next in thread | previous in thread | raw e-mail | index | archive | help
By the way, thanks for looking into this! On Tue, Apr 18, 2000 at 04:27:35AM +0800, Adrian Chadd wrote: > Ok, my take on the code is this: > > * with the trace given, the vnode shouldn't even be marked VDOOMED, as its > meant to be in use, > * a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be > ref/held and so shouldn't ever be considered to be cleaned, > * I think a KASSERT should be added in vbusy() Since the situation is known to happen, at least I know it does :-), I think it should be a real call to panic as in my patch instead of a KASSERT that is only enabled if options INVARIANTS is used. If the system is fixed to prevent this situation then it can always be changed to a KASSERT (if the quick check of the flag is too slow for people). > On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running > current from a day ago, I can't reproduce the bug. Are you running with > multiple spindles/softupdates ? Since I was able to reproduce it on machines with very different CPUs, memory, and disks I didn't bother to include machine specifications. The customers that were seeing the problem most often are running "high end" (whatever that means) machines with SCSI disks. The machine I used for testing was a 200 MHz Pentium machine with IDE disks. Softupdates was never enabled on any of these systems. Here is the dmesg output for the machine I did most of my testing on, only partitions on wd2 were mounted during the tests. On my workstation with 64MB of RAM it took much longer to happen and I had some other processes consuming memory, so it might be easier for you to reproduce it if you lower your available system memory to 32 MB or less (via MAXMEM or the boot loader of course). Also, when I did my test if it didn't paniced within 10-15 minutes it seemed to not panic at all. I imagine that once the number of vnodes grows to a certain size it's just less likely to happen or something. Most often it would panic and not only that it paniced within a few minutes. Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 200455163 Hz CPU: Pentium/P55C (200.46-MHz 586-class CPU) Origin = "GenuineIntel" Id = 0x543 Stepping = 3 Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX> real memory = 33554432 (32768K bytes) avail memory = 23916544 (23356K bytes) Preloaded elf kernel "kernel" at 0xc0376000. Preloaded userconfig_script "/boot/kernel.conf" at 0xc037609c. Probing for devices on PCI bus 0: chip0: <Intel 82439TX System Controller (MTXC)> rev 0x01 on pci0.0.0 chip1: <Intel 82371AB PCI to ISA bridge> rev 0x01 on pci0.1.0 ide_pci0: <Intel PIIX4 Bus-master IDE controller> rev 0x01 on pci0.1.1 chip2: <Intel 82371AB Power management controller> rev 0x01 on pci0.1.3 xl0: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 9 on pci0.10.0 xl0: Ethernet address: 00:10:4b:9e:8c:b2 xl0: autoneg complete, link status good (full-duplex, 100Mbps) xl1: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 10 on pci0.11.0 xl1: Ethernet address: 00:10:4b:79:0a:10 xl1: autoneg complete, link status good (half-duplex, 10Mbps) vr0: <VIA VT3043 Rhine I 10/100BaseTX> rev 0x06 int a irq 11 on pci0.12.0 vr0: Ethernet address: 00:80:c8:ec:73:51 vr0: autoneg complete, link status good (half-duplex, 100Mbps) Probing for PnP devices: Probing for devices on the ISA bus: sc0 on isa sc0: VGA color <16 virtual consoles, flags=0x0> atkbdc0 at 0x60-0x6f on motherboard atkbd0 irq 1 on isa psm0 irq 12 on isa psm0: model Generic PS/2 mouse, device ID 0 sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1 at 0x2f8-0x2ff irq 3 on isa sio1: type 16550A fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in wdc0 at 0x1f0-0x1f7 irq 14 on isa wdc0: unit 0 (wd0): <WDC AC23200L> wd0: 3098MB (6346368 sectors), 6296 cyls, 16 heads, 63 S/T, 512 B/S [^^^ not used/mounted at all ^^^] wdc1 at 0x170-0x177 irq 15 on isa wdc1: unit 0 (wd2): <FUJITSU MPC3032AT> wd2: 3093MB (6335280 sectors), 6704 cyls, 15 heads, 63 S/T, 512 B/S ida: port address (0xffffffff) out of range Vendor Specific Word = ffff Vendor Specific Word = ffff Vendor Specific Word = ffff Vendor Specific Word = ffff Vendor Specific Word = ffff vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa npx0 on motherboard npx0: INT 16 interface Intel Pentium detected, installing workaround for F00F bug IP packet filtering initialized, divert enabled, rule-based forwarding disabled, unlimited logging changing root device to wd2s1a Start pid=2 <pagedaemon> Start pid=3 <vmdaemon> Start pid=4 <syncer> xl0: autoneg complete, link status good (full-duplex, 100Mbps) xl1: autoneg complete, link status good (half-duplex, 10Mbps) vr0: autoneg complete, link status good (half-duplex, 100Mbps) > I'll look at the code some more over the next couple of days. Any opinions ? I haven't had the time to look at the code since I came up with the patch (which works for our setups so we're reasonably happy and I'm busy doing other things) but after reading Kirk's opinions on the matter I'd tend to agree and think that vbusy/vhold shouldn't be mucking with the free list the way they do. I'd guess that either they need to be able to check for and return an error or else v_holdcnt should disappear in favour of just using v_usecount. I didn't see any semantically differences between the two (but I didn't look too hard either). -- Dave Chapeskie Senior Software Engineer Borderware Technologies Inc. Mississauga, Ontario, Canada To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00Apr17.185117edt.117127>