Skip site navigation (1)Skip section navigation (2)
Date:       Mon, 17 Apr 2000 18:54:20 -0400
From:      Dave Chapeskie <dchapes@borderware.com>
To:        Adrian Chadd <adrian@freebsd.org>, Matthew Dillon <dillon@apollo.backplane.com>
Cc:        freebsd-fs@freebsd.org
Subject:   Re: vnode_free_list corruption [patch]
Message-ID:  <00Apr17.185117edt.117127@gateway.borderware.com>
In-Reply-To: <20000418042733.I59015@ewok.creative.net.au>; from Adrian Chadd on Tue, Apr 18, 2000 at 04:27:35AM %2B0800
References:  <00Apr14.141908edt.117140@gateway.borderware.com> <20000415023148.F34852@ewok.creative.net.au> <200004141835.LAA71253@apollo.backplane.com> <20000418042733.I59015@ewok.creative.net.au>

next in thread | previous in thread | raw e-mail | index | archive | help
By the way, thanks for looking into this!

On Tue, Apr 18, 2000 at 04:27:35AM +0800, Adrian Chadd wrote:
> Ok, my take on the code is this:
> 
> * with the trace given, the vnode shouldn't even be marked VDOOMED, as its
>   meant to be in use,
> * a vnode shouldn't ever reach vbusy when marked VDOOMED, as it should be
>   ref/held and so shouldn't ever be considered to be cleaned, 
> * I think a KASSERT should be added in vbusy()

Since the situation is known to happen, at least I know it does :-),
I think it should be a real call to panic as in my patch instead of a
KASSERT that is only enabled if options INVARIANTS is used.  If the
system is fixed to prevent this situation then it can always be changed
to a KASSERT (if the quick check of the flag is too slow for people).


> On my machine (400Mhz Celeron, 64mb RAM, single 4.2gig IDE disk) running
> current from a day ago, I can't reproduce the bug. Are you running with
> multiple spindles/softupdates ?

Since I was able to reproduce it on machines with very different CPUs,
memory, and disks I didn't bother to include machine specifications.
The customers that were seeing the problem most often are running "high
end" (whatever that means) machines with SCSI disks.  The machine I used
for testing was a 200 MHz Pentium machine with IDE disks.  Softupdates
was never enabled on any of these systems.


Here is the dmesg output for the machine I did most of my testing on,
only partitions on wd2 were mounted during the tests.  On my workstation
with 64MB of RAM it took much longer to happen and I had some other
processes consuming memory, so it might be easier for you to reproduce
it if you lower your available system memory to 32 MB or less (via
MAXMEM or the boot loader of course).

Also, when I did my test if it didn't paniced within 10-15 minutes it
seemed to not panic at all.  I imagine that once the number of vnodes
grows to a certain size it's just less likely to happen or something.
Most often it would panic and not only that it paniced within a few
minutes.


Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 200455163 Hz
CPU: Pentium/P55C (200.46-MHz 586-class CPU)
  Origin = "GenuineIntel"  Id = 0x543  Stepping = 3
  Features=0x8001bf<FPU,VME,DE,PSE,TSC,MSR,MCE,CX8,MMX>
real memory  = 33554432 (32768K bytes)
avail memory = 23916544 (23356K bytes)
Preloaded elf kernel "kernel" at 0xc0376000.
Preloaded userconfig_script "/boot/kernel.conf" at 0xc037609c.
Probing for devices on PCI bus 0:
chip0: <Intel 82439TX System Controller (MTXC)> rev 0x01 on pci0.0.0
chip1: <Intel 82371AB PCI to ISA bridge> rev 0x01 on pci0.1.0
ide_pci0: <Intel PIIX4 Bus-master IDE controller> rev 0x01 on pci0.1.1
chip2: <Intel 82371AB Power management controller> rev 0x01 on pci0.1.3
xl0: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 9 on pci0.10.0
xl0: Ethernet address: 00:10:4b:9e:8c:b2
xl0: autoneg complete, link status good (full-duplex, 100Mbps)
xl1: <3Com 3c905B-TX Fast Etherlink XL> rev 0x24 int a irq 10 on pci0.11.0
xl1: Ethernet address: 00:10:4b:79:0a:10
xl1: autoneg complete, link status good (half-duplex, 10Mbps)
vr0: <VIA VT3043 Rhine I 10/100BaseTX> rev 0x06 int a irq 11 on pci0.12.0
vr0: Ethernet address: 00:80:c8:ec:73:51
vr0: autoneg complete, link status good (half-duplex, 100Mbps)
Probing for PnP devices:
Probing for devices on the ISA bus:
sc0 on isa
sc0: VGA color <16 virtual consoles, flags=0x0>
atkbdc0 at 0x60-0x6f on motherboard
atkbd0 irq 1 on isa
psm0 irq 12 on isa
psm0: model Generic PS/2 mouse, device ID 0
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1 at 0x2f8-0x2ff irq 3 on isa
sio1: type 16550A
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdc0 at 0x1f0-0x1f7 irq 14 on isa
wdc0: unit 0 (wd0): <WDC AC23200L>
wd0: 3098MB (6346368 sectors), 6296 cyls, 16 heads, 63 S/T, 512 B/S
[^^^ not used/mounted at all ^^^]
wdc1 at 0x170-0x177 irq 15 on isa
wdc1: unit 0 (wd2): <FUJITSU MPC3032AT>
wd2: 3093MB (6335280 sectors), 6704 cyls, 15 heads, 63 S/T, 512 B/S
ida: port address (0xffffffff) out of range
Vendor Specific Word = ffff
Vendor Specific Word = ffff
Vendor Specific Word = ffff
Vendor Specific Word = ffff
Vendor Specific Word = ffff
vga0 at 0x3b0-0x3df maddr 0xa0000 msize 131072 on isa
npx0 on motherboard
npx0: INT 16 interface
Intel Pentium detected, installing workaround for F00F bug
IP packet filtering initialized, divert enabled, rule-based forwarding disabled, unlimited logging
changing root device to wd2s1a
Start pid=2 <pagedaemon>
Start pid=3 <vmdaemon>
Start pid=4 <syncer>
xl0: autoneg complete, link status good (full-duplex, 100Mbps)
xl1: autoneg complete, link status good (half-duplex, 10Mbps)
vr0: autoneg complete, link status good (half-duplex, 100Mbps)


> I'll look at the code some more over the next couple of days. Any opinions ?

I haven't had the time to look at the code since I came up with the
patch (which works for our setups so we're reasonably happy and I'm busy
doing other things) but after reading Kirk's opinions on the matter I'd
tend to agree and think that vbusy/vhold shouldn't be mucking with the
free list the way they do.

I'd guess that either they need to be able to check for and return
an error or else v_holdcnt should disappear in favour of just using
v_usecount.  I didn't see any semantically differences between the two
(but I didn't look too hard either).

-- 
Dave Chapeskie
Senior Software Engineer
Borderware Technologies Inc.
Mississauga, Ontario, Canada


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?00Apr17.185117edt.117127>