Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 2 Mar 2002 21:38:02 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Kirk McKusick <mckusick@chez.McKusick.COM>
Cc:        Eugene Grosbein <eugen@www.svzserv.kemerovo.su>, stable@FreeBSD.ORG
Subject:   Re: 4.5-STABLE softupdates brokeness: repeated panics and lockups
Message-ID:  <200203030538.g235c2l59112@apollo.backplane.com>
References:   <20020303120121.A2197@svzserv.kemerovo.su>

next in thread | previous in thread | raw e-mail | index | archive | help
    I'm adding Kirk.  Kirk, something doesn't feel right about 
    interlocked_sleep()  I don't think it's supposed to be able to interrupt
    where it interrupted it in the backtrace below.

:Urgent! Please help!
:
:My quite old 4.5-STABLE system suffered from hanging network connections.
:Turning off syncookies helped but I've read this has already been fixed
:in -STABLE so 1 March 2002 I ran cvsup and rebuilt kernel and world
:as usual. Now I state that softupdates code is BROKEN for me.
:
:That night my server crashed hard. I have options DDB and DDB_UNATTENDED,
:my kernel is build with debugging symbols and I have savecore enabled
:in /etc/rc.conf and have enough swap space and disk space in /var
:so server should leave core and restart after panic. It failed to do that.
:Usually I lock the console with vlock and this prevented me to escape
:to DDB, I was forced to turn power off and on next morning.
:Nothing suspictious in logs besides this:
:...
:So I left console unlocked 2 March and today it crashed again.
:Well, that was kernel panic and system locked after 'syncing disks...' message,
:no one character printed after '...'. The panic reason was
:'panic: softdep_setup_allocdirect: lost block'.
:
:It was possible to escape to DDB and say 'trace' and 'panic', so
:I have got crashdump. The last message in log again was:
:
:Mar  3 09:57:33 <kern.crit> www /kernel: z_decompress0: inflate returned -2 ()
:
:After reboot I started to investigate and suddenly it crashed again!
:And the last message in log again was:
:
:Mar  3 10:33:51 <kern.crit> www /kernel: z_decompress0: inflate returned -2 ()
:
:Uptime was only half an hour, eh?
:
:So I decided to turn softupdates off with tunefs on all of my filesystems.
:The root filesystem had softpupdates already turned off.

    Did turning softupdates off solve the problem?  If you turn it on and
    set kern.maxvnodes to 9999999 does that also solve the problem or does
    the problem re-occur?

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

:#12 0xc0248567 in atkbd_intr (kbd=0xc02f8740, arg=0x0)
:    at /home3/src/sys/dev/kbd/atkbd.c:462
:#13 0xc0277c24 in atkbd_isa_intr (arg=0xc02f8740)
:    at /home3/src/sys/isa/atkbd_isa.c:140
:#14 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff2ac)
:    at /home3/src/sys/kern/kern_random.c:247
:#15 0xc01f2798 in softdep_disk_write_complete (bp=0xc75b3c38)
:    at /home3/src/sys/ufs/ffs/ffs_softdep.c:3248
:#16 0xc01701f2 in vfs_backgroundwritedone (bp=0xc75b3c38)
:    at /home3/src/sys/kern/vfs_bio.c:742
:#17 0xc01725c4 in biodone (bp=0xc75b3c38) at /home3/src/sys/kern/vfs_bio.c:2701
:#18 0xc023e522 in ad_interrupt (request=0xc2892200)
:    at /home3/src/sys/dev/ata/ata-disk.c:703
:#19 0xc0238aff in ata_intr (data=0xc229cb80)
:    at /home3/src/sys/dev/ata/ata-all.c:1231
:#20 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff348)
:    at /home3/src/sys/kern/kern_random.c:247
:#21 0xc0259ab2 in vec14 ()
:#22 0xc01ef086 in interlocked_sleep (lk=0xc02bfe7c, op=1, ident=0xce1a6884, 
:    flags=17, wmesg=0xc029301f "drainvp", timo=0)
:    at /home3/src/sys/ufs/ffs/ffs_softdep.c:329
:#23 0xc01f4a4e in drain_output (vp=0xce1a6840, islocked=1)
:    at /home3/src/sys/ufs/ffs/ffs_softdep.c:4913
:#24 0xc01f3812 in softdep_fsync_mountdev (vp=0xce1a6840)
:---Type <return> to continue, or q <return> to quit---
:    at /home3/src/sys/ufs/ffs/ffs_softdep.c:4056
:#25 0xc01f7b7a in ffs_fsync (ap=0xcef85c04)
:    at /home3/src/sys/ufs/ffs/ffs_vnops.c:134
:#26 0xc01f67cc in ffs_sync (mp=0xc234dc00, waitfor=2, cred=0xc0a78680, 
:    p=0xc03003a0) at vnode_if.h:558
:#27 0xc017aa47 in sync (p=0xc03003a0, uap=0x0)
:    at /home3/src/sys/kern/vfs_syscalls.c:554
:#28 0xc0149ab7 in boot (howto=256) at /home3/src/sys/kern/kern_shutdown.c:235
:#29 0xc014a149 in panic (
:    fmt=0xc0291d60 "softdep_setup_allocdirect: lost block")
:    at /home3/src/sys/kern/kern_shutdown.c:595
:#30 0xc01f0150 in softdep_setup_allocdirect (ip=0xc2a21900, lbn=0, 
:    newblkno=398160, oldblkno=394920, newsize=8192, oldsize=8192, 
:    bp=0xc758021c) at /home3/src/sys/ufs/ffs/ffs_softdep.c:1326
:#31 0xc01eb0b3 in ffs_reallocblks (ap=0xcef85dd0)
:    at /home3/src/sys/ufs/ffs/ffs_alloc.c:476
:#32 0xc0174992 in cluster_write (bp=0xc758e7a8, filesize=65536, seqcount=10)
:    at vnode_if.h:1077
:#33 0xc01f765f in ffs_write (ap=0xcef85e74)
:    at /home3/src/sys/ufs/ufs/ufs_readwrite.c:537
:#34 0xc017f972 in vn_write (fp=0xc2a31a40, uio=0xcef85ee0, cred=0xc2df7b80, 
:    flags=0, p=0xced96040) at vnode_if.h:363
:#35 0xc015908e in dofilewrite (p=0xced96040, fp=0xc2a31a40, fd=4, 
:    buf=0x8058000, nbyte=8192, offset=-1, flags=0)
:---Type <return> to continue, or q <return> to quit---
:    at /home3/src/sys/sys/file.h:162
:#36 0xc0158f3f in write (p=0xced96040, uap=0xcef85f80)
:    at /home3/src/sys/kern/sys_generic.c:329
:#37 0xc0265551 in syscall2 (frame={tf_fs = -1072431057, tf_es = -1070727121, 
:      tf_ds = -1070727121, tf_edi = 134578176, tf_esi = 403821508, 
:      tf_ebp = -1077947140, tf_isp = -822583340, tf_ebx = 403764804, 
:      tf_edx = 403821508, tf_ecx = 403821508, tf_eax = 4, tf_trapno = 7, 
:      tf_err = 2, tf_eip = 403517864, tf_cs = 31, tf_eflags = 514, 
:      tf_esp = -1077947164, tf_ss = 47})
:    at /home3/src/sys/i386/i386/trap.c:1167
:#38 0xc0258615 in Xint0x80_syscall ()
:#39 0x18104bd9 in ?? ()
:#40 0x18104b56 in ?? ()
:#41 0x18101946 in ?? ()
:#42 0x180eb05a in ?? ()
:#43 0x804a67a in ?? ()
:#44 0x804affc in ?? ()
:#45 0x804bf4e in ?? ()
:#46 0x804d7a3 in ?? ()
:#47 0x80499f5 in ?? ()
:(kgdb) quit
:
:Script done on Sun Mar  3 11:07:41 2002
:
:Again, I have my 256M crashdump and will answer to any questions but
:I cannot investigate this more deeply myself, I'm not a kernel hacker.
:
:Here are my disks:
:
:/dev/ad0s1a                          49583    35145    10472    77%    /
:/dev/ad0s1g                         992239   290185   622675    32%    /home
:/dev/ad0s1h                        2822646  2072484   524351    80%    /home2
:/dev/ad0s1e                        1488663  1195741   173829    87%    /usr
:/dev/ad0s1f                         496111   361090    95333    79%    /var
:/dev/ad1s1e                        9880414  5920704  3169278    65%    /home4
:/dev/ad2s1e                        9807006  8191954   830492    91%    /home3
:
:Here is my /etc/sysctl.conf:
:
:kern.ipc.somaxconn=1024
:kern.maxfiles=10000
:net.inet.ip.portrange.hifirst=49152
:net.inet.ip.portrange.hilast=49600
:net.inet.tcp.always_keepalive=1
:net.inet.tcp.sendspace=32768
:net.inet.tcp.recvspace=32768
:net.inet.tcp.rfc1644=1
:vfs.vmiodirenable=1
:
:I have CPUTYPE=i686 in /etc/make.conf and no other optimizations.
:
:At last, here is my kernel config:
:
:# WWW kernel config
:# 2 Nov 2001
:
:machine		i386
:#cpu		I386_CPU
:#cpu		I486_CPU
:cpu		I586_CPU
:cpu		I686_CPU
:ident		WWW
:maxusers	128
:options		MAXDSIZ=(256*1024*1024)
:options		DFLDSIZ=(256*1024*1024)
:
:makeoptions	DEBUG=-g		#Build kernel with gdb(1) debug symbols
:
:#options 	MATH_EMULATE		#Support for x87 emulation
:options		CLK_CALIBRATION_LOOP
:options		CLK_USE_I8254_CALIBRATION
:options		CLK_USE_TSC_CALIBRATION
:
:options 	INET			#InterNETworking
:#options 	INET6			#IPv6 communications protocols
:options 	FFS			#Berkeley Fast Filesystem
:options 	FFS_ROOT		#FFS usable as root device [keep this!]
:options 	SOFTUPDATES		#Enable FFS soft updates support
:options 	MFS			#Memory Filesystem
:#options 	MD_ROOT			#MD is a potential root device
:options 	NFS			#Network Filesystem
:#options 	NFS_ROOT		#NFS usable as root device, NFS required
:#options 	MSDOSFS			#MSDOS Filesystem
:options 	CD9660			#ISO 9660 Filesystem
:options 	CD9660_ROOT		#CD-ROM usable as root, CD9660 required
:#options 	PROCFS			#Process filesystem
:options 	COMPAT_43		#Compatible with BSD 4.3 [KEEP THIS!]
:options 	SCSI_DELAY=15000	#Delay (in ms) before probing SCSI
:options 	UCONSOLE		#Allow users to grab the console
:options 	USERCONFIG		#boot -c editor
:options 	VISUAL_USERCONFIG	#visual boot -c editor
:options 	KTRACE			#ktrace(1) support
:options 	SYSVSHM			#SYSV-style shared memory
:options 	SYSVMSG			#SYSV-style message queues
:options 	SYSVSEM			#SYSV-style semaphores
:options		SHMMAXPGS=4096
:options 	P1003_1B		#Posix P1003_1B real-time extensions
:options 	_KPOSIX_PRIORITY_SCHEDULING
:options		ICMP_BANDLIM		#Rate limit bad replies
:options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
:options		PPP_BSDCOMP
:options		PPP_DEFLATE
:options		PPP_FILTER
:options		NSWAPDEV=4
:options		MSGBUF_SIZE=140960
:
:device		isa
:options		"AUTO_EOI_1"
:
:device		eisa
:device		pci
:
:# Floppy drives
:device		fdc0	at isa? port IO_FD1 irq 6 drq 2
:device		fd0	at fdc0 drive 0
:#device		fd1	at fdc0 drive 1
:#
:# If you have a Toshiba Libretto with its Y-E Data PCMCIA floppy,
:# don't use the above line for fdc0 but the following one:
:#device		fdc0
:
:# ATA and ATAPI devices
:#device		ata0	at isa? port IO_WD1 irq 14
:#device		ata1	at isa? port IO_WD2 irq 15
:device		ata
:device		atadisk			# ATA disk drives
:device		atapicd			# ATAPI CDROM drives
:#device		atapifd			# ATAPI floppy drives
:#device		atapist			# ATAPI tape drives
:options 	ATA_STATIC_ID		#Static device numbering
:
:# atkbdc0 controls both the keyboard and the PS/2 mouse
:device		atkbdc0	at isa? port IO_KBD
:device		atkbd0	at atkbdc? irq 1 flags 0x1
:#device		psm0	at atkbdc? irq 12
:
:device		vga0	at isa?
:options		VESA
:
:# splash screen/screen saver
:pseudo-device	splash
:
:# syscons is the default console driver, resembling an SCO console
:device		sc0	at isa? flags 0x100
:options		MAXCONS=16
:options		SC_HISTORY_SIZE=1000
:
:# Floating point support - do not disable.
:device		npx0	at nexus? port IO_NPX irq 13
:
:# Power management support (see LINT for more options)
:#device		apm0    at nexus? disable flags 0x20 # Advanced Power Management
:
:# Serial (COM) ports
:device		sio0	at isa? port IO_COM1 flags 0x10 irq 4
:device		sio1	at isa? port IO_COM2 irq 3
:#device		sio2	at isa? disable port IO_COM3 irq 5
:#device		sio3	at isa? disable port IO_COM4 irq 9
:
:# Parallel port
:device		ppc0	at isa? irq 7
:device		ppbus		# Parallel port bus (required)
:device		lpt		# Printer
:#device		plip		# TCP/IP over parallel
:device		ppi		# Parallel port interface device
:#device		vpo		# Requires scbus and da
:
:# PCI Ethernet NICs.
:#device		de		# DEC/Intel DC21x4x (``Tulip'')
:#device		txp		# 3Com 3cR990 (``Typhoon'')
:#device		vx		# 3Com 3c590, 3c595 (``Vortex'')
:
:# PCI Ethernet NICs that use the common MII bus controller code.
:# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs!
:device		miibus		# MII bus support
:#device		dc		# DEC/Intel 21143 and various workalikes
:device		fxp		# Intel EtherExpress PRO/100B (82557, 82558)
:#device		pcn		# AMD Am79C97x PCI 10/100 NICs
:#device		rl		# RealTek 8129/8139
:#device		sf		# Adaptec AIC-6915 (``Starfire'')
:#device		sis		# Silicon Integrated Systems SiS 900/SiS 7016
:#device		ste		# Sundance ST201 (D-Link DFE-550TX)
:#device		tl		# Texas Instruments ThunderLAN
:#device		tx		# SMC EtherPower II (83c170 ``EPIC'')
:#device		vr		# VIA Rhine, Rhine II
:#device		wb		# Winbond W89C840F
:#device		wx		# Intel Gigabit Ethernet Card (``Wiseman'')
:#device		xl		# 3Com 3c90x (``Boomerang'', ``Cyclone'')
:
:device pcm0 at isa? port ? irq 5 drq 1
:
:# Pseudo devices - the number indicates how many units to allocate.
:pseudo-device	loop		# Network loopback
:pseudo-device	ether		# Ethernet support
:#pseudo-device	sl	1	# Kernel SLIP
:pseudo-device	ppp	3	# Kernel PPP
:pseudo-device	tun		# Packet tunnel.
:pseudo-device	pty	64	# Pseudo-ttys (telnet etc)
:pseudo-device	snp	8
:pseudo-device	vn
:pseudo-device	gzip
:pseudo-device	speaker
:#pseudo-device	md		# Memory "disks"
:pseudo-device	gif		# IPv6 and IPv4 tunneling
:#pseudo-device	faith	1	# IPv6-to-IPv4 relaying (translation)
:
:# The `bpf' pseudo-device enables the Berkeley Packet Filter.
:# Be aware of the administrative consequences of enabling this!
:pseudo-device	bpf		#Berkeley packet filter
:
:options		QUOTA
:options 	IPFIREWALL
:options 	IPFIREWALL_VERBOSE
:#options 	IPFIREWALL_VERBOSE_LIMIT=100
:options 	IPDIVERT
:options 	IPFIREWALL_FORWARD
:options 	TCP_DROP_SYNFIN		#drop TCP packets with SYN+FIN
:options 	DUMMYNET
:options		NMBCLUSTERS=8192
:options 	IBCS2
:options		DDB
:options		DDB_UNATTENDED
:options		RANDOM_IP_ID
:options		UFS_DIRHASH
:options		USER_LDT
:options		UCONSOLE
:
:#end of file
:
:Feel free to request any information.
:I'd like to help resolve this ASAP.
:
:Eugene Grosbein

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200203030538.g235c2l59112>