Date: Sat, 2 Mar 2002 21:38:02 -0800 (PST) From: Matthew Dillon <dillon@apollo.backplane.com> To: Kirk McKusick <mckusick@chez.McKusick.COM> Cc: Eugene Grosbein <eugen@www.svzserv.kemerovo.su>, stable@FreeBSD.ORG Subject: Re: 4.5-STABLE softupdates brokeness: repeated panics and lockups Message-ID: <200203030538.g235c2l59112@apollo.backplane.com> References: <20020303120121.A2197@svzserv.kemerovo.su>
next in thread | previous in thread | raw e-mail | index | archive | help
I'm adding Kirk. Kirk, something doesn't feel right about interlocked_sleep() I don't think it's supposed to be able to interrupt where it interrupted it in the backtrace below. :Urgent! Please help! : :My quite old 4.5-STABLE system suffered from hanging network connections. :Turning off syncookies helped but I've read this has already been fixed :in -STABLE so 1 March 2002 I ran cvsup and rebuilt kernel and world :as usual. Now I state that softupdates code is BROKEN for me. : :That night my server crashed hard. I have options DDB and DDB_UNATTENDED, :my kernel is build with debugging symbols and I have savecore enabled :in /etc/rc.conf and have enough swap space and disk space in /var :so server should leave core and restart after panic. It failed to do that. :Usually I lock the console with vlock and this prevented me to escape :to DDB, I was forced to turn power off and on next morning. :Nothing suspictious in logs besides this: :... :So I left console unlocked 2 March and today it crashed again. :Well, that was kernel panic and system locked after 'syncing disks...' message, :no one character printed after '...'. The panic reason was :'panic: softdep_setup_allocdirect: lost block'. : :It was possible to escape to DDB and say 'trace' and 'panic', so :I have got crashdump. The last message in log again was: : :Mar 3 09:57:33 <kern.crit> www /kernel: z_decompress0: inflate returned -2 () : :After reboot I started to investigate and suddenly it crashed again! :And the last message in log again was: : :Mar 3 10:33:51 <kern.crit> www /kernel: z_decompress0: inflate returned -2 () : :Uptime was only half an hour, eh? : :So I decided to turn softupdates off with tunefs on all of my filesystems. :The root filesystem had softpupdates already turned off. Did turning softupdates off solve the problem? If you turn it on and set kern.maxvnodes to 9999999 does that also solve the problem or does the problem re-occur? -Matt Matthew Dillon <dillon@backplane.com> :#12 0xc0248567 in atkbd_intr (kbd=0xc02f8740, arg=0x0) : at /home3/src/sys/dev/kbd/atkbd.c:462 :#13 0xc0277c24 in atkbd_isa_intr (arg=0xc02f8740) : at /home3/src/sys/isa/atkbd_isa.c:140 :#14 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff2ac) : at /home3/src/sys/kern/kern_random.c:247 :#15 0xc01f2798 in softdep_disk_write_complete (bp=0xc75b3c38) : at /home3/src/sys/ufs/ffs/ffs_softdep.c:3248 :#16 0xc01701f2 in vfs_backgroundwritedone (bp=0xc75b3c38) : at /home3/src/sys/kern/vfs_bio.c:742 :#17 0xc01725c4 in biodone (bp=0xc75b3c38) at /home3/src/sys/kern/vfs_bio.c:2701 :#18 0xc023e522 in ad_interrupt (request=0xc2892200) : at /home3/src/sys/dev/ata/ata-disk.c:703 :#19 0xc0238aff in ata_intr (data=0xc229cb80) : at /home3/src/sys/dev/ata/ata-all.c:1231 :#20 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff348) : at /home3/src/sys/kern/kern_random.c:247 :#21 0xc0259ab2 in vec14 () :#22 0xc01ef086 in interlocked_sleep (lk=0xc02bfe7c, op=1, ident=0xce1a6884, : flags=17, wmesg=0xc029301f "drainvp", timo=0) : at /home3/src/sys/ufs/ffs/ffs_softdep.c:329 :#23 0xc01f4a4e in drain_output (vp=0xce1a6840, islocked=1) : at /home3/src/sys/ufs/ffs/ffs_softdep.c:4913 :#24 0xc01f3812 in softdep_fsync_mountdev (vp=0xce1a6840) :---Type <return> to continue, or q <return> to quit--- : at /home3/src/sys/ufs/ffs/ffs_softdep.c:4056 :#25 0xc01f7b7a in ffs_fsync (ap=0xcef85c04) : at /home3/src/sys/ufs/ffs/ffs_vnops.c:134 :#26 0xc01f67cc in ffs_sync (mp=0xc234dc00, waitfor=2, cred=0xc0a78680, : p=0xc03003a0) at vnode_if.h:558 :#27 0xc017aa47 in sync (p=0xc03003a0, uap=0x0) : at /home3/src/sys/kern/vfs_syscalls.c:554 :#28 0xc0149ab7 in boot (howto=256) at /home3/src/sys/kern/kern_shutdown.c:235 :#29 0xc014a149 in panic ( : fmt=0xc0291d60 "softdep_setup_allocdirect: lost block") : at /home3/src/sys/kern/kern_shutdown.c:595 :#30 0xc01f0150 in softdep_setup_allocdirect (ip=0xc2a21900, lbn=0, : newblkno=398160, oldblkno=394920, newsize=8192, oldsize=8192, : bp=0xc758021c) at /home3/src/sys/ufs/ffs/ffs_softdep.c:1326 :#31 0xc01eb0b3 in ffs_reallocblks (ap=0xcef85dd0) : at /home3/src/sys/ufs/ffs/ffs_alloc.c:476 :#32 0xc0174992 in cluster_write (bp=0xc758e7a8, filesize=65536, seqcount=10) : at vnode_if.h:1077 :#33 0xc01f765f in ffs_write (ap=0xcef85e74) : at /home3/src/sys/ufs/ufs/ufs_readwrite.c:537 :#34 0xc017f972 in vn_write (fp=0xc2a31a40, uio=0xcef85ee0, cred=0xc2df7b80, : flags=0, p=0xced96040) at vnode_if.h:363 :#35 0xc015908e in dofilewrite (p=0xced96040, fp=0xc2a31a40, fd=4, : buf=0x8058000, nbyte=8192, offset=-1, flags=0) :---Type <return> to continue, or q <return> to quit--- : at /home3/src/sys/sys/file.h:162 :#36 0xc0158f3f in write (p=0xced96040, uap=0xcef85f80) : at /home3/src/sys/kern/sys_generic.c:329 :#37 0xc0265551 in syscall2 (frame={tf_fs = -1072431057, tf_es = -1070727121, : tf_ds = -1070727121, tf_edi = 134578176, tf_esi = 403821508, : tf_ebp = -1077947140, tf_isp = -822583340, tf_ebx = 403764804, : tf_edx = 403821508, tf_ecx = 403821508, tf_eax = 4, tf_trapno = 7, : tf_err = 2, tf_eip = 403517864, tf_cs = 31, tf_eflags = 514, : tf_esp = -1077947164, tf_ss = 47}) : at /home3/src/sys/i386/i386/trap.c:1167 :#38 0xc0258615 in Xint0x80_syscall () :#39 0x18104bd9 in ?? () :#40 0x18104b56 in ?? () :#41 0x18101946 in ?? () :#42 0x180eb05a in ?? () :#43 0x804a67a in ?? () :#44 0x804affc in ?? () :#45 0x804bf4e in ?? () :#46 0x804d7a3 in ?? () :#47 0x80499f5 in ?? () :(kgdb) quit : :Script done on Sun Mar 3 11:07:41 2002 : :Again, I have my 256M crashdump and will answer to any questions but :I cannot investigate this more deeply myself, I'm not a kernel hacker. : :Here are my disks: : :/dev/ad0s1a 49583 35145 10472 77% / :/dev/ad0s1g 992239 290185 622675 32% /home :/dev/ad0s1h 2822646 2072484 524351 80% /home2 :/dev/ad0s1e 1488663 1195741 173829 87% /usr :/dev/ad0s1f 496111 361090 95333 79% /var :/dev/ad1s1e 9880414 5920704 3169278 65% /home4 :/dev/ad2s1e 9807006 8191954 830492 91% /home3 : :Here is my /etc/sysctl.conf: : :kern.ipc.somaxconn=1024 :kern.maxfiles=10000 :net.inet.ip.portrange.hifirst=49152 :net.inet.ip.portrange.hilast=49600 :net.inet.tcp.always_keepalive=1 :net.inet.tcp.sendspace=32768 :net.inet.tcp.recvspace=32768 :net.inet.tcp.rfc1644=1 :vfs.vmiodirenable=1 : :I have CPUTYPE=i686 in /etc/make.conf and no other optimizations. : :At last, here is my kernel config: : :# WWW kernel config :# 2 Nov 2001 : :machine i386 :#cpu I386_CPU :#cpu I486_CPU :cpu I586_CPU :cpu I686_CPU :ident WWW :maxusers 128 :options MAXDSIZ=(256*1024*1024) :options DFLDSIZ=(256*1024*1024) : :makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols : :#options MATH_EMULATE #Support for x87 emulation :options CLK_CALIBRATION_LOOP :options CLK_USE_I8254_CALIBRATION :options CLK_USE_TSC_CALIBRATION : :options INET #InterNETworking :#options INET6 #IPv6 communications protocols :options FFS #Berkeley Fast Filesystem :options FFS_ROOT #FFS usable as root device [keep this!] :options SOFTUPDATES #Enable FFS soft updates support :options MFS #Memory Filesystem :#options MD_ROOT #MD is a potential root device :options NFS #Network Filesystem :#options NFS_ROOT #NFS usable as root device, NFS required :#options MSDOSFS #MSDOS Filesystem :options CD9660 #ISO 9660 Filesystem :options CD9660_ROOT #CD-ROM usable as root, CD9660 required :#options PROCFS #Process filesystem :options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] :options SCSI_DELAY=15000 #Delay (in ms) before probing SCSI :options UCONSOLE #Allow users to grab the console :options USERCONFIG #boot -c editor :options VISUAL_USERCONFIG #visual boot -c editor :options KTRACE #ktrace(1) support :options SYSVSHM #SYSV-style shared memory :options SYSVMSG #SYSV-style message queues :options SYSVSEM #SYSV-style semaphores :options SHMMAXPGS=4096 :options P1003_1B #Posix P1003_1B real-time extensions :options _KPOSIX_PRIORITY_SCHEDULING :options ICMP_BANDLIM #Rate limit bad replies :options KBD_INSTALL_CDEV # install a CDEV entry in /dev :options PPP_BSDCOMP :options PPP_DEFLATE :options PPP_FILTER :options NSWAPDEV=4 :options MSGBUF_SIZE=140960 : :device isa :options "AUTO_EOI_1" : :device eisa :device pci : :# Floppy drives :device fdc0 at isa? port IO_FD1 irq 6 drq 2 :device fd0 at fdc0 drive 0 :#device fd1 at fdc0 drive 1 :# :# If you have a Toshiba Libretto with its Y-E Data PCMCIA floppy, :# don't use the above line for fdc0 but the following one: :#device fdc0 : :# ATA and ATAPI devices :#device ata0 at isa? port IO_WD1 irq 14 :#device ata1 at isa? port IO_WD2 irq 15 :device ata :device atadisk # ATA disk drives :device atapicd # ATAPI CDROM drives :#device atapifd # ATAPI floppy drives :#device atapist # ATAPI tape drives :options ATA_STATIC_ID #Static device numbering : :# atkbdc0 controls both the keyboard and the PS/2 mouse :device atkbdc0 at isa? port IO_KBD :device atkbd0 at atkbdc? irq 1 flags 0x1 :#device psm0 at atkbdc? irq 12 : :device vga0 at isa? :options VESA : :# splash screen/screen saver :pseudo-device splash : :# syscons is the default console driver, resembling an SCO console :device sc0 at isa? flags 0x100 :options MAXCONS=16 :options SC_HISTORY_SIZE=1000 : :# Floating point support - do not disable. :device npx0 at nexus? port IO_NPX irq 13 : :# Power management support (see LINT for more options) :#device apm0 at nexus? disable flags 0x20 # Advanced Power Management : :# Serial (COM) ports :device sio0 at isa? port IO_COM1 flags 0x10 irq 4 :device sio1 at isa? port IO_COM2 irq 3 :#device sio2 at isa? disable port IO_COM3 irq 5 :#device sio3 at isa? disable port IO_COM4 irq 9 : :# Parallel port :device ppc0 at isa? irq 7 :device ppbus # Parallel port bus (required) :device lpt # Printer :#device plip # TCP/IP over parallel :device ppi # Parallel port interface device :#device vpo # Requires scbus and da : :# PCI Ethernet NICs. :#device de # DEC/Intel DC21x4x (``Tulip'') :#device txp # 3Com 3cR990 (``Typhoon'') :#device vx # 3Com 3c590, 3c595 (``Vortex'') : :# PCI Ethernet NICs that use the common MII bus controller code. :# NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! :device miibus # MII bus support :#device dc # DEC/Intel 21143 and various workalikes :device fxp # Intel EtherExpress PRO/100B (82557, 82558) :#device pcn # AMD Am79C97x PCI 10/100 NICs :#device rl # RealTek 8129/8139 :#device sf # Adaptec AIC-6915 (``Starfire'') :#device sis # Silicon Integrated Systems SiS 900/SiS 7016 :#device ste # Sundance ST201 (D-Link DFE-550TX) :#device tl # Texas Instruments ThunderLAN :#device tx # SMC EtherPower II (83c170 ``EPIC'') :#device vr # VIA Rhine, Rhine II :#device wb # Winbond W89C840F :#device wx # Intel Gigabit Ethernet Card (``Wiseman'') :#device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') : :device pcm0 at isa? port ? irq 5 drq 1 : :# Pseudo devices - the number indicates how many units to allocate. :pseudo-device loop # Network loopback :pseudo-device ether # Ethernet support :#pseudo-device sl 1 # Kernel SLIP :pseudo-device ppp 3 # Kernel PPP :pseudo-device tun # Packet tunnel. :pseudo-device pty 64 # Pseudo-ttys (telnet etc) :pseudo-device snp 8 :pseudo-device vn :pseudo-device gzip :pseudo-device speaker :#pseudo-device md # Memory "disks" :pseudo-device gif # IPv6 and IPv4 tunneling :#pseudo-device faith 1 # IPv6-to-IPv4 relaying (translation) : :# The `bpf' pseudo-device enables the Berkeley Packet Filter. :# Be aware of the administrative consequences of enabling this! :pseudo-device bpf #Berkeley packet filter : :options QUOTA :options IPFIREWALL :options IPFIREWALL_VERBOSE :#options IPFIREWALL_VERBOSE_LIMIT=100 :options IPDIVERT :options IPFIREWALL_FORWARD :options TCP_DROP_SYNFIN #drop TCP packets with SYN+FIN :options DUMMYNET :options NMBCLUSTERS=8192 :options IBCS2 :options DDB :options DDB_UNATTENDED :options RANDOM_IP_ID :options UFS_DIRHASH :options USER_LDT :options UCONSOLE : :#end of file : :Feel free to request any information. :I'd like to help resolve this ASAP. : :Eugene Grosbein To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200203030538.g235c2l59112>