From owner-freebsd-stable Sat Mar 2 21: 1:55 2002 Delivered-To: freebsd-stable@freebsd.org Received: from www.svzserv.kemerovo.su (www.svzserv.kemerovo.su [213.184.65.80]) by hub.freebsd.org (Postfix) with ESMTP id 1219437B416 for ; Sat, 2 Mar 2002 21:01:26 -0800 (PST) Received: (from eugen@localhost) by www.svzserv.kemerovo.su (8.11.6/8.11.6) id g2351Mt05707 for stable@freebsd.org; Sun, 3 Mar 2002 12:01:22 +0700 (KRAT) (envelope-from eugen) Date: Sun, 3 Mar 2002 12:01:21 +0700 From: Eugene Grosbein To: stable@freebsd.org Subject: 4.5-STABLE softupdates brokeness: repeated panics and lockups Message-ID: <20020303120121.A2197@svzserv.kemerovo.su> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Urgent! Please help! My quite old 4.5-STABLE system suffered from hanging network connections. Turning off syncookies helped but I've read this has already been fixed in -STABLE so 1 March 2002 I ran cvsup and rebuilt kernel and world as usual. Now I state that softupdates code is BROKEN for me. That night my server crashed hard. I have options DDB and DDB_UNATTENDED, my kernel is build with debugging symbols and I have savecore enabled in /etc/rc.conf and have enough swap space and disk space in /var so server should leave core and restart after panic. It failed to do that. Usually I lock the console with vlock and this prevented me to escape to DDB, I was forced to turn power off and on next morning. Nothing suspictious in logs besides this: Mar 1 22:35:38 www /kernel: z_decompress0: inflate returned -2 () That was the last record before crash. So I left console unlocked 2 March and today it crashed again. Well, that was kernel panic and system locked after 'syncing disks...' message, no one character printed after '...'. The panic reason was 'panic: softdep_setup_allocdirect: lost block'. It was possible to escape to DDB and say 'trace' and 'panic', so I have got crashdump. The last message in log again was: Mar 3 09:57:33 www /kernel: z_decompress0: inflate returned -2 () After reboot I started to investigate and suddenly it crashed again! And the last message in log again was: Mar 3 10:33:51 www /kernel: z_decompress0: inflate returned -2 () Uptime was only half an hour, eh? So I decided to turn softupdates off with tunefs on all of my filesystems. The root filesystem had softpupdates already turned off. Here are some details from gdb: Script started on Sun Mar 3 11:04:49 2002 GNU gdb 4.18 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-unknown-freebsd"... IdlePTD at phsyical address 0x00382000 initial pcb at physical address 0x002e8c60 panicstr: from debugger panic messages: --- panic: softdep_setup_allocdirect: lost block syncing disks... panic: from debugger Uptime: 22h19m56s dumping to dev #ad/0x20001, offset 2560 dump ata0: resetting devices .. done 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 --- #0 dumpsys () at /home3/src/sys/kern/kern_shutdown.c:487 487 if (dumping++) { (kgdb) where #0 dumpsys () at /home3/src/sys/kern/kern_shutdown.c:487 #1 0xc0149cfc in boot (howto=260) at /home3/src/sys/kern/kern_shutdown.c:316 #2 0xc014a149 in panic (fmt=0xc027dfc4 "from debugger") at /home3/src/sys/kern/kern_shutdown.c:595 #3 0xc0121379 in db_panic (addr=-1071285907, have_addr=0, count=-1, modif=0xcef857d0 "") at /home3/src/sys/ddb/db_command.c:435 #4 0xc0121317 in db_command (last_cmdp=0xc02af1b4, cmd_table=0xc02aeff4, aux_cmd_tablep=0xc02e3c78) at /home3/src/sys/ddb/db_command.c:333 #5 0xc01213de in db_command_loop () at /home3/src/sys/ddb/db_command.c:457 #6 0xc012358f in db_trap (type=3, code=0) at /home3/src/sys/ddb/db_trap.c:71 #7 0xc025770c in kdb_trap (type=3, code=0, regs=0xcef858d8) at /home3/src/sys/i386/i386/db_interface.c:158 #8 0xc0264c08 in trap (frame={tf_fs = 16, tf_es = 16, tf_ds = -822607856, tf_edi = 0, tf_esi = -1070620608, tf_ebp = -822585056, tf_isp = -822585084, tf_ebx = 134, tf_edx = -1070974097, tf_ecx = 32, tf_eax = 38, tf_trapno = 3, tf_err = 0, tf_eip = -1071285907, tf_cs = 8, tf_eflags = 582, tf_esp = -1070974113, tf_ss = -1070986775}) at /home3/src/sys/i386/i386/trap.c:584 #9 0xc025796d in Debugger (msg=0xc02a09e9 "manual escape to debugger") at machine/cpufunc.h:67 #10 0xc025486a in scgetc (sc=0xc02ffca0, flags=2) at /home3/src/sys/dev/syscons/syscons.c:3148 #11 0xc0250fa5 in sckbdevent (thiskbd=0xc02f8740, event=0, arg=0xc02ffca0) at /home3/src/sys/dev/syscons/syscons.c:616 ---Type to continue, or q to quit--- #12 0xc0248567 in atkbd_intr (kbd=0xc02f8740, arg=0x0) at /home3/src/sys/dev/kbd/atkbd.c:462 #13 0xc0277c24 in atkbd_isa_intr (arg=0xc02f8740) at /home3/src/sys/isa/atkbd_isa.c:140 #14 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff2ac) at /home3/src/sys/kern/kern_random.c:247 #15 0xc01f2798 in softdep_disk_write_complete (bp=0xc75b3c38) at /home3/src/sys/ufs/ffs/ffs_softdep.c:3248 #16 0xc01701f2 in vfs_backgroundwritedone (bp=0xc75b3c38) at /home3/src/sys/kern/vfs_bio.c:742 #17 0xc01725c4 in biodone (bp=0xc75b3c38) at /home3/src/sys/kern/vfs_bio.c:2701 #18 0xc023e522 in ad_interrupt (request=0xc2892200) at /home3/src/sys/dev/ata/ata-disk.c:703 #19 0xc0238aff in ata_intr (data=0xc229cb80) at /home3/src/sys/dev/ata/ata-all.c:1231 #20 0xc0147fe3 in add_interrupt_randomness (vsc=0xc02ff348) at /home3/src/sys/kern/kern_random.c:247 #21 0xc0259ab2 in vec14 () #22 0xc01ef086 in interlocked_sleep (lk=0xc02bfe7c, op=1, ident=0xce1a6884, flags=17, wmesg=0xc029301f "drainvp", timo=0) at /home3/src/sys/ufs/ffs/ffs_softdep.c:329 #23 0xc01f4a4e in drain_output (vp=0xce1a6840, islocked=1) at /home3/src/sys/ufs/ffs/ffs_softdep.c:4913 #24 0xc01f3812 in softdep_fsync_mountdev (vp=0xce1a6840) ---Type to continue, or q to quit--- at /home3/src/sys/ufs/ffs/ffs_softdep.c:4056 #25 0xc01f7b7a in ffs_fsync (ap=0xcef85c04) at /home3/src/sys/ufs/ffs/ffs_vnops.c:134 #26 0xc01f67cc in ffs_sync (mp=0xc234dc00, waitfor=2, cred=0xc0a78680, p=0xc03003a0) at vnode_if.h:558 #27 0xc017aa47 in sync (p=0xc03003a0, uap=0x0) at /home3/src/sys/kern/vfs_syscalls.c:554 #28 0xc0149ab7 in boot (howto=256) at /home3/src/sys/kern/kern_shutdown.c:235 #29 0xc014a149 in panic ( fmt=0xc0291d60 "softdep_setup_allocdirect: lost block") at /home3/src/sys/kern/kern_shutdown.c:595 #30 0xc01f0150 in softdep_setup_allocdirect (ip=0xc2a21900, lbn=0, newblkno=398160, oldblkno=394920, newsize=8192, oldsize=8192, bp=0xc758021c) at /home3/src/sys/ufs/ffs/ffs_softdep.c:1326 #31 0xc01eb0b3 in ffs_reallocblks (ap=0xcef85dd0) at /home3/src/sys/ufs/ffs/ffs_alloc.c:476 #32 0xc0174992 in cluster_write (bp=0xc758e7a8, filesize=65536, seqcount=10) at vnode_if.h:1077 #33 0xc01f765f in ffs_write (ap=0xcef85e74) at /home3/src/sys/ufs/ufs/ufs_readwrite.c:537 #34 0xc017f972 in vn_write (fp=0xc2a31a40, uio=0xcef85ee0, cred=0xc2df7b80, flags=0, p=0xced96040) at vnode_if.h:363 #35 0xc015908e in dofilewrite (p=0xced96040, fp=0xc2a31a40, fd=4, buf=0x8058000, nbyte=8192, offset=-1, flags=0) ---Type to continue, or q to quit--- at /home3/src/sys/sys/file.h:162 #36 0xc0158f3f in write (p=0xced96040, uap=0xcef85f80) at /home3/src/sys/kern/sys_generic.c:329 #37 0xc0265551 in syscall2 (frame={tf_fs = -1072431057, tf_es = -1070727121, tf_ds = -1070727121, tf_edi = 134578176, tf_esi = 403821508, tf_ebp = -1077947140, tf_isp = -822583340, tf_ebx = 403764804, tf_edx = 403821508, tf_ecx = 403821508, tf_eax = 4, tf_trapno = 7, tf_err = 2, tf_eip = 403517864, tf_cs = 31, tf_eflags = 514, tf_esp = -1077947164, tf_ss = 47}) at /home3/src/sys/i386/i386/trap.c:1167 #38 0xc0258615 in Xint0x80_syscall () #39 0x18104bd9 in ?? () #40 0x18104b56 in ?? () #41 0x18101946 in ?? () #42 0x180eb05a in ?? () #43 0x804a67a in ?? () #44 0x804affc in ?? () #45 0x804bf4e in ?? () #46 0x804d7a3 in ?? () #47 0x80499f5 in ?? () (kgdb) quit Script done on Sun Mar 3 11:07:41 2002 Again, I have my 256M crashdump and will answer to any questions but I cannot investigate this more deeply myself, I'm not a kernel hacker. Here are my disks: /dev/ad0s1a 49583 35145 10472 77% / /dev/ad0s1g 992239 290185 622675 32% /home /dev/ad0s1h 2822646 2072484 524351 80% /home2 /dev/ad0s1e 1488663 1195741 173829 87% /usr /dev/ad0s1f 496111 361090 95333 79% /var /dev/ad1s1e 9880414 5920704 3169278 65% /home4 /dev/ad2s1e 9807006 8191954 830492 91% /home3 Here is my /etc/sysctl.conf: kern.ipc.somaxconn=1024 kern.maxfiles=10000 net.inet.ip.portrange.hifirst=49152 net.inet.ip.portrange.hilast=49600 net.inet.tcp.always_keepalive=1 net.inet.tcp.sendspace=32768 net.inet.tcp.recvspace=32768 net.inet.tcp.rfc1644=1 vfs.vmiodirenable=1 I have CPUTYPE=i686 in /etc/make.conf and no other optimizations. At last, here is my kernel config: # WWW kernel config # 2 Nov 2001 machine i386 #cpu I386_CPU #cpu I486_CPU cpu I586_CPU cpu I686_CPU ident WWW maxusers 128 options MAXDSIZ=(256*1024*1024) options DFLDSIZ=(256*1024*1024) makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols #options MATH_EMULATE #Support for x87 emulation options CLK_CALIBRATION_LOOP options CLK_USE_I8254_CALIBRATION options CLK_USE_TSC_CALIBRATION options INET #InterNETworking #options INET6 #IPv6 communications protocols options FFS #Berkeley Fast Filesystem options FFS_ROOT #FFS usable as root device [keep this!] options SOFTUPDATES #Enable FFS soft updates support options MFS #Memory Filesystem #options MD_ROOT #MD is a potential root device options NFS #Network Filesystem #options NFS_ROOT #NFS usable as root device, NFS required #options MSDOSFS #MSDOS Filesystem options CD9660 #ISO 9660 Filesystem options CD9660_ROOT #CD-ROM usable as root, CD9660 required #options PROCFS #Process filesystem options COMPAT_43 #Compatible with BSD 4.3 [KEEP THIS!] options SCSI_DELAY=15000 #Delay (in ms) before probing SCSI options UCONSOLE #Allow users to grab the console options USERCONFIG #boot -c editor options VISUAL_USERCONFIG #visual boot -c editor options KTRACE #ktrace(1) support options SYSVSHM #SYSV-style shared memory options SYSVMSG #SYSV-style message queues options SYSVSEM #SYSV-style semaphores options SHMMAXPGS=4096 options P1003_1B #Posix P1003_1B real-time extensions options _KPOSIX_PRIORITY_SCHEDULING options ICMP_BANDLIM #Rate limit bad replies options KBD_INSTALL_CDEV # install a CDEV entry in /dev options PPP_BSDCOMP options PPP_DEFLATE options PPP_FILTER options NSWAPDEV=4 options MSGBUF_SIZE=140960 device isa options "AUTO_EOI_1" device eisa device pci # Floppy drives device fdc0 at isa? port IO_FD1 irq 6 drq 2 device fd0 at fdc0 drive 0 #device fd1 at fdc0 drive 1 # # If you have a Toshiba Libretto with its Y-E Data PCMCIA floppy, # don't use the above line for fdc0 but the following one: #device fdc0 # ATA and ATAPI devices #device ata0 at isa? port IO_WD1 irq 14 #device ata1 at isa? port IO_WD2 irq 15 device ata device atadisk # ATA disk drives device atapicd # ATAPI CDROM drives #device atapifd # ATAPI floppy drives #device atapist # ATAPI tape drives options ATA_STATIC_ID #Static device numbering # atkbdc0 controls both the keyboard and the PS/2 mouse device atkbdc0 at isa? port IO_KBD device atkbd0 at atkbdc? irq 1 flags 0x1 #device psm0 at atkbdc? irq 12 device vga0 at isa? options VESA # splash screen/screen saver pseudo-device splash # syscons is the default console driver, resembling an SCO console device sc0 at isa? flags 0x100 options MAXCONS=16 options SC_HISTORY_SIZE=1000 # Floating point support - do not disable. device npx0 at nexus? port IO_NPX irq 13 # Power management support (see LINT for more options) #device apm0 at nexus? disable flags 0x20 # Advanced Power Management # Serial (COM) ports device sio0 at isa? port IO_COM1 flags 0x10 irq 4 device sio1 at isa? port IO_COM2 irq 3 #device sio2 at isa? disable port IO_COM3 irq 5 #device sio3 at isa? disable port IO_COM4 irq 9 # Parallel port device ppc0 at isa? irq 7 device ppbus # Parallel port bus (required) device lpt # Printer #device plip # TCP/IP over parallel device ppi # Parallel port interface device #device vpo # Requires scbus and da # PCI Ethernet NICs. #device de # DEC/Intel DC21x4x (``Tulip'') #device txp # 3Com 3cR990 (``Typhoon'') #device vx # 3Com 3c590, 3c595 (``Vortex'') # PCI Ethernet NICs that use the common MII bus controller code. # NOTE: Be sure to keep the 'device miibus' line in order to use these NICs! device miibus # MII bus support #device dc # DEC/Intel 21143 and various workalikes device fxp # Intel EtherExpress PRO/100B (82557, 82558) #device pcn # AMD Am79C97x PCI 10/100 NICs #device rl # RealTek 8129/8139 #device sf # Adaptec AIC-6915 (``Starfire'') #device sis # Silicon Integrated Systems SiS 900/SiS 7016 #device ste # Sundance ST201 (D-Link DFE-550TX) #device tl # Texas Instruments ThunderLAN #device tx # SMC EtherPower II (83c170 ``EPIC'') #device vr # VIA Rhine, Rhine II #device wb # Winbond W89C840F #device wx # Intel Gigabit Ethernet Card (``Wiseman'') #device xl # 3Com 3c90x (``Boomerang'', ``Cyclone'') device pcm0 at isa? port ? irq 5 drq 1 # Pseudo devices - the number indicates how many units to allocate. pseudo-device loop # Network loopback pseudo-device ether # Ethernet support #pseudo-device sl 1 # Kernel SLIP pseudo-device ppp 3 # Kernel PPP pseudo-device tun # Packet tunnel. pseudo-device pty 64 # Pseudo-ttys (telnet etc) pseudo-device snp 8 pseudo-device vn pseudo-device gzip pseudo-device speaker #pseudo-device md # Memory "disks" pseudo-device gif # IPv6 and IPv4 tunneling #pseudo-device faith 1 # IPv6-to-IPv4 relaying (translation) # The `bpf' pseudo-device enables the Berkeley Packet Filter. # Be aware of the administrative consequences of enabling this! pseudo-device bpf #Berkeley packet filter options QUOTA options IPFIREWALL options IPFIREWALL_VERBOSE #options IPFIREWALL_VERBOSE_LIMIT=100 options IPDIVERT options IPFIREWALL_FORWARD options TCP_DROP_SYNFIN #drop TCP packets with SYN+FIN options DUMMYNET options NMBCLUSTERS=8192 options IBCS2 options DDB options DDB_UNATTENDED options RANDOM_IP_ID options UFS_DIRHASH options USER_LDT options UCONSOLE #end of file Feel free to request any information. I'd like to help resolve this ASAP. Eugene Grosbein To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message