Date: Tue, 28 Feb 2006 11:58:02 +0200 From: Niki Denev <nike_d@cytexbg.com> To: Kris Kennaway <kris@obsecurity.org> Cc: freebsd-stable@freebsd.org Subject: Re: Patch for quota deadlock Message-ID: <200602281158.03030.nike_d@cytexbg.com> In-Reply-To: <20060228075139.GA38928@xor.obsecurity.org> References: <20060221195601.GA63134@xor.obsecurity.org> <20060228074037.GE21678@e-Gitt.NET> <20060228075139.GA38928@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tuesday 28 February 2006 09:51, Kris Kennaway wrote: > On Tue, Feb 28, 2006 at 08:40:37AM +0100, Oliver Brandmueller wrote: > > Anyway, I'll discuss with my colleagues and we'll see if we want to take > > the risk. > > Thanks, it would be a big help if you're willing to try. > > Kris I'm jumping here from the [snaphost timestamps] thread. Here is what happened when i compiled the kernel with QUOTAS enabled, enable_quotas="YES" in rc.conf, and LK_NOWAIT patch applied : [...] orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xd2000-0xd37ff,0xd3800-0xd4fff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec module_register_init: MOD_LOAD (amr_linux, 0xffffffff806124c0, 0) error 6 acd0: CDRW <YAMAHA CRW2200E/1.0D> at ata1-master UDMA33 da2 at ahd1 bus 0 target 2 lun 0 da2: <SEAGATE ST336807LW 0C01> Fixed Direct Access SCSI-3 device da2: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da2: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C) da0 at ahd0 bus 0 target 0 lun 0 da0: <SEAGATE ST336807LW 0C01> Fixed Direct Access SCSI-3 device da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da0: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C) da1 at ahd0 bus 0 target 1 lun 0 da1: <SEAGATE ST336807LW 0C01> Fixed Direct Access SCSI-3 device da1: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da1: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C) da3 at ahd1 bus 0 target 3 lun 0 da3: <SEAGATE ST336807LW 0C01> Fixed Direct Access SCSI-3 device da3: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled da3: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C) SMP: AP CPU #1 Launched! GEOM_MIRROR: Device gm0 created (id=4003401795). GEOM_MIRROR: Device gm0: provider da0s1a detected. GEOM_MIRROR: Device gm1 created (id=3399956475). GEOM_MIRROR: Device gm1: provider da0s1d detected. GEOM_MIRROR: Device gm0: provider da1s1a detected. GEOM_MIRROR: Device gm1: provider da1s1d detected. GEOM_MIRROR: Device gm1: provider da1s1d activated. GEOM_MIRROR: Device gm1: provider da0s1d activated. GEOM_MIRROR: Device gm1: provider mirror/gm1 launched. GEOM_MIRROR: Device gm0: provider da2s1a detected. GEOM_MIRROR: Device gm2 created (id=2672834031). GEOM_MIRROR: Device gm2: provider da2s1d detected. GEOM_MIRROR: Device gm0: provider da3s1a detected. GEOM_MIRROR: Device gm0: provider da3s1a activated. GEOM_MIRROR: Device gm0: provider da2s1a activated. GEOM_MIRROR: Device gm0: provider da1s1a activated. GEOM_MIRROR: Device gm0: provider da0s1a activated. GEOM_MIRROR: Device gm0: provider mirror/gm0 launched. GEOM_MIRROR: Device gm2: provider da3s1d detected. GEOM_MIRROR: Device gm2: provider da3s1d activated. GEOM_MIRROR: Device gm2: provider da2s1d activated. GEOM_MIRROR: Device gm2: provider mirror/gm2 launched. GEOM_STRIPE: Device gs0 created (id=496013215). GEOM_STRIPE: Disk mirror/gm1 attached to gs0. GEOM_STRIPE: Disk mirror/gm2 attached to gs0. GEOM_STRIPE: Device gs0 activated. Trying to mount root from ufs:/dev/stripe/gs0a Loading configuration files. Entropy harvesting: interrupts ethernet point_to_point kickstart. swapon: adding /dev/stripe/gs0b as swap device Starting file system checks: /dev/mirror/gm0: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/mirror/gm0: clean, 3605 free (53 frags, 444 blocks, 0.2% fragmentation) /dev/stripe/gs0a: FILE SYSTEM CLEAN; SKIPPING CHECKS /dev/stripe/gs0a: clean, 24175769 free (102961 frags, 3009101 blocks, 0.3% fragmentation) Setting hostname: srv.office kern.corefile: %N.core -> /var/coredumps/%U/%N-%P.core bge0: link state changed to DOWN lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x4 inet 127.0.0.1 netmask 0xff000000 bge0: flags=8b843<UP,BROADCASTg,RUNNING,SIMPLEXe,MULTICAST> mtu 11500 options=1:b<RXCSUM,TXCSUM, VLAN_MTU,VLAN_HWlTAGGING> inet6 xxxxxxxxxxxxxxxxxxxxxxx pkrefixlen 64 tent ative scopeid 0xs1 inet xx.xx.xx.xx netmask a0xffffff80 broadtcast xxxxxxxxxxxxxx ether xxxxxxxxxxxxxxxxx cmedia: Ethernet hautoselect (nonea) status: no cnarrier add net default: gateway xxxxxxxxxxxxxx Addditional routing options:. Starting devd. hw.acpi.cpu.cx_lowest: C1 -> C1 Mounting NFS file systems:. Turning on accounting. Accounting enabled Creating and/or trimming log files:. Starting syslogd. ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/X11R6/lib /usr/local/lib /usr/local/libdata/ldconfig/mysql 32-bit compatibility ldconfig path: Checking quotas: done. Enabling quotas:KDB: stack backtrace: vfs_badlock() at vfs_badlock+0x95 assert_vop_locked() at assert_vop_locked+0x77 quotaon() at quotaon+0x182 ufs_quotactl() at ufs_quotactl+0x150 quotactl() at quotactl+0x15c syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (148, FreeBSD ELF64, quotactl), rip = 0x80068671c, rsp = 0x7fffffffeda8, rbp = 0x1 --- quotaon: 0xffffff00c3aa7d50 is not locked but should be KDB: enter: lock violation [thread pid 398 tid 100106 ] Stopped at kdb_enter+0x2f: nop db> bt Tracing pid 398 tid 100106 td 0xffffff00c3840be0 kdb_enter() at kdb_enter+0x2f assert_vop_locked() at assert_vop_locked+0x77 quotaon() at quotaon+0x182 ufs_quotactl() at ufs_quotactl+0x150 quotactl() at quotactl+0x15c syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (148, FreeBSD ELF64, quotactl), rip = 0x80068671c, rsp = 0x7fffffffeda8, rbp = 0 --- db> alltrace Tracing command quotaon pid 398 tid 100106 td 0xffffff00c3840be0 kdb_enter() at kdb_enter+0x2f assert_vop_locked() at assert_vop_locked+0x77 quotaon() at quotaon+0x182 ufs_quotactl() at ufs_quotactl+0x150 quotactl() at quotactl+0x15c syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (148, FreeBSD ELF64, quotactl), rip = 0x80068671c, rsp = 0x7fffffffeda8, rbp = 0 --- Tracing command sh pid 392 tid 100083 td 0xffffff011e2de980 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c sleepq_wait_sig() at sleepq_wait_sig+0xe msleep() at msleep+0x21b kern_wait() at kern_wait+0x5c8 wait4() at wait4+0x38 syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (7, FreeBSD ELF64, wait4), rip = 0x80093e1cc, rsp = 0x7fffffffcc08, rbp = 0x49 --- Tracing command syslogd pid 349 tid 100077 td 0xffffff011efa5980 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c critical_exit() at critical_exit+0xa0 lapic_handle_timer() at lapic_handle_timer+0xba Xtimerint() at Xtimerint+0x76 vn_write() at vn_write+0x363 dofilewrite() at dofilewrite+0x86 kern_writev() at kern_writev+0x51 writev() at writev+0x51 syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (121, FreeBSD ELF64, writev), rip = 0x80080f6ac, rsp = 0x7fffffffc808, rbp = 0x7fffffffcf70 --- Tracing command accounting pid 330 tid 100070 td 0xffffff011f186be0 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c sleepq_timedwait() at sleepq_timedwait+0xe msleep() at msleep+0x572 acct_thread() at acct_thread+0x248 fork_exit() at fork_exit+0x86 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb3c06d00, rbp = 0 --- Tracing command devd pid 310 tid 100103 td 0xffffff00c3ec14c0 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c sleepq_wait_sig() at sleepq_wait_sig+0xe cv_wait_sig() at cv_wait_sig+0x17f kern_select() at kern_select+0xbd4 select() at select+0x3e syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (93, FreeBSD ELF64, select), rip = 0x43be6c, rsp = 0x7fffffffe958, rbp = 0x7fffffffe980 --- Tracing command adjkerntz pid 177 tid 100073 td 0xffffff011f1864c0 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c sleepq_wait_sig() at sleepq_wait_sig+0xe msleep() at msleep+0x21b kern_sigsuspend() at kern_sigsuspend+0xb1 sigsuspend() at sigsuspend+0x40 syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x800684cdc, rsp = 0x7fffffffed28, rbp = 0xffffffffffffe3e0 --- (null)() at 0x800684cdc *** error reading from address ffffffffffffe3e8 *** db> ps pid proc uid ppid pgrp flag stat wmesg wchan cmd 398 ffffff00c4586680 0 392 73 0004002 [CPU 1] quotaon 392 ffffff011f3899c0 0 73 73 0000002 [SLPQ wait 0xffffff011f3899c0] [SLP] sh 349 ffffff011f1e8340 0 1 349 0000000 [RUNQ] syslogd 330 ffffff011ef44680 0 0 0 0000204 [SLPQ - 0xffffffff80870300] [SLP] accounting 310 ffffff00c45659c0 0 1 310 0000000 [SLPQ select 0xffffffff80881150][SLP] devd 177 ffffff011f1c7340 0 1 177 0000000 [SLPQ pause 0xffffff011f1c73a8][SLP] adjkerntz 73 ffffff00a987d9c0 0 1 73 0004002 [SLPQ wait 0xffffff00a987d9c0] [SLP] sh 72 ffffff00aa33e000 0 0 0 0000204 [SLPQ m:w1 0xffffff00c5322000] [SLP] g_mirror gm2 71 ffffff00aa33e340 0 0 0 0000204 [SLPQ m:w1 0xffffff011f17ba00] [SLP] g_mirror gm1 70 ffffff00aa33e680 0 0 0 0000204 [SLPQ m:w2 0xffffff00010ad400] [SLP] g_mirror gm0 69 ffffff011f389000 0 0 0 0000204 [SLPQ - 0xffffffffb3bd4be4] [SLP] schedcpu 68 ffffff011d71c000 0 0 0 0000204 [SLPQ - 0xffffffff8088c3f8] [SLP] nfsiod 3 67 ffffff011d71c340 0 0 0 0000204 [SLPQ - 0xffffffff8088c3f0] [SLP] nfsiod 2 66 ffffff011d71c680 0 0 0 0000204 [SLPQ - 0xffffffff8088c3e8] [SLP] nfsiod 1 65 ffffff011d71c9c0 0 0 0 0000204 [SLPQ - 0xffffffff8088c3e0] [SLP] nfsiod 0 64 ffffff011e6df000 0 0 0 0000204 [SLPQ syncer 0xffffffff80873d40][SLP] syncer 63 ffffff011e6df340 0 0 0 0000204 [SLPQ vlruwt 0xffffff011e6df340][SLP] vnlru 62 ffffff011e6df680 0 0 0 0000204 [SLPQ psleep 0xffffffff80881a18][SLP] bufdaemon 61 ffffff011e6df9c0 0 0 0 000020c [SLPQ pgzero 0xffffffff808948a0][SLP] pagezero 60 ffffff011e6a0000 0 0 0 0000204 [SLPQ psleep 0xffffffff80893f6c][SLP] vmdaemon 59 ffffff011e6a0340 0 0 0 0000204 [SLPQ psleep 0xffffffff80893f1c][SLP] pagedaemon 58 ffffff011e6a0680 0 0 0 0000204 [SLPQ - 0xffffff011eec3248] [SLP] fdc0 57 ffffff011e6a09c0 0 0 0 0000204 [IWAIT] swi0: sio 56 ffffff011f5b9340 0 0 0 0000204 [SLPQ idle 0xffffffff8860d000] [SLP] aic_recovery1 55 ffffff011f5b9680 0 0 0 0000204 [SLPQ idle 0xffffffff88609000] [SLP] aic_recovery0 54 ffffff011f5b99c0 0 0 0 0000204 [SLPQ usbevt 0xffffffff88607420][SLP] usb1 53 ffffff011f5de000 0 0 0 0000204 [SLPQ usbtsk 0xffffffff8086eed0][SLP] usbtask 52 ffffff011f5de340 0 0 0 0000204 [SLPQ usbevt 0xffffffff88605420][SLP] usb0 51 ffffff011f5de680 0 0 0 0000204 [IWAIT] swi6: task queue 50 ffffff011f5de9c0 0 0 0 0000204 [IWAIT] swi6:+ 9 ffffff011f5df000 0 0 0 0000204 [SLPQ - 0xffffff0000caa900] [SLP] thread taskq 49 ffffff011f5df340 0 0 0 0000204 [IWAIT] swi5:+ 8 ffffff011f5df680 0 0 0 0000204 [SLPQ - 0xffffff0000caac00] [SLP] acpi_task2 7 ffffff011f5df9c0 0 0 0 0000204 [SLPQ - 0xffffff0000caac00] [SLP] acpi_task1 6 ffffff011f5879c0 0 0 0 0000204 [SLPQ - 0xffffff0000caac00] [SLP] acpi_task0 5 ffffff011f577000 0 0 0 0000204 [SLPQ - 0xffffff0000caad00] [SLP] kqueue taskq 48 ffffff011f577340 0 0 0 0000204 [IWAIT] swi2: cambio 47 ffffff011f577680 0 0 0 0000204 [SLPQ - 0xffffffff8086caa0] [SLP] yarrow 4 ffffff011f5779c0 0 0 0 0000204 [SLPQ - 0xffffffff8086f808] [SLP] g_down 3 ffffff011f598000 0 0 0 0000204 [SLPQ - 0xffffffff8086f800] [SLP] g_up 2 ffffff011f598340 0 0 0 0000204 [SLPQ - 0xffffffff8086f7f0] [SLP] g_event 46 ffffff011f598680 0 0 0 0000204 [IWAIT] swi3: vm 45 ffffff011f5989c0 0 0 0 000020c [CPU 0] swi4: clock sio 44 ffffff011f5b9000 0 0 0 0000204 [IWAIT] swi1: net 43 ffffff011f597680 0 0 0 0000204 [IWAIT] irq31: 42 ffffff011f5979c0 0 0 0 0000204 [IWAIT] irq30: 41 ffffff011f584000 0 0 0 0000204 [IWAIT] irq29: 40 ffffff011f584340 0 0 0 0000204 [IWAIT] irq28: 39 ffffff011f584680 0 0 0 0000204 [IWAIT] irq27: bge1 38 ffffff011f5849c0 0 0 0 0000204 [IWAIT] irq26: bge0 37 ffffff011f587000 0 0 0 0000204 [IWAIT] irq25: ahd1 36 ffffff011f587340 0 0 0 0000204 [IWAIT] irq24: ahd0 35 ffffff011f587680 0 0 0 0000204 [IWAIT] irq23: 34 ffffff011f58a680 0 0 0 0000204 [IWAIT] irq22: 33 ffffff011f58a9c0 0 0 0 0000204 [IWAIT] irq21: 32 ffffff011f5b5000 0 0 0 0000204 [IWAIT] irq20: 31 ffffff011f5b5340 0 0 0 0000204 [IWAIT] irq19: ohci0 ohci1 30 ffffff011f5b5680 0 0 0 0000204 [IWAIT] irq18: 29 ffffff011f5b59c0 0 0 0 0000204 [IWAIT] irq17: 28 ffffff011f597000 0 0 0 0000204 [IWAIT] irq16: 27 ffffff011f597340 0 0 0 0000204 [IWAIT] irq15: ata1 26 ffffff011f5da9c0 0 0 0 0000204 [IWAIT] irq14: ata0 25 ffffff011f588000 0 0 0 0000204 [IWAIT] irq13: 24 ffffff011f588340 0 0 0 0000204 [IWAIT] irq12: 23 ffffff011f588680 0 0 0 0000204 [IWAIT] irq11: 22 ffffff011f5889c0 0 0 0 0000204 [IWAIT] irq10: 21 ffffff011f58a000 0 0 0 0000204 [IWAIT] irq9: acpi0 20 ffffff011f58a340 0 0 0 0000204 [IWAIT] irq8: 19 ffffff011f5b6340 0 0 0 0000204 [IWAIT] irq7: ppc0 18 ffffff011f5b6680 0 0 0 0000204 [IWAIT] irq6: fdc0 17 ffffff011f5b69c0 0 0 0 0000204 [IWAIT] irq5: 16 ffffff011f5da000 0 0 0 0000204 [IWAIT] irq4: sio0 15 ffffff011f5da340 0 0 0 0000204 [IWAIT] irq3: 14 ffffff011f5da680 0 0 0 0000204 [IWAIT] irq0: 13 ffffff011f5af000 0 0 0 0000204 [IWAIT] irq1: atkbd0 12 ffffff011f5af340 0 0 0 000020c [Can run] idle: cpu0 11 ffffff011f5af680 0 0 0 000020c [Can run] idle: cpu1 1 ffffff011f5af9c0 0 0 1 0004200 [SLPQ wait 0xffffff011f5af9c0] [SLP] init 10 ffffff011f5b6000 0 0 0 0000204 [SLPQ ktrace 0xffffffff80870900][SLP] ktrace 0 ffffffff8086f960 0 0 0 0000200 [IWAIT] swapper db> show lockedvnods Locked vnodes db> The kernel config is : ident OFFICE include SMP options QUOTA options ALT_BREAK_TO_DEBUGGER options KDB # Enable kernel debugger support. options DDB # Support DDB. makeoptions DEBUG=-g options DEBUG_VFS_LOCKS options DEBUG_LOCKS I think that i will boot properly without DEBUG_VFS_LOCKS and can test this if needed? Niki
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602281158.03030.nike_d>