Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 01 Oct 2003 19:09:07 +0000
From:      "Vinod R. Kashyap" <vinodrk@hotmail.com>
To:        freebsd-fs@freebsd.org, fs@freebsd.org
Subject:   clean/dirty queue mixup
Message-ID:  <BAY2-F17yupGKy3hrbn00004847@hotmail.com>

next in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]

I am running some file I/O (multiple 'cp -R x/* y') on a kernel built out of
FreeBSD 4.8 RELEASE sources (with SMP and APIC_IO turned on),
when I see the following panic, anywhere between 0.5 - 15 hours after start:
panic:  softdep_deallocate_dependencies: dangling deps
mp_lock = 01000001; cpuid = 1; lapic_id = 06000000
(the I/O is running on a system with a dual Xeon motherboard)

I recompiled the kernel with INVARIANTS & INVARIANT_SUPPORT on,
and now see the following panic:
panic:  getnewbuf:  inconsistant queue 2 bp 0xcc94b564
mp_lock = 01000001; cpuid = 1;  lapic_id = 06000000

In getnewbuf (in kern/vfs_bio.c), bufqueues[QUEUE_EMPTYKVA] seems to have
run out of buffers, and so, a buffer is gotten out of 
bufqueues[QUEUE_CLEAN].
The problem is that this buffer has its b_qindex set to 3 (QUEUE_DIRTY), 
instead
of 2 (QUEUE_CLEAN).  It also has b_flags set to B_DELWRI, and b_xflags set 
to
BX_VNDIRTY. bp->b_dep is non-NULL, and points to a work item with
wk_type=0x5 (D_INDIRDEP) and wk_state=0x8001 (VONWORKLIST | ATTACHED).
I noticed that at least the first 5 buffers in the CLEAN queue had their 
b_qindex set
to 3 (DIRTY).

I noticed something that could potentially be causing this (I might be way 
off here...):
When a low level device driver calls biodone(), if bp->b_dep is non-NULL,
(*bioops.io_complete)/softdep_disk_write_complete gets called, which, in 
case
the work item is of type D_INDIRDEP, unconditionally puts it back into the 
b_dep
queue, and marks it dirty.  On return from this call, it looks like biodone 
goes ahead
and frees the buffer with the dirty flags still set, and b_dep non-NULL.  Is 
this the
cause by any chance?  Can anyone please help?

I have attached some more information from gdb.  I can send the core-dump to
anyone who might be interested.


Thanks,

Vinod.

_________________________________________________________________
Contact brides & grooms FREE! http://www.shaadi.com/ptnr.php?ptnr=hmltag 
Only on www.shaadi.com. Register now!

[-- Attachment #2 --]
FreeBSD4# gdb -k /kernel
GNU gdb 4.18 (FreeBSD)
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd"...Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 2627 in elfstab_build_psymtabs
Deprecated bfd_read called at /usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line 933 in fill_symbuf

(kgdb) exec-file kernel.3
(kgdb) core-file vmcore.3
SMP 2 cpus
IdlePTD at phsyical address 0x0055f000
initial pcb at physical address 0x00487ee0
panicstr: (fmt null)

panic messages:
---
panic: getnewbuf: inconsistant queue 2 bp 0xcc94b564
mp_lock = 01000001; cpuid = 1; lapic.id = 06000000
panic: (fmt null)

mp_lock = 01000002; cpuid = 1; lapic.id = 06000000
boot() called on cpu#1
Uptime: 13h25m14s

dumping to dev #ad/0x20001, offset 1015968
dump ata0: resetting devices .. done
511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496 495 494 493 492 491 490 489 488 487 486 485 484 483 482 481 480 479 478 477 476 475 474 473 472 471 470 469 468 467 466 465 464 463 462 461 460 459 458 457 456 455 454 453 452 451 450 449 448 447 446 445 444 443 442 441 440 439 438 437 436 435 434 433 432 431 430 429 428 427 426 425 424 423 422 421 420 419 418 417 416 415 414 413 412 411 410 409 408 407 406 405 404 403 402 401 400 399 398 397 396 395 394 393 392 391 390 389 388 387 386 385 384 383 382 381 380 379 378 377 376 375 374 373 372 371 370 369 368 367 366 365 364 363 362 361 360 359 358 357 356 355 354 353 352 351 350 349 348 347 346 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 327 326 325 324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 301 300 299 298 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264 263 262 261 260 259 258 257 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
---
#0  dumpsys () at ../../kern/kern_shutdown.c:487
487             if (dumping++) {
(kgdb) where
#0  dumpsys () at ../../kern/kern_shutdown.c:487
#1  0xc0219757 in boot (howto=260) at ../../kern/kern_shutdown.c:316
#2  0xc0219bc9 in panic (fmt=0x0) at ../../kern/kern_shutdown.c:595
#3  0xc014d735 in db_fncall (dummy1=0, dummy2=0, dummy3=0,
    dummy4=0xdb4a2b1c "`µIÀ &GÀdÄ\001hÉ 7Àí\0377ÀþÔ\001k`µIÀ\r")
    at ../../ddb/db_command.c:534
#4  0xc014d559 in db_command (last_cmdp=0xc042c424, cmd_table=0xc042c264,
    aux_cmd_tablep=0xc046fb98) at ../../ddb/db_command.c:333
#5  0xc014d61e in db_command_loop () at ../../ddb/db_command.c:457
#6  0xc014f75b in db_trap (type=3, code=0) at ../../ddb/db_trap.c:71
#7  0xc03823a1 in kdb_trap (type=3, code=0, regs=0xdb4a2c2c)
    at ../../i386/i386/db_interface.c:158
#8  0xc0399098 in trap (frame={tf_fs = 24, tf_es = 16, tf_ds = -615907312, tf_edi = 2,
      tf_esi = 256, tf_ebp = -615895948, tf_isp = -615895976, tf_ebx = -1069628416,
      tf_edx = -1744946277, tf_ecx = 0, tf_eax = 18, tf_trapno = 3, tf_err = 0,
      tf_eip = -1070061984, tf_cs = 8, tf_eflags = 582, tf_esp = -1069445309,
      tf_ss = -1069644758}) at ../../i386/i386/trap.c:592
#9  0xc0382660 in Debugger (msg=0xc03e842a "panic") at machine/cpufunc.h:67
#10 0xc0219bc0 in panic (fmt=0xc03ec400 "getnewbuf: inconsistant queue %d bp %p")
    at ../../kern/kern_shutdown.c:593
#11 0xc0240dd4 in getnewbuf (slpflag=0, slptimeo=0, size=2048, maxsize=16384)
    at ../../kern/vfs_bio.c:1593
#12 0xc0241a0a in getblk (vp=0xdb7a3c80, blkno=0, size=2048, slpflag=0, slptimeo=0)
    at ../../kern/vfs_bio.c:2239
#13 0xc031f95a in ffs_balloc (ap=0xdb4a2e20) at ../../ufs/ffs/ffs_balloc.c:172
#14 0xc0329431 in ffs_write (ap=0xdb4a2e60) at vnode_if.h:1056
#15 0xc024ff88 in vn_write (fp=0xc23b8640, uio=0xdb4a2ed0, cred=0xc256bb00, flags=0,
    p=0xdb3a9100) at vnode_if.h:363
#16 0xc0228dad in dofilewrite (p=0xdb3a9100, fp=0xc23b8640, fd=4, buf=0x28058000,
    nbyte=34, offset=-1, flags=0) at ../../sys/file.h:163
#17 0xc0228c66 in write (p=0xdb3a9100, uap=0xdb4a2f80) at ../../kern/sys_generic.c:329
---Type <return> to continue, or q <return> to quit---
#18 0xc0399b11 in syscall2 (frame={tf_fs = 671416367, tf_es = 671416367,
      tf_ds = -1078001617, tf_edi = 671449088, tf_esi = 671449088,
      tf_ebp = -1077937612, tf_isp = -615895084, tf_ebx = 34, tf_edx = 134674508,
      tf_ecx = 0, tf_eax = 4, tf_trapno = 7, tf_err = 2, tf_eip = 134561984,
      tf_cs = 31, tf_eflags = 659, tf_esp = -1077937672, tf_ss = 47})
    at ../../i386/i386/trap.c:1175
#19 0xc03832fb in Xint0x80_syscall ()
cannot read proc at 0
(kgdb) f 11
#11 0xc0240dd4 in getnewbuf (slpflag=0, slptimeo=0, size=2048, maxsize=16384)
    at ../../kern/vfs_bio.c:1593
1593                    KASSERT(bp->b_qindex == qindex, ("getnewbuf: inconsistant queue %d bp %p", qindex, bp));
(kgdb) p bp
$1 = (struct buf *) 0xcc94b564
(kgdb) p qindex
$2 = 2
(kgdb) p nqindex
$3 = 2
(kgdb) p bp->b_qindex
$4 = 3
(kgdb) p bp->b_flags
$5 = 537006752
(kgdb) p/x bp->b_flags
$6 = 0x200212a0
(kgdb) p bp->b_dep
$7 = {lh_first = 0xc2682420}
(kgdb) x/4xw bp->b_dep.lh_first
0xc2682420:     0x00000000      0xcc94b6a4      0x80010005      0x00000000
(kgdb) p bp->b_xflags
$8 = 1 '\001'
(kgdb) p/x bp->b_xflags
$9 = 0x1
(kgdb) p nbp
$10 = (struct buf *) 0xcc8aaee0
(kgdb) p nbp->q_index
There is no member named q_index.
(kgdb) p nbp->b_qindex
$11 = 3
(kgdb) p bufqueues[0]
$12 = {tqh_first = 0x0, tqh_last = 0xc044cd20}
(kgdb) p bufqueues[1]
$13 = {tqh_first = 0x0, tqh_last = 0xc044cd28}
(kgdb) p bufqueues[2]
$14 = {tqh_first = 0xcc94b564, tqh_last = 0xcc93f1a0}
(kgdb) p bufqueues[3]
$15 = {tqh_first = 0xcc82b300, tqh_last = 0xcc91c7dc}
(kgdb) p bufqueues[4]
$16 = {tqh_first = 0x0, tqh_last = 0xc044cd40}
(kgdb) p bufqueues[5]
$17 = {tqh_first = 0xcc95e628, tqh_last = 0xcc961dfc}
(kgdb) p bufqueues[5].tqh_first->b_freelist
$18 = {tqe_next = 0xcc95e774, tqe_prev = 0xc044cd48}
(kgdb) p bufqueues[5].tqh_first->b_freelist->tqe_next->b_freelist
$19 = {tqe_next = 0xcc95e8c0, tqe_prev = 0xcc95e638}
(kgdb) p bufqueues[5].tqh_first->b_freelist.b_qindex
There is no member named b_qindex.
(kgdb) p bufqueues[5].tqh_first->b_freelist->tqe_next->b_qindex
$20 = 5
(kgdb) p bufqueues[5].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_qindex
$21 = 5
(kgdb) p bufqueues[2].tqh_first->b_freelist
$22 = {tqe_next = 0xcc8aaee0, tqe_prev = 0xc044cd30}
(kgdb) p bufqueues[2].tqh_first->b_freelist->tqe_next->b_qindex
$23 = 3
(kgdb) p bufqueues[2].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_qindex
$24 = 3
(kgdb) p bufqueues[2].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next->b_qindex
$25 = 3
(kgdb) p bufqueues[2].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next->b_flags
$26 = 537002656
(kgdb) p bufqueues[2].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_flags
$27 = 70304
(kgdb) p bufqueues[3].tqh_first->b_freelist
$28 = {tqe_next = 0xcc8e9194, tqe_prev = 0xc044cd38}
(kgdb) p bufqueues[3].tqh_first->b_freelist->tqe_next->b_qindex
$29 = 3
(kgdb) p bufqueues[3].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_flags
$30 = 537002144
(kgdb) p bufqueues[3].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_qindex
$31 = 3
(kgdb) p bufqueues[2].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next->b_qindex
$32 = 3
(kgdb) p bp->b_freelist
$33 = {tqe_next = 0xcc8aaee0, tqe_prev = 0xc044cd30}
(kgdb) p bufqueues[5]
$34 = {tqh_first = 0xcc95e628, tqh_last = 0xcc961dfc}
(kgdb) p bufqueues[5].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next
$35 = (struct buf *) 0xcc95ea0c
(kgdb) p bp->b_act
$36 = {tqe_next = 0x0, tqe_prev = 0xc24a4008}
(kgdb) p nbp->b_act
$37 = {tqe_next = 0x0, tqe_prev = 0xc24a4008}
(kgdb) p bufqueues[5].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next
$38 = (struct buf *) 0xcc95eb58
(kgdb) p bufqueues[5].tqh_first->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next->b_freelist->tqe_next
$39 = (struct buf *) 0xcc95eca4
(kgdb) p defrag
$40 = 0
(kgdb) p bufspace
$41 = 63275008
(kgdb) p lobufspace
$42 = 63275008
(kgdb) p/x
$43 = 0x3c58000
(kgdb) p nbp->b_dep.lh_first
$44 = (struct worklist *) 0xc25e17e0
(kgdb) p/4wx nbp->b_dep.lh_first
Size letters are meaningless in "print" command.
(kgdb) x/4wx nbp->b_dep.lh_first
0xc25e17e0:     0x00000000      0xcc8ab020      0x80010005      0x00000000
(kgdb) p/x numdirtybuffers
$45 = 0x26a
(kgdb) p bp->b_vp
$46 = (struct vnode *) 0xdbcf1200
(kgdb) p/x bp->b_vp->v_flag
$47 = 0x202000
(kgdb) p bp-b_vp->v_dirtyblkhd
No symbol "b_vp" in current context.
(kgdb) p bp->b_vp->v_dirtyblkhd
$48 = {tqh_first = 0xcc94b564, tqh_last = 0xcc94b56c}
(kgdb) p bp->b_vp->v_dirtyblkhd->tqh_first
$49 = (struct buf *) 0xcc94b564
(kgdb) p bp
$50 = (struct buf *) 0xcc94b564
(kgdb) p bp->b_lblkno
$51 = -12
(kgdb) p bp->b_vnbufs
$52 = {tqe_next = 0x0, tqe_prev = 0xdbcf1234}
(kgdb) p/x reassignbufsortgood
$53 = 0x61459c
(kgdb) p/x reassignbufmethod
$54 = 0x1
(kgdb) p bp->b_vp->v_dirtyblkhd->tqh_last
$55 = (struct buf **) 0xcc94b56c
(kgdb) p bp->b_vp->v_dirtyblkhd->tqh_first->vn_bufs
There is no member named vn_bufs.
(kgdb) p bp->b_vp->v_dirtyblkhd->tqh_first->b_vnbufs
$56 = {tqe_next = 0x0, tqe_prev = 0xdbcf1234}
(kgdb) p/x mp_lock
$57 = 0x1000002
(kgdb) p bp->b_dev
$58 = 0xc2398880
(kgdb) p bp->b_dev->si_drv1
$59 = (void *) 0xc25c5000
(kgdb) p/4wx 0xc25c5000
Size letters are meaningless in "print" command.
(kgdb) x/10wx 0xc25c5000
0xc25c5000:     0xc24c6c00      0xc2398980      0xc24a4000      0xc24a4430
0xc25c5010:     0x00000000      0x00000000      0xc2582880      0xc2398980
0xc25c5020:     0xc25cc000      0x00000000
(kgdb) x/40wx 0xc24a4410
0xc24a4410:     0xc25c3000      0xc25c4c00      0xc25c4800      0xc25c4400
0xc24a4420:     0xc25c4000      0xc25c5c00      0xc25c5800      0xc25c5400
0xc24a4430:     0x12a19eb0      0x00004c01      0x000000ff      0x0000003f
0xc24a4440:     0x00000000      0x00000000      0x00000fff      0xc24c6c00
0xc24a4450:     0x00000000      0x00000000      0x00000000      0x00000000
0xc24a4460:     0x00000000      0x00000000      0x00000000      0x00000000
0xc24a4470:     0x12a19eb0      0x00004c01      0x000000ff      0x0000003f
0xc24a4480:     0x00000001      0x00000002      0x00000fff      0xc24c6b80
0xc24a4490:     0x00000000      0x00000000      0x00000000      0x00000000
0xc24a44a0:     0x00000000      0x00000000      0x00000000      0x00000000
(kgdb) p/x bp->b_error
$60 = 0x0
(kgdb) p/x bp->b_act
$61 = {tqe_next = 0x0, tqe_prev = 0xc24a4008}
(kgdb) x/40wx 0xc24a4000
0xc24a4000:     0xc2598c00      0xc25bc20c      0x00000000      0xc24a4008
0xc24a4010:     0x00b86b67      0x00000000      0x00000000      0x00000000
0xc24a4020:     0xc24a401c      0xc25be400      0xc25b460c      0x00000000
0xc24a4030:     0xc24a402c      0xc24a6c00      0xc24a6800      0xc24a6400
0xc24a4040:     0xc24a6000      0xc2587c00      0xc2587800      0xc2587400
0xc24a4050:     0xc2587000      0xc2588c00      0xc2588800      0xc2588400
0xc24a4060:     0xc2588000      0xc2589c00      0xc2589800      0xc2589400
0xc24a4070:     0xc2589000      0xc258ac00      0xc258a800      0xc258a400
0xc24a4080:     0xc258a000      0xc258bc00      0xc258b800      0xc258b400
0xc24a4090:     0xc258b000      0xc258cc00      0xc258c800      0xc258c400
(kgdb) p bp->b_dep
$62 = {lh_first = 0xc2682420}
(kgdb) p bp->b_dep.lh_first
$63 = (struct worklist *) 0xc2682420
(kgdb) p nbp->b_dep.lh_first
$64 = (struct worklist *) 0xc25e17e0
(kgdb) x/4wx 0xc2682420
0xc2682420:     0x00000000      0xcc94b6a4      0x80010005      0x00000000
(kgdb) p bp->b_npages
$65 = 4
(kgdb) p stat_indir_blk_ptrs
$66 = 115289
(kgdb) p bp->b_kvasize
$67 = 16384
(kgdb) p mp_lock
$68 = 16777218
(kgdb) p/x mp_lock
$69 = 0x1000002
(kgdb) p/x cpuid
No symbol "cpuid" in current context.
(kgdb) p/x lapic_id
Cannot access memory at address 0xffbff020.
(kgdb) x/4wx lapic_id
Cannot access memory at address 0xffbff020.
(kgdb) p/x cpu_id
$70 = 0xf27
(kgdb)


Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BAY2-F17yupGKy3hrbn00004847>