Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 2 Mar 2007 18:11:45 GMT
From:      Andrew<andrew+pr2@supernews.net>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/109762: deadlock in g_down -> ahd_action -> contigmalloc
Message-ID:  <200703021811.l22IBjDp012463@www.freebsd.org>
Resent-Message-ID: <200703021820.l22IK4BA053426@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         109762
>Category:       kern
>Synopsis:       deadlock in g_down -> ahd_action -> contigmalloc
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Mar 02 18:20:04 GMT 2007
>Closed-Date:
>Last-Modified:
>Originator:     Andrew
>Release:        FreeBSD 6.2-20070202
>Organization:
Critical Path, Inc
>Environment:
FreeBSD volcano.supernews.net 6.2-20070202 FreeBSD 6.2-20070202 #0: Fri Feb  2 16:29:10 UTC 2007     root@supernews.net:/usr/obj/usr/src/sys/SUPERNEWS  i386

>Description:
System hung during heavy file write activity (copying a large file between filesystems). The cause of the hang was g_down being stuck as follows:

Tracing pid 4 tid 100016 td 0xa8303a80
sched_switch(a8303a80,0,1) at sched_switch+0x14b
mi_switch(1,0,a8303a80,ca4249c4,a0515704,...) at mi_switch+0x1ba
sleepq_switch(bc3927f0) at sleepq_switch+0x87
sleepq_wait(bc3927f0,0,a8303a80,44,bc3927f0,...) at sleepq_wait+0x5c
msleep(bc3927f0,a073c7a0,44,a06f43d5,0) at msleep+0x269
bwait(bc3927f0,44,a06f43d5,bc3927f0,0,...) at bwait+0x5f
swap_pager_putpages(ac2d318c,ca424ac4,1,1,ca424a90,...) at swap_pager_putpages+0x48c
default_pager_putpages(ac2d318c,ca424ac4,1,1,ca424a90) at default_pager_putpages+0x18
vm_pageout_flush(ca424ac4,1,1) at vm_pageout_flush+0xcb
vm_contig_launder_page(a49de288) at vm_contig_launder_page+0x2a6
vm_page_alloc_contig(3,0,0,ffffffff,8,0) at vm_page_alloc_contig+0x25c
contigmalloc(3000,a0710ea0,1,0,ffffffff,...) at contigmalloc+0x97
bus_dmamem_alloc(a83d3e80,ab998618,1,ab998610) at bus_dmamem_alloc+0xb4
ahd_alloc_scbs(a8415000) at ahd_alloc_scbs+0x17a
ahd_get_scb(a8415000,8) at ahd_get_scb+0x57
ahd_action(a83f82c0,ab8a4800) at ahd_action+0x103
xpt_run_dev_sendq(a83f8280) at xpt_run_dev_sendq+0x175
xpt_action(ab8a4800) at xpt_action+0x269
dastart(a862c600,ab8a4800,ab8a4800,a86254c0,1) at dastart+0x149
xpt_run_dev_allocq(a83f8280) at xpt_run_dev_allocq+0x82
xpt_schedule(a862c600,1,a9da5bdc,ca424ce8,a04bd420,...) at xpt_schedule+0xef
dastrategy(a9da5bdc) at dastrategy+0x4a
g_disk_start(a9da5528) at g_disk_start+0x18c
g_io_schedule_down(a8303a80) at g_io_schedule_down+0x13b
g_down_procbody(0,ca424d38) at g_down_procbody+0x92
fork_exit(a04bf00c,0,ca424d38) at fork_exit+0x71
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xca424d6c, ebp = 0 ---

Clearly, having g_down waiting for a swap pageout to complete is a deadlock.

The circumstances under which this happens are not particularly clear - at the point of the hang, most of the system memory was in the 'inactive' queue, but the amount of free and/or cached memory was substantial. Machine has 4GB of RAM of which about 3.3GB is usable (i386, no PAE). Inactive memory was about 2.2GB, cache 900M, free 5M.

This has been observed twice so far, though attempts to reproduce it in a consistent fashion have failed and it seems to be relatively rare.


>How-To-Repeat:
Initiate a burst of heavy i/o via the ahd driver, such as copying multi-gigabyte files or using dd to create same. The other conditions needed for it to happen are not known.
>Fix:

>Release-Note:
>Audit-Trail:
>Unformatted:



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200703021811.l22IBjDp012463>