Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 28 Dec 2012 10:19:31 +0100
From:      Andreas Longwitz <longwitz@incore.de>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-stable@freebsd.org, fs@freebsd.org
Subject:   Re: FS hang with suspfs when creating snapshot on a UFS + GJOURNAL setup
Message-ID:  <50DD6423.5090305@incore.de>
In-Reply-To: <20121227194145.GM82219@kib.kiev.ua>
References:  <50DC30F6.1050904@incore.de> <20121227133355.GI82219@kib.kiev.ua> <50DC8999.8000708@incore.de> <20121227194145.GM82219@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote:
>>> On Thu, Dec 27, 2012 at 12:28:54PM +0100, Andreas Longwitz wrote:
>> db> alltrace (pid 18 and 7126)
>>
>> Tracing command g_journal switcher pid 18 tid 100076 td 0xffffff0002bd5000
>> sched_switch() at sched_switch+0xde
>> mi_switch() at mi_switch+0x186
>> sleepq_wait() at sleepq_wait+0x42
>> __lockmgr_args() at __lockmgr_args+0x49b
>> ffs_copyonwrite() at ffs_copyonwrite+0x19a
>> ffs_geom_strategy() at ffs_geom_strategy+0x1b5
>> bufwrite() at bufwrite+0xe9
>> ffs_sbupdate() at ffs_sbupdate+0x12a
>> g_journal_ufs_clean() at g_journal_ufs_clean+0x3e
>> g_journal_switcher() at g_journal_switcher+0xe5e
>> fork_exit() at fork_exit+0x11f
>> fork_trampoline() at fork_trampoline+0xe
>> --- trap 0, rip = 0, rsp = 0xffffff8242ca8cf0, rbp = 0 ---
>>
>> Tracing command mksnap_ffs pid 7126 tid 100157 td 0xffffff000807a470
>> sched_switch() at sched_switch+0xde
>> mi_switch() at mi_switch+0x186
>> sleepq_wait() at sleepq_wait+0x42
>> _sleep() at _sleep+0x373
>> vn_start_write() at vn_start_write+0xdf
>> ffs_snapshot() at ffs_snapshot+0xe2b
> Can you look up the line number for the ffs_snapshot+0xe2b ?

(kgdb) list *ffs_snapshot+0xe2b
0xffffffff8056287b is in ffs_snapshot
(/usr/src/sys/ufs/ffs/ffs_snapshot.c:676).
671    /*
672     * Resume operation on filesystem.
673     */
674    vfs_write_resume(vp->v_mount);
675    vn_start_write(NULL, &wrtmp, V_WAIT);
676    if (collectsnapstats && starttime.tv_sec > 0) {
677         nanotime(&endtime);
678         timespecsub(&endtime, &starttime);
679         printf("%s: suspended %ld.%03ld sec, redo %ld of %d\n",
680            vp->v_mount->mnt_stat.f_mntonname, (long)endtime.tv_sec,

> I think the bug is that vn_start_write() is called while the snaplock
> is owned, after the out1 label in ffs_snapshot() (I am looking at the
> HEAD code).

You are right, the vn_start_write() is just after the out1 label.

>> ffs_mount() at ffs_mount+0x65a
>> vfs_donmount() at vfs_donmount+0xdc5
>> nmount() at nmount+0x63
>> amd64_syscall() at amd64_syscall+0x1f4
>> Xfast_syscall() at Xfast_syscall+0xfc
>> --- syscall (378, FreeBSD ELF64, nmount), rip = 0x18069e35c, rsp =
>> 0x7fffffffe358, rbp = 0x7fffffffedc7 ---

-- 
Andreas Longwitz




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50DD6423.5090305>