Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Sep 2012 10:25:43 +0300
From:      Samu Nuutamo <samu.nuutamo@gmail.com>
To:        freebsd-fs@freebsd.org
Subject:   Kernel panics with 9.0-RELEASE and fusefs-kmod 0.3.9-pre1.20080208
Message-ID:  <CAEj5WDoNq7xLODi84uAcwoxkG-WRC5s6z0z5%2BNFAinJSLUWzsw@mail.gmail.com>

next in thread | raw e-mail | index | archive | help
I've created a custom fuse-based filesystem that is started as a
service at system startup. The filesystem itself works quite nicely
but I have issues with system stability when starting up the system or
restarting the service. The filesystem's usage/load mainly consists of
concurrent reads.

I know this is a huge post but many of these traps are "unknown" to
google so I thought it would be a good idea to include every bit of
information I have. The fiov_adjust trap was discussed here in
freebsd-fs last december in 8.0R context but it seems to be a dead
end.

I'm aware of the GSoC'2011 effort to fix FUSE, but it has other issues
preventing me to test it properly. For example stat must be called on
the file before the contents can be read.

Recently I learned about the projects/fuse branch in the SVN tree and
backported the code to 9.0R, but it has the same issues as the GSoC
version. This isn't a big surprise as GSoC is the base for the SVN
code.

Does a patch/fix/workaround exist for any of these traps (even
experimental)? I'm willing to port code from the SVN and/or GSoC if
someone can pinpoint the commits.

--
Samu Nuutamo



-------------------------------------

1. Trap on service restart

Whenever the service is restarted while some program is accessing the
fs there is a high probability of a fatal trap.

Fool-proof way of reproducing this is to run
$ while [ 1 ]; do service myfs restart; done
in first terminal
$ while [ 1 ]; do find /myfs | xargs cat; done
and in the second terminal.

The system crashes just a few second after the find loop is started.
Here's the trace:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xd8
fault code		= supervisor write data, page not present
instruction pointer	= 0x20:0xffffffff81017f01
stack pointer	        = 0x28:0xffffff808a7295b0
frame pointer	        = 0x28:0xffffff808a7296c0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1818 (find)

(kgdb) bt
#0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:260
#1  0xffffffff802b6e80 in db_dump (dummy=Variable "dummy" is not
available.) at /usr/src/sys/ddb/db_command.c:537
#2  0xffffffff802b6501 in db_command (last_cmdp=0xffffffff8097fd60,
cmd_table=Variable "cmd_table" is not available.) at
/usr/src/sys/ddb/db_command.c:448
#3  0xffffffff802b6750 in db_command_loop () at
/usr/src/sys/ddb/db_command.c:501
#4  0xffffffff802b8819 in db_trap (type=Variable "type" is not
available.) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff80450561 in kdb_trap (type=12, code=0,
tf=0xffffff808a729500) at /usr/src/sys/kern/subr_kdb.c:620
#6  0xffffffff8063605d in trap_fatal (frame=0xffffff808a729500,
eva=Variable "eva" is not available.) at
/usr/src/sys/amd64/amd64/trap.c:814
#7  0xffffffff8063637e in trap_pfault (frame=0xffffff808a729500,
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:735
#8  0xffffffff806367cf in trap (frame=0xffffff808a729500) at
/usr/src/sys/amd64/amd64/trap.c:474
#9  0xffffffff806220af in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:228
#10 0xffffffff81017f01 in fuse_lookup (ap=Variable "ap" is not
available.) at fuse_vnops.c:1368
#11 0xffffffff80671310 in VOP_LOOKUP_APV (vop=0xffffffff8101c320,
a=0xffffff808a729720) at vnode_if.c:123
#12 0xffffffff8049c98d in lookup (ndp=0xffffff808a7299e0) at vnode_if.h:54
#13 0xffffffff8049d81c in namei (ndp=0xffffff808a7299e0) at
/usr/src/sys/kern/vfs_lookup.c:297
#14 0xffffffff804b4052 in vn_open_cred (ndp=0xffffff808a7299e0,
flagp=0xffffff808a7299dc, cmode=0, vn_open_flags=Variable
"vn_open_flags" is not available.) at
/usr/src/sys/kern/vfs_vnops.c:195
#15 0xffffffff804b12e9 in kern_openat (td=0xfffffe003162f460, fd=-100,
path=0x800955abb <Address 0x800955abb out of bounds>,
pathseg=UIO_USERSPACE, flags=1, mode=Variable "mode" is not
available.) at /usr/src/sys/kern/vfs_syscalls.c:1123
#16 0xffffffff8063584e in amd64_syscall (td=0xfffffe003162f460,
traced=0) at subr_syscall.c:131
#17 0xffffffff80622397 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387
#18 0x000000080093cd2c in ?? ()
Previous frame inner to this frame (corrupt stack?)


fuse_vnops.c:1368 points to this line on code:

    if (op == FUSE_GETATTR)
>>      cache_attrs(*vpp, (struct fuse_attr_out *)fdi.answ);

-------------------------------------

2. Trap on system startup

This trap happens less frequently and most times during system startup
when services are loading. By forcing a reboot loop I managed to get a
trap after 1h (that's ~50 reboots).

Here's the trace:


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address	= 0x40
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff81012b17
stack pointer	        = 0x28:0xffffff810c7ce540
frame pointer	        = 0x28:0xffffff810c7ce570
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1294 (python)

(kgdb) bt
#0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:260
#1  0xffffffff802b6e80 in db_dump (dummy=Variable "dummy" is not
available.) at /usr/src/sys/ddb/db_command.c:537
#2  0xffffffff802b6501 in db_command (last_cmdp=0xffffffff8097fd60,
cmd_table=Variable "cmd_table" is not available.) at
/usr/src/sys/ddb/db_command.c:448
#3  0xffffffff802b6750 in db_command_loop () at
/usr/src/sys/ddb/db_command.c:501
#4  0xffffffff802b8819 in db_trap (type=Variable "type" is not
available.) at /usr/src/sys/ddb/db_main.c:229
#5  0xffffffff80450561 in kdb_trap (type=12, code=0,
tf=0xffffff810c7ce490) at /usr/src/sys/kern/subr_kdb.c:620
#6  0xffffffff8063605d in trap_fatal (frame=0xffffff810c7ce490,
eva=Variable "eva" is not available.) at
/usr/src/sys/amd64/amd64/trap.c:814
#7  0xffffffff8063637e in trap_pfault (frame=0xffffff810c7ce490,
usermode=0) at /usr/src/sys/amd64/amd64/trap.c:735
#8  0xffffffff806367cf in trap (frame=0xffffff810c7ce490) at
/usr/src/sys/amd64/amd64/trap.c:474
#9  0xffffffff806220af in calltrap () at
/usr/src/sys/amd64/amd64/exception.S:228
#10 0xffffffff81012b17 in fiov_adjust (fiov=0x30, size=52) at fuse_msg.c:96
#11 0xffffffff810132af in fdisp_make (fdip=0xffffff810c7ce610,
mp=Variable "mp" is not available.) at fuse_msg.c:788
#12 0xffffffff81017aaa in fuse_lookup (ap=Variable "ap" is not
available.) at fuse_vnops.c:1156
#13 0xffffffff80671310 in VOP_LOOKUP_APV (vop=0xffffffff8101c320,
a=0xffffff810c7ce720) at vnode_if.c:123
#14 0xffffffff8049c98d in lookup (ndp=0xffffff810c7ce9e0) at vnode_if.h:54
#15 0xffffffff8049d81c in namei (ndp=0xffffff810c7ce9e0) at
/usr/src/sys/kern/vfs_lookup.c:297
#16 0xffffffff804b4052 in vn_open_cred (ndp=0xffffff810c7ce9e0,
flagp=0xffffff810c7ce9dc, cmode=420, vn_open_flags=Variable
"vn_open_flags" is not available.) at
/usr/src/sys/kern/vfs_vnops.c:195
#17 0xffffffff804b12e9 in kern_openat (td=0xfffffe000a109460, fd=-100,
path=0x80075f1f2 <Address 0x80075f1f2 out of bounds>,
pathseg=UIO_USERSPACE, flags=1, mode=Variable "mode" is not
available.) at /usr/src/sys/kern/vfs_syscalls.c:1123
#18 0xffffffff8063584e in amd64_syscall (td=0xfffffe000a109460,
traced=0) at subr_syscall.c:131
#19 0xffffffff80622397 in Xfast_syscall () at
/usr/src/sys/amd64/amd64/exception.S:387
#20 0x000000080073db4c in ?? ()
Previous frame inner to this frame (corrupt stack?)


fuse_msg.c:96 points to this line on code:

    void
    fiov_adjust(struct fuse_iov *fiov, size_t size)
>>  {
        if (fiov->allocated_size < size ||


I have no idea what's going on there.

-------------------------------------

3. Other traps

I've also seen multiple other traps with different traces, but all
having a few common functions in the call stacks: fdisp_make+0xef or
fdisp_make_vp+0xf4. I was about to group the fiov_adjust trap here as
well as it has the same fdisp_make+0xef in the stack, but in
fiov_adjust's case the fdisp_make is not the stack's topmost function
call. It still looks to me that fdisp_make(_vp) is somehow broken or
it uses corrupted pointers.



Some call stacks (sorry, I don't have proper kernel dumps for these):

(acquired from a dmesg output)

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address   = 0x40
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff81012b17
stack pointer           = 0x28:0xffffff808970d680
frame pointer           = 0x28:0xffffff808970d6b0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 32168 (sudo)
trap number             = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80460b1e at kdb_backtrace+0x5e
#1 0xffffffff8042b857 at panic+0x187
#2 0xffffffff8066d6a0 at trap_fatal+0x290
#3 0xffffffff8066d9e9 at trap_pfault+0x1f9
#4 0xffffffff8066deaf at trap+0x3df
#5 0xffffffff806583ef at calltrap+0x8
#6 0xffffffff81013574 at fdisp_make_vp+0xf4
#7 0xffffffff810172fd at fuse_readlink+0x4d
#8 0xffffffff804b32ff at namei+0x5cf
#9 0xffffffff804cb92b at vn_open_cred+0x3cb
#10 0xffffffff804ca9c9 at kern_openat+0x1f9
#11 0xffffffff8066cf90 at amd64_syscall+0x450
#12 0xffffffff806586d7 at Xfast_syscall+0xf7



(manually transcribed from a jpg)

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 00
fault virtual address   = 0xc
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff8047ec95
stack pointer           = 0x28:0xffffff808956a590
frame pointer           = 0x28:0xffffff808956a5c0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 11418 (httpd)
trap number             = 12
panic: page fault
cpuid = 2
KDB: stack backtrace:
#0 0xffffffff80470d5e at kdb_backtrace+0x5e
#1 0xffffffff8043ba97 at panic+0x187
#2 0xffffffff8067d8e0 at trap_fatal+0x290
#3 0xffffffff8067dc29 at trap_pfault+0x1f9
#4 0xffffffff8067e0ef at trap+0x3df
#5 0xffffffff8066862f at calltrap+0x8
#6 0xffffffff81013574 at fdisp_make_vp+0xf4
#7 0xffffffff81018d8e at fuse_get_filehandle+0x14e
#8 0xffffffff8101976e at fuse_open+0x7e
#9 0xffffffff804dbaa3 at vn_open_cred+0x3cb
#10 0xffffffff804dac09 at kern_openat+0x1f9
#11 0xffffffff8067d1d0 at amd64_syscall+0x450
#12 0xffffffff80668917 at Xfast_syscall+0xf7



(manually transcribed from a jpg)

--- previous output not visible ---
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 6997 (cron)
trap number             = 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff80477e4e at kdb_backtrace+0x5e
#1 0xffffffff80442b87 at panic+0x187
#2 0xffffffff806849d0 at trap_fatal+0x290
#3 0xffffffff80684d19 at trap_pfault+0x1f9
#4 0xffffffff806851df at trap+0x3df
#5 0xffffffff8066f71f at calltrap+0x8
#6 0xffffffff81013574 at fdisp_make_vp+0xf4
#7 0xffffffff810156ea at fuse_send_release+0x6a
#8 0xffffffff81015fa5 at fuse_close_f+0xf5
#9 0xffffffff80404e93 at _fdrop+0x23
#10 0xffffffff804068f2 at closef+0x52
#11 0xffffffff80407122 at kern_close+0x172
#12 0xffffffff806842c0 at amd64_syscall+0x450
#13 0xffffffff8066fa07 at Xfast_syscall+0xf7



(manually transcribed from a jpg)

--- previous output not visible ---
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1279 (sh)
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0xffffffff80460b1e at kdb_backtrace+0x5e
#1 0xffffffff8042b857 at panic+0x187
#2 0xffffffff8066d6a0 at trap_fatal+0x290
#3 0xffffffff8066d9e9 at trap_pfault+0x1f9
#4 0xffffffff8066deaf at trap+0x3df
#5 0xffffffff806583ef at calltrap+0x8
#6 0xffffffff810132af at fdisp_make+0xef
#7 0xffffffff81017aaa at fuse_lookup+0x44a
#8 0xffffffff806a9b40 at VOP_LOOKUP_APV+0x40
#9 0xffffffff804b2154 at lookup+0x464
#10 0xffffffff804b3206 at namei+0x4d6
#11 0xffffffff804cb92b at vn_open_cred+0x3cb
#10 0xffffffff804ca9c9 at kern_openat+0x1f9
#11 0xffffffff8066cf90 at amd64_syscall+0x450
#12 0xffffffff806586d7 at Xfast_syscall+0xf7



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAEj5WDoNq7xLODi84uAcwoxkG-WRC5s6z0z5%2BNFAinJSLUWzsw>