Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 11 Jun 2019 15:46:42 -0600
From:      Alan Somers <asomers@freebsd.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        FreeBSD Hackers <freebsd-hackers@freebsd.org>
Subject:   Re: panic: vm_fault_hold: fault on nofault entry in fusefs
Message-ID:  <CAOtMX2gQKz%2Bw%2BkTO5zAk32S3Xz7O68c=Fd9nth%2BAHzDy-_JL1w@mail.gmail.com>
In-Reply-To: <20190611203018.GC75280@kib.kiev.ua>
References:  <CAOtMX2gPHy1GWkLyOm5sF=e0zgnj0UEKijFbOnPk6sRo9K4Yew@mail.gmail.com> <20190611203018.GC75280@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Jun 11, 2019 at 2:30 PM Konstantin Belousov <kostikbel@gmail.com> wrote:
>
> On Tue, Jun 11, 2019 at 02:12:22PM -0600, Alan Somers wrote:
> > Can somebody please help me to debug a fusefs problem?  I have a 100%
> > reproducible panic with the above message.  Evidentially there's
> > something I don't know about buf(9) and uiomove(9).  The good news is
> > that the panic is sufficiently reproducible and sufficiently
> > instrumented that I know exactly what's happening; I just don't know
> > why.  Here's a summary of what happens.
> >
> > 1) fusefs's VOP_WRITE method gets called with a buffer that spans a
> > logical block boundary, but does not extend the size of the file.
> > 2) It splits the write into two parts.  Each one calls getblk to
> > allocate a struct buf, fills in the old data with a read, and fills
> > the new data with uiomove.
> > 3) After the file gets close()ed, VOP_INACTIVE calls vn_fsync_buf to
> > flush dirty buffers.
> > 4) VOP_STRATEGY successfully writes the first buffer and frees it with
> > bufdone().
> > 5) VOP_STRATEGY tries to write the second buffer, but panics during
> > uiomove.  The address that caused the panic is always exactly 4KB into
> > the buffer.
> >
> > So what am I doing wrong?  The address that causes the panic in step 5
> > was successfully accessed in step 2, so this isn't some kind of buffer
> > overrun.  Does it have something to do with the fact that the read
> > operation in step 2 called bufdone()?  Seems unlikely because it did
> > that for both buffers, yet only the second one panics.  Or does the
> > address actually fault during both VOP_WRITE and VOP_STRATEGY, but
> > something low down handles the fault in the first case?  I'd be
> > grateful for any help that anyone can offer.
> > -Alan
> >
> > P.S.
> > Here's the panic's stack
> > panic: vm_fault_hold: fault on nofault entry, addr: 0xfffffe0004591000
> > cpuid = 1
> > time = 1560283621
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0031c21f80
> > vpanic() at vpanic+0x19d/frame 0xfffffe0031c21fd0
> > panic() at panic+0x43/frame 0xfffffe0031c22030
> > vm_fault_hold() at vm_fault_hold+0x2064/frame 0xfffffe0031c22170
> > vm_fault() at vm_fault+0x60/frame 0xfffffe0031c221b0
> > trap_pfault() at trap_pfault+0x188/frame 0xfffffe0031c22200
> > trap() at trap+0x2b4/frame 0xfffffe0031c22310
> > calltrap() at calltrap+0x8/frame 0xfffffe0031c22310
> > --- trap 0xc, rip = 0xffffffff8108c9e6, rsp = 0xfffffe0031c223e0, rbp
> > = 0xfffffe0031c223e0 ---
> > memmove_erms() at memmove_erms+0x116/frame 0xfffffe0031c223e0
> > uiomove_faultflag() at uiomove_faultflag+0x146/frame 0xfffffe0031c22420
> > fuse_write_directbackend() at fuse_write_directbackend+0x1cd/frame
> > 0xfffffe0031c224f0
> > fuse_io_strategy() at fuse_io_strategy+0x24d/frame 0xfffffe0031c22590
> > fuse_vnop_strategy() at fuse_vnop_strategy+0x2a/frame 0xfffffe0031c225a0
> > VOP_STRATEGY_APV() at VOP_STRATEGY_APV+0x63/frame 0xfffffe0031c225c0
> > bufstrategy() at bufstrategy+0x44/frame 0xfffffe0031c225f0
> > bufwrite() at bufwrite+0x259/frame 0xfffffe0031c22640
> > vn_fsync_buf() at vn_fsync_buf+0x23e/frame 0xfffffe0031c226a0
> > fuse_vnop_inactive() at fuse_vnop_inactive+0x7e/frame 0xfffffe0031c226e0
> > VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0x63/frame 0xfffffe0031c22700
> > vinactive() at vinactive+0xcd/frame 0xfffffe0031c22750
> > vputx() at vputx+0x2d0/frame 0xfffffe0031c227b0
> > vn_close1() at vn_close1+0x116/frame 0xfffffe0031c22820
> > vn_closefile() at vn_closefile+0x4c/frame 0xfffffe0031c228a0
> > _fdrop() at _fdrop+0x1a/frame 0xfffffe0031c228c0
> > closef() at closef+0x1ec/frame 0xfffffe0031c22950
> > closefp() at closefp+0x9c/frame 0xfffffe0031c22990
> > amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe0031c22ab0
> > fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0031c22ab0
> > --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8006842ba, rsp =
> > 0x7fffffffe748, rbp = 0x7fffffffe760 ---
> > KDB: enter: panic
> Start with dumping core.  Then print out the struct buf and show it.

Thanks for the tip.  I think I've figured it out: after VOP_WRITE but
before VOP_INACTIVE a VOP_SETATTR was truncating the file.  And a
legacy of fuse_io.c's origins as a copy/paste of the NFS client is
that it has are two different ways to track the valid region of a buf.
-Alan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAOtMX2gQKz%2Bw%2BkTO5zAk32S3Xz7O68c=Fd9nth%2BAHzDy-_JL1w>