Date: Fri, 30 Dec 2022 14:06:38 -0800 From: Rick Macklem <rick.macklem@gmail.com> To: Rob Wing <rob.fx907@gmail.com> Cc: John F Carr <jfc@mit.edu>, Hikmat Jafarli <jafarlihi@gmail.com>, "freebsd-fs@freebsd.org" <freebsd-fs@freebsd.org> Subject: Re: Trying to implement BFS, page fault at vfs_domount_first, how to debug? Message-ID: <CAM5tNy5p-0vtPyU2aTFCdJ50-sbK4uqezzXp%2Bsg9WqOwGbEumQ@mail.gmail.com> In-Reply-To: <CAF3%2Bn_eVEUz_Qmf-eU4T-UaLmKLsRf_x8JMv7pHFcmUJC2SZLg@mail.gmail.com> References: <CAPWrP-Y3usfDukwhQroJY0NUbZK_C=cuctm%2BXYSjBqDQYejBWw@mail.gmail.com> <23A1E4DF-320A-4BCB-ADB8-83FEFC3D7649@mit.edu> <CAF3%2Bn_eVEUz_Qmf-eU4T-UaLmKLsRf_x8JMv7pHFcmUJC2SZLg@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000059449005f112d1f8 Content-Type: text/plain; charset="UTF-8" On Fri, Dec 30, 2022 at 11:48 AM Rob Wing <rob.fx907@gmail.com> wrote: > you might try `addr2line -e $path_to_kernel 0xffffffff80cf0651` > Just a note. I find I need the kernel called "kernel.debug" for addr2line to work. It normally lives in the kernel build directory under /usr/obj. rick > > Aside from that, it looks like errors aren't being handled correctly after > failing to find the BFS superblock in bfs_mountfs(). Since no error is > returned after failing to find the superblock..I'm guessing that the NULL > pointer `bfsmp` is being de-referenced in bfs_statfs(). > > On Fri, Dec 30, 2022 at 10:35 AM John F Carr <jfc@mit.edu> wrote: > >> >> >> > On Dec 30, 2022, at 14:13, Hikmat Jafarli <jafarlihi@gmail.com> wrote: >> > >> > I'm trying to implement the BeOS filesystem (BFS) for FreeBSD. >> > The repository is here: https://github.com/jafarlihi/freebsd-bfs >> > (Please don't mind bad styling and all the copy-paste work, >> > I'll polish it later, I'm just trying to get to some PoC where it works) >> > >> > Now when I try to mount a valid BFS partition (reported as BFS by >> `fstyp`) >> > it executes all the way to printf that logs "Either not a BFS volume or >> > corrupted" and then crashes with "page fault while in kernel mode" in >> > vfs_domount_first+0x271. Here's the log: >> > ``` >> > Either not a BFS volume or corrupted >> > >> > Fatal trap 12: page fault while in kernel mode >> > cpuid = 0; apic id = 00 >> > fault virtual address = 0x18 >> > fault code = supervisor read data, page not present >> > instruction pointer = 0x20:0xffffffff82b2427b >> > stack pointer = 0x28:0xfffffe00df399ac0 >> > frame pointer = 0x28:0xfffffe00df399ac0 >> > code segment = base 0x0, limit 0xfffff, type 0x1b >> > = DPL 0, pres 1, long 1, def32 0, gran 1 >> > processor eflags = interrupt enabled, resume, IOPL = 0 >> > current process = 1208 (mount) >> > trap number = 12 >> > panic: page fault >> > cpuid = 0 >> > time = 1672414952 >> > KDB: stack backtrace: >> > #0 0xffffffff80c694a5 at kdb_backtrace+0x65 >> > #1 0xffffffff80c1bb5f at vpanic+0x17f >> > #2 0xffffffff80c1b9d3 at panic+0x43 >> > #3 0xffffffff810afdf5 at trap_fatal+0x385 >> > #4 0xffffffff810afe4f at trap_pfault+0x4f >> > #5 0xffffffff810875b8 at calltrap+0x8 >> > #6 0xffffffff80cf0651 at vfs_domount_first+0x271 >> > #7 0xffffffff80cece9d at vfs_domount+0x2ad >> > #8 0xffffffff80cec2d8 at vfs_donmount+0x8f8 >> > #9 0xffffffff80ceb9a9 at sys_nmount+0x69 >> > #10 0xffffffff810b06ec at amd64_syscall+0x10c >> > #11 0xffffffff81087ecb at fast_syscall_common+0xf8 >> > ``` >> > >> > Now I'm trying to understand what exactly goes wrong here >> > and how to map 0x271 to the exact source line. >> > >> > I'd appreciate it if someone could tell me how to debug this. >> > >> > (Sorry for noob question, I already tried IRC and was directed here) >> >> Your BFS module tried to dereference a null pointer to structure. >> >> It's a null pointer dereference because of "fault virtual address = >> 0x18". That normally means you tried to access the fourth word of a >> structure but the pointer to structure was null. It could be something >> else, but play the odds. >> >> It's in your module because the instruction pointer address is far beyond >> the other kernel functions in the stack trace. Stack traces in crash >> reports are misleading: they tend to omit the function that triggered the >> crash. The address of vfs_domount_first is 0xffffffff80cf03e0 >> (0xffffffff80cf0651 - 0x271). That's the function that called your >> module. The address of the faulting instruction is 0xffffffff82b2427b. >> That's in your module. >> >> >> >> --00000000000059449005f112d1f8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"fon= t-family:monospace"><br></div></div><br><div class=3D"gmail_quote"><div dir= =3D"ltr" class=3D"gmail_attr">On Fri, Dec 30, 2022 at 11:48 AM Rob Wing <= ;<a href=3D"mailto:rob.fx907@gmail.com">rob.fx907@gmail.com</a>> wrote:<= br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e= x;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"= ><div>you might try `addr2line -e $path_to_kernel 0xffffffff80cf0651`</div>= </div></blockquote><div><span class=3D"gmail_default" style=3D"font-family:= monospace">Just a note. I find I need the kernel called "kernel.debug&= quot; for</span></div><div><span class=3D"gmail_default" style=3D"font-fami= ly:monospace">addr2line to work. It normally lives in the kernel build dire= ctory</span></div><div><span class=3D"gmail_default" style=3D"font-family:m= onospace">under /usr/obj.</span></div><div><span class=3D"gmail_default" st= yle=3D"font-family:monospace"><br></span></div><div><span class=3D"gmail_de= fault" style=3D"font-family:monospace">rick</span></div><div><span class=3D= "gmail_default" style=3D"font-family:monospace"></span>=C2=A0</div><blockqu= ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px= solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div><br></div><= div>Aside from that, it looks like errors aren't being handled correctl= y after failing to find the BFS superblock in bfs_mountfs(). Since no error= is returned after failing to find the superblock..I'm guessing that th= e NULL pointer `bfsmp` is being de-referenced in bfs_statfs().</div></div><= br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri,= Dec 30, 2022 at 10:35 AM John F Carr <<a href=3D"mailto:jfc@mit.edu" ta= rget=3D"_blank">jfc@mit.edu</a>> wrote:<br></div><blockquote class=3D"gm= ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,= 204,204);padding-left:1ex"><br> <br> > On Dec 30, 2022, at 14:13, Hikmat Jafarli <<a href=3D"mailto:jafarl= ihi@gmail.com" target=3D"_blank">jafarlihi@gmail.com</a>> wrote:<br> > <br> > I'm trying to implement the BeOS filesystem (BFS) for FreeBSD.<br> > The repository is here: <a href=3D"https://github.com/jafarlihi/freebs= d-bfs" rel=3D"noreferrer" target=3D"_blank">https://github.com/jafarlihi/fr= eebsd-bfs</a><br> > (Please don't mind bad styling and all the copy-paste work,<br> > I'll polish it later, I'm just trying to get to some PoC where= it works)<br> > <br> > Now when I try to mount a valid BFS partition (reported as BFS by `fst= yp`)<br> > it executes all the way to printf that logs "Either not a BFS vol= ume or<br> > corrupted" and then crashes with "page fault while in kernel= mode" in<br> > vfs_domount_first+0x271. Here's the log:<br> > ```<br> > Either not a BFS volume or corrupted<br> > <br> > Fatal trap 12: page fault while in kernel mode<br> > cpuid =3D 0; apic id =3D 00<br> > fault virtual address =3D 0x18<br> > fault code =3D supervisor read data, page not present<br> > instruction pointer =3D 0x20:0xffffffff82b2427b<br> > stack pointer=C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D 0x28:0xfffffe00df399ac0<b= r> > frame pointer=C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D 0x28:0xfffffe00df399ac0<b= r> > code segment =3D base 0x0, limit 0xfffff, type 0x1b<br> > =3D DPL 0, pres 1, long 1, def32 0, gran 1<br> > processor eflags =3D interrupt enabled, resume, IOPL =3D 0<br> > current process =3D 1208 (mount)<br> > trap number =3D 12<br> > panic: page fault<br> > cpuid =3D 0<br> > time =3D 1672414952<br> > KDB: stack backtrace:<br> > #0 0xffffffff80c694a5 at kdb_backtrace+0x65<br> > #1 0xffffffff80c1bb5f at vpanic+0x17f<br> > #2 0xffffffff80c1b9d3 at panic+0x43<br> > #3 0xffffffff810afdf5 at trap_fatal+0x385<br> > #4 0xffffffff810afe4f at trap_pfault+0x4f<br> > #5 0xffffffff810875b8 at calltrap+0x8<br> > #6 0xffffffff80cf0651 at vfs_domount_first+0x271<br> > #7 0xffffffff80cece9d at vfs_domount+0x2ad<br> > #8 0xffffffff80cec2d8 at vfs_donmount+0x8f8<br> > #9 0xffffffff80ceb9a9 at sys_nmount+0x69<br> > #10 0xffffffff810b06ec at amd64_syscall+0x10c<br> > #11 0xffffffff81087ecb at fast_syscall_common+0xf8<br> > ```<br> > <br> > Now I'm trying to understand what exactly goes wrong here<br> > and how to map 0x271 to the exact source line.<br> > <br> > I'd appreciate it if someone could tell me how to debug this.<br> > <br> > (Sorry for noob question, I already tried IRC and was directed here)<b= r> <br> Your BFS module tried to dereference a null pointer to structure.<br> <br> It's a null pointer dereference because of "fault virtual address = =3D 0x18".=C2=A0 That normally means you tried to access the fourth wo= rd of a structure but the pointer to structure was null.=C2=A0 It could be = something else, but play the odds.<br> <br> It's in your module because the instruction pointer address is far beyo= nd the other kernel functions in the stack trace.=C2=A0 Stack traces in cra= sh reports are misleading: they tend to omit the function that triggered th= e crash.=C2=A0 The address of vfs_domount_first is 0xffffffff80cf03e0 (0xff= ffffff80cf0651 - 0x271).=C2=A0 That's the function that called your mod= ule.=C2=A0 The address of the faulting instruction is 0xffffffff82b2427b.= =C2=A0 That's in your module.<br> <br> <br> <br> </blockquote></div> </blockquote></div></div> --00000000000059449005f112d1f8--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAM5tNy5p-0vtPyU2aTFCdJ50-sbK4uqezzXp%2Bsg9WqOwGbEumQ>