Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Jun 2025 01:45:46 -0400
From:      Sanchit Sahay <ss19723@nyu.edu>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org
Subject:   Re: Corrupted bp->b_lblkno on bread() // Life-cycle of a buf obj?
Message-ID:  <CAJ4siUB8KBH1xTk6K57j7frxv_u%2BK_n3OhLTUUQamCPo95ce9w@mail.gmail.com>
In-Reply-To: <aFRjDcPlA7lEHm0S@kib.kiev.ua>
References:  <CAJ4siUBgGbeDKO8%2BW5JULfW8U0oLO6=xhjTr-utxuqV3N3Fnkg@mail.gmail.com> <aFRZ-Q62-WCx1Z7D@kib.kiev.ua> <CAJ4siUBfmq6fayMM_1WPLrudzYXO1kTr4YSD-LRJJmDRCp-xjQ@mail.gmail.com> <aFRjDcPlA7lEHm0S@kib.kiev.ua>

index | next in thread | previous in thread | raw e-mail

[-- Attachment #1 --]
I did end up using KERNBUILDDIR in the makefile, and it seems to do the
trick. The KBI was indeed the issue as I confirmed by comparing the
structs on gbd prior to rebuilding my module. Thanks!

On Thu, Jun 19, 2025 at 3:21 PM Konstantin Belousov <kostikbel@gmail.com>
wrote:

> On Thu, Jun 19, 2025 at 03:03:53PM -0400, Sanchit Sahay wrote:
> > > There is something strange in the sentence.  First you claim that
> > > b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to
> some
> > > random value.
> >
> > Apologies for the confusing phrasing. What I meant by this is that
> > pre-calling VOP_STRATEGY blkno and lblkno are the same (both are 0 in
> this
> > particular case), which implies there needs to be a bmap call.
> >
> > > And this smells like an KBI (Kernel Binary Interface) issue, since
> > DEBUG_LOCKS
> > > changes the layout of the struct lock, which is embedded into struct
> buf
> > > with which you have problems.
> >
> > > How do you build your fs code? As a module?  If yes, you must use the
> same
> > > set of opt_*.h headers as used for the kernel build.
> >
> > I think this might be it, I am building it as a kmod and hadn't taken the
> > changed struct into account. Will try including these headers. Was
> starting
> > to see similar behaviour creep up in a different code path as well.
> Thanks
> > for the help!
>
> How do you intend to include them?
> The right way, if you build your module out of tree, is to do
> something like the following:
>
> make -C <module src dir> SYSDIR=<kernel sources path> KERNBUILDDIR=<config
> output path>
>
> i.e. KERNBUILDDIR should point to the directory where config(8) put
> the generated files, most important are opt_*.h.
>
> >
> > On Thu, 19 Jun 2025 at 14:42, Konstantin Belousov <kostikbel@gmail.com>
> > wrote:
> >
> > > On Tue, Jun 17, 2025 at 11:07:49PM -0400, Sanchit Sahay wrote:
> > > > I'm working on porting a filesystem to FreeBSD, and am running into
> an
> > > > issue that I'm having difficulty debugging. Any help would be
> > > appreciated.
> > > >
> > > > When calling bread() with an blkno=lblkno, by the time the flow of
> the
> > > > control reaches the vop_strategy function, the value of lblkno
> changes
> > > from
> > > > 0 to a seemingly random value.
> > > There is something strange in the sentence.  First you claim that
> > > b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to
> some
> > > random value.
> > >
> > > So, is it 0 or b_blkno?
> > >
> > > >
> > > > Having inspected this with gdb,
> > > >
> > > > On frame 9:
> > > >
> > > > #9  0xffff0000c3e72930 in hfs_strategy ()
> > > > 1488            kdb_enter("lblk random", "lblk random");
> > > >
> > > > *(kgdb) p ap->a_bp->b_lblkno$10 = -281474971149872*
> > > >
> > > > On frame 10:
> > > >
> > > > #10 0xffff0000009387b0 in VOP_STRATEGY_APV () at vnode_if.c:2423
> > > > 2423                    rc = vop->vop_strategy(a);
> > > >
> > > > *(kgdb) p a->a_bp->b_lblkno$11 = 0*
> > > And the same pattern occurs there.
> > >
> > > >
> > > > This flow is triggered when calling bread() like so:
> > > >
> > > > retval = bread(vp, blockNum, block->blockSize, NOCRED, &bp);
> > > >
> > > > The stack trace is:
> > > >
> > > > #9  0xffff0000c3e72930 in hfs_strategy (ap=0xffff00009bbd1058)
> > > > #10 0xffff0000009387b0 in VOP_STRATEGY_APV (
> > > > #11 0xffff00000054bbcc in VOP_STRATEGY (vp=0xffff000000a08fc5,
> > > > #12 bufstrategy (bo=<optimized out>, bp=0xffff0000404990c8)
> > > > #13 0xffff00000054d6f0 in bstrategy (bp=0xffff0000404990c8)
> > > > #14 breadn_flags
> > > >
> > > > There seems to be no code run between these two stacks, the a_bp in
> both
> > > > these frames points to the same memory address. No other fields are
> > > > modified between these two frames.
> > > >
> > > > Because of this seemingly random lblkno value, VOP_BMAP is not
> triggered,
> > > > and the read returns arbitrary results.
> > > >
> > > > This issue only occurs when I have the kernel compiled with these
> > > > additional flags (as suggested by the handbook for debugging
> deadlocks):
> > > >
> > > > options INVARIANTS
> > > > options INVARIANT_SUPPORT
> > > > options WITNESS
> > > > options WITNESS_SKIPSPIN
> > > > options DEBUG_LOCKS
> > > > options DEBUG_VFS_LOCKS
> > > > options DIAGNOSTIC
> > > >
> > > > Without these flags enabled, this lblkno corruption does not take
> place,
> > > > and the bread returns a valid read. I don't see any conditional code
> that
> > > > these flags enable which would cause such an issue.
> > > And this smells like an KBI (Kernel Binary Interface) issue, since
> > > DEBUG_LOCKS
> > > changes the layout of the struct lock, which is embedded into struct
> buf
> > > with which you have problems.
> > >
> > > How do you build your fs code? As a module?  If yes, you must use the
> same
> > > set of opt_*.h headers as used for the kernel build.
> > >
> > > >
> > > > Any tips on how to investigate this further would be greatly
> appreciated,
> > > > or if I am missing something about the lifecycle of the buffer object
> > > that
> > > > might cause it to "reset" certain fields.
> > > >
> > > > Thanks
> > > > Sanchit Sahay
> > >
>

[-- Attachment #2 --]
<div dir="ltr">I did end up using KERNBUILDDIR in the makefile, and it seems to do the trick. The KBI was indeed the issue as I confirmed by comparing the structs on gbd prior to rebuilding my module. Thanks! </div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Jun 19, 2025 at 3:21 PM Konstantin Belousov &lt;<a href="mailto:kostikbel@gmail.com">kostikbel@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Jun 19, 2025 at 03:03:53PM -0400, Sanchit Sahay wrote:<br>
&gt; &gt; There is something strange in the sentence.  First you claim that<br>
&gt; &gt; b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some<br>
&gt; &gt; random value.<br>
&gt; <br>
&gt; Apologies for the confusing phrasing. What I meant by this is that<br>
&gt; pre-calling VOP_STRATEGY blkno and lblkno are the same (both are 0 in this<br>
&gt; particular case), which implies there needs to be a bmap call.<br>
&gt; <br>
&gt; &gt; And this smells like an KBI (Kernel Binary Interface) issue, since<br>
&gt; DEBUG_LOCKS<br>
&gt; &gt; changes the layout of the struct lock, which is embedded into struct buf<br>
&gt; &gt; with which you have problems.<br>
&gt; <br>
&gt; &gt; How do you build your fs code? As a module?  If yes, you must use the same<br>
&gt; &gt; set of opt_*.h headers as used for the kernel build.<br>
&gt; <br>
&gt; I think this might be it, I am building it as a kmod and hadn&#39;t taken the<br>
&gt; changed struct into account. Will try including these headers. Was starting<br>
&gt; to see similar behaviour creep up in a different code path as well. Thanks<br>
&gt; for the help!<br>
<br>
How do you intend to include them?<br>
The right way, if you build your module out of tree, is to do<br>
something like the following:<br>
<br>
make -C &lt;module src dir&gt; SYSDIR=&lt;kernel sources path&gt; KERNBUILDDIR=&lt;config output path&gt;<br>
<br>
i.e. KERNBUILDDIR should point to the directory where config(8) put<br>
the generated files, most important are opt_*.h.<br>
<br>
&gt; <br>
&gt; On Thu, 19 Jun 2025 at 14:42, Konstantin Belousov &lt;<a href="mailto:kostikbel@gmail.com" target="_blank">kostikbel@gmail.com</a>&gt;<br>
&gt; wrote:<br>
&gt; <br>
&gt; &gt; On Tue, Jun 17, 2025 at 11:07:49PM -0400, Sanchit Sahay wrote:<br>
&gt; &gt; &gt; I&#39;m working on porting a filesystem to FreeBSD, and am running into an<br>
&gt; &gt; &gt; issue that I&#39;m having difficulty debugging. Any help would be<br>
&gt; &gt; appreciated.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; When calling bread() with an blkno=lblkno, by the time the flow of the<br>
&gt; &gt; &gt; control reaches the vop_strategy function, the value of lblkno changes<br>
&gt; &gt; from<br>
&gt; &gt; &gt; 0 to a seemingly random value.<br>
&gt; &gt; There is something strange in the sentence.  First you claim that<br>
&gt; &gt; b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some<br>
&gt; &gt; random value.<br>
&gt; &gt;<br>
&gt; &gt; So, is it 0 or b_blkno?<br>
&gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Having inspected this with gdb,<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; On frame 9:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; #9  0xffff0000c3e72930 in hfs_strategy ()<br>
&gt; &gt; &gt; 1488            kdb_enter(&quot;lblk random&quot;, &quot;lblk random&quot;);<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; *(kgdb) p ap-&gt;a_bp-&gt;b_lblkno$10 = -281474971149872*<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; On frame 10:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; #10 0xffff0000009387b0 in VOP_STRATEGY_APV () at vnode_if.c:2423<br>
&gt; &gt; &gt; 2423                    rc = vop-&gt;vop_strategy(a);<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; *(kgdb) p a-&gt;a_bp-&gt;b_lblkno$11 = 0*<br>
&gt; &gt; And the same pattern occurs there.<br>
&gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; This flow is triggered when calling bread() like so:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; retval = bread(vp, blockNum, block-&gt;blockSize, NOCRED, &amp;bp);<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; The stack trace is:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; #9  0xffff0000c3e72930 in hfs_strategy (ap=0xffff00009bbd1058)<br>
&gt; &gt; &gt; #10 0xffff0000009387b0 in VOP_STRATEGY_APV (<br>
&gt; &gt; &gt; #11 0xffff00000054bbcc in VOP_STRATEGY (vp=0xffff000000a08fc5,<br>
&gt; &gt; &gt; #12 bufstrategy (bo=&lt;optimized out&gt;, bp=0xffff0000404990c8)<br>
&gt; &gt; &gt; #13 0xffff00000054d6f0 in bstrategy (bp=0xffff0000404990c8)<br>
&gt; &gt; &gt; #14 breadn_flags<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; There seems to be no code run between these two stacks, the a_bp in both<br>
&gt; &gt; &gt; these frames points to the same memory address. No other fields are<br>
&gt; &gt; &gt; modified between these two frames.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Because of this seemingly random lblkno value, VOP_BMAP is not triggered,<br>
&gt; &gt; &gt; and the read returns arbitrary results.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; This issue only occurs when I have the kernel compiled with these<br>
&gt; &gt; &gt; additional flags (as suggested by the handbook for debugging deadlocks):<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; options INVARIANTS<br>
&gt; &gt; &gt; options INVARIANT_SUPPORT<br>
&gt; &gt; &gt; options WITNESS<br>
&gt; &gt; &gt; options WITNESS_SKIPSPIN<br>
&gt; &gt; &gt; options DEBUG_LOCKS<br>
&gt; &gt; &gt; options DEBUG_VFS_LOCKS<br>
&gt; &gt; &gt; options DIAGNOSTIC<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Without these flags enabled, this lblkno corruption does not take place,<br>
&gt; &gt; &gt; and the bread returns a valid read. I don&#39;t see any conditional code that<br>
&gt; &gt; &gt; these flags enable which would cause such an issue.<br>
&gt; &gt; And this smells like an KBI (Kernel Binary Interface) issue, since<br>
&gt; &gt; DEBUG_LOCKS<br>
&gt; &gt; changes the layout of the struct lock, which is embedded into struct buf<br>
&gt; &gt; with which you have problems.<br>
&gt; &gt;<br>
&gt; &gt; How do you build your fs code? As a module?  If yes, you must use the same<br>
&gt; &gt; set of opt_*.h headers as used for the kernel build.<br>
&gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Any tips on how to investigate this further would be greatly appreciated,<br>
&gt; &gt; &gt; or if I am missing something about the lifecycle of the buffer object<br>
&gt; &gt; that<br>
&gt; &gt; &gt; might cause it to &quot;reset&quot; certain fields.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Thanks<br>
&gt; &gt; &gt; Sanchit Sahay<br>
&gt; &gt;<br>
</blockquote></div>
help

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ4siUB8KBH1xTk6K57j7frxv_u%2BK_n3OhLTUUQamCPo95ce9w>