Date: Thu, 19 Jun 2025 15:03:53 -0400 From: Sanchit Sahay <ss19723@nyu.edu> To: Konstantin Belousov <kostikbel@gmail.com> Cc: freebsd-fs@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: Corrupted bp->b_lblkno on bread() // Life-cycle of a buf obj? Message-ID: <CAJ4siUBfmq6fayMM_1WPLrudzYXO1kTr4YSD-LRJJmDRCp-xjQ@mail.gmail.com> In-Reply-To: <aFRZ-Q62-WCx1Z7D@kib.kiev.ua> References: <CAJ4siUBgGbeDKO8%2BW5JULfW8U0oLO6=xhjTr-utxuqV3N3Fnkg@mail.gmail.com> <aFRZ-Q62-WCx1Z7D@kib.kiev.ua>
index | next in thread | previous in thread | raw e-mail
[-- Attachment #1 --]
> There is something strange in the sentence. First you claim that
> b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some
> random value.
Apologies for the confusing phrasing. What I meant by this is that
pre-calling VOP_STRATEGY blkno and lblkno are the same (both are 0 in this
particular case), which implies there needs to be a bmap call.
> And this smells like an KBI (Kernel Binary Interface) issue, since
DEBUG_LOCKS
> changes the layout of the struct lock, which is embedded into struct buf
> with which you have problems.
> How do you build your fs code? As a module? If yes, you must use the same
> set of opt_*.h headers as used for the kernel build.
I think this might be it, I am building it as a kmod and hadn't taken the
changed struct into account. Will try including these headers. Was starting
to see similar behaviour creep up in a different code path as well. Thanks
for the help!
On Thu, 19 Jun 2025 at 14:42, Konstantin Belousov <kostikbel@gmail.com>
wrote:
> On Tue, Jun 17, 2025 at 11:07:49PM -0400, Sanchit Sahay wrote:
> > I'm working on porting a filesystem to FreeBSD, and am running into an
> > issue that I'm having difficulty debugging. Any help would be
> appreciated.
> >
> > When calling bread() with an blkno=lblkno, by the time the flow of the
> > control reaches the vop_strategy function, the value of lblkno changes
> from
> > 0 to a seemingly random value.
> There is something strange in the sentence. First you claim that
> b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some
> random value.
>
> So, is it 0 or b_blkno?
>
> >
> > Having inspected this with gdb,
> >
> > On frame 9:
> >
> > #9 0xffff0000c3e72930 in hfs_strategy ()
> > 1488 kdb_enter("lblk random", "lblk random");
> >
> > *(kgdb) p ap->a_bp->b_lblkno$10 = -281474971149872*
> >
> > On frame 10:
> >
> > #10 0xffff0000009387b0 in VOP_STRATEGY_APV () at vnode_if.c:2423
> > 2423 rc = vop->vop_strategy(a);
> >
> > *(kgdb) p a->a_bp->b_lblkno$11 = 0*
> And the same pattern occurs there.
>
> >
> > This flow is triggered when calling bread() like so:
> >
> > retval = bread(vp, blockNum, block->blockSize, NOCRED, &bp);
> >
> > The stack trace is:
> >
> > #9 0xffff0000c3e72930 in hfs_strategy (ap=0xffff00009bbd1058)
> > #10 0xffff0000009387b0 in VOP_STRATEGY_APV (
> > #11 0xffff00000054bbcc in VOP_STRATEGY (vp=0xffff000000a08fc5,
> > #12 bufstrategy (bo=<optimized out>, bp=0xffff0000404990c8)
> > #13 0xffff00000054d6f0 in bstrategy (bp=0xffff0000404990c8)
> > #14 breadn_flags
> >
> > There seems to be no code run between these two stacks, the a_bp in both
> > these frames points to the same memory address. No other fields are
> > modified between these two frames.
> >
> > Because of this seemingly random lblkno value, VOP_BMAP is not triggered,
> > and the read returns arbitrary results.
> >
> > This issue only occurs when I have the kernel compiled with these
> > additional flags (as suggested by the handbook for debugging deadlocks):
> >
> > options INVARIANTS
> > options INVARIANT_SUPPORT
> > options WITNESS
> > options WITNESS_SKIPSPIN
> > options DEBUG_LOCKS
> > options DEBUG_VFS_LOCKS
> > options DIAGNOSTIC
> >
> > Without these flags enabled, this lblkno corruption does not take place,
> > and the bread returns a valid read. I don't see any conditional code that
> > these flags enable which would cause such an issue.
> And this smells like an KBI (Kernel Binary Interface) issue, since
> DEBUG_LOCKS
> changes the layout of the struct lock, which is embedded into struct buf
> with which you have problems.
>
> How do you build your fs code? As a module? If yes, you must use the same
> set of opt_*.h headers as used for the kernel build.
>
> >
> > Any tips on how to investigate this further would be greatly appreciated,
> > or if I am missing something about the lifecycle of the buffer object
> that
> > might cause it to "reset" certain fields.
> >
> > Thanks
> > Sanchit Sahay
>
[-- Attachment #2 --]
<div dir="ltr"><div dir="auto"><div>> There is something strange in the sentence. First you claim that</div>> b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some<br>> random value.<div><br></div><div dir="auto"><span style="font-size:16px;font-style:normal;font-weight:400;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:1px;text-decoration:none;float:none;display:inline!important;background-color:rgba(0,0,0,0);border-color:rgb(49,49,49);color:rgb(49,49,49)">Apologies for the confusing phrasing. What I meant by this is that pre-calling VOP_STRATEGY blkno and lblkno are the same (both are 0 in this particular case), which implies there needs to be a bmap call. </span></div><div dir="auto"><span style="font-size:16px;font-style:normal;font-weight:400;letter-spacing:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:1px;text-decoration:none;float:none;display:inline!important;background-color:rgba(0,0,0,0);border-color:rgb(49,49,49);color:rgb(49,49,49)"><br></span></div><div dir="auto">> And this smells like an KBI (Kernel Binary Interface) issue, since DEBUG_LOCKS<br>> changes the layout of the struct lock, which is embedded into struct buf<br>> with which you have problems.<br><br>> How do you build your fs code? As a module? If yes, you must use the same<br>> set of opt_*.h headers as used for the kernel build.</div></div><div><br><font color="#313131"><span style="font-size:16px;word-spacing:1px">I think this might be it, I am building it as a kmod and hadn't taken the changed struct into account. Will try including these headers. Was starting to see similar behaviour creep up in a different code path as well. Thanks for the help!</span></font></div></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 19 Jun 2025 at 14:42, Konstantin Belousov <<a href="mailto:kostikbel@gmail.com" target="_blank">kostikbel@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, Jun 17, 2025 at 11:07:49PM -0400, Sanchit Sahay wrote:<br>
> I'm working on porting a filesystem to FreeBSD, and am running into an<br>
> issue that I'm having difficulty debugging. Any help would be appreciated.<br>
> <br>
> When calling bread() with an blkno=lblkno, by the time the flow of the<br>
> control reaches the vop_strategy function, the value of lblkno changes from<br>
> 0 to a seemingly random value.<br>
There is something strange in the sentence. First you claim that<br>
b_blkno == b_lblkno, then you claim thant b_lbkno changes from 0 to some<br>
random value.<br>
<br>
So, is it 0 or b_blkno?<br>
<br>
> <br>
> Having inspected this with gdb,<br>
> <br>
> On frame 9:<br>
> <br>
> #9 0xffff0000c3e72930 in hfs_strategy ()<br>
> 1488 kdb_enter("lblk random", "lblk random");<br>
> <br>
> *(kgdb) p ap->a_bp->b_lblkno$10 = -281474971149872*<br>
> <br>
> On frame 10:<br>
> <br>
> #10 0xffff0000009387b0 in VOP_STRATEGY_APV () at vnode_if.c:2423<br>
> 2423 rc = vop->vop_strategy(a);<br>
> <br>
> *(kgdb) p a->a_bp->b_lblkno$11 = 0*<br>
And the same pattern occurs there.<br>
<br>
> <br>
> This flow is triggered when calling bread() like so:<br>
> <br>
> retval = bread(vp, blockNum, block->blockSize, NOCRED, &bp);<br>
> <br>
> The stack trace is:<br>
> <br>
> #9 0xffff0000c3e72930 in hfs_strategy (ap=0xffff00009bbd1058)<br>
> #10 0xffff0000009387b0 in VOP_STRATEGY_APV (<br>
> #11 0xffff00000054bbcc in VOP_STRATEGY (vp=0xffff000000a08fc5,<br>
> #12 bufstrategy (bo=<optimized out>, bp=0xffff0000404990c8)<br>
> #13 0xffff00000054d6f0 in bstrategy (bp=0xffff0000404990c8)<br>
> #14 breadn_flags<br>
> <br>
> There seems to be no code run between these two stacks, the a_bp in both<br>
> these frames points to the same memory address. No other fields are<br>
> modified between these two frames.<br>
> <br>
> Because of this seemingly random lblkno value, VOP_BMAP is not triggered,<br>
> and the read returns arbitrary results.<br>
> <br>
> This issue only occurs when I have the kernel compiled with these<br>
> additional flags (as suggested by the handbook for debugging deadlocks):<br>
> <br>
> options INVARIANTS<br>
> options INVARIANT_SUPPORT<br>
> options WITNESS<br>
> options WITNESS_SKIPSPIN<br>
> options DEBUG_LOCKS<br>
> options DEBUG_VFS_LOCKS<br>
> options DIAGNOSTIC<br>
> <br>
> Without these flags enabled, this lblkno corruption does not take place,<br>
> and the bread returns a valid read. I don't see any conditional code that<br>
> these flags enable which would cause such an issue.<br>
And this smells like an KBI (Kernel Binary Interface) issue, since DEBUG_LOCKS<br>
changes the layout of the struct lock, which is embedded into struct buf<br>
with which you have problems.<br>
<br>
How do you build your fs code? As a module? If yes, you must use the same<br>
set of opt_*.h headers as used for the kernel build.<br>
<br>
> <br>
> Any tips on how to investigate this further would be greatly appreciated,<br>
> or if I am missing something about the lifecycle of the buffer object that<br>
> might cause it to "reset" certain fields.<br>
> <br>
> Thanks<br>
> Sanchit Sahay<br>
</blockquote></div></div>
help
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ4siUBfmq6fayMM_1WPLrudzYXO1kTr4YSD-LRJJmDRCp-xjQ>
