Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 11 Aug 2018 12:05:25 +0000
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        "freebsd-current@FreeBSD.org" <freebsd-current@FreeBSD.org>, "peter@holm.cc" <peter@holm.cc>
Subject:   Re: ffs_truncate3 panics
Message-ID:  <YTOPR0101MB18203F316E929C7AF7576BF0DD3B0@YTOPR0101MB1820.CANPRD01.PROD.OUTLOOK.COM>
In-Reply-To: <20180810172941.GA2113@kib.kiev.ua>
References:  <YTOPR0101MB18206289DDED97BE9DD38D14DD270@YTOPR0101MB1820.CANPRD01.PROD.OUTLOOK.COM> <20180807131445.GC1884@kib.kiev.ua> <YTOPR0101MB18207C97903D3058A15091FFDD260@YTOPR0101MB1820.CANPRD01.PROD.OUTLOOK.COM> <20180808221647.GH1884@kib.kiev.ua> <YTOPR0101MB18205CDE34ABCC2345C3F172DD250@YTOPR0101MB1820.CANPRD01.PROD.OUTLOOK.COM> <20180809111004.GK1884@kib.kiev.ua> <YTOPR0101MB182067FF6F908E0B2EBAE3D1DD250@YTOPR0101MB1820.CANPRD01.PROD.OUTLOOK.COM>, <20180810172941.GA2113@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
Konstantin Belousov wrote:
>On Thu, Aug 09, 2018 at 08:38:50PM +0000, Rick Macklem wrote:
>> >BTW, does NFS server use extended attributes ?  What for ?  Can you, pl=
ease,
>> >point out the code which does this ?
>> For the pNFS service, there are two system namespace extended attributes=
 for
>> each file stored on the service.
>> pnfsd.dsfile - Stores where the data for the file is. Can be displayed b=
y the
>>      pnfsdsfile(8) command.
>>
>> pnfsd.dsattr - Cached attributes that change when a file is written (siz=
e, mtime,
>> change) so that the MDS doesn't have to do a Getattr on the data server =
for every client Getattr.
>>
>
>My reading of the nfsd code + ffs extattr handling reminds me that you
>already reported this issue some time ago.  I suspected ufs_balloc() at
>that time.
Yes. I had almost forgotten about them, because I have been testing with a
couple of machines (not big, but amd64 with a few Gbytes of RAM) and they
never hit the panic(). Recently, I've been using the 256Mbyte i386 and star=
ted
seeing them again.

>Now I think that the situation with the stray buffers hanging on the
>queue is legitimate, ffs_extread() might create such buffer and release
>it to a clean queue, then removal of the file would see inode with no
>allocated ext blocks but with the buffer.
>
>I think the easiest way to handle it is to always flush buffers and pages
>in the ext attr range, regardless of the number of allocated ext blocks.
>Patch below was not tested.
[patch deleted for brevity]
Well, the above sounds reasonable, but the patch didn't help.
Here's a small portion of the log a test run last night.
- First, a couple of things about the printf()s. When they start with "CL=
=3D<N>",
  the printf() is at the start of ffs_truncate(). "<N>" is a static counter=
 of calls to
  ffs_truncate(), so "same value" indicates same call.


CL=3D31816 flags=3D0xc00 vtyp=3D1 bodirty=3D0 boclean=3D1 diextsiz=3D320
buf at 0x429f260
b_flags =3D 0x20001020<vmio,reuse,cache>, b_xflags=3D0x2<clean>, b_vflags=
=3D0x0
b_error =3D 0, b_bufsize =3D 4096, b_bcount =3D 4096, b_resid =3D 0
b_bufobj =3D (0xfa3f734), b_data =3D 0x4c90000, b_blkno =3D -1, b_lblkno =
=3D -1, b_dep =3D 0
b_kvabase =3D 0x4c90000, b_kvasize =3D 32768

CL=3D34593 flags=3D0xc00 vtyp=3D1 bodirty=3D0 boclean=3D1 diextsiz=3D320
buf at 0x429deb0
b_flags =3D 0x20001020<vmio,reuse,cache>, b_xflags=3D0x2<clean>, b_vflags=
=3D0x0
b_error =3D 0, b_bufsize =3D 4096, b_bcount =3D 4096, b_resid =3D 0
b_bufobj =3D (0xfd3da94), b_data =3D 0x5700000, b_blkno =3D -1, b_lblkno =
=3D -1, b_dep =3D 0
b_kvabase =3D 0x5700000, b_kvasize =3D 32768

FFST3=3D34593 vtyp=3D1 bodirty=3D0 boclean=3D1
buf at 0x429deb0
b_flags =3D 0x20001020<vmio,reuse,cache>, b_xflags=3D0x2<clean>, b_vflags=
=3D0x0
b_error =3D 0, b_bufsize =3D 4096, b_bcount =3D 4096, b_resid =3D 0
b_bufobj =3D (0xfd3da94), b_data =3D 0x5700000, b_blkno =3D -1, b_lblkno =
=3D -1, b_dep =3D 0
b_kvabase =3D 0x5700000, b_kvasize =3D 32768

So, the first one is what typically happens and there would be no panic().
 The second/third would be a panic(), since the one that starts with "FFST3=
"
is a printf() that replaces the panic() call.
- Looking at the second/third, the number at the beginning is the same, so =
it is
  the same call, but for some reason, between the start of the function and
  where the ffs_truncate3 panic() test is, di_extsize has been set to 0, bu=
t the
  buffer is still there (or has been re-created there by another thread?).

Looking at the code, I can't see how this could happen, since there is a vi=
nvalbuf()
call after the only place in the code that sets di_extsize =3D=3D 0, from w=
hat I can see?
I am going to add printf()s after the vinvalbuf() calls, to make sure they =
are
happening and getting rid of the buffer.

If another thread could somehow (re)create the buffer concurrently with the
ffs_truncate() call, that would explain it, I think?

Just a wild guess, but I suspect softdep_slowdown() is flipping, due to the=
 small
size of the machine and this makes the behaviour of ffs_truncate() confusin=
g.

I'll post again when I have more info.
Thanks for looking at it, rick




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?YTOPR0101MB18203F316E929C7AF7576BF0DD3B0>