Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Jul 2018 14:42:29 +0200
From:      Alexander Leidinger <Alexander@leidinger.net>
To:        freebsd-current@freebsd.org
Subject:   Re: Deadlocks / hangs in ZFS
Message-ID:  <20180712144229.Horde.D_-hM4wiiKjbsuR-VROmvkZ@webmail.leidinger.net>
In-Reply-To: <20180604223108.Horde.RcVquaVKWdNzNidD_5aJz7E@webmail.leidinger.net>
References:  <20180522101749.Horde.Wxz9gSxx1xArxkYMQqTL0iZ@webmail.leidinger.net> <fa263af4-9bf7-88f8-8d23-21456daf7960@FreeBSD.org> <20180522122924.GC1954@zxy.spb.ru> <20180522161632.Horde.ROSnBoZixBoE9ZBGp5VBQgZ@webmail.leidinger.net> <20180522144055.GD1954@zxy.spb.ru> <20180527194159.v54ox3vlthpuvx4q@jo> <20180527220612.GK1926@zxy.spb.ru> <20180528090201.Horde._E4JZcuEaZHfj_BNzWjci2O@webmail.leidinger.net> <20180603211450.Horde.pI-Fom6S1tUcaHvTF4MUjin@webmail.leidinger.net> <20180603192814.GP1926@zxy.spb.ru> <20180604223108.Horde.RcVquaVKWdNzNidD_5aJz7E@webmail.leidinger.net>

next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format and has been PGP signed.

--=_2USRmY-6FpJfuASgA9OC6Zt
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable


Quoting Alexander Leidinger <Alexander@leidinger.net> (from Mon, 04=20=20
Jun=202018 22:31:08 +0200):

> Quoting Slawa Olhovchenkov <slw@zxy.spb.ru> (from Sun, 3 Jun 2018=20=20
>=2022:28:14 +0300):
>
>> On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote:
>>
>>> Quoting Alexander Leidinger <Alexander@leidinger.net> (from Mon, 28
>>> May 2018 09:02:01 +0200):
>>>
>>>> Quoting Slawa Olhovchenkov <slw@zxy.spb.ru> (from Mon, 28 May 2018
>>>> 01:06:12 +0300):
>>>>
>>>>> On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote:
>>>>>
>>>>>> On 05/22, Slawa Olhovchenkov wrote:
>>>>>>> > It has been a while since I tried Karl's patch the last time, and=
 I
>>>>>>> > stopped because it didn't apply to -current anymore at some point=
.
>>>>>>> > Will what is provided right now in the patch work on -current?
>>>>>>>
>>>>>>> I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g
>>>>>>> I am don't know how to have two distinct patch (for stable and
>>>>>>> current) in one review.
>>>>>>
>>>>>> I'm experiencing these issues sporadically as well, would you mind
>>>>>> to publish this patch for fresh current?
>>>>>
>>>>> Week ago I am adopt and publish patch to fresh current and stable, is
>>>>> adopt need again?
>>>>
>>>> I applied the patch in the review yesterday to rev 333966, it
>>>> applied OK (with some fuzz). I will try to reproduce my issue with
>>>> the patch.
>>>
>>> The behavior changed (or the system was long enough in this state
>>> without me noticing it). I have a panic now:
>>> panic: deadlkres: possible deadlock detected for 0xfffff803766db580,
>>> blocked for 1803003 ticks
>>
>> Hmm, may be first determinate locked function
>>
>> addr2line -ie /boot/kernel/kernel 0xfffff803766db580
>>
>> or
>>
>> kgdb
>> x/10i 0xfffff803766db580
>
> Both don'T produce any sensible output:
> (kgdb) x/10i 0xfffff803766db580
> 0xfffff803766db580:     subb   $0x80,-0x78(%rsi)
> 0xfffff803766db584:     (bad)
> 0xfffff803766db585:     (bad)
> 0xfffff803766db586:     (bad)
> 0xfffff803766db587:     incl   -0x7f7792(%rax)
> 0xfffff803766db58d:     (bad)
> 0xfffff803766db58e:     (bad)
> 0xfffff803766db58f:     incl   -0x7f7792(%rax)
> 0xfffff803766db595:     (bad)
> 0xfffff803766db596:     (bad)
>
>
> Seems I need to provoke a real kernel dump instead of a textdump for this=
.

Finally... time to recompile the kernel with crashdump-compress=20=20
support=20and changing from textdump to normal dump and to install a=20=20
recent=20gdb from ports...

The dump is with r336194 and the zfs patch as of 20180527.

---snip---
# kgdb -c /var/crash/vmcore.2 /boot/kernel/kernel
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.htm=
l>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...Reading symbols from=20=20
/usr/lib/debug//boot/kernel/kernel.debug...done.
done.

Unread=20portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid =3D 1; apic id =3D 01
fault virtual address   =3D 0x20
fault code              =3D supervisor read data, page not present
instruction pointer     =3D 0x20:0xffffffff81391fbe
stack pointer           =3D 0x0:0xfffffe0000457b10
frame pointer           =3D 0x0:0xfffffe0000457bb0
code segment            =3D base 0x0, limit 0xfffff, type 0x1b
                         =3D DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        =3D interrupt enabled, resume, IOPL =3D 0
current process         =3D 15 (arc_reclaim_thread)
trap number             =3D 12
panic: page fault
cpuid =3D 1
time =3D 1531394214
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000457=
7d0
vpanic() at vpanic+0x1a3/frame 0xfffffe0000457830
panic() at panic+0x43/frame 0xfffffe0000457890
trap_fatal() at trap_fatal+0x35f/frame 0xfffffe00004578e0
trap_pfault() at trap_pfault+0x62/frame 0xfffffe0000457930
trap() at trap+0x2ba/frame 0xfffffe0000457a40
calltrap() at calltrap+0x8/frame 0xfffffe0000457a40
--- trap 0xc, rip =3D 0xffffffff81391fbe, rsp =3D 0xfffffe0000457b10, rbp=
=20=20
=3D 0xfffffe0000457bb0 ---
arc_reclaim_thread() at arc_reclaim_thread+0x42e/frame 0xfffffe0000457bb0
fork_exit() at fork_exit+0x84/frame 0xfffffe0000457bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000457bf0
--- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 ---
Uptime: 38m3s
Dumping 2378 out of 8037 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..9=
1%

__curthread () at ./machine/pcpu.h:230
230             __asm("movq %%gs:%1,%0" : "=3Dr" (td)
(kgdb) bt
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=3D1) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80485e11 in kern_reboot (howto=3D260) at=20=20
/usr/src/sys/kern/kern_shutdown.c:446
#3=20 0xffffffff804863f3 in vpanic (fmt=3D<optimized out>, ap=3D0xfffffe000=
0457870)
     at /usr/src/sys/kern/kern_shutdown.c:863
#4  0xffffffff80486443 in panic (fmt=3D<unavailable>) at=20=20
/usr/src/sys/kern/kern_shutdown.c:790
#5=20 0xffffffff8075279f in trap_fatal (frame=3D0xfffffe0000457a50,=20=20
eva=3D32) at /usr/src/sys/amd64/amd64/trap.c:892
#6  0xffffffff80752812 in trap_pfault (frame=3D0xfffffe0000457a50,=20=20
usermode=3D<optimized out>)
     at /usr/src/sys/amd64/amd64/trap.c:728
#7  0xffffffff80751e1a in trap (frame=3D0xfffffe0000457a50) at=20=20
/usr/src/sys/amd64/amd64/trap.c:427
#8=20 <signal handler called>
#9  0xffffffff81391fbe in arc_check_uma_cache (lowest=3D-1011712)
     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532
#10 arc_reclaim_thread (unused=3D<optimized out>)
     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4657
#11 0xffffffff8044ca74 in fork_exit (callout=3D0xffffffff81391b90=20=20
<arc_reclaim_thread>,=20arg=3D0x0,
     frame=3D0xfffffe0000457c00) at /usr/src/sys/kern/kern_fork.c:1057
#12 <signal handler called>
(kgdb) up 9
#9  0xffffffff81391fbe in arc_check_uma_cache (lowest=3D-1011712)
     at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532
4532                    lowest +=3D=20=20
uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone);
(kgdb)=20list
4527            int                     iter =3D 4;
4528            int                     step =3D 1 << (SPA_MAXBLOCKSHIFT=20=
=20
-=20SPA_MINBLOCKSHIFT - 3);
4529            int                     n =3D (SPA_MAXBLOCKSIZE >>=20=20
SPA_MINBLOCKSHIFT)=20- 1;
4530
4531            while (n >=3D 0) {
4532                    lowest +=3D=20=20
uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone);
4533=20                   if (lowest >=3D 0)
4534                            return lowest;
4535                    n -=3D step;
4536                    if(--iter =3D=3D 0) {
(kgdb) print n
$1 =3D 32767
(kgdb) print zio_data_buf_cache[n]
$2 =3D (kmem_cache_t *) 0x0
(kgdb)

---snip---

Bye,
Alexander.

--=20
http://www.Leidinger.net=20Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.org    netchild@FreeBSD.org  : PGP 0x8F31830F9F2772BF

--=_2USRmY-6FpJfuASgA9OC6Zt
Content-Type: application/pgp-signature
Content-Description: Digitale PGP-Signatur
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAABAgAGBQJbR0y1AAoJEKrxQhqFIICE5ZcP/2osqMauBIm9L+hZyCFGt8Nq
j0WZ6OD+FeqW9OzJN+ucmyTTCERRxfi1sYUWn9TIpmvjBy33HBMb98+J6Wldk0Vg
WbmGBIzDaI095e+H6wHF0GpPEVgvUvIMwkgYSpLGfty8m05fbsqAhs/uE+JX7UGF
JO/YVvicUa0NjWLS1Hz7BECpRgyaNPClALouT8FTY88lZFFfNB4oqx6uQS2HARKV
OztSJhhgC8X9pddNK0TOFPv2xsq08E6F24U718UvdvRLwy30r8DYB2EO/hz/oLhe
djAB2waNTXF5jd2q0pTlq4Jfd5kEeTKpspEmRCZBZ78ZOLHlpUm3Nbt7NalZvyEU
TOtxdORfiRkqJ1PZY6vvK79QObbw5QzccE+VCId2HuZejuC7qlANIUQzhGp0TKnD
ZHexRTPNsr2aAr/3phTdL9qBYObwEbQIggyyNQhVczrSSEjLPKObMJ/lJ9j6dStl
SKe8L2W3mAc790rte4OuMsUk7B3s6/129WbHmrd6ksj/SXZDNxSjti/tEE6zsAf0
RLzKQGACQKr9RaKZVH3TU5i62ZfwiveN+rHOAjB4auoAKS7egtXnL+q2rFYwB45c
zYgHuFs6ihRccnuJfxHTqjHytYfAZ3TljVeWdsbvnGJxVjKtRKjsSXpEIrN3jDEm
0Qxn/v6PbDu+4TdQ9l5e
=1F0l
-----END PGP SIGNATURE-----

--=_2USRmY-6FpJfuASgA9OC6Zt--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180712144229.Horde.D_-hM4wiiKjbsuR-VROmvkZ>