Date: Thu, 12 Jul 2018 14:42:29 +0200 From: Alexander Leidinger <Alexander@leidinger.net> To: freebsd-current@freebsd.org Subject: Re: Deadlocks / hangs in ZFS Message-ID: <20180712144229.Horde.D_-hM4wiiKjbsuR-VROmvkZ@webmail.leidinger.net> In-Reply-To: <20180604223108.Horde.RcVquaVKWdNzNidD_5aJz7E@webmail.leidinger.net> References: <20180522101749.Horde.Wxz9gSxx1xArxkYMQqTL0iZ@webmail.leidinger.net> <fa263af4-9bf7-88f8-8d23-21456daf7960@FreeBSD.org> <20180522122924.GC1954@zxy.spb.ru> <20180522161632.Horde.ROSnBoZixBoE9ZBGp5VBQgZ@webmail.leidinger.net> <20180522144055.GD1954@zxy.spb.ru> <20180527194159.v54ox3vlthpuvx4q@jo> <20180527220612.GK1926@zxy.spb.ru> <20180528090201.Horde._E4JZcuEaZHfj_BNzWjci2O@webmail.leidinger.net> <20180603211450.Horde.pI-Fom6S1tUcaHvTF4MUjin@webmail.leidinger.net> <20180603192814.GP1926@zxy.spb.ru> <20180604223108.Horde.RcVquaVKWdNzNidD_5aJz7E@webmail.leidinger.net>
next in thread | previous in thread | raw e-mail | index | archive | help
This message is in MIME format and has been PGP signed. --=_2USRmY-6FpJfuASgA9OC6Zt Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Quoting Alexander Leidinger <Alexander@leidinger.net> (from Mon, 04=20=20 Jun=202018 22:31:08 +0200): > Quoting Slawa Olhovchenkov <slw@zxy.spb.ru> (from Sun, 3 Jun 2018=20=20 >=2022:28:14 +0300): > >> On Sun, Jun 03, 2018 at 09:14:50PM +0200, Alexander Leidinger wrote: >> >>> Quoting Alexander Leidinger <Alexander@leidinger.net> (from Mon, 28 >>> May 2018 09:02:01 +0200): >>> >>>> Quoting Slawa Olhovchenkov <slw@zxy.spb.ru> (from Mon, 28 May 2018 >>>> 01:06:12 +0300): >>>> >>>>> On Sun, May 27, 2018 at 09:41:59PM +0200, Kirill Ponomarev wrote: >>>>> >>>>>> On 05/22, Slawa Olhovchenkov wrote: >>>>>>> > It has been a while since I tried Karl's patch the last time, and= I >>>>>>> > stopped because it didn't apply to -current anymore at some point= . >>>>>>> > Will what is provided right now in the patch work on -current? >>>>>>> >>>>>>> I am mean yes, after s/vm_cnt.v_free_count/vm_free_count()/g >>>>>>> I am don't know how to have two distinct patch (for stable and >>>>>>> current) in one review. >>>>>> >>>>>> I'm experiencing these issues sporadically as well, would you mind >>>>>> to publish this patch for fresh current? >>>>> >>>>> Week ago I am adopt and publish patch to fresh current and stable, is >>>>> adopt need again? >>>> >>>> I applied the patch in the review yesterday to rev 333966, it >>>> applied OK (with some fuzz). I will try to reproduce my issue with >>>> the patch. >>> >>> The behavior changed (or the system was long enough in this state >>> without me noticing it). I have a panic now: >>> panic: deadlkres: possible deadlock detected for 0xfffff803766db580, >>> blocked for 1803003 ticks >> >> Hmm, may be first determinate locked function >> >> addr2line -ie /boot/kernel/kernel 0xfffff803766db580 >> >> or >> >> kgdb >> x/10i 0xfffff803766db580 > > Both don'T produce any sensible output: > (kgdb) x/10i 0xfffff803766db580 > 0xfffff803766db580: subb $0x80,-0x78(%rsi) > 0xfffff803766db584: (bad) > 0xfffff803766db585: (bad) > 0xfffff803766db586: (bad) > 0xfffff803766db587: incl -0x7f7792(%rax) > 0xfffff803766db58d: (bad) > 0xfffff803766db58e: (bad) > 0xfffff803766db58f: incl -0x7f7792(%rax) > 0xfffff803766db595: (bad) > 0xfffff803766db596: (bad) > > > Seems I need to provoke a real kernel dump instead of a textdump for this= . Finally... time to recompile the kernel with crashdump-compress=20=20 support=20and changing from textdump to normal dump and to install a=20=20 recent=20gdb from ports... The dump is with r336194 and the zfs patch as of 20180527. ---snip--- # kgdb -c /var/crash/vmcore.2 /boot/kernel/kernel GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD] Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.htm= l> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd12.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel...Reading symbols from=20=20 /usr/lib/debug//boot/kernel/kernel.debug...done. done. Unread=20portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid =3D 1; apic id =3D 01 fault virtual address =3D 0x20 fault code =3D supervisor read data, page not present instruction pointer =3D 0x20:0xffffffff81391fbe stack pointer =3D 0x0:0xfffffe0000457b10 frame pointer =3D 0x0:0xfffffe0000457bb0 code segment =3D base 0x0, limit 0xfffff, type 0x1b =3D DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags =3D interrupt enabled, resume, IOPL =3D 0 current process =3D 15 (arc_reclaim_thread) trap number =3D 12 panic: page fault cpuid =3D 1 time =3D 1531394214 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0000457= 7d0 vpanic() at vpanic+0x1a3/frame 0xfffffe0000457830 panic() at panic+0x43/frame 0xfffffe0000457890 trap_fatal() at trap_fatal+0x35f/frame 0xfffffe00004578e0 trap_pfault() at trap_pfault+0x62/frame 0xfffffe0000457930 trap() at trap+0x2ba/frame 0xfffffe0000457a40 calltrap() at calltrap+0x8/frame 0xfffffe0000457a40 --- trap 0xc, rip =3D 0xffffffff81391fbe, rsp =3D 0xfffffe0000457b10, rbp= =20=20 =3D 0xfffffe0000457bb0 --- arc_reclaim_thread() at arc_reclaim_thread+0x42e/frame 0xfffffe0000457bb0 fork_exit() at fork_exit+0x84/frame 0xfffffe0000457bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0000457bf0 --- trap 0, rip =3D 0, rsp =3D 0, rbp =3D 0 --- Uptime: 38m3s Dumping 2378 out of 8037 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..9= 1% __curthread () at ./machine/pcpu.h:230 230 __asm("movq %%gs:%1,%0" : "=3Dr" (td) (kgdb) bt #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=3D1) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80485e11 in kern_reboot (howto=3D260) at=20=20 /usr/src/sys/kern/kern_shutdown.c:446 #3=20 0xffffffff804863f3 in vpanic (fmt=3D<optimized out>, ap=3D0xfffffe000= 0457870) at /usr/src/sys/kern/kern_shutdown.c:863 #4 0xffffffff80486443 in panic (fmt=3D<unavailable>) at=20=20 /usr/src/sys/kern/kern_shutdown.c:790 #5=20 0xffffffff8075279f in trap_fatal (frame=3D0xfffffe0000457a50,=20=20 eva=3D32) at /usr/src/sys/amd64/amd64/trap.c:892 #6 0xffffffff80752812 in trap_pfault (frame=3D0xfffffe0000457a50,=20=20 usermode=3D<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:728 #7 0xffffffff80751e1a in trap (frame=3D0xfffffe0000457a50) at=20=20 /usr/src/sys/amd64/amd64/trap.c:427 #8=20 <signal handler called> #9 0xffffffff81391fbe in arc_check_uma_cache (lowest=3D-1011712) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532 #10 arc_reclaim_thread (unused=3D<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4657 #11 0xffffffff8044ca74 in fork_exit (callout=3D0xffffffff81391b90=20=20 <arc_reclaim_thread>,=20arg=3D0x0, frame=3D0xfffffe0000457c00) at /usr/src/sys/kern/kern_fork.c:1057 #12 <signal handler called> (kgdb) up 9 #9 0xffffffff81391fbe in arc_check_uma_cache (lowest=3D-1011712) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:4532 4532 lowest +=3D=20=20 uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone); (kgdb)=20list 4527 int iter =3D 4; 4528 int step =3D 1 << (SPA_MAXBLOCKSHIFT=20= =20 -=20SPA_MINBLOCKSHIFT - 3); 4529 int n =3D (SPA_MAXBLOCKSIZE >>=20=20 SPA_MINBLOCKSHIFT)=20- 1; 4530 4531 while (n >=3D 0) { 4532 lowest +=3D=20=20 uma_zone_get_free_size(zio_data_buf_cache[n]->kc_zone); 4533=20 if (lowest >=3D 0) 4534 return lowest; 4535 n -=3D step; 4536 if(--iter =3D=3D 0) { (kgdb) print n $1 =3D 32767 (kgdb) print zio_data_buf_cache[n] $2 =3D (kmem_cache_t *) 0x0 (kgdb) ---snip--- Bye, Alexander. --=20 http://www.Leidinger.net=20Alexander@Leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netchild@FreeBSD.org : PGP 0x8F31830F9F2772BF --=_2USRmY-6FpJfuASgA9OC6Zt Content-Type: application/pgp-signature Content-Description: Digitale PGP-Signatur Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJbR0y1AAoJEKrxQhqFIICE5ZcP/2osqMauBIm9L+hZyCFGt8Nq j0WZ6OD+FeqW9OzJN+ucmyTTCERRxfi1sYUWn9TIpmvjBy33HBMb98+J6Wldk0Vg WbmGBIzDaI095e+H6wHF0GpPEVgvUvIMwkgYSpLGfty8m05fbsqAhs/uE+JX7UGF JO/YVvicUa0NjWLS1Hz7BECpRgyaNPClALouT8FTY88lZFFfNB4oqx6uQS2HARKV OztSJhhgC8X9pddNK0TOFPv2xsq08E6F24U718UvdvRLwy30r8DYB2EO/hz/oLhe djAB2waNTXF5jd2q0pTlq4Jfd5kEeTKpspEmRCZBZ78ZOLHlpUm3Nbt7NalZvyEU TOtxdORfiRkqJ1PZY6vvK79QObbw5QzccE+VCId2HuZejuC7qlANIUQzhGp0TKnD ZHexRTPNsr2aAr/3phTdL9qBYObwEbQIggyyNQhVczrSSEjLPKObMJ/lJ9j6dStl SKe8L2W3mAc790rte4OuMsUk7B3s6/129WbHmrd6ksj/SXZDNxSjti/tEE6zsAf0 RLzKQGACQKr9RaKZVH3TU5i62ZfwiveN+rHOAjB4auoAKS7egtXnL+q2rFYwB45c zYgHuFs6ihRccnuJfxHTqjHytYfAZ3TljVeWdsbvnGJxVjKtRKjsSXpEIrN3jDEm 0Qxn/v6PbDu+4TdQ9l5e =1F0l -----END PGP SIGNATURE----- --=_2USRmY-6FpJfuASgA9OC6Zt--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20180712144229.Horde.D_-hM4wiiKjbsuR-VROmvkZ>