Date: Fri, 24 Dec 2021 14:41:54 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 260664] FreeBSD randomly freeze or crash with nfs mount after some days or a month. Message-ID: <bug-260664-227@https.bugs.freebsd.org/bugzilla/>
next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D260664 Bug ID: 260664 Summary: FreeBSD randomly freeze or crash with nfs mount after some days or a month. Product: Base System Version: 13.0-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: bugs@FreeBSD.org Reporter: ypnow@163.com Hello everyone, I have a server running on FreeBSD 13.0, the server randomly freeze after some days or a month. Here is the phenomenon: 1. Unable to connect to the ssh, when input ssh command, no any response. 2. Alot of services can not be visit, some simple service like static nginx page can be opened in a short time, but if you refreshed page some times, t= he page will be stuck, and have no response, some other services is the same. 3. Another server has a always logged ssh to FreeBSD Server, and opened a t= op command, when FreeBSD freeze, this ssh can still visit and top command can refresh and output system status=EF=BC=8Cthe memory is normal, cpu usage is= normal, ZFS ARC is normal, swap is normal, clock is normal, looks like anything is norm= al. Any hot key for top can use, but when press q to quit top, and type other command, like "systat -ifstat", the command stuck, no any output, Ctrl + Z = or C no response. 4. Ping server always normal. 5. The redis-server on freebsd is normal, because redis service can response and very good. 6. Unable to login from console, when type username and password, press ent= er, no any output. Environment: FreeBSD 13.0=EF=BC=8CIntel Xeon 4Core + 16GB Memory, Two 2T Disk, ZFS Mirro= r, Root on ZFS. It's a new machine, it's been less than half a year since we bought it. Main system only running sshguard+ipfw, mount a nfs and use nullfs to a jai= l, jail file system running on zfs dataset clone, services all running in this jail. Server has two bge network interface, one for lan, one for wan, the service= s is network heavy service. In jail, running nginx, php-fpm, php cli server, mysql, redis-server, there= is alot of nfs write, read by php. Some try: At first it was suspected to be a ZFS ARC problem, and I set arc max to 2G,= but in top ARC is very normal.. When look at dmesg, or any log by system or services, every log stopped rec= ord when system freeze, means there is no any abnormal log.. but looks like some service that no need read or write file is normal. Some try 2: Before configure kern.ipc.somaxconn, when system hang, I can't login system, can't do any operate. But after change somaxconn, when system hang (worker processes freeze), I can still login to system, and do some operate that not touch the NFS mountpoint, I found that freeze because of nfs mount is dead. I found some same problems: https://emby.media/community/index.php?/topic/74175-freebsd-jail-with-nfsv4= -share-causes-system-to-hang/ https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D251347 https://redmine.ixsystems.com/issues/2068 Some try 3: I have move my program that read/write nfs out of jail, add intr to nfs mou= nt options, more system crash happened, and every crash stack trace is differe= nt, sometimes is arc_write, or nfscl, sometime is zio_execute, I now suspect th= at the possibility of hardware failure is quite high. - uname -a FreeBSD ppbsd 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:= 27 UTC 2021=20=20=20=20 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC= =20 amd64 - loader.conf kern.geom.label.disk_ident.enable=3D"0" kern.geom.label.gptid.enable=3D"0" cryptodev_load=3D"YES" zfs_load=3D"YES" coretemp_load=3D"YES" net.inet.ip.fw.default_to_accept=3D1 vfs.zfs.arc_max=3D"2G" # Increase dmesg buffer to fit longer boot output. kern.msgbufsize=3D"524288" - sysctl.conf # $FreeBSD$ # # This file is read when going to multi-user and its contents piped thru # ``sysctl'' to adjust kernel values. ``man 5 sysctl.conf'' for details. # # Uncomment this to prevent users from seeing information about processes t= hat # are being run under another UID. #security.bsd.see_other_uids=3D0 #vfs.zfs.min_auto_ashift=3D12 kern.ipc.somaxconn=3D4096 Here is the crash log 1: Dec 17 00:47:02 ppbsd kernel: Fatal trap 12: page fault while in kernel mode Dec 17 00:47:02 ppbsd kernel: cpuid =3D 2; apic id =3D 04 Dec 17 00:47:02 ppbsd kernel: fault virtual address =3D 0x28 Dec 17 00:47:02 ppbsd kernel: fault code =3D supervisor read= data, page not present Dec 17 00:47:02 ppbsd kernel: instruction pointer =3D 0x20:0xffffffff821495f8 Dec 17 00:47:02 ppbsd kernel: stack pointer =3D 0x0:0xfffffe010fae48d0 Dec 17 00:47:02 ppbsd kernel: frame pointer =3D 0x0:0xfffffe010fae48d0 Dec 17 00:47:02 ppbsd kernel: code segment =3D base 0x0, limit 0xfffff, type 0x1b Dec 17 00:47:02 ppbsd kernel: =3D DPL 0, pres 1, long 1, = def32 0, gran 1 Dec 17 00:47:02 ppbsd kernel: processor eflags =3D interrupt enabled, resu= me, IOPL =3D 0 Dec 17 00:47:02 ppbsd kernel: current process =3D 0 (z_wr_int_3) Dec 17 00:47:02 ppbsd kernel: trap number =3D 12 Dec 17 00:47:02 ppbsd kernel: panic: page fault Dec 17 00:47:02 ppbsd kernel: cpuid =3D 2 Dec 17 00:47:02 ppbsd kernel: time =3D 1639673010 Dec 17 00:47:02 ppbsd kernel: KDB: stack backtrace: Dec 17 00:47:02 ppbsd kernel: #0 0xffffffff80c574c5 at kdb_backtrace+0x65 Dec 17 00:47:02 ppbsd kernel: #1 0xffffffff80c09ea1 at vpanic+0x181 Dec 17 00:47:02 ppbsd kernel: #2 0xffffffff80c09d13 at panic+0x43 Dec 17 00:47:02 ppbsd kernel: #3 0xffffffff8108b1b7 at trap_fatal+0x387 Dec 17 00:47:02 ppbsd kernel: #4 0xffffffff8108b20f at trap_pfault+0x4f Dec 17 00:47:02 ppbsd kernel: #5 0xffffffff8108a86d at trap+0x27d Dec 17 00:47:02 ppbsd kernel: #6 0xffffffff81061958 at calltrap+0x8 Dec 17 00:47:02 ppbsd kernel: #7 0xffffffff821a4d3e at dbuf_write_done+0x9e Dec 17 00:47:02 ppbsd kernel: #8 0xffffffff82190c5c at arc_write_done+0x33c Dec 17 00:47:02 ppbsd kernel: #9 0xffffffff822f920d at zio_done+0xd9d Dec 17 00:47:02 ppbsd kernel: #10 0xffffffff822f2d5c at zio_execute+0x3c Dec 17 00:47:02 ppbsd kernel: #11 0xffffffff80c6b161 at taskqueue_run_locked+0x181 Dec 17 00:47:02 ppbsd kernel: #12 0xffffffff80c6c47c at taskqueue_thread_loop+0xac Dec 17 00:47:02 ppbsd kernel: #13 0xffffffff80bc7dde at fork_exit+0x7e Dec 17 00:47:02 ppbsd kernel: #14 0xffffffff810629de at fork_trampoline+0xe Dec 17 00:47:02 ppbsd kernel: Uptime: 1h44m58s Dec 17 00:47:02 ppbsd kernel: Dumping 2511 out of 16190 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>--- Crash log 2: Dec 21 14:09:50 ppbsd syslogd: kernel boot file is /boot/kernel/kernel Dec 21 14:09:50 ppbsd kernel:=C2=A0 Dec 21 14:09:50 ppbsd syslogd: last message repeated 1 times Dec 21 14:09:50 ppbsd kernel: Fatal trap 12: page fault while in kernel mode Dec 21 14:09:50 ppbsd kernel: cpuid =3D 2; apic id =3D 04 Dec 21 14:09:50 ppbsd kernel: fault virtual address =3D 0x0 Dec 21 14:09:50 ppbsd kernel: fault code =3D supervisor write data, page not present Dec 21 14:09:50 ppbsd kernel: instruction pointer =3D 0x20:0xffffffff80ac9e26 Dec 21 14:09:50 ppbsd kernel: stack pointer =C2=A0 =C2=A0 =C2=A0 =C2=A0= =3D 0x28:0xfffffe011bf165b0 Dec 21 14:09:50 ppbsd kernel: frame pointer =C2=A0 =C2=A0 =C2=A0 =C2=A0= =3D 0x28:0xfffffe011bf165f0 Dec 21 14:09:50 ppbsd kernel: code segment =3D base 0x0, limit 0xfffff, type 0x1b Dec 21 14:09:50 ppbsd kernel: =3D DPL 0, pres 1, long 1, = def32 0, gran 1 Dec 21 14:09:50 ppbsd kernel: processor eflags =3D interrupt enabled, resu= me, IOPL =3D 0 Dec 21 14:09:50 ppbsd kernel: current process =3D 4541 (newnfs 3) Dec 21 14:09:50 ppbsd kernel: trap number =3D 12 Dec 21 14:09:50 ppbsd kernel: panic: page fault Dec 21 14:09:50 ppbsd kernel: cpuid =3D 2 Dec 21 14:09:50 ppbsd kernel: time =3D 1640066765 Dec 21 14:09:50 ppbsd kernel: KDB: stack backtrace: Dec 21 14:09:50 ppbsd kernel: #0 0xffffffff80c574c5 at kdb_backtrace+0x65 Dec 21 14:09:50 ppbsd kernel: #1 0xffffffff80c09ea1 at vpanic+0x181 Dec 21 14:09:50 ppbsd kernel: #2 0xffffffff80c09d13 at panic+0x43 Dec 21 14:09:50 ppbsd kernel: #3 0xffffffff8108b1b7 at trap_fatal+0x387 Dec 21 14:09:50 ppbsd kernel: #4 0xffffffff8108b20f at trap_pfault+0x4f Dec 21 14:09:50 ppbsd kernel: #5 0xffffffff8108a86d at trap+0x27d Dec 21 14:09:50 ppbsd kernel: #6 0xffffffff81061958 at calltrap+0x8 Dec 21 14:09:50 ppbsd kernel: #7 0xffffffff80acc5d9 at nfscl_hasexpired+0x7= 09 Dec 21 14:09:50 ppbsd kernel: #8 0xffffffff80add066 at nfsrpc_read+0x316 Dec 21 14:09:50 ppbsd kernel: #9 0xffffffff80aee349 at ncl_readrpc+0x89 Dec 21 14:09:50 ppbsd kernel: #10 0xffffffff80b01443 at ncl_doio+0xe3 Dec 21 14:09:50 ppbsd kernel: #11 0xffffffff80b03b32 at nfssvc_iod+0x232 Dec 21 14:09:50 ppbsd kernel: #12 0xffffffff80bc7dde at fork_exit+0x7e Dec 21 14:09:50 ppbsd kernel: #13 0xffffffff810629de at fork_trampoline+0xe Dec 21 14:09:50 ppbsd kernel: Uptime: 2d17h24m52s Dec 21 14:09:50 ppbsd kernel: Dumping 3243 out of 16190 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>--- Crash log 3: Dec 22 16:30:19 ppbsd syslogd: kernel boot file is /boot/kernel/kernel Dec 22 16:30:19 ppbsd kernel:=C2=A0 Dec 22 16:30:19 ppbsd syslogd: last message repeated 1 times Dec 22 16:30:19 ppbsd kernel: Fatal trap 12: page fault while in kernel mode Dec 22 16:30:19 ppbsd kernel: cpuid =3D 2; apic id =3D 04 Dec 22 16:30:19 ppbsd kernel: fault virtual address =3D 0x2000 Dec 22 16:30:19 ppbsd kernel: fault code =3D supervisor read instruction, page not present Dec 22 16:30:19 ppbsd kernel: instruction pointer =3D 0x20:0x2000 Dec 22 16:30:19 ppbsd kernel: stack pointer =C2=A0 =C2=A0 =C2=A0 =C2=A0= =3D 0x28:0xfffffe010f9a4998 Dec 22 16:30:19 ppbsd kernel: frame pointer =C2=A0 =C2=A0 =C2=A0 =C2=A0= =3D 0x28:0xfffffe010f9a49d0 Dec 22 16:30:19 ppbsd kernel: code segment =3D base 0x0, limit 0xfffff, type 0x1b Dec 22 16:30:19 ppbsd kernel: =3D DPL 0, pres 1, long 1, = def32 0, gran 1 Dec 22 16:30:19 ppbsd kernel: processor eflags =3D interrupt enabled, resu= me, IOPL =3D 0 Dec 22 16:30:19 ppbsd kernel: current process =3D 0 (z_wr_int_2) Dec 22 16:30:19 ppbsd kernel: trap number =3D 12 Dec 22 16:30:19 ppbsd kernel: panic: page fault Dec 22 16:30:19 ppbsd kernel: cpuid =3D 2 Dec 22 16:30:19 ppbsd kernel: time =3D 1640161595 Dec 22 16:30:19 ppbsd kernel: KDB: stack backtrace: Dec 22 16:30:19 ppbsd kernel: #0 0xffffffff80c574c5 at kdb_backtrace+0x65 Dec 22 16:30:19 ppbsd kernel: #1 0xffffffff80c09ea1 at vpanic+0x181 Dec 22 16:30:19 ppbsd kernel: #2 0xffffffff80c09d13 at panic+0x43 Dec 22 16:30:19 ppbsd kernel: #3 0xffffffff8108b1b7 at trap_fatal+0x387 Dec 22 16:30:19 ppbsd kernel: #4 0xffffffff8108b20f at trap_pfault+0x4f Dec 22 16:30:19 ppbsd kernel: #5 0xffffffff8108a86d at trap+0x27d Dec 22 16:30:19 ppbsd kernel: #6 0xffffffff81061958 at calltrap+0x8 Dec 22 16:30:19 ppbsd kernel: #7 0xffffffff822e2d5c at zio_execute+0x3c Dec 22 16:30:19 ppbsd kernel: #8 0xffffffff80c6b161 at taskqueue_run_locked+0x181 Dec 22 16:30:19 ppbsd kernel: #9 0xffffffff80c6c47c at taskqueue_thread_loop+0xac Dec 22 16:30:19 ppbsd kernel: #10 0xffffffff80bc7dde at fork_exit+0x7e Dec 22 16:30:19 ppbsd kernel: #11 0xffffffff810629de at fork_trampoline+0xe Dec 22 16:30:19 ppbsd kernel: Uptime: 23h2m59s Dec 22 16:30:19 ppbsd kernel: Dumping 3138 out of 16190 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>--- --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-260664-227>