Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 01 Nov 2020 10:04:40 +0000
From:      bugzilla-noreply@freebsd.org
To:        virtualization@FreeBSD.org
Subject:   [Bug 250770] AWS EC2 system freezes up possibly associated with NFS (EFS)
Message-ID:  <bug-250770-27103-2WLx5yuavY@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-250770-27103@https.bugs.freebsd.org/bugzilla/>
References:  <bug-250770-27103@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D250770

--- Comment #1 from Gunther Schadow <raj@gusw.net> ---
Note, this is not like the ENA kernel panic nor the other AWS EC2 freeze on=
 t3
bug (apparently fixed in 12.1-RELENG) nor does the presence or absence of t=
he
"intsmb0: Could not allocate I/O space" error during boot (open since 2018)
make any difference. So this is not a duplicate of those bugs.

sysctl -a |...
kern.features.nfsd: 1
kern.features.nfscl: 1
vfs.nfs.downdelayinitial: 12
vfs.nfs.downdelayinterval: 30
vfs.nfs.defect: 0
vfs.nfs.iodmax: 20
vfs.nfs.iodmin: 0
vfs.nfs.iodmaxidle: 120
vfs.nfs.use_buf_pager: 1
vfs.nfs.fileid_maxwarnings: 10
vfs.nfs.diskless_rootpath:
vfs.nfs.diskless_valid: 0
vfs.nfs.nfs_ip_paranoia: 1
vfs.nfs.nfs_directio_allow_mmap: 1
vfs.nfs.nfs_keep_dirty_on_error: 0
vfs.nfs.nfs_directio_enable: 0
vfs.nfs.clean_pages_on_close: 1
vfs.nfs.commit_on_close: 0
vfs.nfs.prime_access_cache: 0
vfs.nfs.access_cache_timeout: 60
vfs.nfs.dssameconn: 0
vfs.nfs.ignore_eexist: 0
vfs.nfs.pnfsiothreads: -1
vfs.nfs.userhashsize: 100
vfs.nfs.debuglevel: 0
vfs.nfs.callback_addr:
vfs.nfs.realign_count: 0
vfs.nfs.realign_test: 0
vfs.nfs.pnfsmirror: 1
vfs.nfs.enable_uidtostring: 0
vfs.nfs.dsretries: 2
vfs.nfs.skip_wcc_data_onerr: 1
vfs.nfs.nfs3_jukebox_delay: 10
vfs.nfs.reconnects: 0
vfs.nfs.bufpackets: 4

searching for ways to do some debug logging, should I enable this?

debug.fail_point.status_nfscl_force_fileid_warning: off
debug.fail_point.nfscl_force_fileid_warning: off

How might I log NFS warnings? Current setup with out of the box unchanged
/etc/syslog.conf

# $FreeBSD: releng/12.2/usr.sbin/syslogd/syslog.conf 338146 2018-08-21
17:01:47Z brd $
#
#       Spaces ARE valid field separators in this file. However,
#       other *nix-like systems still insist on using tabs as field
#       separators. If you are sharing this file between systems, you
#       may want to use only tabs as field separators here.
#       Consult the syslog.conf(5) manpage.
*.err;kern.warning;auth.notice;mail.crit                /dev/console
*.notice;authpriv.none;kern.debug;lpr.info;mail.crit;news.err=20=20
/var/log/messages
security.*                                      /var/log/security
auth.info;authpriv.info                         /var/log/auth.log
mail.info                                       /var/log/maillog
cron.*                                          /var/log/cron
!-devd
*.=3Ddebug                                        /var/log/debug.log
*.emerg                                         *
# uncomment this to log all writes to /dev/console to /var/log/console.log
# touch /var/log/console.log and chmod it to mode 600 before it will work
#console.info                                   /var/log/console.log
# uncomment this to enable logging of all log messages to /var/log/all.log
# touch /var/log/all.log and chmod it to mode 600 before it will work
#*.*                                            /var/log/all.log
# uncomment this to enable logging to a remote loghost named loghost
#*.*                                            @loghost
# uncomment these if you're running inn
# news.crit                                     /var/log/news/news.crit
# news.err                                      /var/log/news/news.err
# news.notice                                   /var/log/news/news.notice
# Uncomment this if you wish to see messages produced by devd
# !devd
# *.>=3Dnotice                                    /var/log/devd.log
!*
include                                         /etc/syslog.d
include                                         /usr/local/etc/syslog.d

in /var/log/messages you see the reboot without any issues reported:

Nov  1 04:00:13 freebsd ec2[777]:
#############################################################
...
Nov  1 05:49:13 freebsd su[1602]: ec2-user to root on /dev/pts/5
Nov  1 08:56:09 freebsd syslogd: exiting on signal 15
Nov  1 08:59:01 freebsd syslogd: kernel boot file is /boot/kernel/kernel
Nov  1 08:59:01 freebsd kernel: ---<<BOOT>>---

not sure why syslogd is exiting on signal 15, that might have to do wit the
lock-up, but more likely has to do with my force-stop action on the AWS
console. Here is the earlier lock-up sequence:

Oct 31 23:58:12 freebsd su[888]: ec2-user to root on /dev/pts/2
Oct 31 23:58:23 freebsd su[891]: ec2-user to root on /dev/pts/1
Oct 31 23:58:43 freebsd fsck[861]: /dev/gpt/varfs: 550 files, 1650 used, 12=
5234
free (106 frags, 15641 blocks, 0.1% fragmentation)
Nov  1 03:48:09 freebsd syslogd: exiting on signal 15
Nov  1 03:50:35 freebsd syslogd: kernel boot file is /boot/kernel/kernel
Nov  1 03:50:35 freebsd kernel: ---<<BOOT>>---

about 4 hours like I estimated, and the "syslogd: exiting on signal 15" is =
just
too close to the next boot that I don't think it is a herald of the lock-up,
but rather it happens when I do the force-stop on the AWS EC2 management
dashboard. And this would prove that we are not losing log messages due to =
the
lock-up. If the system had anything to say, it would have said it!

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-250770-27103-2WLx5yuavY>