From owner-freebsd-current@freebsd.org Sun Sep 6 09:02:52 2020 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 8DC663CAC4F for ; Sun, 6 Sep 2020 09:02:52 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from dec.sakura.ne.jp (dec.sakura.ne.jp [210.188.226.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Bklmq2DGVz4dS3 for ; Sun, 6 Sep 2020 09:02:50 +0000 (UTC) (envelope-from junchoon@dec.sakura.ne.jp) Received: from kalamity.joker.local (180-198-4-200.nagoya1.commufa.jp [180.198.4.200]) (authenticated bits=0) by dec.sakura.ne.jp (8.15.2/8.15.2/[SAKURA-WEB]/20080708) with ESMTPA id 08692euf034832 for ; Sun, 6 Sep 2020 18:02:40 +0900 (JST) (envelope-from junchoon@dec.sakura.ne.jp) Date: Sun, 6 Sep 2020 18:02:40 +0900 From: Tomoaki AOKI To: freebsd-current@freebsd.org Subject: Re: Fatal trap 18 on boot after OpenZFS import Message-Id: <20200906180240.e61a2869b1258f96c3e7d398@dec.sakura.ne.jp> In-Reply-To: <20200904220301.7fac6b4008f1bc7ad8d803c9@dec.sakura.ne.jp> References: <20200904220301.7fac6b4008f1bc7ad8d803c9@dec.sakura.ne.jp> Reply-To: junchoon@dec.sakura.ne.jp Organization: Junchoon corps X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.32; amd64-portbld-freebsd12.1) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 4Bklmq2DGVz4dS3 X-Spamd-Bar: ++ X-Spamd-Result: default: False [2.27 / 15.00]; HAS_REPLYTO(0.00)[junchoon@dec.sakura.ne.jp]; RCVD_VIA_SMTP_AUTH(0.00)[]; MV_CASE(0.50)[]; REPLYTO_ADDR_EQ_FROM(0.00)[]; TO_DN_NONE(0.00)[]; HAS_ORG_HEADER(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:9370, ipnet:210.188.224.0/19, country:JP]; MID_RHS_MATCH_FROM(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[180.198.4.200:received]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(0.29)[0.290]; MIME_GOOD(-0.10)[text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; AUTH_NA(1.00)[]; NEURAL_SPAM_MEDIUM(0.26)[0.256]; RCPT_COUNT_ONE(0.00)[1]; DMARC_NA(0.00)[sakura.ne.jp]; NEURAL_SPAM_LONG(0.33)[0.327]; MIME_TRACE(0.00)[0:+]; R_SPF_NA(0.00)[no SPF record]; RCVD_COUNT_TWO(0.00)[2]; MAILMAN_DEST(0.00)[freebsd-current] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.33 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 06 Sep 2020 09:02:52 -0000 Filed PR. Bug 249147 - [ZFS][Panic]Fatal trap 18 on boot after OpenZFS import https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249147 On Fri, 4 Sep 2020 22:03:01 +0900 Tomoaki AOKI wrote: > Hi. > > Encountering boot failure with fatal trap 18 on boot, > happening at (maybe) just before init() starts. Possibly on > root remount by kernel or zpool import by rc.d script. > The last revision tried is r365316 (r364788 is the last tried > clean rebuild). > > The last health revision is r364744, just before actual switch > to OpenZFS. amd64 on ThinkPad P52 (Core i7-8750H) w/descrete nvidia GPU. > > r364751 with diff of r364777 and r364788 (to successfully built > Without unrelated-to-OpenZFS changes) fails. > > Any suggestions and fixes are appreciated. > > > Trap screen is something like below (text attached), > typed up from relatively clear photo, so could be some typo. > > This is shown just after usual kernel startup outputs. > boot1.efi (as EFI/bootx64.efi on ESP) starts /boot/loader.efi > properly, and loader.efi seems to boot kernel properly. > > As even single user shell selection doesn't appear, loader.efi > is of r364744. But they works even if I proceeded irregular > process, > > 1)Update src tree > 2)Clean obj tree > 3)buildworld > 4)etcupdate -p > 5)buildkernel > 6)installkernel > 7)shutdown to single user WITHOUT reboot <- Irregular! > 8)installworld > 9)etcupdate > 10)rebuild src/sys-dependent ports (kmods, nvidia-driver, ...) > 11)reboot > > loader.efi looks doing its job and panics after kernel startup ends. > Needless to say, rolling back to r364744 state from stable/12 on nvd0 > Fixes the issue. > > Regards. > > ===== > > Fatal trap 18: integer divide fault while in kernel mode > cpuid = 2; apic id = 02 > instruction pointer = 0x20:0xffffffff82bfa320 > stack pointer = 0x28:0xfffffe00e20c6900 > frame pointer = 0x28:0xfffffe00e20c6960 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 27 (vdev_open) > trap number = 18 > panic: integer divide fault > cpuid = 2 > time = 16 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame > 0xfffffe00e20c6610 vpanic() at vpanic+0x182/frame fffffe00e20c6660 > panic() at panic+0x43/frame fffffe00e20c66c0 > trap_fatal() at trap_fatal+0x387/frame fffffe00e20c6720 > trap() at trap+0x8e/frame fffffe00e20c6830 > calltrap() at calltrap+0x8/frame fffffe00e20c6830 > --- trap 0x12, rip = 0xffffffff82bfa320, rsp = 0xfffffe00e20c6900, rbp > = 0xfffffe00e20c6960 --- zio_wait() at zio_wait+0x60/frame > 0xfffffe00e20c6960 vdev_open() at vdev_open+0x74d/frame > 0xfffffe00e20c69c0 vdev_open_child() at vdev_open_child+0x1e/frame > 0xfffffe00e20c69e0 taskq_run() at taskq_run+0x1f/frame > 0xfffffe00e20c6a00 taskqueue_run_locked() at > taskqueue_run_locked+0x181/frame 0xfffffe00e20c6a80 > taskqueue_thread_loop() at taskqueue_thread_loop+0x118/frame > 0xfffffe00e20c6ab0 fork_exit() at fork_exit+0x7d/frame > 0xfffffe00e20c6af0 fork_trampoline() at fork_trampoline+0xe/frame > 0xfffffe00e20c6af0 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic > [ thread pid 27 tid 100570 ] > Stopped at kdb_enter+0x37: movq $0,0x1091556(%rip) > db> > > ===== > > Additional info: > *Clean build with killing CPUTYPE from command line and > make.conf (so should be equivalent with nocona) didn't help. > > *Clean build with commenting out WITH_KERNEL_RETPOLINE line > and WITH_RETPOLINE line in src.conf didn't help. > > *Combination of the above two didn't help, too (at r364788). > > *There are two root pools in different physical drive. > stable/12 on nvd0 (primary) and head on ada0 (secondary). > > *GENERIC-NODEBUG based (added options CAM_IOSCHED_DYNAMIC) > kernel. > > -- > Tomoaki AOKI -- Tomoaki AOKI