Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 6 Sep 2020 18:02:40 +0900
From:      Tomoaki AOKI <junchoon@dec.sakura.ne.jp>
To:        freebsd-current@freebsd.org
Subject:   Re: Fatal trap 18 on boot after OpenZFS import
Message-ID:  <20200906180240.e61a2869b1258f96c3e7d398@dec.sakura.ne.jp>
In-Reply-To: <20200904220301.7fac6b4008f1bc7ad8d803c9@dec.sakura.ne.jp>
References:  <20200904220301.7fac6b4008f1bc7ad8d803c9@dec.sakura.ne.jp>

next in thread | previous in thread | raw e-mail | index | archive | help
Filed PR.
Bug 249147 - [ZFS][Panic]Fatal trap 18 on boot after OpenZFS import

 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249147


On Fri, 4 Sep 2020 22:03:01 +0900
Tomoaki AOKI <junchoon@dec.sakura.ne.jp> wrote:

> Hi.
> 
> Encountering boot failure with fatal trap 18 on boot,
> happening at (maybe) just before init() starts. Possibly on
> root remount by kernel or zpool import by rc.d script.
> The last revision tried is r365316 (r364788 is the last tried
> clean rebuild).
> 
> The last health revision is r364744, just before actual switch
> to OpenZFS. amd64 on ThinkPad P52 (Core i7-8750H) w/descrete nvidia GPU.
> 
> r364751 with diff of r364777 and r364788 (to successfully built
> Without unrelated-to-OpenZFS changes) fails.
> 
> Any suggestions and fixes are appreciated.
> 
> 
> Trap screen is something like below (text attached),
> typed up from relatively clear photo, so could be some typo.
> 
> This is shown just after usual kernel startup outputs.
> boot1.efi (as EFI/bootx64.efi on ESP) starts /boot/loader.efi
> properly, and loader.efi seems to boot kernel properly.
> 
> As even single user shell selection doesn't appear, loader.efi
> is of r364744. But they works even if I proceeded irregular
> process,
> 
>   1)Update src tree
>   2)Clean obj tree
>   3)buildworld
>   4)etcupdate -p
>   5)buildkernel
>   6)installkernel
>   7)shutdown to single user WITHOUT reboot  <- Irregular!
>   8)installworld
>   9)etcupdate
>  10)rebuild src/sys-dependent ports (kmods, nvidia-driver, ...)
>  11)reboot
> 
> loader.efi looks doing its job and panics after kernel startup ends.
> Needless to say, rolling back to r364744 state from stable/12 on nvd0
> Fixes the issue.
> 
> Regards.
> 
> =====
> 
> Fatal trap 18: integer divide fault while in kernel mode
> cpuid = 2; apic id = 02
> instruction pointer     = 0x20:0xffffffff82bfa320
> stack pointer           = 0x28:0xfffffe00e20c6900
> frame pointer           = 0x28:0xfffffe00e20c6960
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 27 (vdev_open)
> trap number             = 18
> panic: integer divide fault
> cpuid = 2
> time = 16
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe00e20c6610 vpanic() at vpanic+0x182/frame fffffe00e20c6660
> panic() at panic+0x43/frame fffffe00e20c66c0
> trap_fatal() at trap_fatal+0x387/frame fffffe00e20c6720
> trap() at trap+0x8e/frame fffffe00e20c6830
> calltrap() at calltrap+0x8/frame fffffe00e20c6830
> --- trap 0x12, rip = 0xffffffff82bfa320, rsp = 0xfffffe00e20c6900, rbp
> = 0xfffffe00e20c6960 --- zio_wait() at zio_wait+0x60/frame
> 0xfffffe00e20c6960 vdev_open() at vdev_open+0x74d/frame
> 0xfffffe00e20c69c0 vdev_open_child() at vdev_open_child+0x1e/frame
> 0xfffffe00e20c69e0 taskq_run() at taskq_run+0x1f/frame
> 0xfffffe00e20c6a00 taskqueue_run_locked() at
> taskqueue_run_locked+0x181/frame 0xfffffe00e20c6a80
> taskqueue_thread_loop() at taskqueue_thread_loop+0x118/frame
> 0xfffffe00e20c6ab0 fork_exit() at fork_exit+0x7d/frame
> 0xfffffe00e20c6af0 fork_trampoline() at fork_trampoline+0xe/frame
> 0xfffffe00e20c6af0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 27 tid 100570 ]
> Stopped at      kdb_enter+0x37: movq    $0,0x1091556(%rip)
> db> 
> 
> =====
> 
> Additional info:
>  *Clean build with killing CPUTYPE from command line and
>   make.conf (so should be equivalent with nocona) didn't help.
> 
>  *Clean build with commenting out WITH_KERNEL_RETPOLINE line
>   and WITH_RETPOLINE line in src.conf didn't help.
> 
>  *Combination of the above two didn't help, too (at r364788).
> 
>  *There are two root pools in different physical drive.
>   stable/12 on nvd0 (primary) and head on ada0 (secondary).
> 
>  *GENERIC-NODEBUG based (added options CAM_IOSCHED_DYNAMIC)
>   kernel.
> 
> -- 
> Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>


-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20200906180240.e61a2869b1258f96c3e7d398>