Date: Mon, 16 Nov 2015 09:22:57 -0600 From: Will Senn <will.senn@gmail.com> To: freebsd-questions@freebsd.org Subject: Kernel panic and hard disk failure Message-ID: <5649F4D1.8070105@gmail.com>
next in thread | raw e-mail | index | archive | help
Hi, disclosure: I am a freebsd newbie coming from Mac OS X and Linux, even some Windows..., please be gentle. My questions are listed at the bottom, here is the background. I have a Dell 755 Optiplex configured as follows: Two SATA disks - 240GB SSD + 750GB HDD 8 GB RAM Quad core Intel 2.83Ghz CPU FreeBSD 10.2-Release I came into my home office yesterday and the console was displaying a disk error and the system was prompting for a shell. I entered shell and a core dump was generated and saved in /var/crash. Since this was my first experience with such an event, I just merrily went about my day after a reboot. It happened again, later in the day. I figured it was a bad hard drive and replaced it with a spare and restored from rsync backup. After thinking a bit more about the situation, I decided to look at the crash directory to see if there was anything to be learned there. Apparently, there is quite a bit for me to learn yet :). In /var/crash, there were 12 files and two symlinks: bounds core.txt.0 core.txt.1 core.txt.2 info.0 info.1 info.2 info.last minfree vmcore.0 vmcore.1 vmcore.2 vmcore.last Three dumps? Hmm... I did file on the files to see if any were ASCII, and sure enough, bounds, core.txt.X, info.X, minfree were. bounds contained the single number 3 minfree contained the number 2048 info.X contained basic crash dump information. The first had Panic String: page fault, the other two had Panic String: softdep_deallocate_dependencies: dangling deps. core.txt.X files look like a lot of different system tools being run and the results concatenated together Next, I looked at the vmcore.0 files using kgdb /boot/kernel/kernel /var/crash/vmcore.X, this produced yet more information (overload? not yet, but getting there): --- the first crash, snip Unread portion of the kernel message buffer: <118>Oct 28 19:33:05 freebird syslogd: exiting on signal 15 Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80906fa9 stack pointer = 0x28:0xfffffe0231eb8830 frame pointer = 0x28:0xfffffe0231eb8a20 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 961 (kwin) trap number = 12 panic: page fault cpuid = 2 --- the second and third crashes, snip Unread portion of the kernel message buffer: Device ada1p1 went missing before all of the data could be written to it; expect data loss. panic: softdep_deallocate_dependencies: dangling deps cpuid = 0 I didn't know what signal 15 was, so I did kill -l and figured out it was SIGTERM. I got the feeling the reason I didn't know about the first crash was that I probably killed/reset a reboot process or something. Out of this exercise, I have the following questions that I hope someone can help with: 1. Is bounds the number of crashes in /var/crashes, or what? 2. What is minfree? 3. What does it mean that the device went missing? 4. Does the information above sound like a faulty hard drive or are there additional tests that will tell me more about the failure? The device in question is the 750GB HDD, it is formatted ufs and is the target of rsync jobs running on another FBSD machine and Mac machine through rysncd. I replaced it out of due caution, but haven't thrown away the drive yet. Thanks, Will
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5649F4D1.8070105>