From owner-freebsd-current@FreeBSD.ORG  Fri Dec 27 23:06:42 2013
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id DDAD6F33;
 Fri, 27 Dec 2013 23:06:42 +0000 (UTC)
Received: from hydra.pix.net (hydra.pix.net [IPv6:2001:470:e254::3c])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.freebsd.org (Postfix) with ESMTPS id 888BE186A;
 Fri, 27 Dec 2013 23:06:42 +0000 (UTC)
Received: from torb.pix.net (torb.pix.net
 [IPv6:2001:470:e254:10:12dd:b1ff:febf:eca9]) (authenticated bits=0)
 by hydra.pix.net (8.14.5/8.14.5) with ESMTP id rBRN6avb018062;
 Fri, 27 Dec 2013 18:06:37 -0500 (EST) (envelope-from lidl@pix.net)
X-Virus-Status: Clean
X-Virus-Scanned: clamav-milter 0.98 at mail.pix.net
Message-ID: <52BE07FC.8020104@pix.net>
Date: Fri, 27 Dec 2013 18:06:36 -0500
From: Kurt Lidl <lidl@pix.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8;
 rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: Marius Strobl <marius@alchemy.franken.de>
Subject: Re: panic on sparc64 running 10-beta4
References: <529F51DA.1040703@pix.net>
 <20131208135023.GA75625@alchemy.franken.de>
 <20131227184234.GA1597@alchemy.franken.de>
In-Reply-To: <20131227184234.GA1597@alchemy.franken.de>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: FreeBSD-Current <freebsd-current@freebsd.org>, sparc64@freebsd.org
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
 <freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-current>, 
 <mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current/>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
 <mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Dec 2013 23:06:42 -0000

On 12/27/13 1:42 PM, Marius Strobl wrote:
> On Sun, Dec 08, 2013 at 02:50:23PM +0100, Marius Strobl wrote:
>> On Wed, Dec 04, 2013 at 11:01:30AM -0500, Kurt Lidl wrote:
>>> I installed a sparc V120 (4GB memory, dual 72GB disks) with the 10-beta4
>>> install image today.
>>>
>>> Installation went fine.  I rebooted the machine, and then went to get
>>> a fresh ports tree, and the machine panic'd:
>>>
>>> root@host:/usr/ports # portsnap fetch
>>> Looking up portsnap.FreeBSD.org mirrors... 7 mirrors found.
>>> Fetching public key from your-org.portsnap.freebsd.org... done.
>>> Fetching snapshot tag from your-org.portsnap.freebsd.org... done.
>>> Fetching snapshot metadata... done.
>>> Fetching snapshot generated at Tue Dec  3 19:06:18 EST 2013:
>>> 43b6803c6d94efd5b2e2bc9df0b66a84b75417fa3c1728100% of   69 MB 3225 kBps
>>> 00m22s
>>> Extracting snapshot... done.
>>> Verifying snapshot integrity... panic: trap: illegal instruction (kernel)
>>> cpuid = 0
>>> KDB: stack backtrace:
>>> #0 0xc08836d4 at trap+0x554
>>> Uptime: 6m59s
>>> Dumping 4096 MB (4 chunks)
>>>     chunk at 0: 1073741824 bytes ... ok
>>>     chunk at 0x40000000: 1073741824 bytes ... ok
>>>     chunk at 0x80000000: 1073741824 bytes ... ok
>>>     chunk at 0xc0000000: 1073741824 bytes ... ok
>>>
>>> Dump complete
>>> Automatic reboot in 15 seconds - press a key on the console to abort
>>> Rebooting...
>>>
>>> And then it panic'd again when attempting to run 'savecore'!
>>> (I typed a <ctrl-t> after it printed out the line about
>>> writing the core file, that's where the "load: 0.72 ..." line
>>> came from...)
>>
>> Hrm, I don't seem to be able to reproduce this with an installation
>> built from sources and also can't remember a commit between BETA3 and
>> BETA4 which should be able to cause this. I currently can't test the
>> 10-BETA4 install image, though. Was the machine in question running
>> FreeBSD before, i. e. is it known good hardware? Did savecore eventually
>> succeed on writing out a dump?

Yes, this machine was successfully running 9/stable before this.
Yes, I did ultimately get a successful savecore to run.  The
trick seems to be not to use ctrl-t to check the status of the
machine.

I loaded the RC1 build too, and restrained myself to not check
via ctrl-t during the installation and unpacking of the OS, and
again when doing a "portsnap fetch && portsnap unpack".

I think the problem hinges on ctrl-t corrupting something that
causes the panic soon thereafter.

>
> FYI, I tried again with a machine installed from the 10.0-RC3 binary
> image and couldn't reproduce that problem either.

I just tried it again with a freshly fetched and burned RC3 image, and
was able to get it to panic while verifying the snapshot. My comments
are in [square brackets].

root@dna:~ # portsnap fetch
Looking up portsnap.FreeBSD.org mirrors... 7 mirrors found.
Fetching public key from your-org.portsnap.freebsd.org... done.
Fetching snapshot tag from your-org.portsnap.freebsd.org... done.
Fetching snapshot metadata... done.
Fetching snapshot generated at Thu Dec 26 19:11:40 EST 2013:

[ I did several ctrl-t operations during the fetch, no problem ]

Extracting snapshot...
[ctrl-t]
load: 0.55  cmd: bsdtar 1355 [runnable] 6.33r 1.39u 3.78s 37% 5384k
In: 11851934 bytes, compression 23%;  Out: 5320 files, 15471104 bytes
Current: 
snap/3d543fc157d97d1617eeb20832bf2cb37d04aeb2bf068bd0a07533e5b67c02fe.gz 
(1152 bytes)
[ctrl-t]
load: 0.83  cmd: bsdtar 1355 [runnable] 11.43r 2.36u 6.55s 51% 5384k
In: 19288110 bytes, compression 24%;  Out: 9299 files, 25624576 bytes
Current: 
snap/1856dcdc8799dd2b5a19d2d4720452bc77b4084088dd9ac5bd190da5ac211c4b.gz 
(101014 bytes)
done.
Verifying snapshot integrity...
[ a bunch of rapid ctrl-t keystrokes ]
load: 2.23  cmd: sha256 1370 [runnable] 0.49r 0.32u 0.00s 3% 2064k
load: 2.21  cmd: sh 1539 [runnable] 0.04r 0.00u 0.00s 2% 0k
load: 1.93  cmd: sha256 5705 [runnable] 0.02r 0.00u 0.00s 15% 1880k
load: 1.93  cmd: sh 5715 [runnable] 0.03r 0.00u 0.00s 15% 3136k
load: 1.93  cmd: gunzip 5728 [runnable] 0.01r 0.00u 0.00s 16% 1200k
load: 1.93  cmd: gunzip 5737 [runnable] 0.02r 0.00u 0.01s 16% 2144k
load: 1.93  cmd: sh 5749 [runnable] 0.00r 0.00u 0.00s 16% 3136k
load: 1.93  cmd: sh 1391 [runnable] 68.71r 0.58u 5.18s 15% 3136k
panic: trap: fast data access mmu miss (kernel)
cpuid = 0
KDB: stack backtrace:
#0 0xc0883954 at trap+0x554
Uptime: 1h1m23s
Dumping 4096 MB (4 chunks)
   chunk at 0: 1073741824 bytes ... ok
   chunk at 0x40000000: 1073741824 bytes ... ok
   chunk at 0x80000000: 1073741824 bytes ... ok
   chunk at 0xc0000000: 1073741824 bytes ... ok

Dump complete

Here's the backtrace from the recovered crashdump, 'core.txt.0':

Unread portion of the kernel message buffer:
panic: trap: fast data access mmu miss (kernel)
cpuid = 0
KDB: stack backtrace:
#0 0xc0883954 at trap+0x554
Uptime: 1h1m23s
Dumping 4096 MB (4 chunks)
   chunk at 0: 1073741824 bytes

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done.
Loaded symbols for /boot/kernel/geom_mirror.ko.symbols
#0  0x00000000c052f57c in doadump (textdump=<value optimized out>)
     at /usr/src/sys/kern/kern_shutdown.c:258
258             savectx(&dumppcb);
(kgdb) #0  0x00000000c052f57c in doadump (textdump=<value optimized out>)
     at /usr/src/sys/kern/kern_shutdown.c:258
#1  0x00000000c052ff70 in kern_reboot (howto=260)
     at /usr/src/sys/kern/kern_shutdown.c:447
#2  0x00000000c0530338 in panic (fmt=0xc0af4828 "trap: %s (kernel)")
     at /usr/src/sys/kern/kern_shutdown.c:754
#3  0x00000000c088395c in trap (tf=0xc1665040)
     at /usr/src/sys/sparc64/sparc64/trap.c:410
#4  0x00000000c00a1060 in tl1_trap ()
#5  0x00000000c051b3e8 in __mtx_lock_sleep (c=0xfffff800fca631e0,
     tid=18446735278028046848, opts=-56217240, file=0x0, line=0)
     at /usr/src/sys/kern/kern_mutex.c:432
#6  0x00000000c08108e8 in vm_page_insert_after (m=0xc0c58a98,
     object=0xfffff80002c73240, pindex=0, mpred=0x0)
     at /usr/src/sys/vm/vm_page.c:998
#7  0x00000000c080f780 in vm_page_dequeue (m=0xfffff800f981b368)
     at /usr/src/sys/vm/vm_page.c:2045
#8  0x00000000c07fcd80 in vm_fault_hold (map=0xfffff8000228ea00,
     vaddr=1083088896, fault_type=2 '\002', fault_flags=0, m_hold=0x0)
     at vm_page.h:644
#9  0x00000000c07feb90 in vm_fault (map=0xfffff8000228ea00, 
vaddr=1083088896,
     fault_type=2 '\002', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:224
#10 0x00000000c0882ffc in trap_pfault (td=<value optimized out>,
     tf=0xc1665880) at /usr/src/sys/sparc64/sparc64/trap.c:501
#11 0x00000000c0883498 in trap (tf=0xc1665880)
     at /usr/src/sys/sparc64/sparc64/trap.c:289
#12 0x00000000c00a0e40 in tl0_intr ()
#13 0x0000000000000000 in ?? ()
(kgdb)

-Kurt