From owner-freebsd-arm@FreeBSD.ORG  Wed Oct 24 10:38:29 2012
Return-Path: <owner-freebsd-arm@FreeBSD.ORG>
Delivered-To: arm@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id C1628880;
 Wed, 24 Oct 2012 10:38:29 +0000 (UTC)
 (envelope-from andrew@fubar.geek.nz)
Received: from smtp5.clear.net.nz (smtp5.clear.net.nz [203.97.33.68])
 by mx1.freebsd.org (Postfix) with ESMTP id 79AC98FC16;
 Wed, 24 Oct 2012 10:38:28 +0000 (UTC)
Received: from mxin2-orange.clear.net.nz
 (lb2-srcnat.clear.net.nz [203.97.32.237])
 by smtp5.clear.net.nz (CLEAR Net Mail)
 with ESMTP id <0MCE00NNG87O3F10@smtp5.clear.net.nz>; Wed,
 24 Oct 2012 23:38:16 +1300 (NZDT)
Received: from 202-0-48-19.paradise.net.nz (HELO localhost) ([202.0.48.19])
 by smtpin2.paradise.net.nz with ESMTP; Wed, 24 Oct 2012 23:38:14 +1300
Date: Wed, 24 Oct 2012 23:38:12 +1300
From: Andrew Turner <andrew@fubar.geek.nz>
Subject: Re: Trashed registers returning from kernel?
In-reply-to: <2B1CF099-50F0-46BE-8B02-61309DF93D5F@freebsd.org>
To: Tim Kientzle <kientzle@freebsd.org>
Message-id: <20121024233812.0eefd07f@fubar.geek.nz>
MIME-version: 1.0
X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.6; i386-portbld-freebsd8.1)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7bit
X-Pirate: Arrrr
References: <2B1CF099-50F0-46BE-8B02-61309DF93D5F@freebsd.org>
Cc: arm@freebsd.org
X-BeenThere: freebsd-arm@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Porting FreeBSD to the StrongARM Processor <freebsd-arm.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arm>
List-Post: <mailto:freebsd-arm@freebsd.org>
List-Help: <mailto:freebsd-arm-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arm>,
 <mailto:freebsd-arm-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Oct 2012 10:38:29 -0000

On Sun, 21 Oct 2012 18:40:08 -0700
Tim Kientzle <kientzle@freebsd.org> wrote:

> On the BeagleBone, I'm seeing a similar crash in several different
> user land programs.  I suspect it's a kernel bug.
> 
> Symptom: program is killed with SIGSEGV.  Most of the registers
> contain values above 0xc0000000 (pointing into kernel space).
> 
> Theory:
>  * Registers are not always getting correctly restored on a
> kernel->user transition.
>  * SEGV is a consequence.
> 
> I can reproduce it semi-consistently by running "emacs existing-file"
> just after a reboot.  (But I'm pretty sure this is the same symptoms
> I've seen with several other programs, so I don't think it's a bug in
> emacs.)
> 
> Has anyone else seen this on an armv6 system?
> 
> Does anyone have suggestions for how to go about debugging this?
> 
> Suggestions appreciated.

Can you find if the crash happens after a single syscall or is it
after many different syscalls? How consistent are the register values
and instruction that causes the SEGV? Have you identified any other programs that have the same issue?

The relevant code to save the registers with system calls is in
sys/arm/arm/exception.S and sys/arm/include/asmacros.h.

In exception.S there is the function swi_entry. It:
 - Saves the registers to the stack.
 - Stores sp in r0 to be passed in as the argument to swi_handler()
 - Stores sp in r6 to allow us to restore it later
 - Aligns the stack
 - Calls swi_handler() to perform the system call
 - Restores the stack pointer from r6
 - Performs any asynchronous software trap (calls ast() if required)
 - Restores the registers from the stack
 - Returns to userland

Assuming it is a syscall causing this I can think of 3 possible causes:
1. Someone is clobbering the stack.
2. Someone is clobbering the trap frame.
3. There is a cache issue causing old data to be written to the stack.

Checking 1 should be easy. In exception.S add the instruction "sub sp,
sp, #32" before the bic instruction. This will add padding to the
stack. You may need to change the #32 if it is not large enough. This
won't help if the issue is in ast().

Andrew