From owner-freebsd-arm@FreeBSD.ORG Wed Oct 24 10:38:29 2012 Return-Path: Delivered-To: arm@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C1628880; Wed, 24 Oct 2012 10:38:29 +0000 (UTC) (envelope-from andrew@fubar.geek.nz) Received: from smtp5.clear.net.nz (smtp5.clear.net.nz [203.97.33.68]) by mx1.freebsd.org (Postfix) with ESMTP id 79AC98FC16; Wed, 24 Oct 2012 10:38:28 +0000 (UTC) Received: from mxin2-orange.clear.net.nz (lb2-srcnat.clear.net.nz [203.97.32.237]) by smtp5.clear.net.nz (CLEAR Net Mail) with ESMTP id <0MCE00NNG87O3F10@smtp5.clear.net.nz>; Wed, 24 Oct 2012 23:38:16 +1300 (NZDT) Received: from 202-0-48-19.paradise.net.nz (HELO localhost) ([202.0.48.19]) by smtpin2.paradise.net.nz with ESMTP; Wed, 24 Oct 2012 23:38:14 +1300 Date: Wed, 24 Oct 2012 23:38:12 +1300 From: Andrew Turner Subject: Re: Trashed registers returning from kernel? In-reply-to: <2B1CF099-50F0-46BE-8B02-61309DF93D5F@freebsd.org> To: Tim Kientzle Message-id: <20121024233812.0eefd07f@fubar.geek.nz> MIME-version: 1.0 X-Mailer: Claws Mail 3.8.0 (GTK+ 2.24.6; i386-portbld-freebsd8.1) Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7bit X-Pirate: Arrrr References: <2B1CF099-50F0-46BE-8B02-61309DF93D5F@freebsd.org> Cc: arm@freebsd.org X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Porting FreeBSD to the StrongARM Processor List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 24 Oct 2012 10:38:29 -0000 On Sun, 21 Oct 2012 18:40:08 -0700 Tim Kientzle wrote: > On the BeagleBone, I'm seeing a similar crash in several different > user land programs. I suspect it's a kernel bug. > > Symptom: program is killed with SIGSEGV. Most of the registers > contain values above 0xc0000000 (pointing into kernel space). > > Theory: > * Registers are not always getting correctly restored on a > kernel->user transition. > * SEGV is a consequence. > > I can reproduce it semi-consistently by running "emacs existing-file" > just after a reboot. (But I'm pretty sure this is the same symptoms > I've seen with several other programs, so I don't think it's a bug in > emacs.) > > Has anyone else seen this on an armv6 system? > > Does anyone have suggestions for how to go about debugging this? > > Suggestions appreciated. Can you find if the crash happens after a single syscall or is it after many different syscalls? How consistent are the register values and instruction that causes the SEGV? Have you identified any other programs that have the same issue? The relevant code to save the registers with system calls is in sys/arm/arm/exception.S and sys/arm/include/asmacros.h. In exception.S there is the function swi_entry. It: - Saves the registers to the stack. - Stores sp in r0 to be passed in as the argument to swi_handler() - Stores sp in r6 to allow us to restore it later - Aligns the stack - Calls swi_handler() to perform the system call - Restores the stack pointer from r6 - Performs any asynchronous software trap (calls ast() if required) - Restores the registers from the stack - Returns to userland Assuming it is a syscall causing this I can think of 3 possible causes: 1. Someone is clobbering the stack. 2. Someone is clobbering the trap frame. 3. There is a cache issue causing old data to be written to the stack. Checking 1 should be easy. In exception.S add the instruction "sub sp, sp, #32" before the bic instruction. This will add padding to the stack. You may need to change the #32 if it is not large enough. This won't help if the issue is in ast(). Andrew