Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 28 Feb 2012 17:22:37 +0100
From:      Bengt Ahlgren <bengta@sics.se>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        emulation@freebsd.org
Subject:   Re: [Bengt Ahlgren] 8.3-PRERELEASE panic in linux emulation
Message-ID:  <uh7fwdvlygi.fsf@P142.sics.se>
In-Reply-To: <uh7k438ba9r.fsf@P142.sics.se> (Bengt Ahlgren's message of "Mon,  27 Feb 2012 15:50:40 %2B0100")
References:  <uh762ewxvpv.fsf@P142.sics.se> <20120224142940.GR55074@deviant.kiev.zoral.com.ua> <uh739a0w39k.fsf@P142.sics.se> <uh7fwe054u4.fsf@P142.sics.se> <20120225011228.GB55074@deviant.kiev.zoral.com.ua> <uh762es5tlz.fsf@P142.sics.se> <20120227143143.GE55074@deviant.kiev.zoral.com.ua> <uh7obskbayd.fsf@P142.sics.se> <20120227143958.GG55074@deviant.kiev.zoral.com.ua> <uh7k438ba9r.fsf@P142.sics.se>

next in thread | previous in thread | raw e-mail | index | archive | help
Bengt Ahlgren <bengta@sics.se> writes:

> Konstantin Belousov <kostikbel@gmail.com> writes:
>
>> On Mon, Feb 27, 2012 at 03:35:54PM +0100, Bengt Ahlgren wrote:
>>> Konstantin Belousov <kostikbel@gmail.com> writes:
>>> 
>>> > On Mon, Feb 27, 2012 at 01:49:28PM +0100, Bengt Ahlgren wrote:
>>> >> Konstantin Belousov <kostikbel@gmail.com> writes:
>>> >> 
>>> >> > On Fri, Feb 24, 2012 at 09:55:31PM +0100, Bengt Ahlgren wrote:
>>> >> >> Bengt Ahlgren <bengta@sics.se> writes:
>>> >> >> 
>>> >> >> > Konstantin Belousov <kostikbel@gmail.com> writes:
>>> >> >> >
>>> >> >> >> On Fri, Feb 24, 2012 at 01:27:24PM +0100, Bengt Ahlgren wrote:
>>> >> >> >>> Hello!
>>> >> >> >>> 
>>> >> >> >>> Perhaps emulation@ is a better place to report this problem?
>>> >> >> >>> 
>>> >> >> >>> Bengt
>>> >> >> >>> 
>>> >> >> >>
>>> >> >> >>> From: Bengt Ahlgren <bengta@sics.se>
>>> >> >> >>> To: stable@freebsd.org
>>> >> >> >>> Subject: 8.3-PRERELEASE panic in linux emulation
>>> >> >> >>> Date: Thu, 23 Feb 2012 17:26:32 +0100
>>> >> >> >>> 
>>> >> >> >>> Hi!
>>> >> >> >>> 
>>> >> >> >>> I get a consistent panic when starting acroread after updating to
>>> >> >> >>> 8.3-PRERELEASE.  An 8.2-STABLE from Feb 4th was OK.  Can provide more
>>> >> >> >>> info if needed.
>>> >> >> >>> 
>>> >> >> >>> Bengt
>>> >> >> >>> 
>>> >> >> >>> FreeBSD xx.yy.zz 8.3-PRERELEASE FreeBSD 8.3-PRERELEASE #13 r231999: Wed Feb 22 21:01:38 CET 2012     bengta@P142.sics.se:/usr/obj/usr/src/sys/P142-82  i386
>>> >> >> >>> 
>>> >> >> >>> Fatal trap 12: page fault while in kernel mode
>>> >> >> >>> fault virtual address   = 0xbfbfdffc
>>> >> >> >>> fault code              = supervisor write, page not present
>>> >> >> >>> instruction pointer     = 0x20:0xc50b396c
>>> >> >> >>> stack pointer           = 0x28:0xe7481a6c
>>> >> >> >>> frame pointer           = 0x28:0xe7481a90
>>> >> >> >>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>> >> >> >>>                         = DPL 0, pres 1, def32 1, gran 1
>>> >> >> >>> processor eflags        = interrupt enabled, resume, IOPL = 0
>>> >> >> >>> current process         = 1997 (bash)
>>> >> >> >>> trap number             = 12
>>> >> >> >>> panic: page fault
>>> >> >> >>> KDB: stack backtrace:
>>> >> >> >>> db_trace_self_wrapper(c091af2a,70797420,78302065,a0d6231,c5d6b600,...) at db_trace_self_wrapper+0x26
>>> >> >> >>> kdb_backtrace(c0919061,c09b49a0,c0900251,e7481910,e7481910,...) at kdb_backtrace+0x2a
>>> >> >> >>> panic(c0900251,c0941cab,c5c246e8,1,1,...) at panic+0xaf
>>> >> >> >>> trap_fatal(c0670d02,0,e7481964,80400,c5c24580,...) at trap_fatal+0x353
>>> >> >> >>> trap_pfault(e74819cc,bfbfe190,c5c24580,202,c5cecac0,...) at trap_pfault+0x87
>>> >> >> >>> trap(e7481a2c) at trap+0x453
>>> >> >> >>> calltrap() at calltrap+0x6
>>> >> >> >>> --- trap 0xc, eip = 0xc50b396c, esp = 0xe7481a6c, ebp = 0xe7481a90 ---
>>> >> >> >>> elf_linux_fixup(e7481c0c,e7481b98,c065ca92,c60ffce8,100000,...) at elf_linux_fixup+0x33c
>>> >> >> >>> kern_execve(c5c24580,e7481c3c,0,8112710,8116cd8,...) at kern_execve+0x7d6
>>> >> >> >>> linux_execve(c5c24580,e7481cec,c,c,c,...) at linux_execve+0xa7
>>> >> >> >>> syscall(e7481d28) at syscall+0x372
>>> >> >> >>> Xint0x80_syscall() at Xint0x80_syscall+0x21
>>> >> >> >>> --- syscall (11, Linux ELF, linux_execve), eip = 0x281e0d4a, esp = 0xbfbfd644, ebp = 0xbfbfd7e8 ---
>>> >> >> >>> 
>>> >> >> >>> #0  doadump () at pcpu.h:244
>>> >> >> >>> #1  0xc05de609 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:441
>>> >> >> >>> #2  0xc05de84f in panic (fmt=Variable "fmt" is not available.
>>> >> >> >>> ) at /usr/src/sys/kern/kern_shutdown.c:614
>>> >> >> >>> #3  0xc08b22c3 in trap_fatal (frame=0xe7481a2c, eva=3217022972) at /usr/src/sys/i386/i386/trap.c:981
>>> >> >> >>> #4  0xc08b2357 in trap_pfault (frame=0xe7481a2c, usermode=0, eva=3217022972) at /usr/src/sys/i386/i386/trap.c:843
>>> >> >> >>> #5  0xc08b3133 in trap (frame=0xe7481a2c) at /usr/src/sys/i386/i386/trap.c:562
>>> >> >> >>> #6  0xc089bedc in calltrap () at /usr/src/sys/i386/i386/exception.s:168
>>> >> >> >>> #7  0xc50b396c in elf_linux_fixup (stack_base=0xe7481c0c, imgp=0xe7481b98) at /usr/src/sys/modules/linux/../../i386/linux/linux_sysvec.c:288
>>> >> >> >>> #8  0xc05ac636 in kern_execve (td=0xc5c24580, args=0xe7481c3c, mac_p=0x0) at /usr/src/sys/kern/kern_exec.c:551
>>> >> >> >>> #9  0xc50ab387 in linux_execve (td=0xc5c24580, args=0xe7481cec) at /usr/src/sys/modules/linux/../../i386/linux/linux_machdep.c:143
>>> >> >> >>> #10 0xc08b2902 in syscall (frame=0xe7481d28) at subr_syscall.c:114
>>> >> >> >>> #11 0xc089bf41 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:266
>>> >> >> >>> #12 0x00000033 in ?? ()
>>> >> >> >>> 
>>> >> >> >> I am not sure if this is the real cause of your panic, but the line from
>>> >> >> >> the backtrace indeed has a bug. Please try the change below.
>>> >> >> >>
>>> >> >> >> diff --git a/sys/i386/linux/linux_sysvec.c b/sys/i386/linux/linux_sysvec.c
>>> >> >> >> index 7634138..d4e23e1 100644
>>> >> >> >> --- a/sys/i386/linux/linux_sysvec.c
>>> >> >> >> +++ b/sys/i386/linux/linux_sysvec.c
>>> >> >> >> @@ -227,11 +227,11 @@ linux_fixup(register_t **stack_base, struct image_params *imgp)
>>> >> >> >>  	argv = *stack_base;
>>> >> >> >>  	envp = *stack_base + (imgp->args->argc + 1);
>>> >> >> >>  	(*stack_base)--;
>>> >> >> >> -	**stack_base = (intptr_t)(void *)envp;
>>> >> >> >> +	suword(*stack_base, (intptr_t)(void *)envp);
>>> >> >> >>  	(*stack_base)--;
>>> >> >> >> -	**stack_base = (intptr_t)(void *)argv;
>>> >> >> >> +	suword(*stack_base, (intptr_t)(void *)argv);
>>> >> >> >>  	(*stack_base)--;
>>> >> >> >> -	**stack_base = imgp->args->argc;
>>> >> >> >> +	suword(*stack_base, imgp->args->argc);
>>> >> >> >>  	return (0);
>>> >> >> >>  }
>>> >> >> >>  
>>> >> >> >> @@ -286,7 +286,7 @@ elf_linux_fixup(register_t **stack_base, struct image_params *imgp)
>>> >> >> >>  	imgp->auxargs = NULL;
>>> >> >> >>  
>>> >> >> >>  	(*stack_base)--;
>>> >> >> >> -	**stack_base = (register_t)imgp->args->argc;
>>> >> >> >> +	suword(*stack_base, (register_t)imgp->args->argc);
>>> >> >> >>  	return (0);
>>> >> >> >>  }
>>> >> >> >>  
>>> >> >> >
>>> >> >> > Thanks for the response!  I will try the patch, but that file has not
>>> >> >> > been touched since June 2011.  I was suspecting the changes in r231146
>>> >> >> > and r231148.  If there is no change with your patch I will roll back
>>> >> >> > those to see what happens.
>>> >> > This is very unlikely. fadvise() has nothing to do with image activators.
>>> >> >
>>> >> >> 
>>> >> >> No panics so far.  That patch does indeed seem to solve the problem!  I
>>> >> >> also verified with going back to the old kernel, which again
>>> >> >> consistently paniced.
>>> >> > I will commit the change in minutes. Kernel must not access usermode
>>> >> > addresses directly.
>>> >> >
>>> >> > But, does the application that used to panic the system, behave properly ?
>>> >> 
>>> >> Yes, it (acroread8) does behave properly with the patch.  Have not
>>> >> tested extensively, however.
>>> >> 
>>> >> >> Thanks very much for good work!
>>> >> >> 
>>> >> >> I'm a but puzzled though, because that bug must have been there for
>>> >> >> quite some time without triggering the panic.
>>> >> > The panic with unpatched kernel looks puzzling. Do you have some
>>> >> > non-default stack limit ? Can you look at the resource limit values
>>> >> > for the process initiated the panic ?
>>> >> 
>>> >> Not that I'm aware of.  Unless the acroread launch script does this.  I
>>> >> don't know how to check this for a running process, but "limits|grep
>>> >> stack" in a regular shell gives me:
>>> >> 
>>> >>   stacksize               65536 kB
>>> >> 
>>> >> Or, do you mean that I can dig that out of the crash dump?  If so, I'll
>>> >> need some help with how to.
>>> >
>>> > From the kgdb, frame 8, print the content of td->td_proc->p_limit.
>>> 
>>> Is this it?
>>> 
>>> (kgdb) frame 8
>>> #8  0xc05ac636 in kern_execve (td=0xc5dc3000, args=0xe7535c3c, mac_p=0x0)
>>>     at /usr/src/sys/kern/kern_exec.c:551
>>> 551                     (*p->p_sysent->sv_fixup)(&stack_base, imgp);
>>> (kgdb) print td->td_proc->p_limit
>>> $1 = (struct plimit *) 0xc5490700
>>> (kgdb) print *td->td_proc->p_limit
>>> $2 = {pl_rlimit = {{rlim_cur = 9223372036854775807, 
>>>       rlim_max = 9223372036854775807}, {rlim_cur = 9223372036854775807, 
>>>       rlim_max = 9223372036854775807}, {rlim_cur = 536870912, 
>>>       rlim_max = 536870912}, {rlim_cur = 67108864, rlim_max = 67108864}, {
>>>       rlim_cur = 9223372036854775807, rlim_max = 9223372036854775807}, {
>>>       rlim_cur = 9223372036854775807, rlim_max = 9223372036854775807}, {
>>>       rlim_cur = 9223372036854775807, rlim_max = 9223372036854775807}, {
>>>       rlim_cur = 5547, rlim_max = 5547}, {rlim_cur = 11095, rlim_max = 11095}, 
>>>     {rlim_cur = 9223372036854775807, rlim_max = 9223372036854775807}, {
>>>       rlim_cur = 9223372036854775807, rlim_max = 9223372036854775807}, {
>>>       rlim_cur = 9223372036854775807, rlim_max = 9223372036854775807}, {
>>>       rlim_cur = 9223372036854775807, rlim_max = 9223372036854775807}}, 
>>>   pl_refcnt = 51}
>> Yes, but the stack size is the normal 64MB.
>>
>> Did other linux binaries worked before the patch ? E.g., did the bash
>> started normally ?
>
> Hmm, I see now that the crash dump actually says that it is bash that is
> running, not acroread.  The acroread startup script invokes a linux bash
> to launch the acroread binary, so I guess acroread was actually never
> started.
>
> I should also say that I was running KDE.  I can test invoking linux
> bash and some other linux utilities without any Xorg+KDE to see what
> happens.  I however don't have time today for this.

I tried to reproduce today with other linux programs, but strangely
enough I can't reproduce it at all, not even with acroread.  I'll run
memtest to see whether I have any ram problems.

Bengt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?uh7fwdvlygi.fsf>