From owner-freebsd-current@FreeBSD.ORG Tue Sep 2 18:41:12 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8C9E81065670 for ; Tue, 2 Sep 2008 18:41:12 +0000 (UTC) (envelope-from freebsd-current@adam.gs) Received: from mail.adam.gs (cl-127.ewr-01.us.sixxs.net [IPv6:2001:4830:1200:7e::2]) by mx1.freebsd.org (Postfix) with ESMTP id 180118FC2C for ; Tue, 2 Sep 2008 18:41:12 +0000 (UTC) (envelope-from freebsd-current@adam.gs) Received: from [127.0.0.1] (localhost.adam.gs [127.0.0.1]) by mail.adam.gs (Postfix) with ESMTP id 99888F8D61D for ; Tue, 2 Sep 2008 14:41:10 -0400 (EDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=simple; s=mail; d=adam.gs; b=Qy4SktDGZf3OQ4mxIAR62j/qaE4uOwFhLdKS+XQghXt2G27xFpWXYksHGyvvo28BQFEuRbZ3iDqEoaXomWJOb8mHpULGVMdRWvI/GECR8GJrIhtKtKdhdGm7m4ZNYF/WPnMZWcTSK9xc/n3CaRSx77D2Vo7pGu1//Zv2iuyolk4=; Message-Id: From: Adam Jacob Muller To: Kostik Belousov In-Reply-To: <20080901145315.GU2038@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v928.1) Date: Tue, 2 Sep 2008 14:41:08 -0400 References: <20080830183804.GG2038@deviant.kiev.zoral.com.ua> <20080830195844.GI2038@deviant.kiev.zoral.com.ua> <20080831071618.GK2038@deviant.kiev.zoral.com.ua> <20080831091639.GM2038@deviant.kiev.zoral.com.ua> <80861bfa0809010733h47580d3evb3eb68c972a2bb25@mail.gmail.com> <20080901145315.GU2038@deviant.kiev.zoral.com.ua> X-Authentication: 5Y9zP9uohPVCIKSKssvSwTvzf0+nN8I7IHuNjRHchA2ZGkQ0S1nNBDHYYPDC3FPZGd9A1cjjjyDCoZ9VN+JLpLiSCkZVi7UuiAa/zQMWg10/YCOcNIalQ2HNL//J/eqV60ibrXDFlBGWDW/aW03wCn3+ZAGKOzfiwburp0e9qa2uGm7X2FGol2eC+A== Cc: freebsd-current@freebsd.org, Vyacheslav Bocharov Subject: Re: __tls_get_addr problem with recent current X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Sep 2008 18:41:12 -0000 On Sep 1, 2008, at 10:53 AM, Kostik Belousov wrote: > On Mon, Sep 01, 2008 at 05:33:37PM +0300, Vyacheslav Bocharov wrote: >> I have similar problem in 7-STABLE (from 1 sep): >> 32bit application exec 64application and we have an core dump: >> >> # gdb fw.sh fw.sh.core >> GNU gdb 6.1.1 [FreeBSD] >> Copyright 2004 Free Software Foundation, Inc. >> GDB is free software, covered by the GNU General Public License, >> and you are >> welcome to change it and/or distribute copies of it under certain >> conditions. >> Type "show copying" to see the conditions. >> There is absolutely no warranty for GDB. Type "show warranty" for >> details. >> This GDB was configured as "amd64-marcel-freebsd"... >> Core was generated by `fw.sh'. >> Program terminated with signal 11, Segmentation fault. >> Reading symbols from /usr/lib/libstdc++.so.6...done. >> Loaded symbols for /usr/lib/libstdc++.so.6 >> Reading symbols from /lib/libm.so.5...done. >> Loaded symbols for /lib/libm.so.5 >> Reading symbols from /lib/libgcc_s.so.1...done. >> Loaded symbols for /lib/libgcc_s.so.1 >> Reading symbols from /lib/libc.so.7...done. >> Loaded symbols for /lib/libc.so.7 >> Reading symbols from /libexec/ld-elf.so.1...done. >> Loaded symbols for /libexec/ld-elf.so.1 >> #0 0x0000000800507483 in __tls_get_addr () from /libexec/ld-elf.so.1 >> (gdb) bt >> #0 0x0000000800507483 in __tls_get_addr () from /libexec/ld-elf.so.1 >> #1 0x0000000800ad8892 in _pthread_mutex_init_calloc_cb () from >> /lib/libc.so.7 >> #2 0x0000000800ada35f in malloc () from /lib/libc.so.7 >> #3 0x00000008007050ad in operator new () from /usr/lib/libstdc+ >> +.so.6 >> #4 0x00000008006b5f21 in std::string::_Rep::_S_create () >> from /usr/lib/libstdc++.so.6 >> #5 0x00000008006b6ca5 in std::string::_S_copy_chars () >> from /usr/lib/libstdc++.so.6 >> #6 0x00000008006b6dc2 in std::basic_string> std::char_traits, >> std::allocator >::basic_string () from /usr/lib/libstdc++.so.6 >> #7 0x00000000004021ec in __static_initialization_and_destruction_0 ( >> __initialize_p=1, __priority=65535) at CCmdLine.cpp:16 >> #8 0x00000000004026c3 in global constructors keyed to cmdlist () >> at CCmdLine.cpp:177 >> #9 0x00000000004033a2 in __do_global_ctors_aux () >> #10 0x000000000040113e in _init () >> #11 0x0000000800b2b0c0 in __cxa_atexit () from /lib/libc.so.7 >> #12 0x00000000004014e8 in _start () >> #13 0x000000080052c000 in ?? () >> >> I tried your patch but nothing changed. > Exactly which patch ? There were three, one of which caused immediate > panic. I put the patches at > http://people.freebsd.org/~kib/misc/fsbase.1.patch > http://people.freebsd.org/~kib/misc/fsbase.2.patch > > Could you, please, try both and report the results ? > And, isolated test case, as several C files or recipe to reproduce > this with base system, would be ideal. > >> >> 2008/8/31 Kostik Belousov >> >>> On Sun, Aug 31, 2008 at 10:16:18AM +0300, Kostik Belousov wrote: >>>> On Sat, Aug 30, 2008 at 02:03:00PM -0700, Artem Belevich wrote: >>>>> With the new patch kernel has crashed as soon as I ran i386 app, >>>>> though the crash happened within in-kernel thread g_up: >>>>> >>>>> Fatal trap 12: page fault while in kernel mode >>>>> cpuid = 2; apic id = 02 >>>>> fault virtual address = 0x20 >>>>> fault code = supervisor read data, page not present >>>>> instruction pointer = 0x8:0xffffffff804a821f >>>>> stack pointer = 0x10:0xffffffffac280b60 >>>>> frame pointer = 0x10:0x0 >>>>> code segment = base 0x0, limit 0xfffff, type 0x1b >>>>> = DPL 0, pres 1, long 1, def32 0, gran 1 >>>>> processor eflags = resume, IOPL = 0 >>>>> current process = 3 (g_up) >>>>> trap number = 12 >>>>> panic: page fault >>>>> cpuid = 2 >>>>> Uptime: 37s >>>>> Physical memory: 8169 MB >>>>> Dumping 380 MB: 365 349 333 317 301 285 269 253 237 221 205 189 >>>>> 173 >>>>> 157 141 125 109 93 77 61 45 29 13 >>>> Could you, please, show me the disassembled code around the faulted >>>> %rip ? >>> >>> No need, it seems I found the problem. I trashed the %rdx that >>> contains >>> the third cpu_switch argument. Please, try the updated patch. >>> >>> Thanks for the testing ! >>> >>> diff --git a/sys/amd64/amd64/cpu_switch.S b/sys/amd64/amd64/ >>> cpu_switch.S >>> index f34b0cc..03f0eca 100644 >>> --- a/sys/amd64/amd64/cpu_switch.S >>> +++ b/sys/amd64/amd64/cpu_switch.S >>> @@ -249,6 +249,12 @@ store_seg: >>> 1: movl %ds,PCB_DS(%r8) >>> movl %es,PCB_ES(%r8) >>> movl %fs,PCB_FS(%r8) >>> + movq %rdx,%r11 >>> + movl $MSR_FSBASE,%ecx >>> + rdmsr >>> + shlq $32,%rdx >>> + leaq (%rax,%rdx),%r9 >>> + movq %r11,%rdx >>> jmp done_store_seg >>> 2: movq PCB_GS32P(%r8),%rax >>> movq (%rax),%rax >>> >> >> >> >> -- >> Vyacheslav Bocharov Hi, i have this same issue on recent RELENG_7 (pre and post 7.1- PRERELEASE), the issue was reproducible by a simple c-app compiled on 7.x 32-bit #include main() { execl("/bin/ls", "/bin/ls", (char *) 0); } this app will segfault rather reliably (but not 100% of the time) (while true;do ./test; if [ "$?" -gt "0" ];then break; fi; done). patch 1 (http://people.freebsd.org/~kib/misc/fsbase.1.patch) fixes the issue for me patch 2 (http://people.freebsd.org/~kib/misc/fsbase.2.patch) does not though it may mitigate it slightly (cause things to crash less frequently) -Adam