Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 16 May 2003 05:56:16 +1000
From:      Peter Jeremy <peterjeremy@optushome.com.au>
To:        Jonas =?iso-8859-1?Q?B=FClow?= <jonas@servicefactory.se>
Cc:        freebsd-stable <freebsd-stable@freebsd.org>
Subject:   Re: Kernel panic on FreeBSD 4.8-STABLE
Message-ID:  <20030515195616.GE4366@cirb503493.alcatel.com.au>
In-Reply-To: <3EC377BC.5060708@servicefactory.se>
References:  <3EC10790.50809@bulow.mine.nu> <20030514100716.GA4410@cirb503493.alcatel.com.au> <3EC22DB4.70409@bulow.mine.nu> <20030514200331.GD4366@cirb503493.alcatel.com.au> <3EC377BC.5060708@servicefactory.se>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi Jonas,

Debugging this is going to take some time - especially given the
different timezones we are in.  The trap is occurring in the 'ltr'
that actually does the task switch.

On Thu, May 15, 2003 at 01:19:24PM +0200, Jonas Bülow wrote:
>Hi,
>
>Peter Jeremy wrote:
>>On Wed, May 14, 2003 at 01:51:16PM +0200, Jonas Bulow wrote:
>>
>>>Peter Jeremy wrote:
>>>
>>>>On Tue, May 13, 2003 at 04:56:16PM +0200, Jonas Bulow wrote:
>>>>
>>>>
>>>>>I need some help to understand a backtrace.
>>>>
>>>>
>>>>>Fatal trap 9: general protection fault while in kernel mode
>>>>>instruction pointer     = 0x8:0xc023ceeb
>>>>>stack pointer           = 0x10:0xcf7d9ea4
>>>>>frame pointer           = 0x10:0xcf7d9ec0
>>>>>code segment            = base 0x0, limit 0xfffff, type 0x1b
>>>>>                    = DPL 0, pres 1, def32 1, gran 1
>>>>>processor eflags        = resume, IOPL = 0
>>>>>current process         = Idle
>>>>>interrupt mask          = net tty bio cam
>>>>>trap number             = 9
>>>>>panic: general protection fault
>>>>
>>>>...
>>>>
>>>>
>>>>>#17 0xc023d6fb in trap (frame={tf_fs = 16, tf_es = 134938640, tf_ds = 
>>>>>-982253552, tf_edi = -971835344, tf_esi = 32,
>>>>>  tf_ebp = -813850944, tf_isp = -813850992, tf_ebx = -1070885216, 
>>>>>tf_edx = -812732416, tf_ecx = -831483840,
>>>>>  tf_eax = 336283586, tf_trapno = 9, tf_err = 32, tf_eip = 
>>>>>-1071395093, tf_cs = 8, tf_eflags = 65670, tf_esp = -1072211888,
>>>>>  tf_ss = -831471360}) at /usr/src/sys/i386/i386/trap.c:636
>>>>>#18 0xc023ceeb in sw1a ()
>>>>>#19 0xc0174ff1 in tsleep (ident=0xce70c100, priority=288, 
>>>>>wmesg=0xc02530a5 "wait", timo=0) at /usr/src/sys/kern/kern_synch.c:479
>>>>
>>>>
>>>>#18 is the underlying problem.  sw1a() is in /sys/i386/i386/swtch.s
>>>>and you might like to disassemble the code around 0xc023ceeb to see
>>>>exactly where it is dying.  GPF is a catch-all category so it's
>>>>difficult to know exactly why you're getting it without knowing the
>>>>actual instruction it dies on.
>>>
>>>This is beyond my skills. :-) Does the disassemble say anything usefull?
>>>
>>>(kgdb) disassemble 0xc023ceeb
>>
>>...
>>
>>>0xc023cecf <sw1a+93>:   mov    $0xc0298550,%edi
>>>0xc023ced4 <sw1a+98>:   mov    0xc0298558,%ebx
>>>0xc023ceda <sw1a+104>:  mov    0x0(%edi),%eax
>>>0xc023cedd <sw1a+107>:  mov    %eax,0x0(%ebx)
>>>0xc023cee0 <sw1a+110>:  mov    0x4(%edi),%eax
>>>0xc023cee3 <sw1a+113>:  mov    %eax,0x4(%ebx)
>>>0xc023cee6 <sw1a+116>:  mov    $0x20,%esi
>>>0xc023ceeb <sw1a+121>:  ltr    %si
>>
>>
>>It's dying trying to switch tasks.  %edi isn't _common_tssd so it's a
>>private TSS.  This is a bit beyond my skills to debug remotely - I
>>don't suppose you have a iA32 system programming manual handy?  
>
>I have the manuals found at 
>http://developer.intel.com/design/pentium4/manuals/ . Chapter 6 in 
>volume 3 seems to be the home work for me. :-)

Looks right from the cover.  I'll check at work.  (I'm using an old
486 book at home - it's the 'Multi-tasking' chapter in that).  You
might need to read the "System Architecture" chapter for some background
as well.

>> You
>>could try printing the 8 bytes following %edi in frame #18
>>(0xc612f830) 
>
>(kgdb) x/8xb 0xc612f830
>0xc612f830:     0x10    0x02    0x00    0x00    0xc2    0x47    0x0b    0x14

This should be the new TSS descriptor.  As longwords, this gives
0x00000210 0x140b47c2, which doesn't make sense - though the latter
word correctly matches tf_eax.

AVL:    0
Busy:   ?
Base:   0x14c20000 - this isn't valid.  Kernel addresses are 0xcXXXXXXX
D:      0
DPL:    2
G:      0
Limit:  0xb0210 - This is excessive
P:      0 - this must be 1
Type:   7 - describes it as a 80286 Trap Gate.  It should be 9.

It would make more sense if it was byte-swapped.  I'm not sure where to
go from here.  I'll do some more thinking when I get home this evening.

Peter



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030515195616.GE4366>