From owner-freebsd-stable  Fri Mar  8 13:15:10 2002
Delivered-To: freebsd-stable@freebsd.org
Received: from InterJet.dellroad.org (adsl-63-194-81-26.dsl.snfc21.pacbell.net [63.194.81.26])
	by hub.freebsd.org (Postfix) with ESMTP
	id 49A7A37B404; Fri,  8 Mar 2002 13:15:02 -0800 (PST)
Received: from arch20m.dellroad.org (arch20m.dellroad.org [10.1.1.20])
	by InterJet.dellroad.org (8.9.1a/8.9.1) with ESMTP id NAA07558;
	Fri, 8 Mar 2002 13:13:08 -0800 (PST)
Received: (from archie@localhost)
	by arch20m.dellroad.org (8.11.6/8.11.6) id g28LCcl48040;
	Fri, 8 Mar 2002 13:12:38 -0800 (PST)
	(envelope-from archie)
From: Archie Cobbs <archie@dellroad.org>
Message-Id: <200203082112.g28LCcl48040@arch20m.dellroad.org>
Subject: M_NOWAIT is waiting anyway..?
To: freebsd-stable@freebsd.org
Date: Fri, 8 Mar 2002 13:12:38 -0800 (PST)
Cc: dillon@freebsd.org
X-Mailer: ELM [version 2.4ME+ PL88 (25)]
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-stable.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-stable>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-stable>
X-Loop: FreeBSD.ORG

I'm seeing a panic that suggests that the kernel malloc() implementation
is broken with respect to M_NOWAIT, hard to believe as that is.
Here's the trace:

> panic: tsleep1
> Debugger("panic")
> Stopped at      Debugger+0x34:  movb    $0,in_Debugger.426
> db> tr
> Debugger(c025f03b) at Debugger+0x34
> panic(c025f50c,c02befd0,1000000,ffffffff,34) at panic+0x70
> tsleep(c02befd0,4,c0279394,0,c02befd0) at tsleep+0x5e
> acquire(c02befd0,1000000,600,1,c02b3540) at acquire+0x88
> lockmgr(c02befd0,2,0,0,1) at lockmgr+0x248
> kmem_malloc(c02befa0,1000,1,0,c06fb700) at kmem_malloc+0x54
> malloc(c,c02b3540,1,0,c06fb700) at malloc+0x246
> typed_mem_realloc(c028e988,a9,c028e984,0,24) at typed_mem_realloc+0xa2
> pkt_new_nobuf(c1e8f7c0,c06fb73e,c02962b8,c024d84a,c06fb700) at pkt_new_nobuf+0x38
> ng_test_mbuf2pkt(c06fb700) at ng_test_mbuf2pkt+0x39
> ng_test_rx_int(c1ebb940,c06fb700,0,5,c1ebb920) at ng_test_rx_int+0x62
> ng_test_rcvdata(c1ebb940,c06fb700,0,c1f2eba0,c1ebb920) at ng_test_rcvdata+0xe6
> ng_send_dataq(c1ebb920,c06fb700,0) at ng_send_dataq+0x6f
> ngintr(c021a44f,0,c0290010,c0140010,c02b0010) at ngintr+0xd3
> swi_net_next(3581000,0,0,0,0) at swi_net_next
> vm_page_zero_idle(f,686,2,383f9ff,756e6547) at vm_page_zero_idle+0xdf
> idle_loop() at idle_loop+0x13

The important things to note are:

- A netgraph soft interrupt is running during the idle process,
  so curproc == NULL
- malloc() is being called with the M_NOWAIT
- tsleep() is being called anyway

This is on a 4.4-REL kernel, but it appears that the same thing
would happen in 4.5-REL as well.

This is of course completely broken, because M_NOWAIT tells malloc()
it should never sleep, returning NULL instead.

As always, this could be happening to me due to memory corruption,
which was my first thought, but after looking at the code, it does
appear that this can happen like so:

  1. malloc() is called with M_NOWAIT

  2. malloc() calls kmem_malloc(kmem_map, ...), which has this
     rather disturbing comment:

    * NOTE:  This routine is not supposed to block if M_NOWAIT is set, but
    * I have not verified that it actually does not block.

  3. kmem_malloc() calls vm_map_lock(kmem_map)

  4. vm_map_lock() is a macro that calls
     lockmgr(&kmem_map->lock, LK_EXCLUSIVE, ..)

  5. kmem_map->lock->lk_flags does not include LK_NOWAIT (it is
     initialized by vm_map_init() which calls lockinit() with LK_NOPAUSE
     but not LK_NOWAIT) so lockmgr() calls acquire()

  6. acquire() calls tsleep() -> panic because curproc == NULL

Is the above scenario correct? If so it seems like a very serious
problem for me to be the first one to see it.. though that may be
because my kernel netgraph node is allocating enough memory to cause
malloc() to call kmem_malloc(), which normally does not happen.. ?

Thanks for any insights!
-Archie

__________________________________________________________________________
Archie Cobbs     *     Packet Design     *     http://www.packetdesign.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message