From owner-freebsd-stable Fri Mar 8 13:15:10 2002 Delivered-To: freebsd-stable@freebsd.org Received: from InterJet.dellroad.org (adsl-63-194-81-26.dsl.snfc21.pacbell.net [63.194.81.26]) by hub.freebsd.org (Postfix) with ESMTP id 49A7A37B404; Fri, 8 Mar 2002 13:15:02 -0800 (PST) Received: from arch20m.dellroad.org (arch20m.dellroad.org [10.1.1.20]) by InterJet.dellroad.org (8.9.1a/8.9.1) with ESMTP id NAA07558; Fri, 8 Mar 2002 13:13:08 -0800 (PST) Received: (from archie@localhost) by arch20m.dellroad.org (8.11.6/8.11.6) id g28LCcl48040; Fri, 8 Mar 2002 13:12:38 -0800 (PST) (envelope-from archie) From: Archie Cobbs Message-Id: <200203082112.g28LCcl48040@arch20m.dellroad.org> Subject: M_NOWAIT is waiting anyway..? To: freebsd-stable@freebsd.org Date: Fri, 8 Mar 2002 13:12:38 -0800 (PST) Cc: dillon@freebsd.org X-Mailer: ELM [version 2.4ME+ PL88 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I'm seeing a panic that suggests that the kernel malloc() implementation is broken with respect to M_NOWAIT, hard to believe as that is. Here's the trace: > panic: tsleep1 > Debugger("panic") > Stopped at Debugger+0x34: movb $0,in_Debugger.426 > db> tr > Debugger(c025f03b) at Debugger+0x34 > panic(c025f50c,c02befd0,1000000,ffffffff,34) at panic+0x70 > tsleep(c02befd0,4,c0279394,0,c02befd0) at tsleep+0x5e > acquire(c02befd0,1000000,600,1,c02b3540) at acquire+0x88 > lockmgr(c02befd0,2,0,0,1) at lockmgr+0x248 > kmem_malloc(c02befa0,1000,1,0,c06fb700) at kmem_malloc+0x54 > malloc(c,c02b3540,1,0,c06fb700) at malloc+0x246 > typed_mem_realloc(c028e988,a9,c028e984,0,24) at typed_mem_realloc+0xa2 > pkt_new_nobuf(c1e8f7c0,c06fb73e,c02962b8,c024d84a,c06fb700) at pkt_new_nobuf+0x38 > ng_test_mbuf2pkt(c06fb700) at ng_test_mbuf2pkt+0x39 > ng_test_rx_int(c1ebb940,c06fb700,0,5,c1ebb920) at ng_test_rx_int+0x62 > ng_test_rcvdata(c1ebb940,c06fb700,0,c1f2eba0,c1ebb920) at ng_test_rcvdata+0xe6 > ng_send_dataq(c1ebb920,c06fb700,0) at ng_send_dataq+0x6f > ngintr(c021a44f,0,c0290010,c0140010,c02b0010) at ngintr+0xd3 > swi_net_next(3581000,0,0,0,0) at swi_net_next > vm_page_zero_idle(f,686,2,383f9ff,756e6547) at vm_page_zero_idle+0xdf > idle_loop() at idle_loop+0x13 The important things to note are: - A netgraph soft interrupt is running during the idle process, so curproc == NULL - malloc() is being called with the M_NOWAIT - tsleep() is being called anyway This is on a 4.4-REL kernel, but it appears that the same thing would happen in 4.5-REL as well. This is of course completely broken, because M_NOWAIT tells malloc() it should never sleep, returning NULL instead. As always, this could be happening to me due to memory corruption, which was my first thought, but after looking at the code, it does appear that this can happen like so: 1. malloc() is called with M_NOWAIT 2. malloc() calls kmem_malloc(kmem_map, ...), which has this rather disturbing comment: * NOTE: This routine is not supposed to block if M_NOWAIT is set, but * I have not verified that it actually does not block. 3. kmem_malloc() calls vm_map_lock(kmem_map) 4. vm_map_lock() is a macro that calls lockmgr(&kmem_map->lock, LK_EXCLUSIVE, ..) 5. kmem_map->lock->lk_flags does not include LK_NOWAIT (it is initialized by vm_map_init() which calls lockinit() with LK_NOPAUSE but not LK_NOWAIT) so lockmgr() calls acquire() 6. acquire() calls tsleep() -> panic because curproc == NULL Is the above scenario correct? If so it seems like a very serious problem for me to be the first one to see it.. though that may be because my kernel netgraph node is allocating enough memory to cause malloc() to call kmem_malloc(), which normally does not happen.. ? Thanks for any insights! -Archie __________________________________________________________________________ Archie Cobbs * Packet Design * http://www.packetdesign.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message