From owner-freebsd-current Sun Jun 9 21:41:13 2002 Delivered-To: freebsd-current@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id 77C5937B400; Sun, 9 Jun 2002 21:40:50 -0700 (PDT) Received: (from ken@localhost) by panzer.kdm.org (8.11.6/8.9.1) id g5A4ebj21169; Sun, 9 Jun 2002 22:40:37 -0600 (MDT) (envelope-from ken) Date: Sun, 9 Jun 2002 22:40:37 -0600 From: "Kenneth D. Merry" To: current@FreeBSD.org Cc: net@FreeBSD.org Subject: new zero copy sockets snapshot, WITNESS problems Message-ID: <20020609224036.A21143@panzer.kdm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: owner-freebsd-current@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG I have released a new zero copy sockets snapshot, the code and a brief update on what has been fixed is here: http://people.FreeBSD.org/~ken/zero_copy In short, I fixed the following things, which were found by Alfred Perlstein: - fix a race in the vm object allocation in jumbo_vm_init() - use a sysinit to initialize the jumbo_mutex, since there is really no other way to avoid a race between checking the mutex to see if it has been initialized and actually initializing it. - use SLIST_FIRST instead of directly accessing the first element in the inuse list. - don't call malloc(9) with M_WAITOK while holding a mutex. Between the last snapshot and this one, jhb (or someone else, can't remember who) turned on the WITNESS logging that flags when there is a potential sleep while a mutex is held. That has uncovered a whole slew of warnings in the zero copy code. Some of the warnings are present in the ti(4) driver without my patches, some of them are only triggered by the zero copy patches. Below is an abbreviated list of places where I found problems. Most of these problems are in areas where I could use some help to figure out what the best course of action to take is. So any comments on how to get these things fixed up (or better yet, code!) would be welcome! 1. sf_buf_init() calls kmem_alloc_pageable(), which through several calls ends up calling vm_map_entry_create(). vm_map_entry_create() calls uma_zalloc() with M_WAITOK. 2. sf_buf_init() calls malloc() *with* M_NOWAIT, but the VM code ends up calling vm_map_entry_create(), so you have the same problem as above. 3. ti_attach() calls bus_alloc_resource(), which through a ton of calls ends up calling vm_map_entry_create(), same problem as above. 4. ti_attach() calls bus_setup_intr(), which through various calls ends up calling ithread_create(), which calls malloc() with M_WAITOK. 5. ti_attach() calls bus_setup_intr(), which through various calls ends up calling ithread_create(), which calls kthread_create(), which calls fork1(), which calls uma_zalloc() with M_WAITOK. 6. ti_attach() calls bus_setup_intr(), which through various calls ends up calling ithread_create(), which calls kthread_create(), which calls fork1(), which calls MALLOC() with M_WAITOK in various places. 7. see the previous entry, fork1() calls fdcopy(), which calls MALLOC() with M_WAITOK. 8. see entry 6, fork1() calls vm_forkproc(), which calls pmap_new_proc(), which calls vm_object_allocate(), which does a uma_zalloc with M_WAITOK. 9. see above, pmap_new_proc() calls kmem_alloc_nofault(), which calls vm_map_find(), which through several calls calls vm_map_entry_create(). 10. fork1() calls pmap_new_thread(), which calls vm_object_allocate(), which does a uma_zalloc() with M_WAITOK. 11. ti_attach() calls bus_setup_intr(), which ends up calling ithread_add_handler() through several layers of indirection. ithread_add_handler() calls malloc with M_WAITOK. 12. ti_attach() calls contigmalloc() *with* M_NOWAIT, but contigmalloc1() calls vm_map_insert(), which calls vm_map_entry_create(), which calls uma_zalloc with M_WAITOK. 13. ti_attach() calls jumbo_vm_init() (jumbo buffer initialization function), which calls kmem_alloc_pageable(). See number 1 above, same problem here with vm_map_entry_create(). 14. jumbo_vm_init() calls malloc() *with* M_NOWAIT, but vm_map_insert() gets called, which calls vm_map_entry_create(), which calls uma_zalloc() with M_WAITOK. 15. several more instances, the same as 14, but vm_map_entry_create() gets called through a slightly different path from the same root malloc() call in jumbo_vm_init(). 16. ti_newbuf_std() calls MCLGET(), *with* M_DONTWAIT set, but m_clget() calls mb_alloc(), which calls mb_pop_cont(), which calls kmem_malloc(), which calls vm_map_insert(), which calls vm_map_entry_create(), which calls uma_zalloc() with M_WAITOK. I could keep going almost indefinitely, but I'm getting kinda tired of going through stack traces, and this is enough to talk about for the moment. There seem to be two general problems here: - the M_WAITOK call to uma_zalloc in vm_map_entry_create() is the cause of the problems in entries 1, 2, 3, 12, 13, 14, 15 and 16 - the bus_setup_intr(), or rather the kthread code in general apparantly isn't safe to be called while holding a mutex. This is the cause of the problems in entries 4, 5, 6, 7, 8, 9, 10, and 11. Several of the interfaces, most notably malloc(), contigmalloc(), and MCLGET(), offer "don't wait" interfaces, but the functions that they call don't necessarily respect or know about those flags. There are a lot more problems I ran into, some similar to the ones above. This is enough to get started with. If anyone wants to see the full console log, it is available at: http://people.FreeBSD.org/~ken/zero_copy/session.log.20020609 There was one other problem I ran into that wasn't related to sleeping while holding a mutex: db> c lock order reversal 1st 0xe7920bc0 ti0 (network driver) @ /usr/home/ken/perforce/FreeBSD-zero/src/sys/pci/if_ti.c:2126 2nd 0xc036c7c0 allproc (allproc) @ /usr/home/ken/perforce/FreeBSD-zero/src/sys/kern/kern_fork.c:309 Debugger("witness_lock") Stopped at Debugger+0x46: xchgl %ebx,in_Debugger.0 db> trace Debugger(c0318067) at Debugger+0x46 witness_lock(c036c7c0,8,c0311480,135,c04fcae8) at witness_lock+0x533 _sx_xlock(c036c7c0,c0311480,135,c01dc94b,c03cae10) at _sx_xlock+0x7d fork1(c0367d00,60034,c04fcabc) at fork1+0x1a0 kthread_create(c01d59c0,d8364400,c04fcae8,60000,c0311841) at kthread_create+0x37 ithread_create(c04fcb1c,10,0,c02e197a,c02e192c,c03381fd,10) at ithread_create+0x96 inthand_add(e7929480,10,c027aec4,e791e000,4) at inthand_add+0x6e nexus_setup_intr(e5dc4a00,e7938600,e78f37c0,4,c027aec4,e791e000,e791e150) at nexus_setup_intr+0x61 bus_generic_setup_intr(e5dc4600,e7938600,e78f37c0,4,c027aec4,e791e000,e791e150) at bus_generic_setup_intr+0x77 bus_generic_setup_intr(e7938700,e7938600,e78f37c0,4,c027aec4,e791e000,e791e150) at bus_generic_setup_intr+0x77 bus_setup_intr(e7938600,e78f37c0,4,c027aec4,e791e000) at bus_setup_intr+0x79 ti_attach(e7938600) at ti_attach+0x226 device_probe_and_attach(e7938600) at device_probe_and_attach+0x9c bus_generic_attach(e7938700) at bus_generic_attach+0x14 device_probe_and_attach(e7938700) at device_probe_and_attach+0x9c bus_generic_attach(e5dc4600,e5dc4600,c03398b8,2,e5dc4600) at bus_generic_attach+0x14 nexus_pcib_attach(e5dc4600) at nexus_pcib_attach+0x21 device_probe_and_attach(e5dc4600) at device_probe_and_attach+0x9c bus_generic_attach(e5dc4a00,e5dc4a00,e788f090,e5dc4a00,c04fcd60) at bus_generic_attach+0x14 nexus_attach(e5dc4a00) at nexus_attach+0xf device_probe_and_attach(e5dc4a00) at device_probe_and_attach+0x9c root_bus_configure(e5dc4c80,c0331000,0) at root_bus_configure+0x16 configure(0,4f9c00,4f9000,0,c012a1cc) at configure+0x20 mi_startup() at mi_startup+0x93 begin() at begin+0x43 db> c As I said above, any comments would be welcome! Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message