Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 9 Jun 2002 22:40:37 -0600
From:      "Kenneth D. Merry" <ken@kdm.org>
To:        current@FreeBSD.org
Cc:        net@FreeBSD.org
Subject:   new zero copy sockets snapshot, WITNESS problems
Message-ID:  <20020609224036.A21143@panzer.kdm.org>

next in thread | raw e-mail | index | archive | help

I have released a new zero copy sockets snapshot, the code and a brief
update on what has been fixed is here:

http://people.FreeBSD.org/~ken/zero_copy

In short, I fixed the following things, which were found by Alfred
Perlstein:

 - fix a race in the vm object allocation in jumbo_vm_init()
 - use a sysinit to initialize the jumbo_mutex, since there is really no
   other way to avoid a race between checking the mutex to see if it has
   been initialized and actually initializing it.
 - use SLIST_FIRST instead of directly accessing the first element in
   the inuse list.
 - don't call malloc(9) with M_WAITOK while holding a mutex.

Between the last snapshot and this one, jhb (or someone else, can't
remember who) turned on the WITNESS logging that flags when there is a
potential sleep while a mutex is held.

That has uncovered a whole slew of warnings in the zero copy code.  Some of
the warnings are present in the ti(4) driver without my patches, some of
them are only triggered by the zero copy patches.

Below is an abbreviated list of places where I found problems.

Most of these problems are in areas where I could use some help to figure
out what the best course of action to take is.  So any comments on how to
get these things fixed up (or better yet, code!) would be welcome!

1.  sf_buf_init() calls kmem_alloc_pageable(), which through several calls
    ends up calling vm_map_entry_create(). vm_map_entry_create() calls
    uma_zalloc() with M_WAITOK.

2.  sf_buf_init() calls malloc() *with* M_NOWAIT, but the VM code ends up
    calling vm_map_entry_create(), so you have the same problem as above.

3.  ti_attach() calls bus_alloc_resource(), which through a ton of calls
    ends up calling vm_map_entry_create(), same problem as above.

4.  ti_attach() calls bus_setup_intr(), which through various calls ends up
    calling ithread_create(), which calls malloc() with M_WAITOK.

5.  ti_attach() calls bus_setup_intr(), which through various calls ends up
    calling ithread_create(), which calls kthread_create(), which calls
    fork1(), which calls uma_zalloc() with M_WAITOK.

6.  ti_attach() calls bus_setup_intr(), which through various calls ends up
    calling ithread_create(), which calls kthread_create(), which calls
    fork1(), which calls MALLOC() with M_WAITOK in various places.

7.  see the previous entry, fork1() calls fdcopy(), which calls MALLOC()
    with M_WAITOK.

8.  see entry 6, fork1() calls vm_forkproc(), which calls pmap_new_proc(),
    which calls vm_object_allocate(), which does a uma_zalloc with M_WAITOK.

9.  see above, pmap_new_proc() calls kmem_alloc_nofault(), which calls
    vm_map_find(), which through several calls calls vm_map_entry_create().

10. fork1() calls pmap_new_thread(), which calls vm_object_allocate(),
    which does a uma_zalloc() with M_WAITOK.

11. ti_attach() calls bus_setup_intr(), which ends up calling
    ithread_add_handler() through several layers of indirection. 
    ithread_add_handler() calls malloc with M_WAITOK. 

12. ti_attach() calls contigmalloc() *with* M_NOWAIT, but contigmalloc1()
    calls vm_map_insert(), which calls vm_map_entry_create(), which calls
    uma_zalloc with M_WAITOK.

13. ti_attach() calls jumbo_vm_init() (jumbo buffer initialization
    function), which calls kmem_alloc_pageable().  See number 1 above, same
    problem here with vm_map_entry_create().

14. jumbo_vm_init() calls malloc() *with* M_NOWAIT, but vm_map_insert()
    gets called, which calls vm_map_entry_create(), which calls
    uma_zalloc() with M_WAITOK.

15. several more instances, the same as 14, but vm_map_entry_create() gets
    called through a slightly different path from the same root malloc()
    call in jumbo_vm_init().

16. ti_newbuf_std() calls MCLGET(), *with* M_DONTWAIT set, but m_clget()
    calls mb_alloc(), which calls mb_pop_cont(), which calls kmem_malloc(),
    which calls vm_map_insert(), which calls vm_map_entry_create(), which
    calls uma_zalloc() with M_WAITOK.

I could keep going almost indefinitely, but I'm getting kinda tired of
going through stack traces, and this is enough to talk about for the
moment.

There seem to be two general problems here:

 - the M_WAITOK call to uma_zalloc in vm_map_entry_create() is the cause
   of the problems in entries 1, 2, 3, 12, 13, 14, 15 and 16

 - the bus_setup_intr(), or rather the kthread code in general apparantly
   isn't safe to be called while holding a mutex.  This is the cause of the
   problems in entries 4, 5, 6, 7, 8, 9, 10, and 11.

Several of the interfaces, most notably malloc(), contigmalloc(), and
MCLGET(), offer "don't wait" interfaces, but the functions that they call
don't necessarily respect or know about those flags.

There are a lot more problems I ran into, some similar to the ones above.

This is enough to get started with.  If anyone wants to see the full
console log, it is available at:

http://people.FreeBSD.org/~ken/zero_copy/session.log.20020609

There was one other problem I ran into that wasn't related to sleeping
while holding a mutex:

db> c
lock order reversal
 1st 0xe7920bc0 ti0 (network driver) @ /usr/home/ken/perforce/FreeBSD-zero/src/sys/pci/if_ti.c:2126
 2nd 0xc036c7c0 allproc (allproc) @ /usr/home/ken/perforce/FreeBSD-zero/src/sys/kern/kern_fork.c:309
Debugger("witness_lock")
Stopped at      Debugger+0x46:  xchgl   %ebx,in_Debugger.0
db> trace
Debugger(c0318067) at Debugger+0x46
witness_lock(c036c7c0,8,c0311480,135,c04fcae8) at witness_lock+0x533
_sx_xlock(c036c7c0,c0311480,135,c01dc94b,c03cae10) at _sx_xlock+0x7d
fork1(c0367d00,60034,c04fcabc) at fork1+0x1a0
kthread_create(c01d59c0,d8364400,c04fcae8,60000,c0311841) at kthread_create+0x37
ithread_create(c04fcb1c,10,0,c02e197a,c02e192c,c03381fd,10) at ithread_create+0x96
inthand_add(e7929480,10,c027aec4,e791e000,4) at inthand_add+0x6e
nexus_setup_intr(e5dc4a00,e7938600,e78f37c0,4,c027aec4,e791e000,e791e150) at nexus_setup_intr+0x61
bus_generic_setup_intr(e5dc4600,e7938600,e78f37c0,4,c027aec4,e791e000,e791e150) at bus_generic_setup_intr+0x77
bus_generic_setup_intr(e7938700,e7938600,e78f37c0,4,c027aec4,e791e000,e791e150) at bus_generic_setup_intr+0x77
bus_setup_intr(e7938600,e78f37c0,4,c027aec4,e791e000) at bus_setup_intr+0x79
ti_attach(e7938600) at ti_attach+0x226
device_probe_and_attach(e7938600) at device_probe_and_attach+0x9c
bus_generic_attach(e7938700) at bus_generic_attach+0x14
device_probe_and_attach(e7938700) at device_probe_and_attach+0x9c
bus_generic_attach(e5dc4600,e5dc4600,c03398b8,2,e5dc4600) at bus_generic_attach+0x14
nexus_pcib_attach(e5dc4600) at nexus_pcib_attach+0x21
device_probe_and_attach(e5dc4600) at device_probe_and_attach+0x9c
bus_generic_attach(e5dc4a00,e5dc4a00,e788f090,e5dc4a00,c04fcd60) at bus_generic_attach+0x14
nexus_attach(e5dc4a00) at nexus_attach+0xf
device_probe_and_attach(e5dc4a00) at device_probe_and_attach+0x9c
root_bus_configure(e5dc4c80,c0331000,0) at root_bus_configure+0x16
configure(0,4f9c00,4f9000,0,c012a1cc) at configure+0x20
mi_startup() at mi_startup+0x93
begin() at begin+0x43
db> c


As I said above, any comments would be welcome!

Ken
-- 
Kenneth Merry
ken@kdm.org

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020609224036.A21143>