From owner-svn-src-all@FreeBSD.ORG Fri Jun 11 03:13:21 2010 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B3A43106564A; Fri, 11 Jun 2010 03:13:21 +0000 (UTC) (envelope-from rrs@FreeBSD.org) Received: from svn.freebsd.org (unknown [IPv6:2001:4f8:fff6::2c]) by mx1.freebsd.org (Postfix) with ESMTP id A16B98FC0A; Fri, 11 Jun 2010 03:13:21 +0000 (UTC) Received: from svn.freebsd.org (localhost [127.0.0.1]) by svn.freebsd.org (8.14.3/8.14.3) with ESMTP id o5B3DKrn048456; Fri, 11 Jun 2010 03:13:20 GMT (envelope-from rrs@svn.freebsd.org) Received: (from rrs@localhost) by svn.freebsd.org (8.14.3/8.14.3/Submit) id o5B3DKhQ048446; Fri, 11 Jun 2010 03:13:20 GMT (envelope-from rrs@svn.freebsd.org) Message-Id: <201006110313.o5B3DKhQ048446@svn.freebsd.org> From: Randall Stewart Date: Fri, 11 Jun 2010 03:13:20 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-8@freebsd.org X-SVN-Group: stable-8 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cc: Subject: svn commit: r209028 - stable/8/sys/netinet X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Jun 2010 03:13:21 -0000 Author: rrs Date: Fri Jun 11 03:13:19 2010 New Revision: 209028 URL: http://svn.freebsd.org/changeset/base/209028 Log: MFC: Fix a number of bugs and race conditions. r208160: Bring back of the iterator thread. It now properly handles VNETS having only one thread. The old timer based code was full of LOR's and other issues. r208852: Cleanup bug. Basically when an un-accepted socket was hanging on a closed listener, we would leak the inp never cleaning it up r208853: Enhance the use under invarients of the audit for locks function and fix a bug where a close collision with a cookie being processed would cause a crash. r208854: Use the proper increment macros when working with the sent_queue_retran_cnt r208855: Align comments properly, Fix a bug where we were NOT looking at the resend markings for control chunks and also not decrementing the retran count which caused extra calls to retransmission. Alos add a valid no locks call to the output routine. r208856: Spacing issues in auth/bsd addr. r208857: Get rid of a windows ifdef that somehow leaked in r208863: Missing error leg returns in some failure cases r208864: LOR fix between the iterator and sctp_inpcb_close r208874: Don't call the sctp_inpcb_free from abort an association since you don't know what locks you hold and a timer will take care of the situation when the gone flag is set r208875: sctp_inpcb_free bug - a socket under the right situation could get stuck (from the accept queue) and never start the proper cleanup timer) r208876: Further enhance invariant lock validation, Fix a bug where a closed socket and a INIT-ACK could collide and cause a crash r208878: Clear up another bug in sctp_inpcb_free where we would end up due to a race in freeing hit a destroy of a contended lock. r208879: Optimize the cleanup and make some additional fixes in the sysctl code so that it won't reference a GONE INP and crash us r208883 & r208891: Fix so we don't open a hole between a sock lock and a call to socantrcvmore.. we could before hit a race that would kill the socket underneath us leading to a crash r208897: CUM-ACK calculation was messed up. So basically large message got broken from the original NR_sack integration. r208902: Make sure that we don't move a bit to the NR array that is behind the cum-ack r208952: Use both bit maps to calculte the cum-ack. r208953: Fix bug having to do with freeing an sctp_inpcb_free(). 1) make sure not to remove the flag until you get the lock again. 2) make sure all log_closing calls hold the lock. 3) Release all the locks when everthing is done and call callout_drain not callout_stop.. r208970: Fix some places on user allocation of a new sctp_inpcb where we run out of resource that we make sure to NULL the so_pcb pointer. Approved by: re - (bz@freebsd.org) Modified: stable/8/sys/netinet/sctp_auth.c stable/8/sys/netinet/sctp_bsd_addr.c stable/8/sys/netinet/sctp_bsd_addr.h stable/8/sys/netinet/sctp_constants.h stable/8/sys/netinet/sctp_indata.c stable/8/sys/netinet/sctp_input.c stable/8/sys/netinet/sctp_lock_bsd.h stable/8/sys/netinet/sctp_output.c stable/8/sys/netinet/sctp_pcb.c stable/8/sys/netinet/sctp_pcb.h stable/8/sys/netinet/sctp_structs.h stable/8/sys/netinet/sctp_sysctl.c stable/8/sys/netinet/sctp_timer.c stable/8/sys/netinet/sctp_usrreq.c stable/8/sys/netinet/sctputil.c Directory Properties: stable/8/sys/ (props changed) stable/8/sys/amd64/include/xen/ (props changed) stable/8/sys/cddl/contrib/opensolaris/ (props changed) stable/8/sys/contrib/dev/acpica/ (props changed) stable/8/sys/contrib/pf/ (props changed) stable/8/sys/dev/xen/xenpci/ (props changed) stable/8/sys/geom/sched/ (props changed) Modified: stable/8/sys/netinet/sctp_auth.c ============================================================================== --- stable/8/sys/netinet/sctp_auth.c Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_auth.c Fri Jun 11 03:13:19 2010 (r209028) @@ -895,9 +895,9 @@ static inline int sctp_get_hmac_block_len(uint16_t hmac_algo) { switch (hmac_algo) { - case SCTP_AUTH_HMAC_ID_SHA1: + case SCTP_AUTH_HMAC_ID_SHA1: #ifdef HAVE_SHA224 - case SCTP_AUTH_HMAC_ID_SHA224: + case SCTP_AUTH_HMAC_ID_SHA224: #endif return (64); #ifdef HAVE_SHA2 @@ -918,7 +918,7 @@ static void sctp_hmac_init(uint16_t hmac_algo, sctp_hash_context_t * ctx) { switch (hmac_algo) { - case SCTP_AUTH_HMAC_ID_SHA1: + case SCTP_AUTH_HMAC_ID_SHA1: SHA1_Init(&ctx->sha1); break; #ifdef HAVE_SHA224 @@ -948,7 +948,7 @@ sctp_hmac_update(uint16_t hmac_algo, sct uint8_t * text, uint32_t textlen) { switch (hmac_algo) { - case SCTP_AUTH_HMAC_ID_SHA1: + case SCTP_AUTH_HMAC_ID_SHA1: SHA1_Update(&ctx->sha1, text, textlen); break; #ifdef HAVE_SHA224 @@ -978,7 +978,7 @@ sctp_hmac_final(uint16_t hmac_algo, sctp uint8_t * digest) { switch (hmac_algo) { - case SCTP_AUTH_HMAC_ID_SHA1: + case SCTP_AUTH_HMAC_ID_SHA1: SHA1_Final(digest, &ctx->sha1); break; #ifdef HAVE_SHA224 Modified: stable/8/sys/netinet/sctp_bsd_addr.c ============================================================================== --- stable/8/sys/netinet/sctp_bsd_addr.c Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_bsd_addr.c Fri Jun 11 03:13:19 2010 (r209028) @@ -49,16 +49,6 @@ __FBSDID("$FreeBSD$"); #include /* Declare all of our malloc named types */ - -/* Note to Michael/Peter for mac-os, - * I think mac has this too since I - * do see the M_PCB type, so I - * will also put in the mac file the - * MALLOC_DECLARE. If this does not - * work for mac uncomment the defines for - * the strings that we use in Panda, I put - * them in comments in the mac-os file. - */ MALLOC_DEFINE(SCTP_M_MAP, "sctp_map", "sctp asoc map descriptor"); MALLOC_DEFINE(SCTP_M_STRMI, "sctp_stri", "sctp stream in array"); MALLOC_DEFINE(SCTP_M_STRMO, "sctp_stro", "sctp stream out array"); @@ -79,47 +69,77 @@ MALLOC_DEFINE(SCTP_M_MVRF, "sctp_mvrf", MALLOC_DEFINE(SCTP_M_ITER, "sctp_iter", "sctp iterator control"); MALLOC_DEFINE(SCTP_M_SOCKOPT, "sctp_socko", "sctp socket option"); -#if defined(SCTP_USE_THREAD_BASED_ITERATOR) +/* Global NON-VNET structure that controls the iterator */ +struct iterator_control sctp_it_ctl; +static int __sctp_thread_based_iterator_started = 0; + + +static void +sctp_cleanup_itqueue(void) +{ + struct sctp_iterator *it; + + while ((it = TAILQ_FIRST(&sctp_it_ctl.iteratorhead)) != NULL) { + if (it->function_atend != NULL) { + (*it->function_atend) (it->pointer, it->val); + } + TAILQ_REMOVE(&sctp_it_ctl.iteratorhead, it, sctp_nxt_itr); + SCTP_FREE(it, SCTP_M_ITER); + } +} + + void sctp_wakeup_iterator(void) { - wakeup(&SCTP_BASE_INFO(iterator_running)); + wakeup(&sctp_it_ctl.iterator_running); } static void sctp_iterator_thread(void *v) { - CURVNET_SET((struct vnet *)v); SCTP_IPI_ITERATOR_WQ_LOCK(); - SCTP_BASE_INFO(iterator_running) = 0; while (1) { - msleep(&SCTP_BASE_INFO(iterator_running), - &SCTP_BASE_INFO(ipi_iterator_wq_mtx), + msleep(&sctp_it_ctl.iterator_running, + &sctp_it_ctl.ipi_iterator_wq_mtx, 0, "waiting_for_work", 0); - if (SCTP_BASE_INFO(threads_must_exit)) { + if (sctp_it_ctl.iterator_flags & SCTP_ITERATOR_MUST_EXIT) { SCTP_IPI_ITERATOR_WQ_DESTROY(); + SCTP_ITERATOR_LOCK_DESTROY(); + sctp_cleanup_itqueue(); + __sctp_thread_based_iterator_started = 0; kthread_exit(); } sctp_iterator_worker(); } - CURVNET_RESTORE(); } void sctp_startup_iterator(void) { + if (__sctp_thread_based_iterator_started) { + /* You only get one */ + return; + } + /* init the iterator head */ + __sctp_thread_based_iterator_started = 1; + sctp_it_ctl.iterator_running = 0; + sctp_it_ctl.iterator_flags = 0; + sctp_it_ctl.cur_it = NULL; + SCTP_ITERATOR_LOCK_INIT(); + SCTP_IPI_ITERATOR_WQ_INIT(); + TAILQ_INIT(&sctp_it_ctl.iteratorhead); + int ret; ret = kproc_create(sctp_iterator_thread, - (void *)curvnet, - &SCTP_BASE_INFO(thread_proc), + (void *)NULL, + &sctp_it_ctl.thread_proc, RFPROC, SCTP_KTHREAD_PAGES, SCTP_KTRHEAD_NAME); } -#endif - #ifdef INET6 void Modified: stable/8/sys/netinet/sctp_bsd_addr.h ============================================================================== --- stable/8/sys/netinet/sctp_bsd_addr.h Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_bsd_addr.h Fri Jun 11 03:13:19 2010 (r209028) @@ -37,12 +37,11 @@ __FBSDID("$FreeBSD$"); #if defined(_KERNEL) || defined(__Userspace__) -#if defined(SCTP_USE_THREAD_BASED_ITERATOR) +extern struct iterator_control sctp_it_ctl; void sctp_wakeup_iterator(void); void sctp_startup_iterator(void); -#endif #ifdef INET6 void sctp_gather_internal_ifa_flags(struct sctp_ifa *ifa); Modified: stable/8/sys/netinet/sctp_constants.h ============================================================================== --- stable/8/sys/netinet/sctp_constants.h Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_constants.h Fri Jun 11 03:13:19 2010 (r209028) @@ -87,10 +87,6 @@ __FBSDID("$FreeBSD$"); /* #define SCTP_AUDITING_ENABLED 1 used for debug/auditing */ #define SCTP_AUDIT_SIZE 256 -/* temporary disabled since it does not work with VNET. */ -#if 0 -#define SCTP_USE_THREAD_BASED_ITERATOR 1 -#endif #define SCTP_KTRHEAD_NAME "sctp_iterator" #define SCTP_KTHREAD_PAGES 0 @@ -572,7 +568,6 @@ __FBSDID("$FreeBSD$"); #define SCTP_TIMER_TYPE_EVENTWAKE 13 #define SCTP_TIMER_TYPE_STRRESET 14 #define SCTP_TIMER_TYPE_INPKILL 15 -#define SCTP_TIMER_TYPE_ITERATOR 16 #define SCTP_TIMER_TYPE_EARLYFR 17 #define SCTP_TIMER_TYPE_ASOCKILL 18 #define SCTP_TIMER_TYPE_ADDR_WQ 19 @@ -899,7 +894,7 @@ __FBSDID("$FreeBSD$"); /* third argument */ #define SCTP_CALLED_DIRECTLY_NOCMPSET 0 #define SCTP_CALLED_AFTER_CMPSET_OFCLOSE 1 - +#define SCTP_CALLED_FROM_INPKILL_TIMER 2 /* second argument */ #define SCTP_FREE_SHOULD_USE_ABORT 1 #define SCTP_FREE_SHOULD_USE_GRACEFUL_CLOSE 0 Modified: stable/8/sys/netinet/sctp_indata.c ============================================================================== --- stable/8/sys/netinet/sctp_indata.c Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_indata.c Fri Jun 11 03:13:19 2010 (r209028) @@ -289,12 +289,20 @@ sctp_build_ctl_cchunk(struct sctp_inpcb static void sctp_mark_non_revokable(struct sctp_association *asoc, uint32_t tsn) { - uint32_t gap, i; + uint32_t gap, i, cumackp1; int fnd = 0; if (SCTP_BASE_SYSCTL(sctp_do_drain) == 0) { return; } + cumackp1 = asoc->cumulative_tsn + 1; + if (compare_with_wrap(cumackp1, tsn, MAX_TSN)) { + /* + * this tsn is behind the cum ack and thus we don't need to + * worry about it being moved from one to the other. + */ + return; + } SCTP_CALC_TSN_TO_GAP(gap, tsn, asoc->mapping_array_base_tsn); if (!SCTP_IS_TSN_PRESENT(asoc->mapping_array, gap)) { printf("gap:%x tsn:%x\n", gap, tsn); @@ -2245,15 +2253,19 @@ sctp_slide_mapping_arrays(struct sctp_tc /* * Now we also need to check the mapping array in a couple of ways. * 1) Did we move the cum-ack point? + * + * When you first glance at this you might think that all entries that + * make up the postion of the cum-ack would be in the nr-mapping + * array only.. i.e. things up to the cum-ack are always + * deliverable. Thats true with one exception, when its a fragmented + * message we may not deliver the data until some threshold (or all + * of it) is in place. So we must OR the nr_mapping_array and + * mapping_array to get a true picture of the cum-ack. */ struct sctp_association *asoc; int at; + uint8_t val; int slide_from, slide_end, lgap, distance; - - /* EY nr_mapping array variables */ - /* int nr_at; */ - /* int nr_last_all_ones = 0; */ - /* int nr_slide_from, nr_slide_end, nr_lgap, nr_distance; */ uint32_t old_cumack, old_base, old_highest, highest_tsn; asoc = &stcb->asoc; @@ -2268,11 +2280,12 @@ sctp_slide_mapping_arrays(struct sctp_tc */ at = 0; for (slide_from = 0; slide_from < stcb->asoc.mapping_array_size; slide_from++) { - if (asoc->nr_mapping_array[slide_from] == 0xff) { + val = asoc->nr_mapping_array[slide_from] | asoc->mapping_array[slide_from]; + if (val == 0xff) { at += 8; } else { /* there is a 0 bit */ - at += sctp_map_lookup_tab[asoc->nr_mapping_array[slide_from]]; + at += sctp_map_lookup_tab[val]; break; } } @@ -3849,7 +3862,8 @@ sctp_window_probe_recovery(struct sctp_t sctp_total_flight_decrease(stcb, tp1); /* Now mark for resend */ tp1->sent = SCTP_DATAGRAM_RESEND; - asoc->sent_queue_retran_cnt++; + sctp_ucount_incr(asoc->sent_queue_retran_cnt); + if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_FLIGHT_LOGGING_ENABLE) { sctp_misc_ints(SCTP_FLIGHT_LOG_DOWN_WP, tp1->whoTo->flight_size, @@ -4262,7 +4276,7 @@ again: sctp_flight_size_increase(tp1); sctp_total_flight_increase(stcb, tp1); } else if (tp1->sent == SCTP_DATAGRAM_RESEND) { - asoc->sent_queue_retran_cnt++; + sctp_ucount_incr(asoc->sent_queue_retran_cnt); } } } @@ -5263,7 +5277,7 @@ again: sctp_flight_size_increase(tp1); sctp_total_flight_increase(stcb, tp1); } else if (tp1->sent == SCTP_DATAGRAM_RESEND) { - asoc->sent_queue_retran_cnt++; + sctp_ucount_incr(asoc->sent_queue_retran_cnt); } } } Modified: stable/8/sys/netinet/sctp_input.c ============================================================================== --- stable/8/sys/netinet/sctp_input.c Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_input.c Fri Jun 11 03:13:19 2010 (r209028) @@ -3067,7 +3067,7 @@ process_chunk_drop(struct sctp_tcb *stcb struct sctp_nets *net, uint8_t flg) { switch (desc->chunk_type) { - case SCTP_DATA: + case SCTP_DATA: /* find the tsn to resend (possibly */ { uint32_t tsn; @@ -4534,7 +4534,8 @@ process_control_chunks: if ((stcb) && (stcb->asoc.total_output_queue_size)) { ; } else { - if (locked_tcb) { + if (locked_tcb != stcb) { + /* Very unlikely */ SCTP_TCB_UNLOCK(locked_tcb); } *offset = length; @@ -4861,6 +4862,10 @@ process_control_chunks: } else { if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) { /* We are not interested anymore */ + abend: + if (stcb) { + SCTP_TCB_UNLOCK(stcb); + } *offset = length; return (NULL); } @@ -4908,6 +4913,11 @@ process_control_chunks: if (linp) { SCTP_ASOC_CREATE_LOCK(linp); + if ((inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE) || + (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) { + SCTP_ASOC_CREATE_UNLOCK(linp); + goto abend; + } } if (netp) { ret_buf = @@ -5408,16 +5418,25 @@ sctp_process_ecn_marked_b(struct sctp_tc } #ifdef INVARIANTS -static void -sctp_validate_no_locks(struct sctp_inpcb *inp) +#ifdef __GNUC__ +__attribute__((noinline)) +#endif + void + sctp_validate_no_locks(struct sctp_inpcb *inp) { - struct sctp_tcb *stcb; + struct sctp_tcb *lstcb; - LIST_FOREACH(stcb, &inp->sctp_asoc_list, sctp_tcblist) { - if (mtx_owned(&stcb->tcb_mtx)) { + LIST_FOREACH(lstcb, &inp->sctp_asoc_list, sctp_tcblist) { + if (mtx_owned(&lstcb->tcb_mtx)) { panic("Own lock on stcb at return from input"); } } + if (mtx_owned(&inp->inp_create_mtx)) { + panic("Own create lock on inp"); + } + if (mtx_owned(&inp->inp_mtx)) { + panic("Own inp lock on inp"); + } } #endif Modified: stable/8/sys/netinet/sctp_lock_bsd.h ============================================================================== --- stable/8/sys/netinet/sctp_lock_bsd.h Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_lock_bsd.h Fri Jun 11 03:13:19 2010 (r209028) @@ -107,42 +107,36 @@ extern int sctp_logoff_stuff; #define SCTP_INP_INFO_WUNLOCK() rw_wunlock(&SCTP_BASE_INFO(ipi_ep_mtx)) -#define SCTP_IPI_ADDR_INIT() \ +#define SCTP_IPI_ADDR_INIT() \ rw_init(&SCTP_BASE_INFO(ipi_addr_mtx), "sctp-addr") - #define SCTP_IPI_ADDR_DESTROY() do { \ if(rw_wowned(&SCTP_BASE_INFO(ipi_addr_mtx))) { \ rw_wunlock(&SCTP_BASE_INFO(ipi_addr_mtx)); \ } \ rw_destroy(&SCTP_BASE_INFO(ipi_addr_mtx)); \ } while (0) - - - #define SCTP_IPI_ADDR_RLOCK() do { \ rw_rlock(&SCTP_BASE_INFO(ipi_addr_mtx)); \ } while (0) - #define SCTP_IPI_ADDR_WLOCK() do { \ rw_wlock(&SCTP_BASE_INFO(ipi_addr_mtx)); \ } while (0) - #define SCTP_IPI_ADDR_RUNLOCK() rw_runlock(&SCTP_BASE_INFO(ipi_addr_mtx)) #define SCTP_IPI_ADDR_WUNLOCK() rw_wunlock(&SCTP_BASE_INFO(ipi_addr_mtx)) #define SCTP_IPI_ITERATOR_WQ_INIT() \ - mtx_init(&SCTP_BASE_INFO(ipi_iterator_wq_mtx), "sctp-it-wq", "sctp_it_wq", MTX_DEF) + mtx_init(&sctp_it_ctl.ipi_iterator_wq_mtx, "sctp-it-wq", "sctp_it_wq", MTX_DEF) #define SCTP_IPI_ITERATOR_WQ_DESTROY() \ - mtx_destroy(&SCTP_BASE_INFO(ipi_iterator_wq_mtx)) + mtx_destroy(&sctp_it_ctl.ipi_iterator_wq_mtx) #define SCTP_IPI_ITERATOR_WQ_LOCK() do { \ - mtx_lock(&SCTP_BASE_INFO(ipi_iterator_wq_mtx)); \ + mtx_lock(&sctp_it_ctl.ipi_iterator_wq_mtx); \ } while (0) -#define SCTP_IPI_ITERATOR_WQ_UNLOCK() mtx_unlock(&SCTP_BASE_INFO(ipi_iterator_wq_mtx)) +#define SCTP_IPI_ITERATOR_WQ_UNLOCK() mtx_unlock(&sctp_it_ctl.ipi_iterator_wq_mtx) #define SCTP_IP_PKTLOG_INIT() \ @@ -191,6 +185,13 @@ extern int sctp_logoff_stuff; #define SCTP_INP_LOCK_DESTROY(_inp) \ mtx_destroy(&(_inp)->inp_mtx) +#define SCTP_INP_LOCK_CONTENDED(_inp) ((_inp)->inp_mtx.mtx_lock & MTX_CONTESTED) + +#define SCTP_INP_READ_CONTENDED(_inp) ((_inp)->inp_rdata_mtx.mtx_lock & MTX_CONTESTED) + +#define SCTP_ASOC_CREATE_LOCK_CONTENDED(_inp) ((_inp)->inp_create_mtx.mtx_lock & MTX_CONTESTED) + + #define SCTP_ASOC_CREATE_LOCK_DESTROY(_inp) \ mtx_destroy(&(_inp)->inp_create_mtx) @@ -300,25 +301,45 @@ extern int sctp_logoff_stuff; #endif #define SCTP_ITERATOR_LOCK_INIT() \ - mtx_init(&SCTP_BASE_INFO(it_mtx), "sctp-it", "iterator", MTX_DEF) + mtx_init(&sctp_it_ctl.it_mtx, "sctp-it", "iterator", MTX_DEF) #ifdef INVARIANTS #define SCTP_ITERATOR_LOCK() \ do { \ - if (mtx_owned(&SCTP_BASE_INFO(it_mtx))) \ + if (mtx_owned(&sctp_it_ctl.it_mtx)) \ panic("Iterator Lock"); \ - mtx_lock(&SCTP_BASE_INFO(it_mtx)); \ + mtx_lock(&sctp_it_ctl.it_mtx); \ } while (0) #else #define SCTP_ITERATOR_LOCK() \ do { \ - mtx_lock(&SCTP_BASE_INFO(it_mtx)); \ + mtx_lock(&sctp_it_ctl.it_mtx); \ } while (0) #endif -#define SCTP_ITERATOR_UNLOCK() mtx_unlock(&SCTP_BASE_INFO(it_mtx)) -#define SCTP_ITERATOR_LOCK_DESTROY() mtx_destroy(&SCTP_BASE_INFO(it_mtx)) +#define SCTP_ITERATOR_UNLOCK() mtx_unlock(&sctp_it_ctl.it_mtx) +#define SCTP_ITERATOR_LOCK_DESTROY() mtx_destroy(&sctp_it_ctl.it_mtx) + + +#define SCTP_WQ_ADDR_INIT() do { \ + mtx_init(&SCTP_BASE_INFO(wq_addr_mtx), "sctp-addr-wq","sctp_addr_wq",MTX_DEF); \ + } while (0) + +#define SCTP_WQ_ADDR_DESTROY() do { \ + if(mtx_owned(&SCTP_BASE_INFO(wq_addr_mtx))) { \ + mtx_unlock(&SCTP_BASE_INFO(wq_addr_mtx)); \ + } \ + mtx_destroy(&SCTP_BASE_INFO(wq_addr_mtx)); \ + } while (0) + +#define SCTP_WQ_ADDR_LOCK() do { \ + mtx_lock(&SCTP_BASE_INFO(wq_addr_mtx)); \ +} while (0) +#define SCTP_WQ_ADDR_UNLOCK() do { \ + mtx_unlock(&SCTP_BASE_INFO(wq_addr_mtx)); \ +} while (0) + #define SCTP_INCR_EP_COUNT() \ Modified: stable/8/sys/netinet/sctp_output.c ============================================================================== --- stable/8/sys/netinet/sctp_output.c Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_output.c Fri Jun 11 03:13:19 2010 (r209028) @@ -3053,32 +3053,32 @@ sctp_source_address_selection(struct sct * it out * zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz * For V4 - *------------------------------------------ + * ------------------------------------------ * source * dest * result * ----------------------------------------- * Private * Global * NAT * ----------------------------------------- * Private * Private * No problem * ----------------------------------------- - * Global * Private * Huh, How will this work? + * Global * Private * Huh, How will this work? * ----------------------------------------- - * Global * Global * No Problem - *------------------------------------------ + * Global * Global * No Problem + *------------------------------------------ * zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz * For V6 - *------------------------------------------ + *------------------------------------------ * source * dest * result * ----------------------------------------- * Linklocal * Global * * ----------------------------------------- * Linklocal * Linklocal * No problem * ----------------------------------------- - * Global * Linklocal * Huh, How will this work? + * Global * Linklocal * Huh, How will this work? * ----------------------------------------- - * Global * Global * No Problem - *------------------------------------------ + * Global * Global * No Problem + *------------------------------------------ * zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz - * + * * And then we add to that what happens if there are multiple addresses * assigned to an interface. Remember the ifa on a ifn is a linked * list of addresses. So one interface can have more than one IP @@ -3091,13 +3091,13 @@ sctp_source_address_selection(struct sct * Decisions: * * - count the number of addresses on the interface. - * - if it is one, no problem except case . - * For we will assume a NAT out there. + * - if it is one, no problem except case . + * For we will assume a NAT out there. * - if there are more than one, then we need to worry about scope P * or G. We should prefer G -> G and P -> P if possible. * Then as a secondary fall back to mixed types G->P being a last * ditch one. - * - The above all works for bound all, but bound specific we need to + * - The above all works for bound all, but bound specific we need to * use the same concept but instead only consider the bound * addresses. If the bound set is NOT assigned to the interface then * we must use rotation amongst the bound addresses.. @@ -8913,6 +8913,9 @@ sctp_chunk_retransmission(struct sctp_in if ((chk->rec.chunk_id.id == SCTP_COOKIE_ECHO) || (chk->rec.chunk_id.id == SCTP_STREAM_RESET) || (chk->rec.chunk_id.id == SCTP_FORWARD_CUM_TSN)) { + if (chk->sent != SCTP_DATAGRAM_RESEND) { + continue; + } if (chk->rec.chunk_id.id == SCTP_STREAM_RESET) { if (chk != asoc->str_reset) { /* @@ -8973,7 +8976,7 @@ sctp_chunk_retransmission(struct sctp_in /* (void)SCTP_GETTIME_TIMEVAL(&chk->whoTo->last_sent_time); */ *cnt_out += 1; chk->sent = SCTP_DATAGRAM_SENT; - /* sctp_ucount_decr(asoc->sent_queue_retran_cnt); */ + sctp_ucount_decr(stcb->asoc.sent_queue_retran_cnt); if (fwd_tsn == 0) { return (0); } else { @@ -13427,6 +13430,13 @@ out_unlocked: } } #endif +#ifdef INVARIANTS + if (inp) { + sctp_validate_no_locks(inp); + } else { + printf("Warning - inp is NULL so cant validate locks\n"); + } +#endif if (top) { sctp_m_freem(top); } Modified: stable/8/sys/netinet/sctp_pcb.c ============================================================================== --- stable/8/sys/netinet/sctp_pcb.c Fri Jun 11 03:00:48 2010 (r209027) +++ stable/8/sys/netinet/sctp_pcb.c Fri Jun 11 03:13:19 2010 (r209028) @@ -692,13 +692,11 @@ sctp_add_addr_to_vrf(uint32_t vrf_id, vo (void)SCTP_GETTIME_TIMEVAL(&wi->start_time); wi->ifa = sctp_ifap; wi->action = SCTP_ADD_IP_ADDRESS; - SCTP_IPI_ITERATOR_WQ_LOCK(); - /* - * Should this really be a tailq? As it is we will process - * the newest first :-0 - */ + + SCTP_WQ_ADDR_LOCK(); LIST_INSERT_HEAD(&SCTP_BASE_INFO(addr_wq), wi, sctp_nxt_addr); - SCTP_IPI_ITERATOR_WQ_UNLOCK(); + SCTP_WQ_ADDR_UNLOCK(); + sctp_timer_start(SCTP_TIMER_TYPE_ADDR_WQ, (struct sctp_inpcb *)NULL, (struct sctp_tcb *)NULL, @@ -806,13 +804,13 @@ out_now: (void)SCTP_GETTIME_TIMEVAL(&wi->start_time); wi->ifa = sctp_ifap; wi->action = SCTP_DEL_IP_ADDRESS; - SCTP_IPI_ITERATOR_WQ_LOCK(); + SCTP_WQ_ADDR_LOCK(); /* * Should this really be a tailq? As it is we will process * the newest first :-0 */ LIST_INSERT_HEAD(&SCTP_BASE_INFO(addr_wq), wi, sctp_nxt_addr); - SCTP_IPI_ITERATOR_WQ_UNLOCK(); + SCTP_WQ_ADDR_UNLOCK(); sctp_timer_start(SCTP_TIMER_TYPE_ADDR_WQ, (struct sctp_inpcb *)NULL, @@ -2298,7 +2296,7 @@ sctp_inpcb_alloc(struct socket *so, uint if (inp->sctp_asocidhash == NULL) { SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp); SCTP_INP_INFO_WUNLOCK(); - return error; + return (ENOBUFS); } #ifdef IPSEC { @@ -2340,6 +2338,7 @@ sctp_inpcb_alloc(struct socket *so, uint * in protosw */ SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, EOPNOTSUPP); + so->so_pcb = NULL; SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp); return (EOPNOTSUPP); } @@ -2358,6 +2357,7 @@ sctp_inpcb_alloc(struct socket *so, uint if (inp->sctp_tcbhash == NULL) { SCTP_PRINTF("Out of SCTP-INPCB->hashinit - no resources\n"); SCTP_LTRACE_ERR_RET(inp, NULL, NULL, SCTP_FROM_SCTP_PCB, ENOBUFS); + so->so_pcb = NULL; SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp); return (ENOBUFS); } @@ -3017,57 +3017,68 @@ continue_anyway: static void -sctp_iterator_inp_being_freed(struct sctp_inpcb *inp, struct sctp_inpcb *inp_next) +sctp_iterator_inp_being_freed(struct sctp_inpcb *inp) { - struct sctp_iterator *it; + struct sctp_iterator *it, *nit; /* * We enter with the only the ITERATOR_LOCK in place and a write * lock on the inp_info stuff. */ - + it = sctp_it_ctl.cur_it; + if (it && (it->vn != curvnet)) { + /* Its not looking at our VNET */ + return; + } + if (it && (it->inp == inp)) { + /* + * This is tricky and we hold the iterator lock, but when it + * returns and gets the lock (when we release it) the + * iterator will try to operate on inp. We need to stop that + * from happening. But of course the iterator has a + * reference on the stcb and inp. We can mark it and it will + * stop. + * + * If its a single iterator situation, we set the end iterator + * flag. Otherwise we set the iterator to go to the next + * inp. + * + */ + if (it->iterator_flags & SCTP_ITERATOR_DO_SINGLE_INP) { + sctp_it_ctl.iterator_flags |= SCTP_ITERATOR_STOP_CUR_IT; + } else { + sctp_it_ctl.iterator_flags |= SCTP_ITERATOR_STOP_CUR_INP; + } + } /* - * Go through all iterators, we must do this since it is possible - * that some iterator does NOT have the lock, but is waiting for it. - * And the one that had the lock has either moved in the last - * iteration or we just cleared it above. We need to find all of - * those guys. The list of iterators should never be very big - * though. - */ - TAILQ_FOREACH(it, &SCTP_BASE_INFO(iteratorhead), sctp_nxt_itr) { - if (it == inp->inp_starting_point_for_iterator) - /* skip this guy, he's special */ + * Now go through and remove any single reference to our inp that + * may be still pending on the list + */ + SCTP_IPI_ITERATOR_WQ_LOCK(); + it = TAILQ_FIRST(&sctp_it_ctl.iteratorhead); + while (it) { + nit = TAILQ_NEXT(it, sctp_nxt_itr); + if (it->vn != curvnet) { + it = nit; continue; + } if (it->inp == inp) { - /* - * This is tricky and we DON'T lock the iterator. - * Reason is he's running but waiting for me since - * inp->inp_starting_point_for_iterator has the lock - * on me (the guy above we skipped). This tells us - * its is not running but waiting for - * inp->inp_starting_point_for_iterator to be - * released by the guy that does have our INP in a - * lock. - */ + /* This one points to me is it inp specific? */ if (it->iterator_flags & SCTP_ITERATOR_DO_SINGLE_INP) { - it->inp = NULL; - it->stcb = NULL; + /* Remove and free this one */ + TAILQ_REMOVE(&sctp_it_ctl.iteratorhead, + it, sctp_nxt_itr); + if (it->function_atend != NULL) { + (*it->function_atend) (it->pointer, it->val); + } + SCTP_FREE(it, SCTP_M_ITER); } else { - /* set him up to do the next guy not me */ - it->inp = inp_next; - it->stcb = NULL; + it->inp = LIST_NEXT(it->inp, sctp_list); } } + it = nit; } - it = inp->inp_starting_point_for_iterator; - if (it) { - if (it->iterator_flags & SCTP_ITERATOR_DO_SINGLE_INP) { - it->inp = NULL; - } else { - it->inp = inp_next; - } - it->stcb = NULL; - } + SCTP_IPI_ITERATOR_WQ_UNLOCK(); } /* release sctp_inpcb unbind the port */ @@ -3083,12 +3094,11 @@ sctp_inpcb_free(struct sctp_inpcb *inp, * all associations. d) finally the ep itself. */ struct sctp_pcb *m; - struct sctp_inpcb *inp_save; struct sctp_tcb *asoc, *nasoc; struct sctp_laddr *laddr, *nladdr; struct inpcb *ip_pcb; struct socket *so; - + int being_refed = 0; struct sctp_queued_to_read *sq; @@ -3099,12 +3109,21 @@ sctp_inpcb_free(struct sctp_inpcb *inp, #ifdef SCTP_LOG_CLOSING sctp_log_closing(inp, NULL, 0); #endif - SCTP_ITERATOR_LOCK(); + if (from == SCTP_CALLED_AFTER_CMPSET_OFCLOSE) { + /* + * Once we are in we can remove the flag from = 1 is only + * passed from the actual closing routines that are called + * via the sockets layer. + */ + SCTP_ITERATOR_LOCK(); + /* mark any iterators on the list or being processed */ + sctp_iterator_inp_being_freed(inp); + SCTP_ITERATOR_UNLOCK(); + } so = inp->sctp_socket; if (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) { /* been here before.. eeks.. get out of here */ SCTP_PRINTF("This conflict in free SHOULD not be happening! from %d, imm %d\n", from, immediate); - SCTP_ITERATOR_UNLOCK(); #ifdef SCTP_LOG_CLOSING sctp_log_closing(inp, NULL, 1); #endif @@ -3114,19 +3133,15 @@ sctp_inpcb_free(struct sctp_inpcb *inp, SCTP_INP_INFO_WLOCK(); SCTP_INP_WLOCK(inp); - /* First time through we have the socket lock, after that no more. */ if (from == SCTP_CALLED_AFTER_CMPSET_OFCLOSE) { - /* - * Once we are in we can remove the flag from = 1 is only - * passed from the actual closing routines that are called - * via the sockets layer. - */ inp->sctp_flags &= ~SCTP_PCB_FLAGS_CLOSE_IP; /* socket is gone, so no more wakeups allowed */ inp->sctp_flags |= SCTP_PCB_FLAGS_DONT_WAKE; inp->sctp_flags &= ~SCTP_PCB_FLAGS_WAKEINPUT; inp->sctp_flags &= ~SCTP_PCB_FLAGS_WAKEOUTPUT; + } + /* First time through we have the socket lock, after that no more. */ sctp_timer_stop(SCTP_TIMER_TYPE_NEWCOOKIE, inp, NULL, NULL, SCTP_FROM_SCTP_PCB + SCTP_LOC_1); @@ -3152,8 +3167,17 @@ sctp_inpcb_free(struct sctp_inpcb *inp, nasoc = LIST_NEXT(asoc, sctp_tcblist); if (asoc->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) { /* Skip guys being freed */ - /* asoc->sctp_socket = NULL; FIXME MT */ cnt_in_sd++; + if (asoc->asoc.state & SCTP_STATE_IN_ACCEPT_QUEUE) { + /* + * Special case - we did not start a + * kill timer on the asoc due to it + * was not closed. So go ahead and + * start it now. + */ + asoc->asoc.state &= ~SCTP_STATE_IN_ACCEPT_QUEUE; + sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, asoc, NULL); + } SCTP_TCB_UNLOCK(asoc); continue; } @@ -3314,14 +3338,13 @@ sctp_inpcb_free(struct sctp_inpcb *inp, } /* now is there some left in our SHUTDOWN state? */ if (cnt_in_sd) { - SCTP_INP_WUNLOCK(inp); - SCTP_ASOC_CREATE_UNLOCK(inp); - SCTP_INP_INFO_WUNLOCK(); - SCTP_ITERATOR_UNLOCK(); #ifdef SCTP_LOG_CLOSING sctp_log_closing(inp, NULL, 2); #endif inp->sctp_socket = NULL; + SCTP_INP_WUNLOCK(inp); + SCTP_ASOC_CREATE_UNLOCK(inp); + SCTP_INP_INFO_WUNLOCK(); return; } } @@ -3344,6 +3367,10 @@ sctp_inpcb_free(struct sctp_inpcb *inp, asoc = nasoc) { nasoc = LIST_NEXT(asoc, sctp_tcblist); if (asoc->asoc.state & SCTP_STATE_ABOUT_TO_BE_FREED) { + if (asoc->asoc.state & SCTP_STATE_IN_ACCEPT_QUEUE) { + asoc->asoc.state &= ~SCTP_STATE_IN_ACCEPT_QUEUE; + sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, asoc, NULL); + } cnt++; continue; } @@ -3392,38 +3419,62 @@ sctp_inpcb_free(struct sctp_inpcb *inp, if (cnt) { /* Ok we have someone out there that will kill us */ (void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer); - SCTP_INP_WUNLOCK(inp); - SCTP_ASOC_CREATE_UNLOCK(inp); - SCTP_INP_INFO_WUNLOCK(); - SCTP_ITERATOR_UNLOCK(); #ifdef SCTP_LOG_CLOSING sctp_log_closing(inp, NULL, 3); #endif + SCTP_INP_WUNLOCK(inp); + SCTP_ASOC_CREATE_UNLOCK(inp); + SCTP_INP_INFO_WUNLOCK(); return; } - if ((inp->refcount) || (inp->sctp_flags & SCTP_PCB_FLAGS_CLOSE_IP)) { + if (SCTP_INP_LOCK_CONTENDED(inp)) + being_refed++; + if (SCTP_INP_READ_CONTENDED(inp)) + being_refed++; + if (SCTP_ASOC_CREATE_LOCK_CONTENDED(inp)) + being_refed++; + + if ((inp->refcount) || + (being_refed) || + (inp->sctp_flags & SCTP_PCB_FLAGS_CLOSE_IP)) { (void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer); +#ifdef SCTP_LOG_CLOSING + sctp_log_closing(inp, NULL, 4); +#endif sctp_timer_start(SCTP_TIMER_TYPE_INPKILL, inp, NULL, NULL); SCTP_INP_WUNLOCK(inp); SCTP_ASOC_CREATE_UNLOCK(inp); SCTP_INP_INFO_WUNLOCK(); - SCTP_ITERATOR_UNLOCK(); -#ifdef SCTP_LOG_CLOSING - sctp_log_closing(inp, NULL, 4); -#endif return; } - (void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer); inp->sctp_ep.signature_change.type = 0; inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_ALLGONE; + /* + * Remove it from the list .. last thing we need a lock for. + */ + LIST_REMOVE(inp, sctp_list); + SCTP_INP_WUNLOCK(inp); + SCTP_ASOC_CREATE_UNLOCK(inp); + SCTP_INP_INFO_WUNLOCK(); + /* + * Now we release all locks. Since this INP cannot be found anymore + * except possbily by the kill timer that might be running. We call + * the drain function here. It should hit the case were it sees the + * ACTIVE flag cleared and exit out freeing us to proceed and + * destroy everything. + */ + if (from != SCTP_CALLED_FROM_INPKILL_TIMER) { + (void)SCTP_OS_TIMER_STOP_DRAIN(&inp->sctp_ep.signature_change.timer); + } else { + /* Probably un-needed */ + (void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer); + } #ifdef SCTP_LOG_CLOSING sctp_log_closing(inp, NULL, 5); #endif - (void)SCTP_OS_TIMER_STOP(&inp->sctp_ep.signature_change.timer); - inp->sctp_ep.signature_change.type = SCTP_TIMER_TYPE_NONE; - /* Clear the read queue */ + if ((inp->sctp_asocidhash) != NULL) { SCTP_HASH_FREE(inp->sctp_asocidhash, inp->hashasocidmark); inp->sctp_asocidhash = NULL; @@ -3494,11 +3545,6 @@ sctp_inpcb_free(struct sctp_inpcb *inp, shared_key = LIST_FIRST(&inp->sctp_ep.shared_keys); } - inp_save = LIST_NEXT(inp, sctp_list); - LIST_REMOVE(inp, sctp_list); - - /* fix any iterators only after out of the list */ - sctp_iterator_inp_being_freed(inp, inp_save); /* * if we have an address list the following will free the list of * ifaddr's that are set into this ep. Again macro limitations here, @@ -3531,8 +3577,6 @@ sctp_inpcb_free(struct sctp_inpcb *inp, SCTP_INP_LOCK_DESTROY(inp); SCTP_INP_READ_DESTROY(inp); SCTP_ASOC_CREATE_LOCK_DESTROY(inp); - SCTP_INP_INFO_WUNLOCK(); - SCTP_ITERATOR_UNLOCK(); SCTP_ZONE_FREE(SCTP_BASE_INFO(ipi_zone_ep), inp); SCTP_DECR_EP_COUNT(); } @@ -4581,8 +4625,12 @@ sctp_free_assoc(struct sctp_inpcb *inp, * Someone holds a reference OR the socket is unaccepted * yet. */ - if (stcb->asoc.refcnt) + if ((stcb->asoc.refcnt) || + (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) || + (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE)) { + stcb->asoc.state &= ~SCTP_STATE_IN_ACCEPT_QUEUE; sctp_timer_start(SCTP_TIMER_TYPE_ASOCKILL, inp, stcb, NULL); + } SCTP_TCB_UNLOCK(stcb); if ((inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE) || (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_GONE)) @@ -4645,8 +4693,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, SS_ISCONFIRMING | SS_ISCONNECTED); } - SOCK_UNLOCK(so); - socantrcvmore(so); + socantrcvmore_locked(so); sctp_sowwakeup(inp, so); sctp_sorwakeup(inp, so); SCTP_SOWAKEUP(so); @@ -5436,8 +5483,6 @@ sctp_pcb_init() /* init the empty list of (All) Endpoints */ LIST_INIT(&SCTP_BASE_INFO(listhead)); - /* init the iterator head */ - TAILQ_INIT(&SCTP_BASE_INFO(iteratorhead)); /* init the hash table of endpoints */ TUNABLE_INT_FETCH("net.inet.sctp.tcbhashsize", &SCTP_BASE_SYSCTL(sctp_hashtblsize)); @@ -5500,16 +5545,15 @@ sctp_pcb_init() /* Master Lock INIT for info structure */ SCTP_INP_INFO_LOCK_INIT(); SCTP_STATLOG_INIT_LOCK(); - SCTP_ITERATOR_LOCK_INIT(); SCTP_IPI_COUNT_INIT(); SCTP_IPI_ADDR_INIT(); - SCTP_IPI_ITERATOR_WQ_INIT(); #ifdef SCTP_PACKET_LOGGING SCTP_IP_PKTLOG_INIT(); #endif LIST_INIT(&SCTP_BASE_INFO(addr_wq)); + SCTP_WQ_ADDR_INIT(); /* not sure if we need all the counts */ SCTP_BASE_INFO(ipi_count_ep) = 0; /* assoc/tcb zone info */ @@ -5537,11 +5581,7 @@ sctp_pcb_init() LIST_INIT(&SCTP_BASE_INFO(vtag_timewait)[i]); } *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***