From owner-freebsd-arch@FreeBSD.ORG Sun Aug 31 16:48:29 2014 Return-Path: Delivered-To: arch@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EF7BEB04; Sun, 31 Aug 2014 16:48:28 +0000 (UTC) Received: from cell.glebius.int.ru (glebius.int.ru [81.19.69.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "cell.glebius.int.ru", Issuer "cell.glebius.int.ru" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 526A61BD9; Sun, 31 Aug 2014 16:48:22 +0000 (UTC) Received: from cell.glebius.int.ru (localhost [127.0.0.1]) by cell.glebius.int.ru (8.14.9/8.14.9) with ESMTP id s7VGmKhO087298 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Sun, 31 Aug 2014 20:48:20 +0400 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.glebius.int.ru (8.14.9/8.14.9/Submit) id s7VGmKKA087297; Sun, 31 Aug 2014 20:48:20 +0400 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.glebius.int.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Sun, 31 Aug 2014 20:48:20 +0400 From: Gleb Smirnoff To: arch@FreeBSD.org Subject: Re: [CFT/review] new sendfile(2) Message-ID: <20140831164820.GD7693@FreeBSD.org> References: <20140529102054.GX50679@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="hTiIB9CRvBOLTyqY" Content-Disposition: inline In-Reply-To: <20140529102054.GX50679@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Mailman-Approved-At: Sun, 31 Aug 2014 18:58:29 +0000 Cc: alc@FreeBSD.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Aug 2014 16:48:29 -0000 --hTiIB9CRvBOLTyqY Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Hi! Just a followup with fresh version of the patch. For details see below. On Thu, May 29, 2014 at 02:20:54PM +0400, Gleb Smirnoff wrote: T> Hello! T> T> At Netflix and Nginx we are experimenting with improving FreeBSD T> wrt sending large amounts of static data via HTTP. T> T> One of the approaches we are experimenting with is new sendfile(2) T> implementation, that doesn't block on the I/O done from the file T> descriptor. T> T> The problem with classic sendfile(2) is that if the the request T> length is large enough, and file data is not cached in VM, then T> sendfile(2) syscall would not return until it fills socket buffer T> with data. With modern internet socket buffers can be up to 1 Mb, T> thus time taken by the syscall raises by order of magnitude. All T> the time, the nginx worker is blocked in syscall and doesn't T> process data from other clients. The best current practice to T> mitigate that is known as "sendfile(2) + aio_read(2)". This is T> special mode of nginx operation on FreeBSD. The sendfile(2) call T> is issued with SF_NODISKIO flag, that forbids the syscall to T> perform disk I/O, and send only data that is cached by VM. If T> sendfile(2) reports that I/O needs to be done (but forbidden), then T> nginx would do aio_read() of a chunk of the file. The data read T> is cached by VM, as side affect. Then sendfile() is called again. T> T> Now for the new sendfile. The core idea is that sendfile() T> schedules the I/O, but doesn't wait for it to complete. It T> returns immediately to the process, and I/O completion is T> processed in kernel context. Unlike aio(4), no additional T> threads in kernel are created. The new sendfile is a drop-in T> replacement for the old one. Applications (like nginx) doesn't T> need recompile, neither configuration change. The SF_NODISKIO is T> ignored. T> T> The patch for review is available at: T> T> https://phabric.freebsd.org/D102 T> T> And for those who prefer email attachments, it is also attached. T> The patch has 3 logically separate changes in itself: T> T> 1) Split of socket buffer sb_cc field into sb_acc and sb_ccc. Where T> sb_acc stands for "available character count" and sb_ccc is "claimed T> character count". This allows us to write a data to a socket, that is T> not ready yet. The data sits in the socket, consumes its space, and T> keeps itself in the right order with earlier or later writes to socket. T> But it can be send only after it is marked as ready. This change is T> split across many files. T> T> 2) A new vnode operation: VOP_GETPAGES_ASYNC(). This one lives in sys/vm. T> T> 3) Actual implementation of new sendfile(2). This one lives in T> kern/uipc_syscalls.c T> T> T> T> At Netflix, we already see improvements with new sendfile(2). T> We can send more data utilizing same amount of CPU, and we can T> push closer to 0% idle, without experiencing short lags. T> T> However, we have somewhat modified VM subsystem, that behaves T> optimal for our task, but suboptimal for average FreeBSD system. T> I'd like someone from community to try the new sendfile(2) at T> other setup and see how does it serve for you. T> T> To be the early tester you need to checkout projects/sendfile T> branch and build kernel from it. The world from head/ would T> run fine with it. T> T> svn co http://svn.freebsd.org/base/projects/sendfile T> cd sendfile T> ... build kernel ... T> T> Limitations: T> - No testing were done on serving files on NFS. T> - No testing were done on serving files on ZFS. T> T> -- T> Totus tuus, Glebius. T> Index: sys/dev/ti/if_ti.c T> =================================================================== T> --- sys/dev/ti/if_ti.c (.../head) (revision 266804) T> +++ sys/dev/ti/if_ti.c (.../projects/sendfile) (revision 266807) T> @@ -1629,7 +1629,7 @@ ti_newbuf_jumbo(struct ti_softc *sc, int idx, stru T> m[i]->m_data = (void *)sf_buf_kva(sf[i]); T> m[i]->m_len = PAGE_SIZE; T> MEXTADD(m[i], sf_buf_kva(sf[i]), PAGE_SIZE, T> - sf_buf_mext, (void*)sf_buf_kva(sf[i]), sf[i], T> + sf_mext_free, (void*)sf_buf_kva(sf[i]), sf[i], T> 0, EXT_DISPOSABLE); T> m[i]->m_next = m[i+1]; T> } T> @@ -1694,7 +1694,7 @@ nobufs: T> if (m[i]) T> m_freem(m[i]); T> if (sf[i]) T> - sf_buf_mext((void *)sf_buf_kva(sf[i]), sf[i]); T> + sf_mext_free((void *)sf_buf_kva(sf[i]), sf[i]); T> } T> return (ENOBUFS); T> } T> Index: sys/dev/cxgbe/tom/t4_cpl_io.c T> =================================================================== T> --- sys/dev/cxgbe/tom/t4_cpl_io.c (.../head) (revision 266804) T> +++ sys/dev/cxgbe/tom/t4_cpl_io.c (.../projects/sendfile) (revision 266807) T> @@ -338,11 +338,11 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp) T> INP_WLOCK_ASSERT(inp); T> T> SOCKBUF_LOCK(sb); T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> - toep->sb_cc = sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> + toep->sb_cc = sbused(sb); T> credits = toep->rx_credits; T> SOCKBUF_UNLOCK(sb); T> T> @@ -863,15 +863,15 @@ do_peer_close(struct sge_iq *iq, const struct rss_ T> tp->rcv_nxt = be32toh(cpl->rcv_nxt); T> toep->ddp_flags &= ~(DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE); T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> #ifdef USE_DDP_RX_FLOW_CONTROL T> toep->rx_credits -= m->m_len; /* adjust for F_RX_FC_DDP */ T> #endif T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> } T> socantrcvmore_locked(so); /* unlocks the sockbuf */ T> T> @@ -1281,12 +1281,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_hea T> } T> } T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> sorwakeup_locked(so); T> SOCKBUF_UNLOCK_ASSERT(sb); T> T> Index: sys/dev/cxgbe/tom/t4_ddp.c T> =================================================================== T> --- sys/dev/cxgbe/tom/t4_ddp.c (.../head) (revision 266804) T> +++ sys/dev/cxgbe/tom/t4_ddp.c (.../projects/sendfile) (revision 266807) T> @@ -224,15 +224,15 @@ insert_ddp_data(struct toepcb *toep, uint32_t n) T> tp->rcv_wnd -= n; T> #endif T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> #ifdef USE_DDP_RX_FLOW_CONTROL T> toep->rx_credits -= n; /* adjust for F_RX_FC_DDP */ T> #endif T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> } T> T> /* SET_TCB_FIELD sent as a ULP command looks like this */ T> @@ -459,15 +459,15 @@ handle_ddp_data(struct toepcb *toep, __be32 ddp_re T> else T> discourage_ddp(toep); T> T> - KASSERT(toep->sb_cc >= sb->sb_cc, T> + KASSERT(toep->sb_cc >= sbused(sb), T> ("%s: sb %p has more data (%d) than last time (%d).", T> - __func__, sb, sb->sb_cc, toep->sb_cc)); T> - toep->rx_credits += toep->sb_cc - sb->sb_cc; T> + __func__, sb, sbused(sb), toep->sb_cc)); T> + toep->rx_credits += toep->sb_cc - sbused(sb); T> #ifdef USE_DDP_RX_FLOW_CONTROL T> toep->rx_credits -= len; /* adjust for F_RX_FC_DDP */ T> #endif T> - sbappendstream_locked(sb, m); T> - toep->sb_cc = sb->sb_cc; T> + sbappendstream_locked(sb, m, 0); T> + toep->sb_cc = sbused(sb); T> wakeup: T> KASSERT(toep->ddp_flags & db_flag, T> ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x, report 0x%x", T> @@ -897,7 +897,7 @@ handle_ddp(struct socket *so, struct uio *uio, int T> #endif T> T> /* XXX: too eager to disable DDP, could handle NBIO better than this. */ T> - if (sb->sb_cc >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || T> + if (sbused(sb) >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || T> uio->uio_resid > MAX_DDP_BUFFER_SIZE || uio->uio_iovcnt > 1 || T> so->so_state & SS_NBIO || flags & (MSG_DONTWAIT | MSG_NBIO) || T> error || so->so_error || sb->sb_state & SBS_CANTRCVMORE) T> @@ -935,7 +935,7 @@ handle_ddp(struct socket *so, struct uio *uio, int T> * payload. T> */ T> ddp_flags = select_ddp_flags(so, flags, db_idx); T> - wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sb->sb_cc, ddp_flags); T> + wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sbused(sb), ddp_flags); T> if (wr == NULL) { T> /* T> * Just unhold the pages. The DDP buffer's software state is T> @@ -960,8 +960,9 @@ handle_ddp(struct socket *so, struct uio *uio, int T> */ T> rc = sbwait(sb); T> while (toep->ddp_flags & buf_flag) { T> + /* XXXGL: shouldn't here be sbwait() call? */ T> sb->sb_flags |= SB_WAIT; T> - msleep(&sb->sb_cc, &sb->sb_mtx, PSOCK , "sbwait", 0); T> + msleep(&sb->sb_acc, &sb->sb_mtx, PSOCK , "sbwait", 0); T> } T> unwire_ddp_buffer(db); T> return (rc); T> @@ -1123,8 +1124,8 @@ restart: T> T> /* uio should be just as it was at entry */ T> KASSERT(oresid == uio->uio_resid, T> - ("%s: oresid = %d, uio_resid = %zd, sb_cc = %d", T> - __func__, oresid, uio->uio_resid, sb->sb_cc)); T> + ("%s: oresid = %d, uio_resid = %zd, sbused = %d", T> + __func__, oresid, uio->uio_resid, sbused(sb))); T> T> error = handle_ddp(so, uio, flags, 0); T> ddp_handled = 1; T> @@ -1134,7 +1135,7 @@ restart: T> T> /* Abort if socket has reported problems. */ T> if (so->so_error) { T> - if (sb->sb_cc > 0) T> + if (sbused(sb)) T> goto deliver; T> if (oresid > uio->uio_resid) T> goto out; T> @@ -1146,7 +1147,7 @@ restart: T> T> /* Door is closed. Deliver what is left, if any. */ T> if (sb->sb_state & SBS_CANTRCVMORE) { T> - if (sb->sb_cc > 0) T> + if (sbused(sb)) T> goto deliver; T> else T> goto out; T> @@ -1153,7 +1154,7 @@ restart: T> } T> T> /* Socket buffer is empty and we shall not block. */ T> - if (sb->sb_cc == 0 && T> + if (sbused(sb) == 0 && T> ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { T> error = EAGAIN; T> goto out; T> @@ -1160,18 +1161,18 @@ restart: T> } T> T> /* Socket buffer got some data that we shall deliver now. */ T> - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && T> + if (sbused(sb) && !(flags & MSG_WAITALL) && T> ((sb->sb_flags & SS_NBIO) || T> (flags & (MSG_DONTWAIT|MSG_NBIO)) || T> - sb->sb_cc >= sb->sb_lowat || T> - sb->sb_cc >= uio->uio_resid || T> - sb->sb_cc >= sb->sb_hiwat) ) { T> + sbused(sb) >= sb->sb_lowat || T> + sbused(sb) >= uio->uio_resid || T> + sbused(sb) >= sb->sb_hiwat) ) { T> goto deliver; T> } T> T> /* On MSG_WAITALL we must wait until all data or error arrives. */ T> if ((flags & MSG_WAITALL) && T> - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) T> + (sbused(sb) >= uio->uio_resid || sbused(sb) >= sb->sb_lowat)) T> goto deliver; T> T> /* T> @@ -1190,7 +1191,7 @@ restart: T> T> deliver: T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); T> + KASSERT(sbused(sb) > 0, ("%s: sockbuf empty", __func__)); T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); T> T> if (sb->sb_flags & SB_DDP_INDICATE && !ddp_handled) T> @@ -1201,7 +1202,7 @@ deliver: T> uio->uio_td->td_ru.ru_msgrcv++; T> T> /* Fill uio until full or current end of socket buffer is reached. */ T> - len = min(uio->uio_resid, sb->sb_cc); T> + len = min(uio->uio_resid, sbused(sb)); T> if (mp0 != NULL) { T> /* Dequeue as many mbufs as possible. */ T> if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { T> Index: sys/dev/cxgbe/iw_cxgbe/cm.c T> =================================================================== T> --- sys/dev/cxgbe/iw_cxgbe/cm.c (.../head) (revision 266804) T> +++ sys/dev/cxgbe/iw_cxgbe/cm.c (.../projects/sendfile) (revision 266807) T> @@ -585,8 +585,8 @@ process_data(struct c4iw_ep *ep) T> { T> struct sockaddr_in *local, *remote; T> T> - CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__, T> - ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc); T> + CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__, T> + ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv)); T> T> switch (state_read(&ep->com)) { T> case MPA_REQ_SENT: T> @@ -602,11 +602,11 @@ process_data(struct c4iw_ep *ep) T> process_mpa_request(ep); T> break; T> default: T> - if (ep->com.so->so_rcv.sb_cc) T> - log(LOG_ERR, "%s: Unexpected streaming data. " T> - "ep %p, state %d, so %p, so_state 0x%x, sb_cc %u\n", T> + if (sbused(&ep->com.so->so_rcv)) T> + log(LOG_ERR, "%s: Unexpected streaming data. ep %p, " T> + "state %d, so %p, so_state 0x%x, sbused %u\n", T> __func__, ep, state_read(&ep->com), ep->com.so, T> - ep->com.so->so_state, ep->com.so->so_rcv.sb_cc); T> + ep->com.so->so_state, sbused(&ep->com.so->so_rcv)); T> break; T> } T> } T> Index: sys/dev/iscsi/icl.c T> =================================================================== T> --- sys/dev/iscsi/icl.c (.../head) (revision 266804) T> +++ sys/dev/iscsi/icl.c (.../projects/sendfile) (revision 266807) T> @@ -758,7 +758,7 @@ icl_receive_thread(void *arg) T> * is enough data received to read the PDU. T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> - available = so->so_rcv.sb_cc; T> + available = sbavail(&so->so_rcv); T> if (available < ic->ic_receive_len) { T> so->so_rcv.sb_lowat = ic->ic_receive_len; T> cv_wait(&ic->ic_receive_cv, &so->so_rcv.sb_mtx); T> Index: sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c T> =================================================================== T> --- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../head) (revision 266804) T> +++ sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../projects/sendfile) (revision 266807) T> @@ -445,8 +445,8 @@ t3_push_frames(struct socket *so, int req_completi T> * Autosize the send buffer. T> */ T> if (snd->sb_flags & SB_AUTOSIZE && VNET(tcp_do_autosndbuf)) { T> - if (snd->sb_cc >= (snd->sb_hiwat / 8 * 7) && T> - snd->sb_cc < VNET(tcp_autosndbuf_max)) { T> + if (sbused(snd) >= (snd->sb_hiwat / 8 * 7) && T> + sbused(snd) < VNET(tcp_autosndbuf_max)) { T> if (!sbreserve_locked(snd, min(snd->sb_hiwat + T> VNET(tcp_autosndbuf_inc), VNET(tcp_autosndbuf_max)), T> so, curthread)) T> @@ -597,10 +597,10 @@ t3_rcvd(struct toedev *tod, struct tcpcb *tp) T> INP_WLOCK_ASSERT(inp); T> T> SOCKBUF_LOCK(so_rcv); T> - KASSERT(toep->tp_enqueued >= so_rcv->sb_cc, T> - ("%s: so_rcv->sb_cc > enqueued", __func__)); T> - toep->tp_rx_credits += toep->tp_enqueued - so_rcv->sb_cc; T> - toep->tp_enqueued = so_rcv->sb_cc; T> + KASSERT(toep->tp_enqueued >= sbused(so_rcv), T> + ("%s: sbused(so_rcv) > enqueued", __func__)); T> + toep->tp_rx_credits += toep->tp_enqueued - sbused(so_rcv); T> + toep->tp_enqueued = sbused(so_rcv); T> SOCKBUF_UNLOCK(so_rcv); T> T> must_send = toep->tp_rx_credits + 16384 >= tp->rcv_wnd; T> @@ -1199,7 +1199,7 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *r T> } T> T> toep->tp_enqueued += m->m_pkthdr.len; T> - sbappendstream_locked(so_rcv, m); T> + sbappendstream_locked(so_rcv, m, 0); T> sorwakeup_locked(so); T> SOCKBUF_UNLOCK_ASSERT(so_rcv); T> T> @@ -1768,7 +1768,7 @@ wr_ack(struct toepcb *toep, struct mbuf *m) T> so_sowwakeup_locked(so); T> } T> T> - if (snd->sb_sndptroff < snd->sb_cc) T> + if (snd->sb_sndptroff < sbused(snd)) T> t3_push_frames(so, 0); T> T> out_free: T> Index: sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c T> =================================================================== T> --- sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../head) (revision 266804) T> +++ sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../projects/sendfile) (revision 266807) T> @@ -1515,11 +1515,11 @@ process_data(struct iwch_ep *ep) T> process_mpa_request(ep); T> break; T> default: T> - if (ep->com.so->so_rcv.sb_cc) T> + if (sbavail(&ep->com.so->so_rcv)) T> printf("%s Unexpected streaming data." T> " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n", T> __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state, T> - ep->com.so->so_rcv.sb_cc, ep->com.so->so_rcv.sb_mb); T> + sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb); T> break; T> } T> return; T> Index: sys/kern/uipc_debug.c T> =================================================================== T> --- sys/kern/uipc_debug.c (.../head) (revision 266804) T> +++ sys/kern/uipc_debug.c (.../projects/sendfile) (revision 266807) T> @@ -403,7 +403,8 @@ db_print_sockbuf(struct sockbuf *sb, const char *s T> db_printf("sb_sndptroff: %u\n", sb->sb_sndptroff); T> T> db_print_indent(indent); T> - db_printf("sb_cc: %u ", sb->sb_cc); T> + db_printf("sb_acc: %u ", sb->sb_acc); T> + db_printf("sb_ccc: %u ", sb->sb_ccc); T> db_printf("sb_hiwat: %u ", sb->sb_hiwat); T> db_printf("sb_mbcnt: %u ", sb->sb_mbcnt); T> db_printf("sb_mbmax: %u\n", sb->sb_mbmax); T> Index: sys/kern/uipc_mbuf.c T> =================================================================== T> --- sys/kern/uipc_mbuf.c (.../head) (revision 266804) T> +++ sys/kern/uipc_mbuf.c (.../projects/sendfile) (revision 266807) T> @@ -389,7 +389,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m) T> * cleaned too. T> */ T> void T> -m_demote(struct mbuf *m0, int all) T> +m_demote(struct mbuf *m0, int all, int flags) T> { T> struct mbuf *m; T> T> @@ -405,7 +405,7 @@ void T> m_freem(m->m_nextpkt); T> m->m_nextpkt = NULL; T> } T> - m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE); T> + m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags); T> } T> } T> T> Index: sys/kern/sys_socket.c T> =================================================================== T> --- sys/kern/sys_socket.c (.../head) (revision 266804) T> +++ sys/kern/sys_socket.c (.../projects/sendfile) (revision 266807) T> @@ -167,20 +167,17 @@ soo_ioctl(struct file *fp, u_long cmd, void *data, T> T> case FIONREAD: T> /* Unlocked read. */ T> - *(int *)data = so->so_rcv.sb_cc; T> + *(int *)data = sbavail(&so->so_rcv); T> break; T> T> case FIONWRITE: T> /* Unlocked read. */ T> - *(int *)data = so->so_snd.sb_cc; T> + *(int *)data = sbavail(&so->so_snd); T> break; T> T> case FIONSPACE: T> - if ((so->so_snd.sb_hiwat < so->so_snd.sb_cc) || T> - (so->so_snd.sb_mbmax < so->so_snd.sb_mbcnt)) T> - *(int *)data = 0; T> - else T> - *(int *)data = sbspace(&so->so_snd); T> + /* Unlocked read. */ T> + *(int *)data = sbspace(&so->so_snd); T> break; T> T> case FIOSETOWN: T> @@ -246,6 +243,7 @@ soo_stat(struct file *fp, struct stat *ub, struct T> struct thread *td) T> { T> struct socket *so = fp->f_data; T> + struct sockbuf *sb; T> #ifdef MAC T> int error; T> #endif T> @@ -261,15 +259,18 @@ soo_stat(struct file *fp, struct stat *ub, struct T> * If SBS_CANTRCVMORE is set, but there's still data left in the T> * receive buffer, the socket is still readable. T> */ T> - SOCKBUF_LOCK(&so->so_rcv); T> - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 || T> - so->so_rcv.sb_cc != 0) T> + sb = &so->so_rcv; T> + SOCKBUF_LOCK(sb); T> + if ((sb->sb_state & SBS_CANTRCVMORE) == 0 || sbavail(sb)) T> ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH; T> - ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; T> - SOCKBUF_UNLOCK(&so->so_rcv); T> - /* Unlocked read. */ T> - if ((so->so_snd.sb_state & SBS_CANTSENDMORE) == 0) T> + ub->st_size = sbavail(sb) - sb->sb_ctl; T> + SOCKBUF_UNLOCK(sb); T> + T> + sb = &so->so_snd; T> + SOCKBUF_LOCK(sb); T> + if ((sb->sb_state & SBS_CANTSENDMORE) == 0) T> ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH; T> + SOCKBUF_UNLOCK(sb); T> ub->st_uid = so->so_cred->cr_uid; T> ub->st_gid = so->so_cred->cr_gid; T> return (*so->so_proto->pr_usrreqs->pru_sense)(so, ub); T> Index: sys/kern/uipc_usrreq.c T> =================================================================== T> --- sys/kern/uipc_usrreq.c (.../head) (revision 266804) T> +++ sys/kern/uipc_usrreq.c (.../projects/sendfile) (revision 266807) T> @@ -790,11 +790,10 @@ uipc_rcvd(struct socket *so, int flags) T> u_int mbcnt, sbcc; T> T> unp = sotounpcb(so); T> - KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL")); T> + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); T> + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_SEQPACKET, T> + ("%s: socktype %d", __func__, so->so_type)); T> T> - if (so->so_type != SOCK_STREAM && so->so_type != SOCK_SEQPACKET) T> - panic("uipc_rcvd socktype %d", so->so_type); T> - T> /* T> * Adjust backpressure on sender and wakeup any waiting to write. T> * T> @@ -807,7 +806,7 @@ uipc_rcvd(struct socket *so, int flags) T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> mbcnt = so->so_rcv.sb_mbcnt; T> - sbcc = so->so_rcv.sb_cc; T> + sbcc = sbavail(&so->so_rcv); T> SOCKBUF_UNLOCK(&so->so_rcv); T> /* T> * There is a benign race condition at this point. If we're planning to T> @@ -843,7 +842,10 @@ uipc_send(struct socket *so, int flags, struct mbu T> int error = 0; T> T> unp = sotounpcb(so); T> - KASSERT(unp != NULL, ("uipc_send: unp == NULL")); T> + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); T> + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_DGRAM || T> + so->so_type == SOCK_SEQPACKET, T> + ("%s: socktype %d", __func__, so->so_type)); T> T> if (flags & PRUS_OOB) { T> error = EOPNOTSUPP; T> @@ -994,7 +996,7 @@ uipc_send(struct socket *so, int flags, struct mbu T> } T> T> mbcnt = so2->so_rcv.sb_mbcnt; T> - sbcc = so2->so_rcv.sb_cc; T> + sbcc = sbavail(&so2->so_rcv); T> sorwakeup_locked(so2); T> T> /* T> @@ -1011,9 +1013,6 @@ uipc_send(struct socket *so, int flags, struct mbu T> UNP_PCB_UNLOCK(unp2); T> m = NULL; T> break; T> - T> - default: T> - panic("uipc_send unknown socktype"); T> } T> T> /* T> Index: sys/kern/vfs_default.c T> =================================================================== T> --- sys/kern/vfs_default.c (.../head) (revision 266804) T> +++ sys/kern/vfs_default.c (.../projects/sendfile) (revision 266807) T> @@ -111,6 +111,7 @@ struct vop_vector default_vnodeops = { T> .vop_close = VOP_NULL, T> .vop_fsync = VOP_NULL, T> .vop_getpages = vop_stdgetpages, T> + .vop_getpages_async = vop_stdgetpages_async, T> .vop_getwritemount = vop_stdgetwritemount, T> .vop_inactive = VOP_NULL, T> .vop_ioctl = VOP_ENOTTY, T> @@ -726,10 +727,19 @@ vop_stdgetpages(ap) T> { T> T> return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, T> - ap->a_count, ap->a_reqpage); T> + ap->a_count, ap->a_reqpage, NULL, NULL); T> } T> T> +/* XXX Needs good comment and a manpage. */ T> int T> +vop_stdgetpages_async(struct vop_getpages_async_args *ap) T> +{ T> + T> + return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, T> + ap->a_count, ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg); T> +} T> + T> +int T> vop_stdkqfilter(struct vop_kqfilter_args *ap) T> { T> return vfs_kqfilter(ap); T> Index: sys/kern/uipc_socket.c T> =================================================================== T> --- sys/kern/uipc_socket.c (.../head) (revision 266804) T> +++ sys/kern/uipc_socket.c (.../projects/sendfile) (revision 266807) T> @@ -1459,12 +1459,12 @@ restart: T> * 2. MSG_DONTWAIT is not set T> */ T> if (m == NULL || (((flags & MSG_DONTWAIT) == 0 && T> - so->so_rcv.sb_cc < uio->uio_resid) && T> - so->so_rcv.sb_cc < so->so_rcv.sb_lowat && T> + sbavail(&so->so_rcv) < uio->uio_resid) && T> + sbavail(&so->so_rcv) < so->so_rcv.sb_lowat && T> m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) { T> - KASSERT(m != NULL || !so->so_rcv.sb_cc, T> - ("receive: m == %p so->so_rcv.sb_cc == %u", T> - m, so->so_rcv.sb_cc)); T> + KASSERT(m != NULL || !sbavail(&so->so_rcv), T> + ("receive: m == %p sbavail == %u", T> + m, sbavail(&so->so_rcv))); T> if (so->so_error) { T> if (m != NULL) T> goto dontblock; T> @@ -1746,9 +1746,7 @@ dontblock: T> SOCKBUF_LOCK(&so->so_rcv); T> } T> } T> - m->m_data += len; T> - m->m_len -= len; T> - so->so_rcv.sb_cc -= len; T> + sbmtrim(&so->so_rcv, m, len); T> } T> } T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> @@ -1913,7 +1911,7 @@ restart: T> T> /* Abort if socket has reported problems. */ T> if (so->so_error) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb) > 0) T> goto deliver; T> if (oresid > uio->uio_resid) T> goto out; T> @@ -1925,7 +1923,7 @@ restart: T> T> /* Door is closed. Deliver what is left, if any. */ T> if (sb->sb_state & SBS_CANTRCVMORE) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb) > 0) T> goto deliver; T> else T> goto out; T> @@ -1932,7 +1930,7 @@ restart: T> } T> T> /* Socket buffer is empty and we shall not block. */ T> - if (sb->sb_cc == 0 && T> + if (sbavail(sb) == 0 && T> ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { T> error = EAGAIN; T> goto out; T> @@ -1939,18 +1937,18 @@ restart: T> } T> T> /* Socket buffer got some data that we shall deliver now. */ T> - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && T> + if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) && T> ((sb->sb_flags & SS_NBIO) || T> (flags & (MSG_DONTWAIT|MSG_NBIO)) || T> - sb->sb_cc >= sb->sb_lowat || T> - sb->sb_cc >= uio->uio_resid || T> - sb->sb_cc >= sb->sb_hiwat) ) { T> + sbavail(sb) >= sb->sb_lowat || T> + sbavail(sb) >= uio->uio_resid || T> + sbavail(sb) >= sb->sb_hiwat) ) { T> goto deliver; T> } T> T> /* On MSG_WAITALL we must wait until all data or error arrives. */ T> if ((flags & MSG_WAITALL) && T> - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_hiwat)) T> + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat)) T> goto deliver; T> T> /* T> @@ -1964,7 +1962,7 @@ restart: T> T> deliver: T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); T> + KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__)); T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); T> T> /* Statistics. */ T> @@ -1972,7 +1970,7 @@ deliver: T> uio->uio_td->td_ru.ru_msgrcv++; T> T> /* Fill uio until full or current end of socket buffer is reached. */ T> - len = min(uio->uio_resid, sb->sb_cc); T> + len = min(uio->uio_resid, sbavail(sb)); T> if (mp0 != NULL) { T> /* Dequeue as many mbufs as possible. */ T> if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { T> @@ -1983,6 +1981,8 @@ deliver: T> for (m = sb->sb_mb; T> m != NULL && m->m_len <= len; T> m = m->m_next) { T> + KASSERT(!(m->m_flags & M_NOTAVAIL), T> + ("%s: m %p not available", __func__, m)); T> len -= m->m_len; T> uio->uio_resid -= m->m_len; T> sbfree(sb, m); T> @@ -2107,9 +2107,9 @@ soreceive_dgram(struct socket *so, struct sockaddr T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> while ((m = so->so_rcv.sb_mb) == NULL) { T> - KASSERT(so->so_rcv.sb_cc == 0, T> - ("soreceive_dgram: sb_mb NULL but sb_cc %u", T> - so->so_rcv.sb_cc)); T> + KASSERT(sbavail(&so->so_rcv) == 0, T> + ("soreceive_dgram: sb_mb NULL but sbavail %u", T> + sbavail(&so->so_rcv))); T> if (so->so_error) { T> error = so->so_error; T> so->so_error = 0; T> @@ -3157,7 +3157,7 @@ filt_soread(struct knote *kn, long hint) T> so = kn->kn_fp->f_data; T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> T> - kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; T> + kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl; T> if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { T> kn->kn_flags |= EV_EOF; T> kn->kn_fflags = so->so_error; T> @@ -3167,7 +3167,7 @@ filt_soread(struct knote *kn, long hint) T> else if (kn->kn_sfflags & NOTE_LOWAT) T> return (kn->kn_data >= kn->kn_sdata); T> else T> - return (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat); T> + return (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat); T> } T> T> static void T> @@ -3350,7 +3350,7 @@ soisdisconnected(struct socket *so) T> sorwakeup_locked(so); T> SOCKBUF_LOCK(&so->so_snd); T> so->so_snd.sb_state |= SBS_CANTSENDMORE; T> - sbdrop_locked(&so->so_snd, so->so_snd.sb_cc); T> + sbdrop_locked(&so->so_snd, sbused(&so->so_snd)); T> sowwakeup_locked(so); T> wakeup(&so->so_timeo); T> } T> Index: sys/kern/vnode_if.src T> =================================================================== T> --- sys/kern/vnode_if.src (.../head) (revision 266804) T> +++ sys/kern/vnode_if.src (.../projects/sendfile) (revision 266807) T> @@ -477,6 +477,19 @@ vop_getpages { T> }; T> T> T> +%% getpages_async vp L L L T> + T> +vop_getpages_async { T> + IN struct vnode *vp; T> + IN vm_page_t *m; T> + IN int count; T> + IN int reqpage; T> + IN vm_ooffset_t offset; T> + IN void (*vop_getpages_iodone)(void *); T> + IN void *arg; T> +}; T> + T> + T> %% putpages vp L L L T> T> vop_putpages { T> Index: sys/kern/uipc_sockbuf.c T> =================================================================== T> --- sys/kern/uipc_sockbuf.c (.../head) (revision 266804) T> +++ sys/kern/uipc_sockbuf.c (.../projects/sendfile) (revision 266807) T> @@ -68,7 +68,152 @@ static u_long sb_efficiency = 8; /* parameter for T> static struct mbuf *sbcut_internal(struct sockbuf *sb, int len); T> static void sbflush_internal(struct sockbuf *sb); T> T> +static void T> +sb_shift_nrdy(struct sockbuf *sb, struct mbuf *m) T> +{ T> + T> + SOCKBUF_LOCK_ASSERT(sb); T> + KASSERT(m->m_flags & M_NOTREADY, ("%s: m %p !M_NOTREADY", __func__, m)); T> + T> + m = m->m_next; T> + while (m != NULL && !(m->m_flags & M_NOTREADY)) { T> + m->m_flags &= ~M_BLOCKED; T> + sb->sb_acc += m->m_len; T> + m = m->m_next; T> + } T> + T> + sb->sb_fnrdy = m; T> +} T> + T> +int T> +sbready(struct sockbuf *sb, struct mbuf *m, int count) T> +{ T> + u_int blocker; T> + T> + SOCKBUF_LOCK(sb); T> + T> + if (sb->sb_state & SBS_CANTSENDMORE) { T> + SOCKBUF_UNLOCK(sb); T> + return (ENOTCONN); T> + } T> + T> + KASSERT(sb->sb_fnrdy != NULL, ("%s: sb %p NULL fnrdy", __func__, sb)); T> + T> + blocker = (sb->sb_fnrdy == m) ? M_BLOCKED : 0; T> + T> + for (int i = 0; i < count; i++, m = m->m_next) { T> + KASSERT(m->m_flags & M_NOTREADY, T> + ("%s: m %p !M_NOTREADY", __func__, m)); T> + m->m_flags &= ~(M_NOTREADY | blocker); T> + if (blocker) T> + sb->sb_acc += m->m_len; T> + } T> + T> + if (!blocker) { T> + SOCKBUF_UNLOCK(sb); T> + return (EWOULDBLOCK); T> + } T> + T> + /* This one was blocking all the queue. */ T> + for (; m && (m->m_flags & M_NOTREADY) == 0; m = m->m_next) { T> + KASSERT(m->m_flags & M_BLOCKED, T> + ("%s: m %p !M_BLOCKED", __func__, m)); T> + m->m_flags &= ~M_BLOCKED; T> + sb->sb_acc += m->m_len; T> + } T> + T> + sb->sb_fnrdy = m; T> + T> + SOCKBUF_UNLOCK(sb); T> + T> + return (0); T> +} T> + T> /* T> + * Adjust sockbuf state reflecting allocation of m. T> + */ T> +void T> +sballoc(struct sockbuf *sb, struct mbuf *m) T> +{ T> + T> + SOCKBUF_LOCK_ASSERT(sb); T> + T> + sb->sb_ccc += m->m_len; T> + T> + if (sb->sb_fnrdy == NULL) { T> + if (m->m_flags & M_NOTREADY) T> + sb->sb_fnrdy = m; T> + else T> + sb->sb_acc += m->m_len; T> + } else T> + m->m_flags |= M_BLOCKED; T> + T> + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> + sb->sb_ctl += m->m_len; T> + T> + sb->sb_mbcnt += MSIZE; T> + sb->sb_mcnt += 1; T> + T> + if (m->m_flags & M_EXT) { T> + sb->sb_mbcnt += m->m_ext.ext_size; T> + sb->sb_ccnt += 1; T> + } T> +} T> + T> +/* T> + * Adjust sockbuf state reflecting freeing of m. T> + */ T> +void T> +sbfree(struct sockbuf *sb, struct mbuf *m) T> +{ T> + T> +#if 0 /* XXX: not yet: soclose() call path comes here w/o lock. */ T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + T> + sb->sb_ccc -= m->m_len; T> + T> + if (!(m->m_flags & M_NOTAVAIL)) T> + sb->sb_acc -= m->m_len; T> + T> + if (sb->sb_fnrdy == m) T> + sb_shift_nrdy(sb, m); T> + T> + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> + sb->sb_ctl -= m->m_len; T> + T> + sb->sb_mbcnt -= MSIZE; T> + sb->sb_mcnt -= 1; T> + if (m->m_flags & M_EXT) { T> + sb->sb_mbcnt -= m->m_ext.ext_size; T> + sb->sb_ccnt -= 1; T> + } T> + T> + if (sb->sb_sndptr == m) { T> + sb->sb_sndptr = NULL; T> + sb->sb_sndptroff = 0; T> + } T> + if (sb->sb_sndptroff != 0) T> + sb->sb_sndptroff -= m->m_len; T> +} T> + T> +/* T> + * Trim some amount of data from (first?) mbuf in buffer. T> + */ T> +void T> +sbmtrim(struct sockbuf *sb, struct mbuf *m, int len) T> +{ T> + T> + SOCKBUF_LOCK_ASSERT(sb); T> + KASSERT(len < m->m_len, ("%s: m %p len %d", __func__, m, len)); T> + T> + m->m_data += len; T> + m->m_len -= len; T> + sb->sb_acc -= len; T> + sb->sb_ccc -= len; T> +} T> + T> +/* T> * Socantsendmore indicates that no more data will be sent on the socket; it T> * would normally be applied to a socket when the user informs the system T> * that no more data is to be sent, by the protocol code (in case T> @@ -127,7 +272,7 @@ sbwait(struct sockbuf *sb) T> SOCKBUF_LOCK_ASSERT(sb); T> T> sb->sb_flags |= SB_WAIT; T> - return (msleep_sbt(&sb->sb_cc, &sb->sb_mtx, T> + return (msleep_sbt(&sb->sb_acc, &sb->sb_mtx, T> (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "sbwait", T> sb->sb_timeo, 0, 0)); T> } T> @@ -184,7 +329,7 @@ sowakeup(struct socket *so, struct sockbuf *sb) T> sb->sb_flags &= ~SB_SEL; T> if (sb->sb_flags & SB_WAIT) { T> sb->sb_flags &= ~SB_WAIT; T> - wakeup(&sb->sb_cc); T> + wakeup(&sb->sb_acc); T> } T> KNOTE_LOCKED(&sb->sb_sel.si_note, 0); T> if (sb->sb_upcall != NULL) { T> @@ -519,7 +664,7 @@ sbappend(struct sockbuf *sb, struct mbuf *m) T> * that is, a stream protocol (such as TCP). T> */ T> void T> -sbappendstream_locked(struct sockbuf *sb, struct mbuf *m) T> +sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags) T> { T> SOCKBUF_LOCK_ASSERT(sb); T> T> @@ -529,8 +674,8 @@ void T> SBLASTMBUFCHK(sb); T> T> /* Remove all packet headers and mbuf tags to get a pure data chain. */ T> - m_demote(m, 1); T> - T> + m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0); T> + T> sbcompress(sb, m, sb->sb_mbtail); T> T> sb->sb_lastrecord = sb->sb_mb; T> @@ -543,38 +688,59 @@ void T> * that is, a stream protocol (such as TCP). T> */ T> void T> -sbappendstream(struct sockbuf *sb, struct mbuf *m) T> +sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags) T> { T> T> SOCKBUF_LOCK(sb); T> - sbappendstream_locked(sb, m); T> + sbappendstream_locked(sb, m, flags); T> SOCKBUF_UNLOCK(sb); T> } T> T> #ifdef SOCKBUF_DEBUG T> void T> -sbcheck(struct sockbuf *sb) T> +sbcheck(struct sockbuf *sb, const char *file, int line) T> { T> - struct mbuf *m; T> - struct mbuf *n = 0; T> - u_long len = 0, mbcnt = 0; T> + struct mbuf *m, *n, *fnrdy; T> + u_long acc, ccc, mbcnt; T> T> SOCKBUF_LOCK_ASSERT(sb); T> T> + acc = ccc = mbcnt = 0; T> + fnrdy = NULL; T> + T> for (m = sb->sb_mb; m; m = n) { T> n = m->m_nextpkt; T> for (; m; m = m->m_next) { T> - len += m->m_len; T> + if ((m->m_flags & M_NOTREADY) && fnrdy == NULL) { T> + if (m != sb->sb_fnrdy) { T> + printf("sb %p: fnrdy %p != m %p\n", T> + sb, sb->sb_fnrdy, m); T> + goto fail; T> + } T> + fnrdy = m; T> + } T> + if (fnrdy) { T> + if (!(m->m_flags & M_NOTAVAIL)) { T> + printf("sb %p: fnrdy %p, m %p is avail\n", T> + sb, sb->sb_fnrdy, m); T> + goto fail; T> + } T> + } else T> + acc += m->m_len; T> + ccc += m->m_len; T> mbcnt += MSIZE; T> if (m->m_flags & M_EXT) /*XXX*/ /* pretty sure this is bogus */ T> mbcnt += m->m_ext.ext_size; T> } T> } T> - if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) { T> - printf("cc %ld != %u || mbcnt %ld != %u\n", len, sb->sb_cc, T> - mbcnt, sb->sb_mbcnt); T> - panic("sbcheck"); T> + if (acc != sb->sb_acc || ccc != sb->sb_ccc || mbcnt != sb->sb_mbcnt) { T> + printf("acc %ld/%u ccc %ld/%u mbcnt %ld/%u\n", T> + acc, sb->sb_acc, ccc, sb->sb_ccc, mbcnt, sb->sb_mbcnt); T> + goto fail; T> } T> + return; T> +fail: T> + panic("%s from %s:%u", __func__, file, line); T> } T> #endif T> T> @@ -800,6 +966,7 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str T> if (n && (n->m_flags & M_EOR) == 0 && T> M_WRITABLE(n) && T> ((sb->sb_flags & SB_NOCOALESCE) == 0) && T> + !(m->m_flags & M_NOTREADY) && T> m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */ T> m->m_len <= M_TRAILINGSPACE(n) && T> n->m_type == m->m_type) { T> @@ -806,7 +973,9 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str T> bcopy(mtod(m, caddr_t), mtod(n, caddr_t) + n->m_len, T> (unsigned)m->m_len); T> n->m_len += m->m_len; T> - sb->sb_cc += m->m_len; T> + sb->sb_ccc += m->m_len; T> + if (sb->sb_fnrdy == NULL) T> + sb->sb_acc += m->m_len; T> if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> /* XXX: Probably don't need.*/ T> sb->sb_ctl += m->m_len; T> @@ -843,13 +1012,13 @@ sbflush_internal(struct sockbuf *sb) T> * Don't call sbcut(sb, 0) if the leading mbuf is non-empty: T> * we would loop forever. Panic instead. T> */ T> - if (!sb->sb_cc && (sb->sb_mb == NULL || sb->sb_mb->m_len)) T> + if (sb->sb_ccc == 0 && (sb->sb_mb == NULL || sb->sb_mb->m_len)) T> break; T> - m_freem(sbcut_internal(sb, (int)sb->sb_cc)); T> + m_freem(sbcut_internal(sb, (int)sb->sb_ccc)); T> } T> - if (sb->sb_cc || sb->sb_mb || sb->sb_mbcnt) T> - panic("sbflush_internal: cc %u || mb %p || mbcnt %u", T> - sb->sb_cc, (void *)sb->sb_mb, sb->sb_mbcnt); T> + KASSERT(sb->sb_ccc == 0 && sb->sb_mb == 0 && sb->sb_mbcnt == 0, T> + ("%s: ccc %u mb %p mbcnt %u", __func__, T> + sb->sb_ccc, (void *)sb->sb_mb, sb->sb_mbcnt)); T> } T> T> void T> @@ -891,7 +1060,9 @@ sbcut_internal(struct sockbuf *sb, int len) T> if (m->m_len > len) { T> m->m_len -= len; T> m->m_data += len; T> - sb->sb_cc -= len; T> + sb->sb_ccc -= len; T> + if (!(m->m_flags & M_NOTAVAIL)) T> + sb->sb_acc -= len; T> if (sb->sb_sndptroff != 0) T> sb->sb_sndptroff -= len; T> if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) T> @@ -977,8 +1148,8 @@ sbsndptr(struct sockbuf *sb, u_int off, u_int len, T> struct mbuf *m, *ret; T> T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__)); T> - KASSERT(off + len <= sb->sb_cc, ("%s: beyond sb", __func__)); T> - KASSERT(sb->sb_sndptroff <= sb->sb_cc, ("%s: sndptroff broken", __func__)); T> + KASSERT(off + len <= sb->sb_acc, ("%s: beyond sb", __func__)); T> + KASSERT(sb->sb_sndptroff <= sb->sb_acc, ("%s: sndptroff broken", __func__)); T> T> /* T> * Is off below stored offset? Happens on retransmits. T> @@ -1091,7 +1262,7 @@ void T> sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) T> { T> T> - xsb->sb_cc = sb->sb_cc; T> + xsb->sb_cc = sb->sb_ccc; T> xsb->sb_hiwat = sb->sb_hiwat; T> xsb->sb_mbcnt = sb->sb_mbcnt; T> xsb->sb_mcnt = sb->sb_mcnt; T> Index: sys/kern/uipc_syscalls.c T> =================================================================== T> --- sys/kern/uipc_syscalls.c (.../head) (revision 266804) T> +++ sys/kern/uipc_syscalls.c (.../projects/sendfile) (revision 266807) T> @@ -132,9 +132,10 @@ static int filt_sfsync(struct knote *kn, long hint T> */ T> static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0, T> "sendfile(2) tunables"); T> -static int sfreadahead = 1; T> + T> +static int sfreadahead = 0; T> SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW, T> - &sfreadahead, 0, "Number of sendfile(2) read-ahead MAXBSIZE blocks"); T> + &sfreadahead, 0, "Read this more pages than socket buffer can accept"); T> T> #ifdef SFSYNC_DEBUG T> static int sf_sync_debug = 0; T> @@ -1988,7 +1989,7 @@ filt_sfsync(struct knote *kn, long hint) T> * Detach mapped page and release resources back to the system. T> */ T> int T> -sf_buf_mext(struct mbuf *mb, void *addr, void *args) T> +sf_mext_free(struct mbuf *mb, void *addr, void *args) T> { T> vm_page_t m; T> struct sendfile_sync *sfs; T> @@ -2009,13 +2010,42 @@ int T> sfs = addr; T> sf_sync_deref(sfs); T> } T> - /* T> - * sfs may be invalid at this point, don't use it! T> - */ T> return (EXT_FREE_OK); T> } T> T> /* T> + * Same as above, but forces the page to be detached from the object T> + * and go into free pool. T> + */ T> +static int T> +sf_mext_free_nocache(struct mbuf *mb, void *addr, void *args) T> +{ T> + vm_page_t m; T> + struct sendfile_sync *sfs; T> + T> + m = sf_buf_page(args); T> + sf_buf_free(args); T> + vm_page_lock(m); T> + vm_page_unwire(m, 0); T> + if (m->wire_count == 0) { T> + vm_object_t obj; T> + T> + if ((obj = m->object) == NULL) T> + vm_page_free(m); T> + else if (!vm_page_xbusied(m) && VM_OBJECT_TRYWLOCK(obj)) { T> + vm_page_free(m); T> + VM_OBJECT_WUNLOCK(obj); T> + } T> + } T> + vm_page_unlock(m); T> + if (addr != NULL) { T> + sfs = addr; T> + sf_sync_deref(sfs); T> + } T> + return (EXT_FREE_OK); T> +} T> + T> +/* T> * Called to remove a reference to a sf_sync object. T> * T> * This is generally done during the mbuf free path to signify T> @@ -2608,106 +2638,181 @@ freebsd4_sendfile(struct thread *td, struct freebs T> } T> #endif /* COMPAT_FREEBSD4 */ T> T> + /* T> + * How much data to put into page i of n. T> + * Only first and last pages are special. T> + */ T> +static inline off_t T> +xfsize(int i, int n, off_t off, off_t len) T> +{ T> + T> + if (i == 0) T> + return (omin(PAGE_SIZE - (off & PAGE_MASK), len)); T> + T> + if (i == n - 1 && ((off + len) & PAGE_MASK) > 0) T> + return ((off + len) & PAGE_MASK); T> + T> + return (PAGE_SIZE); T> +} T> + T> +/* T> + * Offset within object for i page. T> + */ T> +static inline vm_offset_t T> +vmoff(int i, off_t off) T> +{ T> + T> + if (i == 0) T> + return ((vm_offset_t)off); T> + T> + return (trunc_page(off + i * PAGE_SIZE)); T> +} T> + T> +/* T> + * Pretend as if we don't have enough space, subtract xfsize() of T> + * all pages that failed. T> + */ T> +static inline void T> +fixspace(int old, int new, off_t off, int *space) T> +{ T> + T> + KASSERT(old > new, ("%s: old %d new %d", __func__, old, new)); T> + T> + /* Subtract last one. */ T> + *space -= xfsize(old - 1, old, off, *space); T> + old--; T> + T> + if (new == old) T> + /* There was only one page. */ T> + return; T> + T> + /* Subtract first one. */ T> + if (new == 0) { T> + *space -= xfsize(0, old, off, *space); T> + new++; T> + } T> + T> + /* Rest of pages are full sized. */ T> + *space -= (old - new) * PAGE_SIZE; T> + T> + KASSERT(*space >= 0, ("%s: space went backwards", __func__)); T> +} T> + T> +struct sf_io { T> + u_int nios; T> + int npages; T> + struct file *sock_fp; T> + struct mbuf *m; T> + vm_page_t pa[]; T> +}; T> + T> +static void T> +sf_io_done(void *arg) T> +{ T> + struct sf_io *sfio = arg; T> + struct socket *so; T> + T> + if (!refcount_release(&sfio->nios)) T> + return; T> + T> + so = sfio->sock_fp->f_data; T> + T> + if (sbready(&so->so_snd, sfio->m, sfio->npages) == 0) { T> + struct mbuf *m; T> + T> + m = m_get(M_NOWAIT, MT_DATA); T> + if (m == NULL) { T> + panic("XXXGL"); T> + } T> + m->m_len = 0; T> + CURVNET_SET(so->so_vnet); T> + /* XXXGL: curthread */ T> + (void )(so->so_proto->pr_usrreqs->pru_send) T> + (so, 0, m, NULL, NULL, curthread); T> + CURVNET_RESTORE(); T> + } T> + T> + /* XXXGL: curthread */ T> + fdrop(sfio->sock_fp, curthread); T> + free(sfio, M_TEMP); T> +} T> + T> static int T> -sendfile_readpage(vm_object_t obj, struct vnode *vp, int nd, T> - off_t off, int xfsize, int bsize, struct thread *td, vm_page_t *res) T> +sendfile_swapin(vm_object_t obj, struct sf_io *sfio, off_t off, off_t len, T> + int npages, int rhpages) T> { T> - vm_page_t m; T> - vm_pindex_t pindex; T> - ssize_t resid; T> - int error, readahead, rv; T> + vm_page_t *pa = sfio->pa; T> + int nios; T> T> - pindex = OFF_TO_IDX(off); T> + nios = 0; T> VM_OBJECT_WLOCK(obj); T> - m = vm_page_grab(obj, pindex, (vp != NULL ? VM_ALLOC_NOBUSY | T> - VM_ALLOC_IGN_SBUSY : 0) | VM_ALLOC_WIRED | VM_ALLOC_NORMAL); T> + for (int i = 0; i < npages; i++) T> + pa[i] = vm_page_grab(obj, OFF_TO_IDX(vmoff(i, off)), T> + VM_ALLOC_WIRED | VM_ALLOC_NORMAL); T> T> - /* T> - * Check if page is valid for what we need, otherwise initiate I/O. T> - * T> - * The non-zero nd argument prevents disk I/O, instead we T> - * return the caller what he specified in nd. In particular, T> - * if we already turned some pages into mbufs, nd == EAGAIN T> - * and the main function send them the pages before we come T> - * here again and block. T> - */ T> - if (m->valid != 0 && vm_page_is_valid(m, off & PAGE_MASK, xfsize)) { T> - if (vp == NULL) T> - vm_page_xunbusy(m); T> - VM_OBJECT_WUNLOCK(obj); T> - *res = m; T> - return (0); T> - } else if (nd != 0) { T> - if (vp == NULL) T> - vm_page_xunbusy(m); T> - error = nd; T> - goto free_page; T> - } T> + for (int i = 0; i < npages;) { T> + int j, a, count, rv; T> T> - /* T> - * Get the page from backing store. T> - */ T> - error = 0; T> - if (vp != NULL) { T> - VM_OBJECT_WUNLOCK(obj); T> - readahead = sfreadahead * MAXBSIZE; T> + if (vm_page_is_valid(pa[i], vmoff(i, off) & PAGE_MASK, T> + xfsize(i, npages, off, len))) { T> + vm_page_xunbusy(pa[i]); T> + i++; T> + continue; T> + } T> T> - /* T> - * Use vn_rdwr() instead of the pager interface for T> - * the vnode, to allow the read-ahead. T> - * T> - * XXXMAC: Because we don't have fp->f_cred here, we T> - * pass in NOCRED. This is probably wrong, but is T> - * consistent with our original implementation. T> - */ T> - error = vn_rdwr(UIO_READ, vp, NULL, readahead, trunc_page(off), T> - UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((readahead / T> - bsize) << IO_SEQSHIFT), td->td_ucred, NOCRED, &resid, td); T> - SFSTAT_INC(sf_iocnt); T> - VM_OBJECT_WLOCK(obj); T> - } else { T> - if (vm_pager_has_page(obj, pindex, NULL, NULL)) { T> - rv = vm_pager_get_pages(obj, &m, 1, 0); T> - SFSTAT_INC(sf_iocnt); T> - m = vm_page_lookup(obj, pindex); T> - if (m == NULL) T> - error = EIO; T> - else if (rv != VM_PAGER_OK) { T> - vm_page_lock(m); T> - vm_page_free(m); T> - vm_page_unlock(m); T> - m = NULL; T> - error = EIO; T> + for (j = i + 1; j < npages; j++) T> + if (vm_page_is_valid(pa[j], vmoff(j, off) & PAGE_MASK, T> + xfsize(j, npages, off, len))) T> + break; T> + T> + while (!vm_pager_has_page(obj, OFF_TO_IDX(vmoff(i, off)), T> + NULL, &a) && i < j) { T> + pmap_zero_page(pa[i]); T> + pa[i]->valid = VM_PAGE_BITS_ALL; T> + pa[i]->dirty = 0; T> + vm_page_xunbusy(pa[i]); T> + i++; T> + } T> + if (i == j) T> + continue; T> + T> + count = min(a + 1, npages + rhpages - i); T> + for (j = npages; j < i + count; j++) { T> + pa[j] = vm_page_grab(obj, OFF_TO_IDX(vmoff(j, off)), T> + VM_ALLOC_NORMAL | VM_ALLOC_NOWAIT); T> + if (pa[j] == NULL) { T> + count = j - i; T> + break; T> } T> - } else { T> - pmap_zero_page(m); T> - m->valid = VM_PAGE_BITS_ALL; T> - m->dirty = 0; T> + if (pa[j]->valid) { T> + vm_page_xunbusy(pa[j]); T> + count = j - i; T> + break; T> + } T> } T> - if (m != NULL) T> - vm_page_xunbusy(m); T> + T> + refcount_acquire(&sfio->nios); T> + rv = vm_pager_get_pages_async(obj, pa + i, count, 0, T> + &sf_io_done, sfio); T> + T> + KASSERT(rv == VM_PAGER_OK, ("%s: pager fail obj %p page %p", T> + __func__, obj, pa[i])); T> + T> + SFSTAT_INC(sf_iocnt); T> + nios++; T> + T> + for (j = i; j < i + count && j < npages; j++) T> + KASSERT(pa[j] == vm_page_lookup(obj, T> + OFF_TO_IDX(vmoff(j, off))), T> + ("pa[j] %p lookup %p\n", pa[j], T> + vm_page_lookup(obj, OFF_TO_IDX(vmoff(j, off))))); T> + T> + i += count; T> } T> - if (error == 0) { T> - *res = m; T> - } else if (m != NULL) { T> -free_page: T> - vm_page_lock(m); T> - vm_page_unwire(m, 0); T> T> - /* T> - * See if anyone else might know about this page. If T> - * not and it is not valid, then free it. T> - */ T> - if (m->wire_count == 0 && m->valid == 0 && !vm_page_busied(m)) T> - vm_page_free(m); T> - vm_page_unlock(m); T> - } T> - KASSERT(error != 0 || (m->wire_count > 0 && T> - vm_page_is_valid(m, off & PAGE_MASK, xfsize)), T> - ("wrong page state m %p off %#jx xfsize %d", m, (uintmax_t)off, T> - xfsize)); T> VM_OBJECT_WUNLOCK(obj); T> - return (error); T> + T> + return (nios); T> } T> T> static int T> @@ -2814,41 +2919,26 @@ vn_sendfile(struct file *fp, int sockfd, struct ui T> struct vnode *vp; T> struct vm_object *obj; T> struct socket *so; T> - struct mbuf *m; T> + struct mbuf *m, *mh, *mhtail; T> struct sf_buf *sf; T> - struct vm_page *pg; T> struct shmfd *shmfd; T> struct vattr va; T> - off_t off, xfsize, fsbytes, sbytes, rem, obj_size; T> - int error, bsize, nd, hdrlen, mnw; T> + off_t off, sbytes, rem, obj_size; T> + int error, serror, bsize, hdrlen; T> T> - pg = NULL; T> obj = NULL; T> so = NULL; T> - m = NULL; T> - fsbytes = sbytes = 0; T> - hdrlen = mnw = 0; T> - rem = nbytes; T> - obj_size = 0; T> + m = mh = NULL; T> + sbytes = 0; T> T> error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize); T> if (error != 0) T> return (error); T> - if (rem == 0) T> - rem = obj_size; T> T> error = kern_sendfile_getsock(td, sockfd, &sock_fp, &so); T> if (error != 0) T> goto out; T> T> - /* T> - * Do not wait on memory allocations but return ENOMEM for T> - * caller to retry later. T> - * XXX: Experimental. T> - */ T> - if (flags & SF_MNOWAIT) T> - mnw = 1; T> - T> #ifdef MAC T> error = mac_socket_check_send(td->td_ucred, so); T> if (error != 0) T> @@ -2856,31 +2946,27 @@ vn_sendfile(struct file *fp, int sockfd, struct ui T> #endif T> T> /* If headers are specified copy them into mbufs. */ T> - if (hdr_uio != NULL) { T> + if (hdr_uio != NULL && hdr_uio->uio_resid > 0) { T> hdr_uio->uio_td = td; T> hdr_uio->uio_rw = UIO_WRITE; T> - if (hdr_uio->uio_resid > 0) { T> - /* T> - * In FBSD < 5.0 the nbytes to send also included T> - * the header. If compat is specified subtract the T> - * header size from nbytes. T> - */ T> - if (kflags & SFK_COMPAT) { T> - if (nbytes > hdr_uio->uio_resid) T> - nbytes -= hdr_uio->uio_resid; T> - else T> - nbytes = 0; T> - } T> - m = m_uiotombuf(hdr_uio, (mnw ? M_NOWAIT : M_WAITOK), T> - 0, 0, 0); T> - if (m == NULL) { T> - error = mnw ? EAGAIN : ENOBUFS; T> - goto out; T> - } T> - hdrlen = m_length(m, NULL); T> + /* T> + * In FBSD < 5.0 the nbytes to send also included T> + * the header. If compat is specified subtract the T> + * header size from nbytes. T> + */ T> + if (kflags & SFK_COMPAT) { T> + if (nbytes > hdr_uio->uio_resid) T> + nbytes -= hdr_uio->uio_resid; T> + else T> + nbytes = 0; T> } T> - } T> + mh = m_uiotombuf(hdr_uio, M_WAITOK, 0, 0, 0); T> + hdrlen = m_length(mh, &mhtail); T> + } else T> + hdrlen = 0; T> T> + rem = nbytes ? omin(nbytes, obj_size - offset) : obj_size - offset; T> + T> /* T> * Protect against multiple writers to the socket. T> * T> @@ -2900,21 +2986,13 @@ vn_sendfile(struct file *fp, int sockfd, struct ui T> * The outer loop checks the state and available space of the socket T> * and takes care of the overall progress. T> */ T> - for (off = offset; ; ) { T> + for (off = offset; rem > 0; ) { T> + struct sf_io *sfio; T> + vm_page_t *pa; T> struct mbuf *mtail; T> - int loopbytes; T> - int space; T> - int done; T> + int nios, space, npages, rhpages; T> T> - if ((nbytes != 0 && nbytes == fsbytes) || T> - (nbytes == 0 && obj_size == fsbytes)) T> - break; T> - T> mtail = NULL; T> - loopbytes = 0; T> - space = 0; T> - done = 0; T> - T> /* T> * Check the socket state for ongoing connection, T> * no errors and space in socket buffer. T> @@ -2990,53 +3068,44 @@ retry_space: T> VOP_UNLOCK(vp, 0); T> goto done; T> } T> - obj_size = va.va_size; T> + if (va.va_size != obj_size) { T> + if (nbytes == 0) T> + rem += va.va_size - obj_size; T> + else if (offset + nbytes > va.va_size) T> + rem -= (offset + nbytes - va.va_size); T> + obj_size = va.va_size; T> + } T> } T> T> + if (space > rem) T> + space = rem; T> + T> + if (off & PAGE_MASK) T> + npages = 1 + howmany(space - T> + (PAGE_SIZE - (off & PAGE_MASK)), PAGE_SIZE); T> + else T> + npages = howmany(space, PAGE_SIZE); T> + T> + rhpages = SF_READAHEAD(flags) ? T> + SF_READAHEAD(flags) : sfreadahead; T> + rhpages = min(howmany(obj_size - (off & ~PAGE_MASK) - T> + (npages * PAGE_SIZE), PAGE_SIZE), rhpages); T> + T> + sfio = malloc(sizeof(struct sf_io) + T> + (rhpages + npages) * sizeof(vm_page_t), M_TEMP, M_WAITOK); T> + refcount_init(&sfio->nios, 1); T> + T> + nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages); T> + T> /* T> * Loop and construct maximum sized mbuf chain to be bulk T> * dumped into socket buffer. T> */ T> - while (space > loopbytes) { T> - vm_offset_t pgoff; T> + pa = sfio->pa; T> + for (int i = 0; i < npages; i++) { T> struct mbuf *m0; T> T> /* T> - * Calculate the amount to transfer. T> - * Not to exceed a page, the EOF, T> - * or the passed in nbytes. T> - */ T> - pgoff = (vm_offset_t)(off & PAGE_MASK); T> - rem = obj_size - offset; T> - if (nbytes != 0) T> - rem = omin(rem, nbytes); T> - rem -= fsbytes + loopbytes; T> - xfsize = omin(PAGE_SIZE - pgoff, rem); T> - xfsize = omin(space - loopbytes, xfsize); T> - if (xfsize <= 0) { T> - done = 1; /* all data sent */ T> - break; T> - } T> - T> - /* T> - * Attempt to look up the page. Allocate T> - * if not found or wait and loop if busy. T> - */ T> - if (m != NULL) T> - nd = EAGAIN; /* send what we already got */ T> - else if ((flags & SF_NODISKIO) != 0) T> - nd = EBUSY; T> - else T> - nd = 0; T> - error = sendfile_readpage(obj, vp, nd, off, T> - xfsize, bsize, td, &pg); T> - if (error != 0) { T> - if (error == EAGAIN) T> - error = 0; /* not a real error */ T> - break; T> - } T> - T> - /* T> * Get a sendfile buf. When allocating the T> * first buffer for mbuf chain, we usually T> * wait as long as necessary, but this wait T> @@ -3045,17 +3114,18 @@ retry_space: T> * threads might exhaust the buffers and then T> * deadlock. T> */ T> - sf = sf_buf_alloc(pg, (mnw || m != NULL) ? SFB_NOWAIT : T> - SFB_CATCH); T> + sf = sf_buf_alloc(pa[i], T> + m != NULL ? SFB_NOWAIT : SFB_CATCH); T> if (sf == NULL) { T> SFSTAT_INC(sf_allocfail); T> - vm_page_lock(pg); T> - vm_page_unwire(pg, 0); T> - KASSERT(pg->object != NULL, T> - ("%s: object disappeared", __func__)); T> - vm_page_unlock(pg); T> + for (int j = i; j < npages; j++) { T> + vm_page_lock(pa[j]); T> + vm_page_unwire(pa[j], 0); T> + vm_page_unlock(pa[j]); T> + } T> if (m == NULL) T> - error = (mnw ? EAGAIN : EINTR); T> + error = ENOBUFS; T> + fixspace(npages, i, off, &space); T> break; T> } T> T> @@ -3063,36 +3133,26 @@ retry_space: T> * Get an mbuf and set it up as having T> * external storage. T> */ T> - m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA); T> - if (m0 == NULL) { T> - error = (mnw ? EAGAIN : ENOBUFS); T> - (void)sf_buf_mext(NULL, NULL, sf); T> - break; T> - } T> - if (m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE, T> - sf_buf_mext, sfs, sf, M_RDONLY, EXT_SFBUF, T> - (mnw ? M_NOWAIT : M_WAITOK)) != 0) { T> - error = (mnw ? EAGAIN : ENOBUFS); T> - (void)sf_buf_mext(NULL, NULL, sf); T> - m_freem(m0); T> - break; T> - } T> - m0->m_data = (char *)sf_buf_kva(sf) + pgoff; T> - m0->m_len = xfsize; T> + m0 = m_get(M_WAITOK, MT_DATA); T> + (void )m_extadd(m0, (caddr_t )sf_buf_kva(sf), PAGE_SIZE, T> + (flags & SF_NOCACHE) ? sf_mext_free_nocache : T> + sf_mext_free, sfs, sf, M_RDONLY, EXT_SFBUF, T> + M_WAITOK); T> + m0->m_data = (char *)sf_buf_kva(sf) + T> + (vmoff(i, off) & PAGE_MASK); T> + m0->m_len = xfsize(i, npages, off, space); T> + m0->m_flags |= M_NOTREADY; T> T> + if (i == 0) T> + sfio->m = m0; T> + T> /* Append to mbuf chain. */ T> if (mtail != NULL) T> mtail->m_next = m0; T> - else if (m != NULL) T> - m_last(m)->m_next = m0; T> else T> m = m0; T> mtail = m0; T> T> - /* Keep track of bits processed. */ T> - loopbytes += xfsize; T> - off += xfsize; T> - T> /* T> * XXX eventually this should be a sfsync T> * method call! T> @@ -3104,47 +3164,51 @@ retry_space: T> if (vp != NULL) T> VOP_UNLOCK(vp, 0); T> T> + /* Keep track of bytes processed. */ T> + off += space; T> + rem -= space; T> + T> + /* Prepend header, if any. */ T> + if (hdrlen) { T> + mhtail->m_next = m; T> + m = mh; T> + mh = NULL; T> + } T> + T> + if (error) { T> + free(sfio, M_TEMP); T> + goto done; T> + } T> + T> /* Add the buffer chain to the socket buffer. */ T> - if (m != NULL) { T> - int mlen, err; T> + KASSERT(m_length(m, NULL) == space + hdrlen, T> + ("%s: mlen %u space %d hdrlen %d", T> + __func__, m_length(m, NULL), space, hdrlen)); T> T> - mlen = m_length(m, NULL); T> - SOCKBUF_LOCK(&so->so_snd); T> - if (so->so_snd.sb_state & SBS_CANTSENDMORE) { T> - error = EPIPE; T> - SOCKBUF_UNLOCK(&so->so_snd); T> - goto done; T> - } T> - SOCKBUF_UNLOCK(&so->so_snd); T> - CURVNET_SET(so->so_vnet); T> - /* Avoid error aliasing. */ T> - err = (*so->so_proto->pr_usrreqs->pru_send) T> - (so, 0, m, NULL, NULL, td); T> - CURVNET_RESTORE(); T> - if (err == 0) { T> - /* T> - * We need two counters to get the T> - * file offset and nbytes to send T> - * right: T> - * - sbytes contains the total amount T> - * of bytes sent, including headers. T> - * - fsbytes contains the total amount T> - * of bytes sent from the file. T> - */ T> - sbytes += mlen; T> - fsbytes += mlen; T> - if (hdrlen) { T> - fsbytes -= hdrlen; T> - hdrlen = 0; T> - } T> - } else if (error == 0) T> - error = err; T> - m = NULL; /* pru_send always consumes */ T> + CURVNET_SET(so->so_vnet); T> + if (nios == 0) { T> + free(sfio, M_TEMP); T> + serror = (*so->so_proto->pr_usrreqs->pru_send) T> + (so, 0, m, NULL, NULL, td); T> + } else { T> + sfio->sock_fp = sock_fp; T> + sfio->npages = npages; T> + fhold(sock_fp); T> + serror = (*so->so_proto->pr_usrreqs->pru_send) T> + (so, PRUS_NOTREADY, m, NULL, NULL, td); T> + sf_io_done(sfio); T> } T> + CURVNET_RESTORE(); T> T> - /* Quit outer loop on error or when we're done. */ T> - if (done) T> - break; T> + if (serror == 0) { T> + sbytes += space + hdrlen; T> + if (hdrlen) T> + hdrlen = 0; T> + } else if (error == 0) T> + error = serror; T> + m = NULL; /* pru_send always consumes */ T> + T> + /* Quit outer loop on error. */ T> if (error != 0) T> goto done; T> } T> @@ -3179,6 +3243,8 @@ out: T> fdrop(sock_fp, td); T> if (m) T> m_freem(m); T> + if (mh) T> + m_freem(mh); T> T> if (error == ERESTART) T> error = EINTR; T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c T> =================================================================== T> --- sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../head) (revision 266804) T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../projects/sendfile) (revision 266807) T> @@ -1127,9 +1127,8 @@ ng_btsocket_l2cap_process_l2ca_write_rsp(struct ng T> /* T> * Check if we have more data to send T> */ T> - T> sbdroprecord(&pcb->so->so_snd); T> - if (pcb->so->so_snd.sb_cc > 0) { T> + if (sbavail(&pcb->so->so_snd) > 0) { T> if (ng_btsocket_l2cap_send2(pcb) == 0) T> ng_btsocket_l2cap_timeout(pcb); T> else T> @@ -2510,7 +2509,7 @@ ng_btsocket_l2cap_send2(ng_btsocket_l2cap_pcb_p pc T> T> mtx_assert(&pcb->pcb_mtx, MA_OWNED); T> T> - if (pcb->so->so_snd.sb_cc == 0) T> + if (sbavail(&pcb->so->so_snd) == 0) T> return (EINVAL); /* XXX */ T> T> m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c T> =================================================================== T> --- sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../head) (revision 266804) T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../projects/sendfile) (revision 266807) T> @@ -3274,7 +3274,7 @@ ng_btsocket_rfcomm_pcb_send(ng_btsocket_rfcomm_pcb T> } T> T> for (error = 0, sent = 0; sent < limit; sent ++) { T> - length = min(pcb->mtu, pcb->so->so_snd.sb_cc); T> + length = min(pcb->mtu, sbavail(&pcb->so->so_snd)); T> if (length == 0) T> break; T> T> Index: sys/netgraph/bluetooth/socket/ng_btsocket_sco.c T> =================================================================== T> --- sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../head) (revision 266804) T> +++ sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../projects/sendfile) (revision 266807) T> @@ -906,7 +906,7 @@ ng_btsocket_sco_default_msg_input(struct ng_mesg * T> sbdroprecord(&pcb->so->so_snd); T> T> /* Send more if we have any */ T> - if (pcb->so->so_snd.sb_cc > 0) T> + if (sbavail(&pcb->so->so_snd) > 0) T> if (ng_btsocket_sco_send2(pcb) == 0) T> ng_btsocket_sco_timeout(pcb); T> T> @@ -1744,7 +1744,7 @@ ng_btsocket_sco_send2(ng_btsocket_sco_pcb_p pcb) T> mtx_assert(&pcb->pcb_mtx, MA_OWNED); T> T> while (pcb->rt->pending < pcb->rt->num_pkts && T> - pcb->so->so_snd.sb_cc > 0) { T> + sbavail(&pcb->so->so_snd) > 0) { T> /* Get a copy of the first packet on send queue */ T> m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); T> if (m == NULL) { T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c T> =================================================================== T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../head) (revision 266804) T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../projects/sendfile) (revision 266807) T> @@ -746,7 +746,7 @@ sdp_start_disconnect(struct sdp_sock *ssk) T> ("sdp_start_disconnect: sdp_drop() returned NULL")); T> } else { T> soisdisconnecting(so); T> - unread = so->so_rcv.sb_cc; T> + unread = sbused(&so->so_rcv); T> sbflush(&so->so_rcv); T> sdp_usrclosed(ssk); T> if (!(ssk->flags & SDP_DROPPED)) { T> @@ -888,7 +888,7 @@ sdp_append(struct sdp_sock *ssk, struct sockbuf *s T> m_adj(mb, SDP_HEAD_SIZE); T> n->m_pkthdr.len += mb->m_pkthdr.len; T> n->m_flags |= mb->m_flags & (M_PUSH | M_URG); T> - m_demote(mb, 1); T> + m_demote(mb, 1, 0); T> sbcompress(sb, mb, sb->sb_mbtail); T> return; T> } T> @@ -1258,7 +1258,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps T> /* We will never ever get anything unless we are connected. */ T> if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) { T> /* When disconnecting there may be still some data left. */ T> - if (sb->sb_cc > 0) T> + if (sbavail(sb)) T> goto deliver; T> if (!(so->so_state & SS_ISDISCONNECTED)) T> error = ENOTCONN; T> @@ -1266,7 +1266,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps T> } T> T> /* Socket buffer is empty and we shall not block. */ T> - if (sb->sb_cc == 0 && T> + if (sbavail(sb) == 0 && T> ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { T> error = EAGAIN; T> goto out; T> @@ -1277,7 +1277,7 @@ restart: T> T> /* Abort if socket has reported problems. */ T> if (so->so_error) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb)) T> goto deliver; T> if (oresid > uio->uio_resid) T> goto out; T> @@ -1289,7 +1289,7 @@ restart: T> T> /* Door is closed. Deliver what is left, if any. */ T> if (sb->sb_state & SBS_CANTRCVMORE) { T> - if (sb->sb_cc > 0) T> + if (sbavail(sb)) T> goto deliver; T> else T> goto out; T> @@ -1296,18 +1296,18 @@ restart: T> } T> T> /* Socket buffer got some data that we shall deliver now. */ T> - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && T> + if (sbavail(sb) && !(flags & MSG_WAITALL) && T> ((so->so_state & SS_NBIO) || T> (flags & (MSG_DONTWAIT|MSG_NBIO)) || T> - sb->sb_cc >= sb->sb_lowat || T> - sb->sb_cc >= uio->uio_resid || T> - sb->sb_cc >= sb->sb_hiwat) ) { T> + sbavail(sb) >= sb->sb_lowat || T> + sbavail(sb) >= uio->uio_resid || T> + sbavail(sb) >= sb->sb_hiwat) ) { T> goto deliver; T> } T> T> /* On MSG_WAITALL we must wait until all data or error arrives. */ T> if ((flags & MSG_WAITALL) && T> - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) T> + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_lowat)) T> goto deliver; T> T> /* T> @@ -1321,7 +1321,7 @@ restart: T> T> deliver: T> SOCKBUF_LOCK_ASSERT(&so->so_rcv); T> - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); T> + KASSERT(sbavail(sb), ("%s: sockbuf empty", __func__)); T> KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); T> T> /* Statistics. */ T> @@ -1329,7 +1329,7 @@ deliver: T> uio->uio_td->td_ru.ru_msgrcv++; T> T> /* Fill uio until full or current end of socket buffer is reached. */ T> - len = min(uio->uio_resid, sb->sb_cc); T> + len = min(uio->uio_resid, sbavail(sb)); T> if (mp0 != NULL) { T> /* Dequeue as many mbufs as possible. */ T> if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { T> @@ -1509,7 +1509,7 @@ sdp_urg(struct sdp_sock *ssk, struct mbuf *mb) T> if (so == NULL) T> return; T> T> - so->so_oobmark = so->so_rcv.sb_cc + mb->m_pkthdr.len - 1; T> + so->so_oobmark = sbused(&so->so_rcv) + mb->m_pkthdr.len - 1; T> sohasoutofband(so); T> ssk->oobflags &= ~(SDP_HAVEOOB | SDP_HADOOB); T> if (!(so->so_options & SO_OOBINLINE)) { T> Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c T> =================================================================== T> --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../head) (revision 266804) T> +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../projects/sendfile) (revision 266807) T> @@ -183,7 +183,7 @@ sdp_post_recvs_needed(struct sdp_sock *ssk) T> * Compute bytes in the receive queue and socket buffer. T> */ T> bytes_in_process = (posted - SDP_MIN_TX_CREDITS) * buffer_size; T> - bytes_in_process += ssk->socket->so_rcv.sb_cc; T> + bytes_in_process += sbused(&ssk->socket->so_rcv); T> T> return bytes_in_process < max_bytes; T> } T> Index: sys/sys/socket.h T> =================================================================== T> --- sys/sys/socket.h (.../head) (revision 266804) T> +++ sys/sys/socket.h (.../projects/sendfile) (revision 266807) T> @@ -602,12 +602,15 @@ struct sf_hdtr_all { T> * Sendfile-specific flag(s) T> */ T> #define SF_NODISKIO 0x00000001 T> -#define SF_MNOWAIT 0x00000002 T> +#define SF_MNOWAIT 0x00000002 /* unused since 11.0 */ T> #define SF_SYNC 0x00000004 T> #define SF_KQUEUE 0x00000008 T> +#define SF_NOCACHE 0x00000010 T> +#define SF_FLAGS(rh, flags) (((rh) << 16) | (flags)) T> T> #ifdef _KERNEL T> #define SFK_COMPAT 0x00000001 T> +#define SF_READAHEAD(flags) ((flags) >> 16) T> #endif /* _KERNEL */ T> #endif /* __BSD_VISIBLE */ T> T> Index: sys/sys/sockbuf.h T> =================================================================== T> --- sys/sys/sockbuf.h (.../head) (revision 266804) T> +++ sys/sys/sockbuf.h (.../projects/sendfile) (revision 266807) T> @@ -89,8 +89,13 @@ struct sockbuf { T> struct mbuf *sb_lastrecord; /* (c/d) first mbuf of last T> * record in socket buffer */ T> struct mbuf *sb_sndptr; /* (c/d) pointer into mbuf chain */ T> + struct mbuf *sb_fnrdy; /* (c/d) pointer to first not ready buffer */ T> +#if 0 T> + struct mbuf *sb_lnrdy; /* (c/d) pointer to last not ready buffer */ T> +#endif T> u_int sb_sndptroff; /* (c/d) byte offset of ptr into chain */ T> - u_int sb_cc; /* (c/d) actual chars in buffer */ T> + u_int sb_acc; /* (c/d) available chars in buffer */ T> + u_int sb_ccc; /* (c/d) claimed chars in buffer */ T> u_int sb_hiwat; /* (c/d) max actual char count */ T> u_int sb_mbcnt; /* (c/d) chars of mbufs used */ T> u_int sb_mcnt; /* (c/d) number of mbufs in buffer */ T> @@ -120,10 +125,17 @@ struct sockbuf { T> #define SOCKBUF_LOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_OWNED) T> #define SOCKBUF_UNLOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_NOTOWNED) T> T> +/* T> + * Socket buffer private mbuf(9) flags. T> + */ T> +#define M_NOTREADY M_PROTO1 /* m_data not populated yet */ T> +#define M_BLOCKED M_PROTO2 /* M_NOTREADY in front of m */ T> +#define M_NOTAVAIL (M_NOTREADY | M_BLOCKED) T> + T> void sbappend(struct sockbuf *sb, struct mbuf *m); T> void sbappend_locked(struct sockbuf *sb, struct mbuf *m); T> -void sbappendstream(struct sockbuf *sb, struct mbuf *m); T> -void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m); T> +void sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags); T> +void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags); T> int sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa, T> struct mbuf *m0, struct mbuf *control); T> int sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa, T> @@ -136,7 +148,6 @@ int sbappendcontrol_locked(struct sockbuf *sb, str T> struct mbuf *control); T> void sbappendrecord(struct sockbuf *sb, struct mbuf *m0); T> void sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0); T> -void sbcheck(struct sockbuf *sb); T> void sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n); T> struct mbuf * T> sbcreatecontrol(caddr_t p, int size, int type, int level); T> @@ -162,59 +173,54 @@ void sbtoxsockbuf(struct sockbuf *sb, struct xsock T> int sbwait(struct sockbuf *sb); T> int sblock(struct sockbuf *sb, int flags); T> void sbunlock(struct sockbuf *sb); T> +void sballoc(struct sockbuf *, struct mbuf *); T> +void sbfree(struct sockbuf *, struct mbuf *); T> +void sbmtrim(struct sockbuf *, struct mbuf *, int); T> +int sbready(struct sockbuf *, struct mbuf *, int); T> T> +static inline u_int T> +sbavail(struct sockbuf *sb) T> +{ T> + T> +#if 0 T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + return (sb->sb_acc); T> +} T> + T> +static inline u_int T> +sbused(struct sockbuf *sb) T> +{ T> + T> +#if 0 T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + return (sb->sb_ccc); T> +} T> + T> /* T> * How much space is there in a socket buffer (so->so_snd or so->so_rcv)? T> * This is problematical if the fields are unsigned, as the space might T> - * still be negative (cc > hiwat or mbcnt > mbmax). Should detect T> - * overflow and return 0. Should use "lmin" but it doesn't exist now. T> + * still be negative (ccc > hiwat or mbcnt > mbmax). T> */ T> -static __inline T> -long T> +static inline long T> sbspace(struct sockbuf *sb) T> { T> - long bleft; T> - long mleft; T> + long bleft, mleft; T> T> +#if 0 T> + SOCKBUF_LOCK_ASSERT(sb); T> +#endif T> + T> if (sb->sb_flags & SB_STOP) T> return(0); T> - bleft = sb->sb_hiwat - sb->sb_cc; T> + T> + bleft = sb->sb_hiwat - sb->sb_ccc; T> mleft = sb->sb_mbmax - sb->sb_mbcnt; T> - return((bleft < mleft) ? bleft : mleft); T> -} T> T> -/* adjust counters in sb reflecting allocation of m */ T> -#define sballoc(sb, m) { \ T> - (sb)->sb_cc += (m)->m_len; \ T> - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ T> - (sb)->sb_ctl += (m)->m_len; \ T> - (sb)->sb_mbcnt += MSIZE; \ T> - (sb)->sb_mcnt += 1; \ T> - if ((m)->m_flags & M_EXT) { \ T> - (sb)->sb_mbcnt += (m)->m_ext.ext_size; \ T> - (sb)->sb_ccnt += 1; \ T> - } \ T> + return ((bleft < mleft) ? bleft : mleft); T> } T> T> -/* adjust counters in sb reflecting freeing of m */ T> -#define sbfree(sb, m) { \ T> - (sb)->sb_cc -= (m)->m_len; \ T> - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ T> - (sb)->sb_ctl -= (m)->m_len; \ T> - (sb)->sb_mbcnt -= MSIZE; \ T> - (sb)->sb_mcnt -= 1; \ T> - if ((m)->m_flags & M_EXT) { \ T> - (sb)->sb_mbcnt -= (m)->m_ext.ext_size; \ T> - (sb)->sb_ccnt -= 1; \ T> - } \ T> - if ((sb)->sb_sndptr == (m)) { \ T> - (sb)->sb_sndptr = NULL; \ T> - (sb)->sb_sndptroff = 0; \ T> - } \ T> - if ((sb)->sb_sndptroff != 0) \ T> - (sb)->sb_sndptroff -= (m)->m_len; \ T> -} T> - T> #define SB_EMPTY_FIXUP(sb) do { \ T> if ((sb)->sb_mb == NULL) { \ T> (sb)->sb_mbtail = NULL; \ T> @@ -224,13 +230,15 @@ sbspace(struct sockbuf *sb) T> T> #ifdef SOCKBUF_DEBUG T> void sblastrecordchk(struct sockbuf *, const char *, int); T> +void sblastmbufchk(struct sockbuf *, const char *, int); T> +void sbcheck(struct sockbuf *, const char *, int); T> #define SBLASTRECORDCHK(sb) sblastrecordchk((sb), __FILE__, __LINE__) T> - T> -void sblastmbufchk(struct sockbuf *, const char *, int); T> #define SBLASTMBUFCHK(sb) sblastmbufchk((sb), __FILE__, __LINE__) T> +#define SBCHECK(sb) sbcheck((sb), __FILE__, __LINE__) T> #else T> -#define SBLASTRECORDCHK(sb) /* nothing */ T> -#define SBLASTMBUFCHK(sb) /* nothing */ T> +#define SBLASTRECORDCHK(sb) do {} while (0) T> +#define SBLASTMBUFCHK(sb) do {} while (0) T> +#define SBCHECK(sb) do {} while (0) T> #endif /* SOCKBUF_DEBUG */ T> T> #endif /* _KERNEL */ T> Index: sys/sys/protosw.h T> =================================================================== T> --- sys/sys/protosw.h (.../head) (revision 266804) T> +++ sys/sys/protosw.h (.../projects/sendfile) (revision 266807) T> @@ -209,6 +209,7 @@ struct pr_usrreqs { T> #define PRUS_OOB 0x1 T> #define PRUS_EOF 0x2 T> #define PRUS_MORETOCOME 0x4 T> +#define PRUS_NOTREADY 0x8 T> int (*pru_sense)(struct socket *so, struct stat *sb); T> int (*pru_shutdown)(struct socket *so); T> int (*pru_flush)(struct socket *so, int direction); T> Index: sys/sys/sf_buf.h T> =================================================================== T> --- sys/sys/sf_buf.h (.../head) (revision 266804) T> +++ sys/sys/sf_buf.h (.../projects/sendfile) (revision 266807) T> @@ -52,7 +52,7 @@ struct sfstat { /* sendfile statistics */ T> #include T> #include T> #include T> -struct mbuf; /* for sf_buf_mext() */ T> +struct mbuf; /* for sf_mext_free() */ T> T> extern counter_u64_t sfstat[sizeof(struct sfstat) / sizeof(uint64_t)]; T> #define SFSTAT_ADD(name, val) \ T> @@ -61,6 +61,6 @@ extern counter_u64_t sfstat[sizeof(struct sfstat) T> #define SFSTAT_INC(name) SFSTAT_ADD(name, 1) T> #endif /* _KERNEL */ T> T> -int sf_buf_mext(struct mbuf *mb, void *addr, void *args); T> +int sf_mext_free(struct mbuf *mb, void *addr, void *args); T> T> #endif /* !_SYS_SF_BUF_H_ */ T> Index: sys/sys/vnode.h T> =================================================================== T> --- sys/sys/vnode.h (.../head) (revision 266804) T> +++ sys/sys/vnode.h (.../projects/sendfile) (revision 266807) T> @@ -719,6 +719,7 @@ int vop_stdbmap(struct vop_bmap_args *); T> int vop_stdfsync(struct vop_fsync_args *); T> int vop_stdgetwritemount(struct vop_getwritemount_args *); T> int vop_stdgetpages(struct vop_getpages_args *); T> +int vop_stdgetpages_async(struct vop_getpages_async_args *); T> int vop_stdinactive(struct vop_inactive_args *); T> int vop_stdislocked(struct vop_islocked_args *); T> int vop_stdkqfilter(struct vop_kqfilter_args *); T> Index: sys/sys/socketvar.h T> =================================================================== T> --- sys/sys/socketvar.h (.../head) (revision 266804) T> +++ sys/sys/socketvar.h (.../projects/sendfile) (revision 266807) T> @@ -205,7 +205,7 @@ struct xsocket { T> T> /* can we read something from so? */ T> #define soreadabledata(so) \ T> - ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \ T> + (sbavail(&(so)->so_rcv) >= (so)->so_rcv.sb_lowat || \ T> !TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error) T> #define soreadable(so) \ T> (soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE)) T> Index: sys/sys/mbuf.h T> =================================================================== T> --- sys/sys/mbuf.h (.../head) (revision 266804) T> +++ sys/sys/mbuf.h (.../projects/sendfile) (revision 266807) T> @@ -922,7 +922,7 @@ struct mbuf *m_copypacket(struct mbuf *, int); T> void m_copy_pkthdr(struct mbuf *, struct mbuf *); T> struct mbuf *m_copyup(struct mbuf *, int, int); T> struct mbuf *m_defrag(struct mbuf *, int); T> -void m_demote(struct mbuf *, int); T> +void m_demote(struct mbuf *, int, int); T> struct mbuf *m_devget(char *, int, int, struct ifnet *, T> void (*)(char *, caddr_t, u_int)); T> struct mbuf *m_dup(struct mbuf *, int); T> Index: sys/vm/vnode_pager.h T> =================================================================== T> --- sys/vm/vnode_pager.h (.../head) (revision 266804) T> +++ sys/vm/vnode_pager.h (.../projects/sendfile) (revision 266807) T> @@ -41,7 +41,7 @@ T> #ifdef _KERNEL T> T> int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, T> - int count, int reqpage); T> + int count, int reqpage, void (*iodone)(void *), void *arg); T> int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m, T> int count, boolean_t sync, T> int *rtvals); T> Index: sys/vm/vm_pager.h T> =================================================================== T> --- sys/vm/vm_pager.h (.../head) (revision 266804) T> +++ sys/vm/vm_pager.h (.../projects/sendfile) (revision 266807) T> @@ -51,18 +51,21 @@ typedef vm_object_t pgo_alloc_t(void *, vm_ooffset T> struct ucred *); T> typedef void pgo_dealloc_t(vm_object_t); T> typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int); T> +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int, T> + void(*)(void *), void *); T> typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *); T> typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *); T> typedef void pgo_pageunswapped_t(vm_page_t); T> T> struct pagerops { T> - pgo_init_t *pgo_init; /* Initialize pager. */ T> - pgo_alloc_t *pgo_alloc; /* Allocate pager. */ T> - pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ T> - pgo_getpages_t *pgo_getpages; /* Get (read) page. */ T> - pgo_putpages_t *pgo_putpages; /* Put (write) page. */ T> - pgo_haspage_t *pgo_haspage; /* Does pager have page? */ T> - pgo_pageunswapped_t *pgo_pageunswapped; T> + pgo_init_t *pgo_init; /* Initialize pager. */ T> + pgo_alloc_t *pgo_alloc; /* Allocate pager. */ T> + pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ T> + pgo_getpages_t *pgo_getpages; /* Get (read) page. */ T> + pgo_getpages_async_t *pgo_getpages_async; /* Get page asyncly. */ T> + pgo_putpages_t *pgo_putpages; /* Put (write) page. */ T> + pgo_haspage_t *pgo_haspage; /* Query page. */ T> + pgo_pageunswapped_t *pgo_pageunswapped; T> }; T> T> extern struct pagerops defaultpagerops; T> @@ -103,6 +106,8 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v T> void vm_pager_bufferinit(void); T> void vm_pager_deallocate(vm_object_t); T> static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int); T> +static __inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, T> + int, void(*)(void *), void *); T> static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *); T> void vm_pager_init(void); T> vm_object_t vm_pager_object_lookup(struct pagerlst *, void *); T> @@ -131,6 +136,27 @@ vm_pager_get_pages( T> return (r); T> } T> T> +static __inline int T> +vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count, T> + int reqpage, void (*iodone)(void *), void *arg) T> +{ T> + int r; T> + T> + VM_OBJECT_ASSERT_WLOCKED(object); T> + T> + if (*pagertab[object->type]->pgo_getpages_async == NULL) { T> + /* Emulate async operation. */ T> + r = vm_pager_get_pages(object, m, count, reqpage); T> + VM_OBJECT_WUNLOCK(object); T> + (iodone)(arg); T> + VM_OBJECT_WLOCK(object); T> + } else T> + r = (*pagertab[object->type]->pgo_getpages_async)(object, m, T> + count, reqpage, iodone, arg); T> + T> + return (r); T> +} T> + T> static __inline void T> vm_pager_put_pages( T> vm_object_t object, T> Index: sys/vm/vm_page.c T> =================================================================== T> --- sys/vm/vm_page.c (.../head) (revision 266804) T> +++ sys/vm/vm_page.c (.../projects/sendfile) (revision 266807) T> @@ -2689,6 +2689,8 @@ retrylookup: T> sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ? T> vm_page_xbusied(m) : vm_page_busied(m); T> if (sleep) { T> + if (allocflags & VM_ALLOC_NOWAIT) T> + return (NULL); T> /* T> * Reference the page before unlocking and T> * sleeping so that the page daemon is less T> @@ -2716,6 +2718,8 @@ retrylookup: T> } T> m = vm_page_alloc(object, pindex, allocflags & ~VM_ALLOC_IGN_SBUSY); T> if (m == NULL) { T> + if (allocflags & VM_ALLOC_NOWAIT) T> + return (NULL); T> VM_OBJECT_WUNLOCK(object); T> VM_WAIT; T> VM_OBJECT_WLOCK(object); T> Index: sys/vm/vm_page.h T> =================================================================== T> --- sys/vm/vm_page.h (.../head) (revision 266804) T> +++ sys/vm/vm_page.h (.../projects/sendfile) (revision 266807) T> @@ -390,6 +390,7 @@ vm_page_t PHYS_TO_VM_PAGE(vm_paddr_t pa); T> #define VM_ALLOC_IGN_SBUSY 0x1000 /* vm_page_grab() only */ T> #define VM_ALLOC_NODUMP 0x2000 /* don't include in dump */ T> #define VM_ALLOC_SBUSY 0x4000 /* Shared busy the page */ T> +#define VM_ALLOC_NOWAIT 0x8000 /* Return NULL instead of sleeping */ T> T> #define VM_ALLOC_COUNT_SHIFT 16 T> #define VM_ALLOC_COUNT(count) ((count) << VM_ALLOC_COUNT_SHIFT) T> Index: sys/vm/vnode_pager.c T> =================================================================== T> --- sys/vm/vnode_pager.c (.../head) (revision 266804) T> +++ sys/vm/vnode_pager.c (.../projects/sendfile) (revision 266807) T> @@ -83,6 +83,8 @@ static int vnode_pager_input_smlfs(vm_object_t obj T> static int vnode_pager_input_old(vm_object_t object, vm_page_t m); T> static void vnode_pager_dealloc(vm_object_t); T> static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int); T> +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int, T> + void(*)(void *), void *); T> static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); T> static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *); T> static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t, T> @@ -92,6 +94,7 @@ struct pagerops vnodepagerops = { T> .pgo_alloc = vnode_pager_alloc, T> .pgo_dealloc = vnode_pager_dealloc, T> .pgo_getpages = vnode_pager_getpages, T> + .pgo_getpages_async = vnode_pager_getpages_async, T> .pgo_putpages = vnode_pager_putpages, T> .pgo_haspage = vnode_pager_haspage, T> }; T> @@ -664,6 +667,40 @@ vnode_pager_getpages(vm_object_t object, vm_page_t T> return rtval; T> } T> T> +static int T> +vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, T> + int reqpage, void (*iodone)(void *), void *arg) T> +{ T> + int rtval; T> + struct vnode *vp; T> + int bytes = count * PAGE_SIZE; T> + T> + vp = object->handle; T> + VM_OBJECT_WUNLOCK(object); T> + rtval = VOP_GETPAGES_ASYNC(vp, m, bytes, reqpage, 0, iodone, arg); T> + KASSERT(rtval != EOPNOTSUPP, T> + ("vnode_pager: FS getpages_async not implemented\n")); T> + VM_OBJECT_WLOCK(object); T> + return rtval; T> +} T> + T> +struct getpages_softc { T> + vm_page_t *m; T> + struct buf *bp; T> + vm_object_t object; T> + vm_offset_t kva; T> + off_t foff; T> + int size; T> + int count; T> + int unmapped; T> + int reqpage; T> + void (*iodone)(void *); T> + void *arg; T> +}; T> + T> +int vnode_pager_generic_getpages_done(struct getpages_softc *); T> +void vnode_pager_generic_getpages_done_async(struct buf *); T> + T> /* T> * This is now called from local media FS's to operate against their T> * own vnodes if they fail to implement VOP_GETPAGES. T> @@ -670,11 +707,11 @@ vnode_pager_getpages(vm_object_t object, vm_page_t T> */ T> int T> vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount, T> - int reqpage) T> + int reqpage, void (*iodone)(void *), void *arg) T> { T> vm_object_t object; T> vm_offset_t kva; T> - off_t foff, tfoff, nextoff; T> + off_t foff; T> int i, j, size, bsize, first; T> daddr_t firstaddr, reqblock; T> struct bufobj *bo; T> @@ -684,6 +721,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> struct mount *mp; T> int count; T> int error; T> + int unmapped; T> T> object = vp->v_object; T> count = bytecount / PAGE_SIZE; T> @@ -891,8 +929,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> * requires mapped buffers. T> */ T> mp = vp->v_mount; T> - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 && T> - unmapped_buf_allowed) { T> + unmapped = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS)); T> + if (unmapped && unmapped_buf_allowed) { T> bp->b_data = unmapped_buf; T> bp->b_kvabase = unmapped_buf; T> bp->b_offset = 0; T> @@ -905,7 +943,6 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> T> /* build a minimal buffer header */ T> bp->b_iocmd = BIO_READ; T> - bp->b_iodone = bdone; T> KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred")); T> KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred")); T> bp->b_rcred = crhold(curthread->td_ucred); T> @@ -923,10 +960,88 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> T> /* do the input */ T> bp->b_iooffset = dbtob(bp->b_blkno); T> - bstrategy(bp); T> T> - bwait(bp, PVM, "vnread"); T> + if (iodone) { /* async */ T> + struct getpages_softc *sc; T> T> + sc = malloc(sizeof(*sc), M_TEMP, M_WAITOK); T> + T> + sc->m = m; T> + sc->bp = bp; T> + sc->object = object; T> + sc->foff = foff; T> + sc->size = size; T> + sc->count = count; T> + sc->unmapped = unmapped; T> + sc->reqpage = reqpage; T> + sc->kva = kva; T> + T> + sc->iodone = iodone; T> + sc->arg = arg; T> + T> + bp->b_iodone = vnode_pager_generic_getpages_done_async; T> + bp->b_caller1 = sc; T> + BUF_KERNPROC(bp); T> + bstrategy(bp); T> + /* Good bye! */ T> + } else { T> + struct getpages_softc sc; T> + T> + sc.m = m; T> + sc.bp = bp; T> + sc.object = object; T> + sc.foff = foff; T> + sc.size = size; T> + sc.count = count; T> + sc.unmapped = unmapped; T> + sc.reqpage = reqpage; T> + sc.kva = kva; T> + T> + bp->b_iodone = bdone; T> + bstrategy(bp); T> + bwait(bp, PVM, "vnread"); T> + error = vnode_pager_generic_getpages_done(&sc); T> + } T> + T> + return (error ? VM_PAGER_ERROR : VM_PAGER_OK); T> +} T> + T> +void T> +vnode_pager_generic_getpages_done_async(struct buf *bp) T> +{ T> + struct getpages_softc *sc = bp->b_caller1; T> + int error; T> + T> + error = vnode_pager_generic_getpages_done(sc); T> + T> + vm_page_xunbusy(sc->m[sc->reqpage]); T> + T> + sc->iodone(sc->arg); T> + T> + free(sc, M_TEMP); T> +} T> + T> +int T> +vnode_pager_generic_getpages_done(struct getpages_softc *sc) T> +{ T> + vm_object_t object; T> + vm_offset_t kva; T> + vm_page_t *m; T> + struct buf *bp; T> + off_t foff, tfoff, nextoff; T> + int i, size, count, unmapped, reqpage; T> + int error = 0; T> + T> + m = sc->m; T> + bp = sc->bp; T> + object = sc->object; T> + foff = sc->foff; T> + size = sc->size; T> + count = sc->count; T> + unmapped = sc->unmapped; T> + reqpage = sc->reqpage; T> + kva = sc->kva; T> + T> if ((bp->b_ioflags & BIO_ERROR) != 0) T> error = EIO; T> T> @@ -939,7 +1054,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> } T> if ((bp->b_flags & B_UNMAPPED) == 0) T> pmap_qremove(kva, count); T> - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0) { T> + if (unmapped) { T> bp->b_data = (caddr_t)kva; T> bp->b_kvabase = (caddr_t)kva; T> bp->b_flags &= ~B_UNMAPPED; T> @@ -995,7 +1110,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ T> if (error) { T> printf("vnode_pager_getpages: I/O read error\n"); T> } T> - return (error ? VM_PAGER_ERROR : VM_PAGER_OK); T> + T> + return (error); T> } T> T> /* T> Index: sys/rpc/clnt_vc.c T> =================================================================== T> --- sys/rpc/clnt_vc.c (.../head) (revision 266804) T> +++ sys/rpc/clnt_vc.c (.../projects/sendfile) (revision 266807) T> @@ -860,7 +860,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int T> * error condition T> */ T> do_read = FALSE; T> - if (so->so_rcv.sb_cc >= sizeof(uint32_t) T> + if (sbavail(&so->so_rcv) >= sizeof(uint32_t) T> || (so->so_rcv.sb_state & SBS_CANTRCVMORE) T> || so->so_error) T> do_read = TRUE; T> @@ -913,7 +913,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int T> * buffered. T> */ T> do_read = FALSE; T> - if (so->so_rcv.sb_cc >= ct->ct_record_resid T> + if (sbavail(&so->so_rcv) >= ct->ct_record_resid T> || (so->so_rcv.sb_state & SBS_CANTRCVMORE) T> || so->so_error) T> do_read = TRUE; T> Index: sys/rpc/svc_vc.c T> =================================================================== T> --- sys/rpc/svc_vc.c (.../head) (revision 266804) T> +++ sys/rpc/svc_vc.c (.../projects/sendfile) (revision 266807) T> @@ -546,7 +546,7 @@ svc_vc_ack(SVCXPRT *xprt, uint32_t *ack) T> { T> T> *ack = atomic_load_acq_32(&xprt->xp_snt_cnt); T> - *ack -= xprt->xp_socket->so_snd.sb_cc; T> + *ack -= sbused(&xprt->xp_socket->so_snd); T> return (TRUE); T> } T> T> Index: sys/ufs/ffs/ffs_vnops.c T> =================================================================== T> --- sys/ufs/ffs/ffs_vnops.c (.../head) (revision 266804) T> +++ sys/ufs/ffs/ffs_vnops.c (.../projects/sendfile) (revision 266807) T> @@ -105,6 +105,7 @@ extern int ffs_rawread(struct vnode *vp, struct ui T> static vop_fsync_t ffs_fsync; T> static vop_lock1_t ffs_lock; T> static vop_getpages_t ffs_getpages; T> +static vop_getpages_async_t ffs_getpages_async; T> static vop_read_t ffs_read; T> static vop_write_t ffs_write; T> static int ffs_extread(struct vnode *vp, struct uio *uio, int ioflag); T> @@ -125,6 +126,7 @@ struct vop_vector ffs_vnodeops1 = { T> .vop_default = &ufs_vnodeops, T> .vop_fsync = ffs_fsync, T> .vop_getpages = ffs_getpages, T> + .vop_getpages_async = ffs_getpages_async, T> .vop_lock1 = ffs_lock, T> .vop_read = ffs_read, T> .vop_reallocblks = ffs_reallocblks, T> @@ -847,18 +849,16 @@ ffs_write(ap) T> } T> T> /* T> - * get page routine T> + * Get page routines. T> */ T> static int T> -ffs_getpages(ap) T> - struct vop_getpages_args *ap; T> +ffs_getpages_checkvalid(vm_page_t *m, int count, int reqpage) T> { T> - int i; T> vm_page_t mreq; T> int pcount; T> T> - pcount = round_page(ap->a_count) / PAGE_SIZE; T> - mreq = ap->a_m[ap->a_reqpage]; T> + pcount = round_page(count) / PAGE_SIZE; T> + mreq = m[reqpage]; T> T> /* T> * if ANY DEV_BSIZE blocks are valid on a large filesystem block, T> @@ -870,24 +870,48 @@ static int T> if (mreq->valid) { T> if (mreq->valid != VM_PAGE_BITS_ALL) T> vm_page_zero_invalid(mreq, TRUE); T> - for (i = 0; i < pcount; i++) { T> - if (i != ap->a_reqpage) { T> - vm_page_lock(ap->a_m[i]); T> - vm_page_free(ap->a_m[i]); T> - vm_page_unlock(ap->a_m[i]); T> + for (int i = 0; i < pcount; i++) { T> + if (i != reqpage) { T> + vm_page_lock(m[i]); T> + vm_page_free(m[i]); T> + vm_page_unlock(m[i]); T> } T> } T> VM_OBJECT_WUNLOCK(mreq->object); T> - return VM_PAGER_OK; T> + return (VM_PAGER_OK); T> } T> VM_OBJECT_WUNLOCK(mreq->object); T> T> - return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, T> - ap->a_count, T> - ap->a_reqpage); T> + return (-1); T> } T> T> +static int T> +ffs_getpages(struct vop_getpages_args *ap) T> +{ T> + int rv; T> T> + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); T> + if (rv == VM_PAGER_OK) T> + return (rv); T> + T> + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, T> + ap->a_reqpage, NULL, NULL)); T> +} T> + T> +static int T> +ffs_getpages_async(struct vop_getpages_async_args *ap) T> +{ T> + int rv; T> + T> + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); T> + if (rv == VM_PAGER_OK) { T> + (ap->a_vop_getpages_iodone)(ap->a_arg); T> + return (rv); T> + } T> + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, T> + ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg)); T> +} T> + T> /* T> * Extended attribute area reading. T> */ T> Index: sys/tools/vnode_if.awk T> =================================================================== T> --- sys/tools/vnode_if.awk (.../head) (revision 266804) T> +++ sys/tools/vnode_if.awk (.../projects/sendfile) (revision 266807) T> @@ -254,16 +254,26 @@ while ((getline < srcfile) > 0) { T> if (sub(/;$/, "") < 1) T> die("Missing end-of-line ; in \"%s\".", $0); T> T> - # pick off variable name T> - if ((argp = match($0, /[A-Za-z0-9_]+$/)) < 1) T> - die("Missing var name \"a_foo\" in \"%s\".", $0); T> - args[numargs] = substr($0, argp); T> - $0 = substr($0, 1, argp - 1); T> - T> - # what is left must be type T> - # remove trailing space (if any) T> - sub(/ $/, ""); T> - types[numargs] = $0; T> + # pick off argument name T> + if ((argp = match($0, /[A-Za-z0-9_]+$/)) > 0) { T> + args[numargs] = substr($0, argp); T> + $0 = substr($0, 1, argp - 1); T> + sub(/ $/, ""); T> + delete fargs[numargs]; T> + types[numargs] = $0; T> + } else { # try to parse a function pointer argument T> + if ((argp = match($0, T> + /\(\*[A-Za-z0-9_]+\)\([A-Za-z0-9_*, ]+\)$/)) < 1) T> + die("Missing var name \"a_foo\" in \"%s\".", T> + $0); T> + args[numargs] = substr($0, argp + 2); T> + sub(/\).+/, "", args[numargs]); T> + fargs[numargs] = substr($0, argp); T> + sub(/^\([^)]+\)/, "", fargs[numargs]); T> + $0 = substr($0, 1, argp - 1); T> + sub(/ $/, ""); T> + types[numargs] = $0; T> + } T> } T> if (numargs > 4) T> ctrargs = 4; T> @@ -286,8 +296,13 @@ while ((getline < srcfile) > 0) { T> if (hfile) { T> # Print out the vop_F_args structure. T> printh("struct "name"_args {\n\tstruct vop_generic_args a_gen;"); T> - for (i = 0; i < numargs; ++i) T> - printh("\t" t_spc(types[i]) "a_" args[i] ";"); T> + for (i = 0; i < numargs; ++i) { T> + if (fargs[i]) { T> + printh("\t" t_spc(types[i]) "(*a_" args[i] \ T> + ")" fargs[i] ";"); T> + } else T> + printh("\t" t_spc(types[i]) "a_" args[i] ";"); T> + } T> printh("};"); T> printh(""); T> T> @@ -301,8 +316,14 @@ while ((getline < srcfile) > 0) { T> printh(""); T> printh("static __inline int " uname "("); T> for (i = 0; i < numargs; ++i) { T> - printh("\t" t_spc(types[i]) args[i] \ T> - (i < numargs - 1 ? "," : ")")); T> + if (fargs[i]) { T> + printh("\t" t_spc(types[i]) "(*" args[i] \ T> + ")" fargs[i] \ T> + (i < numargs - 1 ? "," : ")")); T> + } else { T> + printh("\t" t_spc(types[i]) args[i] \ T> + (i < numargs - 1 ? "," : ")")); T> + } T> } T> printh("{"); T> printh("\tstruct " name "_args a;"); T> Index: sys/netinet/tcp_reass.c T> =================================================================== T> --- sys/netinet/tcp_reass.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_reass.c (.../projects/sendfile) (revision 266807) T> @@ -248,7 +248,7 @@ present: T> m_freem(mq); T> else { T> mq->m_nextpkt = NULL; T> - sbappendstream_locked(&so->so_rcv, mq); T> + sbappendstream_locked(&so->so_rcv, mq, 0); T> wakeup = 1; T> } T> } T> Index: sys/netinet/accf_http.c T> =================================================================== T> --- sys/netinet/accf_http.c (.../head) (revision 266804) T> +++ sys/netinet/accf_http.c (.../projects/sendfile) (revision 266807) T> @@ -92,7 +92,7 @@ sbfull(struct sockbuf *sb) T> "mbcnt(%ld) >= mbmax(%ld): %d", T> sb->sb_cc, sb->sb_hiwat, sb->sb_cc >= sb->sb_hiwat, T> sb->sb_mbcnt, sb->sb_mbmax, sb->sb_mbcnt >= sb->sb_mbmax); T> - return (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); T> + return (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); T> } T> T> /* T> @@ -162,13 +162,14 @@ static int T> sohashttpget(struct socket *so, void *arg, int waitflag) T> { T> T> - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && !sbfull(&so->so_rcv)) { T> + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && T> + !sbfull(&so->so_rcv)) { T> struct mbuf *m; T> char *cmp; T> int cmplen, cc; T> T> m = so->so_rcv.sb_mb; T> - cc = so->so_rcv.sb_cc - 1; T> + cc = sbavail(&so->so_rcv) - 1; T> if (cc < 1) T> return (SU_OK); T> switch (*mtod(m, char *)) { T> @@ -215,7 +216,7 @@ soparsehttpvers(struct socket *so, void *arg, int T> goto fallout; T> T> m = so->so_rcv.sb_mb; T> - cc = so->so_rcv.sb_cc; T> + cc = sbavail(&so->so_rcv); T> inspaces = spaces = 0; T> for (m = so->so_rcv.sb_mb; m; m = n) { T> n = m->m_nextpkt; T> @@ -304,7 +305,7 @@ soishttpconnected(struct socket *so, void *arg, in T> * have NCHRS left T> */ T> copied = 0; T> - ccleft = so->so_rcv.sb_cc; T> + ccleft = sbavail(&so->so_rcv); T> if (ccleft < NCHRS) T> goto readmore; T> a = b = c = '\0'; T> Index: sys/netinet/sctp_os_bsd.h T> =================================================================== T> --- sys/netinet/sctp_os_bsd.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_os_bsd.h (.../projects/sendfile) (revision 266807) T> @@ -405,7 +405,7 @@ typedef struct callout sctp_os_timer_t; T> #define SCTP_SOWAKEUP(so) wakeup(&(so)->so_timeo) T> /* clear the socket buffer state */ T> #define SCTP_SB_CLEAR(sb) \ T> - (sb).sb_cc = 0; \ T> + (sb).sb_ccc = 0; \ T> (sb).sb_mb = NULL; \ T> (sb).sb_mbcnt = 0; T> T> Index: sys/netinet/tcp_output.c T> =================================================================== T> --- sys/netinet/tcp_output.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_output.c (.../projects/sendfile) (revision 266807) T> @@ -322,7 +322,7 @@ after_sack_rexmit: T> * to send then the probe will be the FIN T> * itself. T> */ T> - if (off < so->so_snd.sb_cc) T> + if (off < sbavail(&so->so_snd)) T> flags &= ~TH_FIN; T> sendwin = 1; T> } else { T> @@ -348,7 +348,8 @@ after_sack_rexmit: T> */ T> if (sack_rxmit == 0) { T> if (sack_bytes_rxmt == 0) T> - len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off); T> + len = ((long)ulmin(sbavail(&so->so_snd), sendwin) - T> + off); T> else { T> long cwin; T> T> @@ -357,8 +358,8 @@ after_sack_rexmit: T> * sending new data, having retransmitted all the T> * data possible in the scoreboard. T> */ T> - len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) T> - - off); T> + len = ((long)ulmin(sbavail(&so->so_snd), tp->snd_wnd) - T> + off); T> /* T> * Don't remove this (len > 0) check ! T> * We explicitly check for len > 0 here (although it T> @@ -457,12 +458,15 @@ after_sack_rexmit: T> * TODO: Shrink send buffer during idle periods together T> * with congestion window. Requires another timer. Has to T> * wait for upcoming tcp timer rewrite. T> + * T> + * XXXGL: should there be used sbused() or sbavail()? T> */ T> if (V_tcp_do_autosndbuf && so->so_snd.sb_flags & SB_AUTOSIZE) { T> if ((tp->snd_wnd / 4 * 5) >= so->so_snd.sb_hiwat && T> - so->so_snd.sb_cc >= (so->so_snd.sb_hiwat / 8 * 7) && T> - so->so_snd.sb_cc < V_tcp_autosndbuf_max && T> - sendwin >= (so->so_snd.sb_cc - (tp->snd_nxt - tp->snd_una))) { T> + sbused(&so->so_snd) >= (so->so_snd.sb_hiwat / 8 * 7) && T> + sbused(&so->so_snd) < V_tcp_autosndbuf_max && T> + sendwin >= (sbused(&so->so_snd) - T> + (tp->snd_nxt - tp->snd_una))) { T> if (!sbreserve_locked(&so->so_snd, T> min(so->so_snd.sb_hiwat + V_tcp_autosndbuf_inc, T> V_tcp_autosndbuf_max), so, curthread)) T> @@ -499,10 +503,11 @@ after_sack_rexmit: T> tso = 1; T> T> if (sack_rxmit) { T> - if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc)) T> + if (SEQ_LT(p->rxmit + len, tp->snd_una + sbavail(&so->so_snd))) T> flags &= ~TH_FIN; T> } else { T> - if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc)) T> + if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + T> + sbavail(&so->so_snd))) T> flags &= ~TH_FIN; T> } T> T> @@ -532,7 +537,7 @@ after_sack_rexmit: T> */ T> if (!(tp->t_flags & TF_MORETOCOME) && /* normal case */ T> (idle || (tp->t_flags & TF_NODELAY)) && T> - len + off >= so->so_snd.sb_cc && T> + len + off >= sbavail(&so->so_snd) && T> (tp->t_flags & TF_NOPUSH) == 0) { T> goto send; T> } T> @@ -660,7 +665,7 @@ dontupdate: T> * if window is nonzero, transmit what we can, T> * otherwise force out a byte. T> */ T> - if (so->so_snd.sb_cc && !tcp_timer_active(tp, TT_REXMT) && T> + if (sbavail(&so->so_snd) && !tcp_timer_active(tp, TT_REXMT) && T> !tcp_timer_active(tp, TT_PERSIST)) { T> tp->t_rxtshift = 0; T> tcp_setpersist(tp); T> @@ -786,7 +791,7 @@ send: T> * fractional unless the send sockbuf can T> * be emptied. T> */ T> - if (sendalot && off + len < so->so_snd.sb_cc) { T> + if (sendalot && off + len < sbavail(&so->so_snd)) { T> len -= len % (tp->t_maxopd - optlen); T> sendalot = 1; T> } T> @@ -889,7 +894,7 @@ send: T> * give data to the user when a buffer fills or T> * a PUSH comes in.) T> */ T> - if (off + len == so->so_snd.sb_cc) T> + if (off + len == sbavail(&so->so_snd)) T> flags |= TH_PUSH; T> SOCKBUF_UNLOCK(&so->so_snd); T> } else { T> Index: sys/netinet/siftr.c T> =================================================================== T> --- sys/netinet/siftr.c (.../head) (revision 266804) T> +++ sys/netinet/siftr.c (.../projects/sendfile) (revision 266807) T> @@ -781,9 +781,9 @@ siftr_siftdata(struct pkt_node *pn, struct inpcb * T> pn->flags = tp->t_flags; T> pn->rxt_length = tp->t_rxtcur; T> pn->snd_buf_hiwater = inp->inp_socket->so_snd.sb_hiwat; T> - pn->snd_buf_cc = inp->inp_socket->so_snd.sb_cc; T> + pn->snd_buf_cc = sbused(&inp->inp_socket->so_snd); T> pn->rcv_buf_hiwater = inp->inp_socket->so_rcv.sb_hiwat; T> - pn->rcv_buf_cc = inp->inp_socket->so_rcv.sb_cc; T> + pn->rcv_buf_cc = sbused(&inp->inp_socket->so_rcv); T> pn->sent_inflight_bytes = tp->snd_max - tp->snd_una; T> pn->t_segqlen = tp->t_segqlen; T> T> Index: sys/netinet/sctp_indata.c T> =================================================================== T> --- sys/netinet/sctp_indata.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_indata.c (.../projects/sendfile) (revision 266807) T> @@ -70,7 +70,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ T> T> /* T> * This is really set wrong with respect to a 1-2-m socket. Since T> - * the sb_cc is the count that everyone as put up. When we re-write T> + * the sb_ccc is the count that everyone as put up. When we re-write T> * sctp_soreceive then we will fix this so that ONLY this T> * associations data is taken into account. T> */ T> @@ -77,7 +77,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ T> if (stcb->sctp_socket == NULL) T> return (calc); T> T> - if (stcb->asoc.sb_cc == 0 && T> + if (stcb->asoc.sb_ccc == 0 && T> asoc->size_on_reasm_queue == 0 && T> asoc->size_on_all_streams == 0) { T> /* Full rwnd granted */ T> @@ -1358,7 +1358,7 @@ sctp_process_a_data_chunk(struct sctp_tcb *stcb, s T> * When we have NO room in the rwnd we check to make sure T> * the reader is doing its job... T> */ T> - if (stcb->sctp_socket->so_rcv.sb_cc) { T> + if (stcb->sctp_socket->so_rcv.sb_ccc) { T> /* some to read, wake-up */ T> #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING) T> struct socket *so; T> Index: sys/netinet/sctp_pcb.c T> =================================================================== T> --- sys/netinet/sctp_pcb.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_pcb.c (.../projects/sendfile) (revision 266807) T> @@ -3328,7 +3328,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi T> if ((asoc->asoc.size_on_reasm_queue > 0) || T> (asoc->asoc.control_pdapi) || T> (asoc->asoc.size_on_all_streams > 0) || T> - (so && (so->so_rcv.sb_cc > 0))) { T> + (so && (so->so_rcv.sb_ccc > 0))) { T> /* Left with Data unread */ T> struct mbuf *op_err; T> T> @@ -3556,7 +3556,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi T> TAILQ_REMOVE(&inp->read_queue, sq, next); T> sctp_free_remote_addr(sq->whoFrom); T> if (so) T> - so->so_rcv.sb_cc -= sq->length; T> + so->so_rcv.sb_ccc -= sq->length; T> if (sq->data) { T> sctp_m_freem(sq->data); T> sq->data = NULL; T> @@ -4775,7 +4775,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, struct sct T> inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED; T> if (so) { T> SOCK_LOCK(so); T> - if (so->so_rcv.sb_cc == 0) { T> + if (so->so_rcv.sb_ccc == 0) { T> so->so_state &= ~(SS_ISCONNECTING | T> SS_ISDISCONNECTING | T> SS_ISCONFIRMING | T> Index: sys/netinet/sctp_pcb.h T> =================================================================== T> --- sys/netinet/sctp_pcb.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_pcb.h (.../projects/sendfile) (revision 266807) T> @@ -369,7 +369,7 @@ struct sctp_inpcb { T> } ip_inp; T> T> T> - /* Socket buffer lock protects read_queue and of course sb_cc */ T> + /* Socket buffer lock protects read_queue and of course sb_ccc */ T> struct sctp_readhead read_queue; T> T> LIST_ENTRY(sctp_inpcb) sctp_list; /* lists all endpoints */ T> Index: sys/netinet/sctp_usrreq.c T> =================================================================== T> --- sys/netinet/sctp_usrreq.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_usrreq.c (.../projects/sendfile) (revision 266807) T> @@ -586,7 +586,7 @@ sctp_must_try_again: T> if (((flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) && T> (atomic_cmpset_int(&inp->sctp_flags, flags, (flags | SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_CLOSE_IP)))) { T> if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) || T> - (so->so_rcv.sb_cc > 0)) { T> + (so->so_rcv.sb_ccc > 0)) { T> #ifdef SCTP_LOG_CLOSING T> sctp_log_closing(inp, NULL, 13); T> #endif T> @@ -751,7 +751,7 @@ sctp_disconnect(struct socket *so) T> } T> if (((so->so_options & SO_LINGER) && T> (so->so_linger == 0)) || T> - (so->so_rcv.sb_cc > 0)) { T> + (so->so_rcv.sb_ccc > 0)) { T> if (SCTP_GET_STATE(asoc) != T> SCTP_STATE_COOKIE_WAIT) { T> /* Left with Data unread */ T> @@ -916,7 +916,7 @@ sctp_flush(struct socket *so, int how) T> inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_CANT_READ; T> SCTP_INP_READ_UNLOCK(inp); T> SCTP_INP_WUNLOCK(inp); T> - so->so_rcv.sb_cc = 0; T> + so->so_rcv.sb_ccc = 0; T> so->so_rcv.sb_mbcnt = 0; T> so->so_rcv.sb_mb = NULL; T> } T> @@ -925,7 +925,7 @@ sctp_flush(struct socket *so, int how) T> * First make sure the sb will be happy, we don't use these T> * except maybe the count T> */ T> - so->so_snd.sb_cc = 0; T> + so->so_snd.sb_ccc = 0; T> so->so_snd.sb_mbcnt = 0; T> so->so_snd.sb_mb = NULL; T> T> Index: sys/netinet/sctp_structs.h T> =================================================================== T> --- sys/netinet/sctp_structs.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_structs.h (.../projects/sendfile) (revision 266807) T> @@ -982,7 +982,7 @@ struct sctp_association { T> T> uint32_t total_output_queue_size; T> T> - uint32_t sb_cc; /* shadow of sb_cc */ T> + uint32_t sb_ccc; /* shadow of sb_ccc */ T> uint32_t sb_send_resv; /* amount reserved on a send */ T> uint32_t my_rwnd_control_len; /* shadow of sb_mbcnt used for rwnd T> * control */ T> Index: sys/netinet/tcp_input.c T> =================================================================== T> --- sys/netinet/tcp_input.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_input.c (.../projects/sendfile) (revision 266807) T> @@ -1729,7 +1729,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, T> tcp_timer_activate(tp, TT_REXMT, T> tp->t_rxtcur); T> sowwakeup(so); T> - if (so->so_snd.sb_cc) T> + if (sbavail(&so->so_snd)) T> (void) tcp_output(tp); T> goto check_delack; T> } T> @@ -1837,7 +1837,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, T> newsize, so, NULL)) T> so->so_rcv.sb_flags &= ~SB_AUTOSIZE; T> m_adj(m, drop_hdrlen); /* delayed header drop */ T> - sbappendstream_locked(&so->so_rcv, m); T> + sbappendstream_locked(&so->so_rcv, m, 0); T> } T> /* NB: sorwakeup_locked() does an implicit unlock. */ T> sorwakeup_locked(so); T> @@ -2541,7 +2541,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, T> * Otherwise we would send pure ACKs. T> */ T> SOCKBUF_LOCK(&so->so_snd); T> - avail = so->so_snd.sb_cc - T> + avail = sbavail(&so->so_snd) - T> (tp->snd_nxt - tp->snd_una); T> SOCKBUF_UNLOCK(&so->so_snd); T> if (avail > 0) T> @@ -2676,10 +2676,10 @@ process_ACK: T> cc_ack_received(tp, th, CC_ACK); T> T> SOCKBUF_LOCK(&so->so_snd); T> - if (acked > so->so_snd.sb_cc) { T> - tp->snd_wnd -= so->so_snd.sb_cc; T> + if (acked > sbavail(&so->so_snd)) { T> + tp->snd_wnd -= sbavail(&so->so_snd); T> mfree = sbcut_locked(&so->so_snd, T> - (int)so->so_snd.sb_cc); T> + (int)sbavail(&so->so_snd)); T> ourfinisacked = 1; T> } else { T> mfree = sbcut_locked(&so->so_snd, acked); T> @@ -2805,7 +2805,7 @@ step6: T> * actually wanting to send this much urgent data. T> */ T> SOCKBUF_LOCK(&so->so_rcv); T> - if (th->th_urp + so->so_rcv.sb_cc > sb_max) { T> + if (th->th_urp + sbavail(&so->so_rcv) > sb_max) { T> th->th_urp = 0; /* XXX */ T> thflags &= ~TH_URG; /* XXX */ T> SOCKBUF_UNLOCK(&so->so_rcv); /* XXX */ T> @@ -2827,7 +2827,7 @@ step6: T> */ T> if (SEQ_GT(th->th_seq+th->th_urp, tp->rcv_up)) { T> tp->rcv_up = th->th_seq + th->th_urp; T> - so->so_oobmark = so->so_rcv.sb_cc + T> + so->so_oobmark = sbavail(&so->so_rcv) + T> (tp->rcv_up - tp->rcv_nxt) - 1; T> if (so->so_oobmark == 0) T> so->so_rcv.sb_state |= SBS_RCVATMARK; T> @@ -2897,7 +2897,7 @@ dodata: /* XXX */ T> if (so->so_rcv.sb_state & SBS_CANTRCVMORE) T> m_freem(m); T> else T> - sbappendstream_locked(&so->so_rcv, m); T> + sbappendstream_locked(&so->so_rcv, m, 0); T> /* NB: sorwakeup_locked() does an implicit unlock. */ T> sorwakeup_locked(so); T> } else { T> Index: sys/netinet/sctp_input.c T> =================================================================== T> --- sys/netinet/sctp_input.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_input.c (.../projects/sendfile) (revision 266807) T> @@ -1042,7 +1042,7 @@ sctp_handle_shutdown_ack(struct sctp_shutdown_ack_ T> if (stcb->sctp_socket) { T> if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) { T> - stcb->sctp_socket->so_snd.sb_cc = 0; T> + stcb->sctp_socket->so_snd.sb_ccc = 0; T> } T> sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED); T> } T> Index: sys/netinet/sctp_var.h T> =================================================================== T> --- sys/netinet/sctp_var.h (.../head) (revision 266804) T> +++ sys/netinet/sctp_var.h (.../projects/sendfile) (revision 266807) T> @@ -82,9 +82,9 @@ extern struct pr_usrreqs sctp_usrreqs; T> T> #define sctp_maxspace(sb) (max((sb)->sb_hiwat,SCTP_MINIMAL_RWND)) T> T> -#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_cc) ? (sctp_maxspace(sb) - (asoc)->sb_cc) : 0)) T> +#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_ccc) ? (sctp_maxspace(sb) - (asoc)->sb_ccc) : 0)) T> T> -#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_cc) ? (sctp_maxspace(sb) - (sb)->sb_cc) : 0)) T> +#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_ccc) ? (sctp_maxspace(sb) - (sb)->sb_ccc) : 0)) T> T> #define sctp_sbspace_sub(a,b) ((a > b) ? (a - b) : 0) T> T> @@ -195,10 +195,10 @@ extern struct pr_usrreqs sctp_usrreqs; T> } T> T> #define sctp_sbfree(ctl, stcb, sb, m) { \ T> - SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_cc, SCTP_BUF_LEN((m))); \ T> + SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_ccc, SCTP_BUF_LEN((m))); \ T> SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_mbcnt, MSIZE); \ T> if (((ctl)->do_not_ref_stcb == 0) && stcb) {\ T> - SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_cc, SCTP_BUF_LEN((m))); \ T> + SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_ccc, SCTP_BUF_LEN((m))); \ T> SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ T> } \ T> if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ T> @@ -207,10 +207,10 @@ extern struct pr_usrreqs sctp_usrreqs; T> } T> T> #define sctp_sballoc(stcb, sb, m) { \ T> - atomic_add_int(&(sb)->sb_cc,SCTP_BUF_LEN((m))); \ T> + atomic_add_int(&(sb)->sb_ccc,SCTP_BUF_LEN((m))); \ T> atomic_add_int(&(sb)->sb_mbcnt, MSIZE); \ T> if (stcb) { \ T> - atomic_add_int(&(stcb)->asoc.sb_cc,SCTP_BUF_LEN((m))); \ T> + atomic_add_int(&(stcb)->asoc.sb_ccc,SCTP_BUF_LEN((m))); \ T> atomic_add_int(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ T> } \ T> if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ T> Index: sys/netinet/sctp_output.c T> =================================================================== T> --- sys/netinet/sctp_output.c (.../head) (revision 266804) T> +++ sys/netinet/sctp_output.c (.../projects/sendfile) (revision 266807) T> @@ -7104,7 +7104,7 @@ one_more_time: T> if ((stcb->sctp_socket != NULL) && \ T> ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { T> - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc, sp->length); T> + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc, sp->length); T> } T> if (sp->data) { T> sctp_m_freem(sp->data); T> @@ -11382,7 +11382,7 @@ jump_out: T> drp->current_onq = htonl(asoc->size_on_reasm_queue + T> asoc->size_on_all_streams + T> asoc->my_rwnd_control_len + T> - stcb->sctp_socket->so_rcv.sb_cc); T> + stcb->sctp_socket->so_rcv.sb_ccc); T> } else { T> /*- T> * If my rwnd is 0, possibly from mbuf depletion as well as T> Index: sys/netinet/tcp_usrreq.c T> =================================================================== T> --- sys/netinet/tcp_usrreq.c (.../head) (revision 266804) T> +++ sys/netinet/tcp_usrreq.c (.../projects/sendfile) (revision 266807) T> @@ -826,7 +826,7 @@ tcp_usr_send(struct socket *so, int flags, struct T> m_freem(control); /* empty control, just free it */ T> } T> if (!(flags & PRUS_OOB)) { T> - sbappendstream(&so->so_snd, m); T> + sbappendstream(&so->so_snd, m, flags); T> if (nam && tp->t_state < TCPS_SYN_SENT) { T> /* T> * Do implied connect if not yet connected, T> @@ -858,7 +858,8 @@ tcp_usr_send(struct socket *so, int flags, struct T> socantsendmore(so); T> tcp_usrclosed(tp); T> } T> - if (!(inp->inp_flags & INP_DROPPED)) { T> + if (!(inp->inp_flags & INP_DROPPED) && T> + !(flags & PRUS_NOTREADY)) { T> if (flags & PRUS_MORETOCOME) T> tp->t_flags |= TF_MORETOCOME; T> error = tcp_output(tp); T> @@ -884,7 +885,7 @@ tcp_usr_send(struct socket *so, int flags, struct T> * of data past the urgent section. T> * Otherwise, snd_up should be one lower. T> */ T> - sbappendstream_locked(&so->so_snd, m); T> + sbappendstream_locked(&so->so_snd, m, flags); T> SOCKBUF_UNLOCK(&so->so_snd); T> if (nam && tp->t_state < TCPS_SYN_SENT) { T> /* T> @@ -908,10 +909,12 @@ tcp_usr_send(struct socket *so, int flags, struct T> tp->snd_wnd = TTCP_CLIENT_SND_WND; T> tcp_mss(tp, -1); T> } T> - tp->snd_up = tp->snd_una + so->so_snd.sb_cc; T> - tp->t_flags |= TF_FORCEDATA; T> - error = tcp_output(tp); T> - tp->t_flags &= ~TF_FORCEDATA; T> + tp->snd_up = tp->snd_una + sbavail(&so->so_snd); T> + if (!(flags & PRUS_NOTREADY)) { T> + tp->t_flags |= TF_FORCEDATA; T> + error = tcp_output(tp); T> + tp->t_flags &= ~TF_FORCEDATA; T> + } T> } T> out: T> TCPDEBUG2((flags & PRUS_OOB) ? PRU_SENDOOB : T> Index: sys/netinet/accf_dns.c T> =================================================================== T> --- sys/netinet/accf_dns.c (.../head) (revision 266804) T> +++ sys/netinet/accf_dns.c (.../projects/sendfile) (revision 266807) T> @@ -75,7 +75,7 @@ sohasdns(struct socket *so, void *arg, int waitfla T> struct sockbuf *sb = &so->so_rcv; T> T> /* If the socket is full, we're ready. */ T> - if (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) T> + if (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) T> goto ready; T> T> /* Check to see if we have a request. */ T> @@ -115,7 +115,7 @@ skippacket(struct sockbuf *sb) { T> unsigned long packlen; T> struct packet q, *p = &q; T> T> - if (sb->sb_cc < 2) T> + if (sbavail(sb) < 2) T> return DNS_WAIT; T> T> q.m = sb->sb_mb; T> @@ -122,7 +122,7 @@ skippacket(struct sockbuf *sb) { T> q.n = q.m->m_nextpkt; T> q.moff = 0; T> q.offset = 0; T> - q.len = sb->sb_cc; T> + q.len = sbavail(sb); T> T> GET16(p, packlen); T> if (packlen + 2 > q.len) T> Index: sys/netinet/sctputil.c T> =================================================================== T> --- sys/netinet/sctputil.c (.../head) (revision 266804) T> +++ sys/netinet/sctputil.c (.../projects/sendfile) (revision 266807) T> @@ -67,9 +67,9 @@ sctp_sblog(struct sockbuf *sb, struct sctp_tcb *st T> struct sctp_cwnd_log sctp_clog; T> T> sctp_clog.x.sb.stcb = stcb; T> - sctp_clog.x.sb.so_sbcc = sb->sb_cc; T> + sctp_clog.x.sb.so_sbcc = sb->sb_ccc; T> if (stcb) T> - sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_cc; T> + sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_ccc; T> else T> sctp_clog.x.sb.stcb_sbcc = 0; T> sctp_clog.x.sb.incr = incr; T> @@ -4356,7 +4356,7 @@ sctp_add_to_readq(struct sctp_inpcb *inp, T> { T> /* T> * Here we must place the control on the end of the socket read T> - * queue AND increment sb_cc so that select will work properly on T> + * queue AND increment sb_ccc so that select will work properly on T> * read. T> */ T> struct mbuf *m, *prev = NULL; T> @@ -4482,7 +4482,7 @@ sctp_append_to_readq(struct sctp_inpcb *inp, T> * the reassembly queue. T> * T> * If PDAPI this means we need to add m to the end of the data. T> - * Increase the length in the control AND increment the sb_cc. T> + * Increase the length in the control AND increment the sb_ccc. T> * Otherwise sb is NULL and all we need to do is put it at the end T> * of the mbuf chain. T> */ T> @@ -4694,10 +4694,10 @@ sctp_free_bufspace(struct sctp_tcb *stcb, struct s T> T> if (stcb->sctp_socket && (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) || T> ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)))) { T> - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { T> - stcb->sctp_socket->so_snd.sb_cc -= tp1->book_size; T> + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { T> + stcb->sctp_socket->so_snd.sb_ccc -= tp1->book_size; T> } else { T> - stcb->sctp_socket->so_snd.sb_cc = 0; T> + stcb->sctp_socket->so_snd.sb_ccc = 0; T> T> } T> } T> @@ -5232,11 +5232,11 @@ sctp_sorecvmsg(struct socket *so, T> in_eeor_mode = sctp_is_feature_on(inp, SCTP_PCB_FLAGS_EXPLICIT_EOR); T> if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { T> sctp_misc_ints(SCTP_SORECV_ENTER, T> - rwnd_req, in_eeor_mode, so->so_rcv.sb_cc, uio->uio_resid); T> + rwnd_req, in_eeor_mode, so->so_rcv.sb_ccc, uio->uio_resid); T> } T> if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { T> sctp_misc_ints(SCTP_SORECV_ENTERPL, T> - rwnd_req, block_allowed, so->so_rcv.sb_cc, uio->uio_resid); T> + rwnd_req, block_allowed, so->so_rcv.sb_ccc, uio->uio_resid); T> } T> error = sblock(&so->so_rcv, (block_allowed ? SBL_WAIT : 0)); T> if (error) { T> @@ -5255,7 +5255,7 @@ restart_nosblocks: T> (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) { T> goto out; T> } T> - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_cc == 0)) { T> + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_ccc == 0)) { T> if (so->so_error) { T> error = so->so_error; T> if ((in_flags & MSG_PEEK) == 0) T> @@ -5262,7 +5262,7 @@ restart_nosblocks: T> so->so_error = 0; T> goto out; T> } else { T> - if (so->so_rcv.sb_cc == 0) { T> + if (so->so_rcv.sb_ccc == 0) { T> /* indicate EOF */ T> error = 0; T> goto out; T> @@ -5269,9 +5269,9 @@ restart_nosblocks: T> } T> } T> } T> - if ((so->so_rcv.sb_cc <= held_length) && block_allowed) { T> + if ((so->so_rcv.sb_ccc <= held_length) && block_allowed) { T> /* we need to wait for data */ T> - if ((so->so_rcv.sb_cc == 0) && T> + if ((so->so_rcv.sb_ccc == 0) && T> ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || T> (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { T> if ((inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0) { T> @@ -5307,7 +5307,7 @@ restart_nosblocks: T> } T> held_length = 0; T> goto restart_nosblocks; T> - } else if (so->so_rcv.sb_cc == 0) { T> + } else if (so->so_rcv.sb_ccc == 0) { T> if (so->so_error) { T> error = so->so_error; T> if ((in_flags & MSG_PEEK) == 0) T> @@ -5364,11 +5364,11 @@ restart_nosblocks: T> SCTP_INP_READ_LOCK(inp); T> } T> control = TAILQ_FIRST(&inp->read_queue); T> - if ((control == NULL) && (so->so_rcv.sb_cc != 0)) { T> + if ((control == NULL) && (so->so_rcv.sb_ccc != 0)) { T> #ifdef INVARIANTS T> panic("Huh, its non zero and nothing on control?"); T> #endif T> - so->so_rcv.sb_cc = 0; T> + so->so_rcv.sb_ccc = 0; T> } T> SCTP_INP_READ_UNLOCK(inp); T> hold_rlock = 0; T> @@ -5489,11 +5489,11 @@ restart_nosblocks: T> } T> /* T> * if we reach here, not suitable replacement is available T> - * fragment interleave is NOT on. So stuff the sb_cc T> + * fragment interleave is NOT on. So stuff the sb_ccc T> * into the our held count, and its time to sleep again. T> */ T> - held_length = so->so_rcv.sb_cc; T> - control->held_length = so->so_rcv.sb_cc; T> + held_length = so->so_rcv.sb_ccc; T> + control->held_length = so->so_rcv.sb_ccc; T> goto restart; T> } T> /* Clear the held length since there is something to read */ T> @@ -5790,10 +5790,10 @@ get_more_data: T> if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_SB_LOGGING_ENABLE) { T> sctp_sblog(&so->so_rcv, control->do_not_ref_stcb ? NULL : stcb, SCTP_LOG_SBFREE, cp_len); T> } T> - atomic_subtract_int(&so->so_rcv.sb_cc, cp_len); T> + atomic_subtract_int(&so->so_rcv.sb_ccc, cp_len); T> if ((control->do_not_ref_stcb == 0) && T> stcb) { T> - atomic_subtract_int(&stcb->asoc.sb_cc, cp_len); T> + atomic_subtract_int(&stcb->asoc.sb_ccc, cp_len); T> } T> copied_so_far += cp_len; T> freed_so_far += cp_len; T> @@ -5938,7 +5938,7 @@ wait_some_more: T> (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE))) { T> goto release; T> } T> - if (so->so_rcv.sb_cc <= control->held_length) { T> + if (so->so_rcv.sb_ccc <= control->held_length) { T> error = sbwait(&so->so_rcv); T> if (error) { T> goto release; T> @@ -5965,8 +5965,8 @@ wait_some_more: T> } T> goto done_with_control; T> } T> - if (so->so_rcv.sb_cc > held_length) { T> - control->held_length = so->so_rcv.sb_cc; T> + if (so->so_rcv.sb_ccc > held_length) { T> + control->held_length = so->so_rcv.sb_ccc; T> held_length = 0; T> } T> goto wait_some_more; T> @@ -6113,13 +6113,13 @@ out: T> freed_so_far, T> ((uio) ? (slen - uio->uio_resid) : slen), T> stcb->asoc.my_rwnd, T> - so->so_rcv.sb_cc); T> + so->so_rcv.sb_ccc); T> } else { T> sctp_misc_ints(SCTP_SORECV_DONE, T> freed_so_far, T> ((uio) ? (slen - uio->uio_resid) : slen), T> 0, T> - so->so_rcv.sb_cc); T> + so->so_rcv.sb_ccc); T> } T> } T> stage_left: T> Index: sys/netinet/sctputil.h T> =================================================================== T> --- sys/netinet/sctputil.h (.../head) (revision 266804) T> +++ sys/netinet/sctputil.h (.../projects/sendfile) (revision 266807) T> @@ -284,10 +284,10 @@ do { \ T> } \ T> if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ T> - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { \ T> - atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_cc), tp1->book_size); \ T> + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { \ T> + atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_ccc), tp1->book_size); \ T> } else { \ T> - stcb->sctp_socket->so_snd.sb_cc = 0; \ T> + stcb->sctp_socket->so_snd.sb_ccc = 0; \ T> } \ T> } \ T> } \ T> @@ -305,10 +305,10 @@ do { \ T> } \ T> if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ T> - if (stcb->sctp_socket->so_snd.sb_cc >= sp->length) { \ T> - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc,sp->length); \ T> + if (stcb->sctp_socket->so_snd.sb_ccc >= sp->length) { \ T> + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc,sp->length); \ T> } else { \ T> - stcb->sctp_socket->so_snd.sb_cc = 0; \ T> + stcb->sctp_socket->so_snd.sb_ccc = 0; \ T> } \ T> } \ T> } \ T> @@ -320,7 +320,7 @@ do { \ T> if ((stcb->sctp_socket != NULL) && \ T> ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ T> (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ T> - atomic_add_int(&stcb->sctp_socket->so_snd.sb_cc,sz); \ T> + atomic_add_int(&stcb->sctp_socket->so_snd.sb_ccc,sz); \ T> } \ T> } while (0) T> T> Index: usr.bin/bluetooth/btsockstat/btsockstat.c T> =================================================================== T> --- usr.bin/bluetooth/btsockstat/btsockstat.c (.../head) (revision 266804) T> +++ usr.bin/bluetooth/btsockstat/btsockstat.c (.../projects/sendfile) (revision 266807) T> @@ -255,8 +255,8 @@ hcirawpr(kvm_t *kvmd, u_long addr) T> (unsigned long) pcb.so, T> (unsigned long) this, T> pcb.flags, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> pcb.addr.hci_node); T> } T> } /* hcirawpr */ T> @@ -303,8 +303,8 @@ l2caprawpr(kvm_t *kvmd, u_long addr) T> "%-8lx %-8lx %6d %6d %-17.17s\n", T> (unsigned long) pcb.so, T> (unsigned long) this, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> bdaddrpr(&pcb.src, NULL, 0)); T> } T> } /* l2caprawpr */ T> @@ -361,8 +361,8 @@ l2cappr(kvm_t *kvmd, u_long addr) T> fprintf(stdout, T> "%-8lx %6d %6d %-17.17s/%-5d %-17.17s %-5d %s\n", T> (unsigned long) this, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> bdaddrpr(&pcb.src, local, sizeof(local)), T> pcb.psm, T> bdaddrpr(&pcb.dst, remote, sizeof(remote)), T> @@ -467,8 +467,8 @@ rfcommpr(kvm_t *kvmd, u_long addr) T> fprintf(stdout, T> "%-8lx %6d %6d %-17.17s %-17.17s %-4d %-4d %s\n", T> (unsigned long) this, T> - so.so_rcv.sb_cc, T> - so.so_snd.sb_cc, T> + so.so_rcv.sb_ccc, T> + so.so_snd.sb_ccc, T> bdaddrpr(&pcb.src, local, sizeof(local)), T> bdaddrpr(&pcb.dst, remote, sizeof(remote)), T> pcb.channel, T> Index: usr.bin/systat/netstat.c T> =================================================================== T> --- usr.bin/systat/netstat.c (.../head) (revision 266804) T> +++ usr.bin/systat/netstat.c (.../projects/sendfile) (revision 266807) T> @@ -333,8 +333,8 @@ enter_kvm(struct inpcb *inp, struct socket *so, in T> struct netinfo *p; T> T> if ((p = enter(inp, state, proto)) != NULL) { T> - p->ni_rcvcc = so->so_rcv.sb_cc; T> - p->ni_sndcc = so->so_snd.sb_cc; T> + p->ni_rcvcc = so->so_rcv.sb_ccc; T> + p->ni_sndcc = so->so_snd.sb_ccc; T> } T> } T> T> Index: usr.bin/netstat/netgraph.c T> =================================================================== T> --- usr.bin/netstat/netgraph.c (.../head) (revision 266804) T> +++ usr.bin/netstat/netgraph.c (.../projects/sendfile) (revision 266807) T> @@ -119,7 +119,7 @@ netgraphprotopr(u_long off, const char *name, int T> if (Aflag) T> printf("%8lx ", (u_long) this); T> printf("%-5.5s %6u %6u ", T> - name, sockb.so_rcv.sb_cc, sockb.so_snd.sb_cc); T> + name, sockb.so_rcv.sb_ccc, sockb.so_snd.sb_ccc); T> T> /* Get info on associated node */ T> if (ngpcb.node_id == 0 || csock == -1) T> Index: usr.bin/netstat/unix.c T> =================================================================== T> --- usr.bin/netstat/unix.c (.../head) (revision 266804) T> +++ usr.bin/netstat/unix.c (.../projects/sendfile) (revision 266807) T> @@ -287,7 +287,8 @@ unixdomainpr(struct xunpcb *xunp, struct xsocket * T> } else { T> printf("%8lx %-6.6s %6u %6u %8lx %8lx %8lx %8lx", T> (long)so->so_pcb, socktype[so->so_type], so->so_rcv.sb_cc, T> - so->so_snd.sb_cc, (long)unp->unp_vnode, (long)unp->unp_conn, T> + so->so_snd.sb_cc, (long)unp->unp_vnode, T> + (long)unp->unp_conn, T> (long)LIST_FIRST(&unp->unp_refs), T> (long)LIST_NEXT(unp, unp_reflink)); T> } T> Index: usr.bin/netstat/inet.c T> =================================================================== T> --- usr.bin/netstat/inet.c (.../head) (revision 266804) T> +++ usr.bin/netstat/inet.c (.../projects/sendfile) (revision 266807) T> @@ -137,7 +137,7 @@ pcblist_sysctl(int proto, const char *name, char * T> static void T> sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) T> { T> - xsb->sb_cc = sb->sb_cc; T> + xsb->sb_cc = sb->sb_ccc; T> xsb->sb_hiwat = sb->sb_hiwat; T> xsb->sb_mbcnt = sb->sb_mbcnt; T> xsb->sb_mcnt = sb->sb_mcnt; T> @@ -479,7 +479,8 @@ protopr(u_long off, const char *name, int af1, int T> printf("%6u %6u %6u ", tp->t_sndrexmitpack, T> tp->t_rcvoopack, tp->t_sndzerowin); T> } else { T> - printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc); T> + printf("%6u %6u ", T> + so->so_rcv.sb_cc, so->so_snd.sb_cc); T> } T> if (numeric_port) { T> if (inp->inp_vflag & INP_IPV4) { T> _______________________________________________ T> freebsd-arch@freebsd.org mailing list T> http://lists.freebsd.org/mailman/listinfo/freebsd-arch T> To unsubscribe, send any mail to "freebsd-arch-unsubscribe@freebsd.org" -- Totus tuus, Glebius. --hTiIB9CRvBOLTyqY Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="sendfile.diff" Index: sys/sys/vnode.h =================================================================== --- sys/sys/vnode.h (.../head) (revision 270879) +++ sys/sys/vnode.h (.../projects/sendfile) (revision 270881) @@ -727,6 +727,7 @@ int vop_stdbmap(struct vop_bmap_args *); int vop_stdfsync(struct vop_fsync_args *); int vop_stdgetwritemount(struct vop_getwritemount_args *); int vop_stdgetpages(struct vop_getpages_args *); +int vop_stdgetpages_async(struct vop_getpages_async_args *); int vop_stdinactive(struct vop_inactive_args *); int vop_stdislocked(struct vop_islocked_args *); int vop_stdkqfilter(struct vop_kqfilter_args *); Index: sys/sys/socket.h =================================================================== --- sys/sys/socket.h (.../head) (revision 270879) +++ sys/sys/socket.h (.../projects/sendfile) (revision 270881) @@ -602,12 +602,15 @@ struct sf_hdtr_all { * Sendfile-specific flag(s) */ #define SF_NODISKIO 0x00000001 -#define SF_MNOWAIT 0x00000002 +#define SF_MNOWAIT 0x00000002 /* unused since 11.0 */ #define SF_SYNC 0x00000004 #define SF_KQUEUE 0x00000008 +#define SF_NOCACHE 0x00000010 +#define SF_FLAGS(rh, flags) (((rh) << 16) | (flags)) #ifdef _KERNEL #define SFK_COMPAT 0x00000001 +#define SF_READAHEAD(flags) ((flags) >> 16) #endif /* _KERNEL */ #endif /* __BSD_VISIBLE */ Index: sys/sys/sockbuf.h =================================================================== --- sys/sys/sockbuf.h (.../head) (revision 270879) +++ sys/sys/sockbuf.h (.../projects/sendfile) (revision 270881) @@ -89,8 +89,13 @@ struct sockbuf { struct mbuf *sb_lastrecord; /* (c/d) first mbuf of last * record in socket buffer */ struct mbuf *sb_sndptr; /* (c/d) pointer into mbuf chain */ + struct mbuf *sb_fnrdy; /* (c/d) pointer to first not ready buffer */ +#if 0 + struct mbuf *sb_lnrdy; /* (c/d) pointer to last not ready buffer */ +#endif u_int sb_sndptroff; /* (c/d) byte offset of ptr into chain */ - u_int sb_cc; /* (c/d) actual chars in buffer */ + u_int sb_acc; /* (c/d) available chars in buffer */ + u_int sb_ccc; /* (c/d) claimed chars in buffer */ u_int sb_hiwat; /* (c/d) max actual char count */ u_int sb_mbcnt; /* (c/d) chars of mbufs used */ u_int sb_mcnt; /* (c/d) number of mbufs in buffer */ @@ -120,10 +125,17 @@ struct sockbuf { #define SOCKBUF_LOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_OWNED) #define SOCKBUF_UNLOCK_ASSERT(_sb) mtx_assert(SOCKBUF_MTX(_sb), MA_NOTOWNED) +/* + * Socket buffer private mbuf(9) flags. + */ +#define M_NOTREADY M_PROTO1 /* m_data not populated yet */ +#define M_BLOCKED M_PROTO2 /* M_NOTREADY in front of m */ +#define M_NOTAVAIL (M_NOTREADY | M_BLOCKED) + void sbappend(struct sockbuf *sb, struct mbuf *m); void sbappend_locked(struct sockbuf *sb, struct mbuf *m); -void sbappendstream(struct sockbuf *sb, struct mbuf *m); -void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m); +void sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags); +void sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags); int sbappendaddr(struct sockbuf *sb, const struct sockaddr *asa, struct mbuf *m0, struct mbuf *control); int sbappendaddr_locked(struct sockbuf *sb, const struct sockaddr *asa, @@ -136,7 +148,6 @@ int sbappendcontrol_locked(struct sockbuf *sb, str struct mbuf *control); void sbappendrecord(struct sockbuf *sb, struct mbuf *m0); void sbappendrecord_locked(struct sockbuf *sb, struct mbuf *m0); -void sbcheck(struct sockbuf *sb); void sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n); struct mbuf * sbcreatecontrol(caddr_t p, int size, int type, int level); @@ -162,59 +173,54 @@ void sbtoxsockbuf(struct sockbuf *sb, struct xsock int sbwait(struct sockbuf *sb); int sblock(struct sockbuf *sb, int flags); void sbunlock(struct sockbuf *sb); +void sballoc(struct sockbuf *, struct mbuf *); +void sbfree(struct sockbuf *, struct mbuf *); +void sbmtrim(struct sockbuf *, struct mbuf *, int); +int sbready(struct sockbuf *, struct mbuf *, int); +static inline u_int +sbavail(struct sockbuf *sb) +{ + +#if 0 + SOCKBUF_LOCK_ASSERT(sb); +#endif + return (sb->sb_acc); +} + +static inline u_int +sbused(struct sockbuf *sb) +{ + +#if 0 + SOCKBUF_LOCK_ASSERT(sb); +#endif + return (sb->sb_ccc); +} + /* * How much space is there in a socket buffer (so->so_snd or so->so_rcv)? * This is problematical if the fields are unsigned, as the space might - * still be negative (cc > hiwat or mbcnt > mbmax). Should detect - * overflow and return 0. Should use "lmin" but it doesn't exist now. + * still be negative (ccc > hiwat or mbcnt > mbmax). */ -static __inline -long +static inline long sbspace(struct sockbuf *sb) { - long bleft; - long mleft; + long bleft, mleft; +#if 0 + SOCKBUF_LOCK_ASSERT(sb); +#endif + if (sb->sb_flags & SB_STOP) return(0); - bleft = sb->sb_hiwat - sb->sb_cc; + + bleft = sb->sb_hiwat - sb->sb_ccc; mleft = sb->sb_mbmax - sb->sb_mbcnt; - return((bleft < mleft) ? bleft : mleft); -} -/* adjust counters in sb reflecting allocation of m */ -#define sballoc(sb, m) { \ - (sb)->sb_cc += (m)->m_len; \ - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ - (sb)->sb_ctl += (m)->m_len; \ - (sb)->sb_mbcnt += MSIZE; \ - (sb)->sb_mcnt += 1; \ - if ((m)->m_flags & M_EXT) { \ - (sb)->sb_mbcnt += (m)->m_ext.ext_size; \ - (sb)->sb_ccnt += 1; \ - } \ + return ((bleft < mleft) ? bleft : mleft); } -/* adjust counters in sb reflecting freeing of m */ -#define sbfree(sb, m) { \ - (sb)->sb_cc -= (m)->m_len; \ - if ((m)->m_type != MT_DATA && (m)->m_type != MT_OOBDATA) \ - (sb)->sb_ctl -= (m)->m_len; \ - (sb)->sb_mbcnt -= MSIZE; \ - (sb)->sb_mcnt -= 1; \ - if ((m)->m_flags & M_EXT) { \ - (sb)->sb_mbcnt -= (m)->m_ext.ext_size; \ - (sb)->sb_ccnt -= 1; \ - } \ - if ((sb)->sb_sndptr == (m)) { \ - (sb)->sb_sndptr = NULL; \ - (sb)->sb_sndptroff = 0; \ - } \ - if ((sb)->sb_sndptroff != 0) \ - (sb)->sb_sndptroff -= (m)->m_len; \ -} - #define SB_EMPTY_FIXUP(sb) do { \ if ((sb)->sb_mb == NULL) { \ (sb)->sb_mbtail = NULL; \ @@ -224,13 +230,15 @@ sbspace(struct sockbuf *sb) #ifdef SOCKBUF_DEBUG void sblastrecordchk(struct sockbuf *, const char *, int); +void sblastmbufchk(struct sockbuf *, const char *, int); +void sbcheck(struct sockbuf *, const char *, int); #define SBLASTRECORDCHK(sb) sblastrecordchk((sb), __FILE__, __LINE__) - -void sblastmbufchk(struct sockbuf *, const char *, int); #define SBLASTMBUFCHK(sb) sblastmbufchk((sb), __FILE__, __LINE__) +#define SBCHECK(sb) sbcheck((sb), __FILE__, __LINE__) #else -#define SBLASTRECORDCHK(sb) /* nothing */ -#define SBLASTMBUFCHK(sb) /* nothing */ +#define SBLASTRECORDCHK(sb) do {} while (0) +#define SBLASTMBUFCHK(sb) do {} while (0) +#define SBCHECK(sb) do {} while (0) #endif /* SOCKBUF_DEBUG */ #endif /* _KERNEL */ Index: sys/sys/protosw.h =================================================================== --- sys/sys/protosw.h (.../head) (revision 270879) +++ sys/sys/protosw.h (.../projects/sendfile) (revision 270881) @@ -208,6 +208,8 @@ struct pr_usrreqs { #define PRUS_OOB 0x1 #define PRUS_EOF 0x2 #define PRUS_MORETOCOME 0x4 +#define PRUS_NOTREADY 0x8 + int (*pru_ready)(struct socket *so, struct mbuf *m, int count); int (*pru_sense)(struct socket *so, struct stat *sb); int (*pru_shutdown)(struct socket *so); int (*pru_flush)(struct socket *so, int direction); @@ -251,6 +253,7 @@ int pru_rcvd_notsupp(struct socket *so, int flags) int pru_rcvoob_notsupp(struct socket *so, struct mbuf *m, int flags); int pru_send_notsupp(struct socket *so, int flags, struct mbuf *m, struct sockaddr *addr, struct mbuf *control, struct thread *td); +int pru_ready_notsupp(struct socket *so, struct mbuf *m, int count); int pru_sense_null(struct socket *so, struct stat *sb); int pru_shutdown_notsupp(struct socket *so); int pru_sockaddr_notsupp(struct socket *so, struct sockaddr **nam); Index: sys/sys/mbuf.h =================================================================== --- sys/sys/mbuf.h (.../head) (revision 270879) +++ sys/sys/mbuf.h (.../projects/sendfile) (revision 270881) @@ -330,12 +330,13 @@ struct mbuf { * External mbuf storage buffer types. */ #define EXT_CLUSTER 1 /* mbuf cluster */ -#define EXT_SFBUF 2 /* sendfile(2)'s sf_bufs */ +#define EXT_SFBUF 2 /* sendfile(2)'s sf_buf */ #define EXT_JUMBOP 3 /* jumbo cluster 4096 bytes */ #define EXT_JUMBO9 4 /* jumbo cluster 9216 bytes */ #define EXT_JUMBO16 5 /* jumbo cluster 16184 bytes */ #define EXT_PACKET 6 /* mbuf+cluster from packet zone */ #define EXT_MBUF 7 /* external mbuf reference (M_IOVEC) */ +#define EXT_SFBUF_NOCACHE 8 /* sendfile(2)'s sf_buf not to be cached */ #define EXT_VENDOR1 224 /* for vendor-internal use */ #define EXT_VENDOR2 225 /* for vendor-internal use */ @@ -384,6 +385,7 @@ struct mbuf { */ void sf_ext_ref(void *, void *); void sf_ext_free(void *, void *); +void sf_ext_free_nocache(void *, void *); /* * Flags indicating checksum, segmentation and other offload work to be @@ -929,7 +931,7 @@ struct mbuf *m_copypacket(struct mbuf *, int); void m_copy_pkthdr(struct mbuf *, struct mbuf *); struct mbuf *m_copyup(struct mbuf *, int, int); struct mbuf *m_defrag(struct mbuf *, int); -void m_demote(struct mbuf *, int); +void m_demote(struct mbuf *, int, int); struct mbuf *m_devget(char *, int, int, struct ifnet *, void (*)(char *, caddr_t, u_int)); struct mbuf *m_dup(struct mbuf *, int); Index: sys/sys/socketvar.h =================================================================== --- sys/sys/socketvar.h (.../head) (revision 270879) +++ sys/sys/socketvar.h (.../projects/sendfile) (revision 270881) @@ -207,7 +207,7 @@ struct xsocket { /* can we read something from so? */ #define soreadabledata(so) \ - ((so)->so_rcv.sb_cc >= (so)->so_rcv.sb_lowat || \ + (sbavail(&(so)->so_rcv) >= (so)->so_rcv.sb_lowat || \ !TAILQ_EMPTY(&(so)->so_comp) || (so)->so_error) #define soreadable(so) \ (soreadabledata(so) || ((so)->so_rcv.sb_state & SBS_CANTRCVMORE)) Index: sys/rpc/svc_vc.c =================================================================== --- sys/rpc/svc_vc.c (.../head) (revision 270879) +++ sys/rpc/svc_vc.c (.../projects/sendfile) (revision 270881) @@ -546,7 +546,7 @@ svc_vc_ack(SVCXPRT *xprt, uint32_t *ack) { *ack = atomic_load_acq_32(&xprt->xp_snt_cnt); - *ack -= xprt->xp_socket->so_snd.sb_cc; + *ack -= sbused(&xprt->xp_socket->so_snd); return (TRUE); } Index: sys/rpc/clnt_vc.c =================================================================== --- sys/rpc/clnt_vc.c (.../head) (revision 270879) +++ sys/rpc/clnt_vc.c (.../projects/sendfile) (revision 270881) @@ -860,7 +860,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int * error condition */ do_read = FALSE; - if (so->so_rcv.sb_cc >= sizeof(uint32_t) + if (sbavail(&so->so_rcv) >= sizeof(uint32_t) || (so->so_rcv.sb_state & SBS_CANTRCVMORE) || so->so_error) do_read = TRUE; @@ -913,7 +913,7 @@ clnt_vc_soupcall(struct socket *so, void *arg, int * buffered. */ do_read = FALSE; - if (so->so_rcv.sb_cc >= ct->ct_record_resid + if (sbavail(&so->so_rcv) >= ct->ct_record_resid || (so->so_rcv.sb_state & SBS_CANTRCVMORE) || so->so_error) do_read = TRUE; Index: sys/ufs/ffs/ffs_vnops.c =================================================================== --- sys/ufs/ffs/ffs_vnops.c (.../head) (revision 270879) +++ sys/ufs/ffs/ffs_vnops.c (.../projects/sendfile) (revision 270881) @@ -105,6 +105,7 @@ extern int ffs_rawread(struct vnode *vp, struct ui static vop_fsync_t ffs_fsync; static vop_lock1_t ffs_lock; static vop_getpages_t ffs_getpages; +static vop_getpages_async_t ffs_getpages_async; static vop_read_t ffs_read; static vop_write_t ffs_write; static int ffs_extread(struct vnode *vp, struct uio *uio, int ioflag); @@ -125,6 +126,7 @@ struct vop_vector ffs_vnodeops1 = { .vop_default = &ufs_vnodeops, .vop_fsync = ffs_fsync, .vop_getpages = ffs_getpages, + .vop_getpages_async = ffs_getpages_async, .vop_lock1 = ffs_lock, .vop_read = ffs_read, .vop_reallocblks = ffs_reallocblks, @@ -847,18 +849,16 @@ ffs_write(ap) } /* - * get page routine + * Get page routines. */ static int -ffs_getpages(ap) - struct vop_getpages_args *ap; +ffs_getpages_checkvalid(vm_page_t *m, int count, int reqpage) { - int i; vm_page_t mreq; int pcount; - pcount = round_page(ap->a_count) / PAGE_SIZE; - mreq = ap->a_m[ap->a_reqpage]; + pcount = round_page(count) / PAGE_SIZE; + mreq = m[reqpage]; /* * if ANY DEV_BSIZE blocks are valid on a large filesystem block, @@ -870,24 +870,48 @@ static int if (mreq->valid) { if (mreq->valid != VM_PAGE_BITS_ALL) vm_page_zero_invalid(mreq, TRUE); - for (i = 0; i < pcount; i++) { - if (i != ap->a_reqpage) { - vm_page_lock(ap->a_m[i]); - vm_page_free(ap->a_m[i]); - vm_page_unlock(ap->a_m[i]); + for (int i = 0; i < pcount; i++) { + if (i != reqpage) { + vm_page_lock(m[i]); + vm_page_free(m[i]); + vm_page_unlock(m[i]); } } VM_OBJECT_WUNLOCK(mreq->object); - return VM_PAGER_OK; + return (VM_PAGER_OK); } VM_OBJECT_WUNLOCK(mreq->object); - return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, - ap->a_count, - ap->a_reqpage); + return (-1); } +static int +ffs_getpages(struct vop_getpages_args *ap) +{ + int rv; + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); + if (rv == VM_PAGER_OK) + return (rv); + + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + ap->a_reqpage, NULL, NULL)); +} + +static int +ffs_getpages_async(struct vop_getpages_async_args *ap) +{ + int rv; + + rv = ffs_getpages_checkvalid(ap->a_m, ap->a_count, ap->a_reqpage); + if (rv == VM_PAGER_OK) { + (ap->a_vop_getpages_iodone)(ap->a_arg); + return (rv); + } + return (vnode_pager_generic_getpages(ap->a_vp, ap->a_m, ap->a_count, + ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg)); +} + /* * Extended attribute area reading. */ Index: sys/kern/uipc_domain.c =================================================================== --- sys/kern/uipc_domain.c (.../head) (revision 270879) +++ sys/kern/uipc_domain.c (.../projects/sendfile) (revision 270881) @@ -152,6 +152,7 @@ protosw_init(struct protosw *pr) DEFAULT(pu->pru_sosend, sosend_generic); DEFAULT(pu->pru_soreceive, soreceive_generic); DEFAULT(pu->pru_sopoll, sopoll_generic); + DEFAULT(pu->pru_ready, pru_ready_notsupp); #undef DEFAULT if (pr->pr_init) (*pr->pr_init)(); Index: sys/kern/vnode_if.src =================================================================== --- sys/kern/vnode_if.src (.../head) (revision 270879) +++ sys/kern/vnode_if.src (.../projects/sendfile) (revision 270881) @@ -477,6 +477,19 @@ vop_getpages { }; +%% getpages_async vp L L L + +vop_getpages_async { + IN struct vnode *vp; + IN vm_page_t *m; + IN int count; + IN int reqpage; + IN vm_ooffset_t offset; + IN void (*vop_getpages_iodone)(void *); + IN void *arg; +}; + + %% putpages vp L L L vop_putpages { Index: sys/kern/uipc_sockbuf.c =================================================================== --- sys/kern/uipc_sockbuf.c (.../head) (revision 270879) +++ sys/kern/uipc_sockbuf.c (.../projects/sendfile) (revision 270881) @@ -68,7 +68,145 @@ static u_long sb_efficiency = 8; /* parameter for static struct mbuf *sbcut_internal(struct sockbuf *sb, int len); static void sbflush_internal(struct sockbuf *sb); +static void +sb_shift_nrdy(struct sockbuf *sb, struct mbuf *m) +{ + +#if 0 /* XXX: not yet: soclose() call path comes here w/o lock. */ + SOCKBUF_LOCK_ASSERT(sb); +#endif + KASSERT(m->m_flags & M_NOTREADY, ("%s: m %p !M_NOTREADY", __func__, m)); + + m = m->m_next; + while (m != NULL && !(m->m_flags & M_NOTREADY)) { + m->m_flags &= ~M_BLOCKED; + sb->sb_acc += m->m_len; + m = m->m_next; + } + + sb->sb_fnrdy = m; +} + +int +sbready(struct sockbuf *sb, struct mbuf *m, int count) +{ + u_int blocker; + + SOCKBUF_LOCK_ASSERT(sb); + + KASSERT(sb->sb_fnrdy != NULL, ("%s: sb %p NULL fnrdy", __func__, sb)); + + blocker = (sb->sb_fnrdy == m) ? M_BLOCKED : 0; + + for (int i = 0; i < count; i++, m = m->m_next) { + KASSERT(m->m_flags & M_NOTREADY, + ("%s: m %p !M_NOTREADY", __func__, m)); + m->m_flags &= ~(M_NOTREADY | blocker); + if (blocker) + sb->sb_acc += m->m_len; + } + + if (!blocker) + return (EINPROGRESS); + + /* This one was blocking all the queue. */ + for (; m && (m->m_flags & M_NOTREADY) == 0; m = m->m_next) { + KASSERT(m->m_flags & M_BLOCKED, + ("%s: m %p !M_BLOCKED", __func__, m)); + m->m_flags &= ~M_BLOCKED; + sb->sb_acc += m->m_len; + } + + sb->sb_fnrdy = m; + + return (0); +} + /* + * Adjust sockbuf state reflecting allocation of m. + */ +void +sballoc(struct sockbuf *sb, struct mbuf *m) +{ + + SOCKBUF_LOCK_ASSERT(sb); + + sb->sb_ccc += m->m_len; + + if (sb->sb_fnrdy == NULL) { + if (m->m_flags & M_NOTREADY) + sb->sb_fnrdy = m; + else + sb->sb_acc += m->m_len; + } else + m->m_flags |= M_BLOCKED; + + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) + sb->sb_ctl += m->m_len; + + sb->sb_mbcnt += MSIZE; + sb->sb_mcnt += 1; + + if (m->m_flags & M_EXT) { + sb->sb_mbcnt += m->m_ext.ext_size; + sb->sb_ccnt += 1; + } +} + +/* + * Adjust sockbuf state reflecting freeing of m. + */ +void +sbfree(struct sockbuf *sb, struct mbuf *m) +{ + +#if 0 /* XXX: not yet: soclose() call path comes here w/o lock. */ + SOCKBUF_LOCK_ASSERT(sb); +#endif + + sb->sb_ccc -= m->m_len; + + if (!(m->m_flags & M_NOTAVAIL)) + sb->sb_acc -= m->m_len; + + if (sb->sb_fnrdy == m) + sb_shift_nrdy(sb, m); + + if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) + sb->sb_ctl -= m->m_len; + + sb->sb_mbcnt -= MSIZE; + sb->sb_mcnt -= 1; + if (m->m_flags & M_EXT) { + sb->sb_mbcnt -= m->m_ext.ext_size; + sb->sb_ccnt -= 1; + } + + if (sb->sb_sndptr == m) { + sb->sb_sndptr = NULL; + sb->sb_sndptroff = 0; + } + if (sb->sb_sndptroff != 0) + sb->sb_sndptroff -= m->m_len; +} + +/* + * Trim some amount of data from (first?) mbuf in buffer. + */ +void +sbmtrim(struct sockbuf *sb, struct mbuf *m, int len) +{ + + SOCKBUF_LOCK_ASSERT(sb); + KASSERT(len < m->m_len, ("%s: m %p len %d", __func__, m, len)); + + m->m_data += len; + m->m_len -= len; + sb->sb_acc -= len; + sb->sb_ccc -= len; +} + +/* * Socantsendmore indicates that no more data will be sent on the socket; it * would normally be applied to a socket when the user informs the system * that no more data is to be sent, by the protocol code (in case @@ -127,7 +265,7 @@ sbwait(struct sockbuf *sb) SOCKBUF_LOCK_ASSERT(sb); sb->sb_flags |= SB_WAIT; - return (msleep_sbt(&sb->sb_cc, &sb->sb_mtx, + return (msleep_sbt(&sb->sb_acc, &sb->sb_mtx, (sb->sb_flags & SB_NOINTR) ? PSOCK : PSOCK | PCATCH, "sbwait", sb->sb_timeo, 0, 0)); } @@ -184,7 +322,7 @@ sowakeup(struct socket *so, struct sockbuf *sb) sb->sb_flags &= ~SB_SEL; if (sb->sb_flags & SB_WAIT) { sb->sb_flags &= ~SB_WAIT; - wakeup(&sb->sb_cc); + wakeup(&sb->sb_acc); } KNOTE_LOCKED(&sb->sb_sel.si_note, 0); if (sb->sb_upcall != NULL) { @@ -519,7 +657,7 @@ sbappend(struct sockbuf *sb, struct mbuf *m) * that is, a stream protocol (such as TCP). */ void -sbappendstream_locked(struct sockbuf *sb, struct mbuf *m) +sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags) { SOCKBUF_LOCK_ASSERT(sb); @@ -529,8 +667,8 @@ void SBLASTMBUFCHK(sb); /* Remove all packet headers and mbuf tags to get a pure data chain. */ - m_demote(m, 1); - + m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0); + sbcompress(sb, m, sb->sb_mbtail); sb->sb_lastrecord = sb->sb_mb; @@ -543,38 +681,59 @@ void * that is, a stream protocol (such as TCP). */ void -sbappendstream(struct sockbuf *sb, struct mbuf *m) +sbappendstream(struct sockbuf *sb, struct mbuf *m, int flags) { SOCKBUF_LOCK(sb); - sbappendstream_locked(sb, m); + sbappendstream_locked(sb, m, flags); SOCKBUF_UNLOCK(sb); } #ifdef SOCKBUF_DEBUG void -sbcheck(struct sockbuf *sb) +sbcheck(struct sockbuf *sb, const char *file, int line) { - struct mbuf *m; - struct mbuf *n = 0; - u_long len = 0, mbcnt = 0; + struct mbuf *m, *n, *fnrdy; + u_long acc, ccc, mbcnt; SOCKBUF_LOCK_ASSERT(sb); + acc = ccc = mbcnt = 0; + fnrdy = NULL; + for (m = sb->sb_mb; m; m = n) { n = m->m_nextpkt; for (; m; m = m->m_next) { - len += m->m_len; + if ((m->m_flags & M_NOTREADY) && fnrdy == NULL) { + if (m != sb->sb_fnrdy) { + printf("sb %p: fnrdy %p != m %p\n", + sb, sb->sb_fnrdy, m); + goto fail; + } + fnrdy = m; + } + if (fnrdy) { + if (!(m->m_flags & M_NOTAVAIL)) { + printf("sb %p: fnrdy %p, m %p is avail\n", + sb, sb->sb_fnrdy, m); + goto fail; + } + } else + acc += m->m_len; + ccc += m->m_len; mbcnt += MSIZE; if (m->m_flags & M_EXT) /*XXX*/ /* pretty sure this is bogus */ mbcnt += m->m_ext.ext_size; } } - if (len != sb->sb_cc || mbcnt != sb->sb_mbcnt) { - printf("cc %ld != %u || mbcnt %ld != %u\n", len, sb->sb_cc, - mbcnt, sb->sb_mbcnt); - panic("sbcheck"); + if (acc != sb->sb_acc || ccc != sb->sb_ccc || mbcnt != sb->sb_mbcnt) { + printf("acc %ld/%u ccc %ld/%u mbcnt %ld/%u\n", + acc, sb->sb_acc, ccc, sb->sb_ccc, mbcnt, sb->sb_mbcnt); + goto fail; } + return; +fail: + panic("%s from %s:%u", __func__, file, line); } #endif @@ -800,6 +959,7 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str if (n && (n->m_flags & M_EOR) == 0 && M_WRITABLE(n) && ((sb->sb_flags & SB_NOCOALESCE) == 0) && + !(m->m_flags & M_NOTREADY) && m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */ m->m_len <= M_TRAILINGSPACE(n) && n->m_type == m->m_type) { @@ -806,7 +966,9 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, str bcopy(mtod(m, caddr_t), mtod(n, caddr_t) + n->m_len, (unsigned)m->m_len); n->m_len += m->m_len; - sb->sb_cc += m->m_len; + sb->sb_ccc += m->m_len; + if (sb->sb_fnrdy == NULL) + sb->sb_acc += m->m_len; if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) /* XXX: Probably don't need.*/ sb->sb_ctl += m->m_len; @@ -843,13 +1005,13 @@ sbflush_internal(struct sockbuf *sb) * Don't call sbcut(sb, 0) if the leading mbuf is non-empty: * we would loop forever. Panic instead. */ - if (!sb->sb_cc && (sb->sb_mb == NULL || sb->sb_mb->m_len)) + if (sb->sb_ccc == 0 && (sb->sb_mb == NULL || sb->sb_mb->m_len)) break; - m_freem(sbcut_internal(sb, (int)sb->sb_cc)); + m_freem(sbcut_internal(sb, (int)sb->sb_ccc)); } - if (sb->sb_cc || sb->sb_mb || sb->sb_mbcnt) - panic("sbflush_internal: cc %u || mb %p || mbcnt %u", - sb->sb_cc, (void *)sb->sb_mb, sb->sb_mbcnt); + KASSERT(sb->sb_ccc == 0 && sb->sb_mb == 0 && sb->sb_mbcnt == 0, + ("%s: ccc %u mb %p mbcnt %u", __func__, + sb->sb_ccc, (void *)sb->sb_mb, sb->sb_mbcnt)); } void @@ -891,7 +1053,9 @@ sbcut_internal(struct sockbuf *sb, int len) if (m->m_len > len) { m->m_len -= len; m->m_data += len; - sb->sb_cc -= len; + sb->sb_ccc -= len; + if (!(m->m_flags & M_NOTAVAIL)) + sb->sb_acc -= len; if (sb->sb_sndptroff != 0) sb->sb_sndptroff -= len; if (m->m_type != MT_DATA && m->m_type != MT_OOBDATA) @@ -977,8 +1141,8 @@ sbsndptr(struct sockbuf *sb, u_int off, u_int len, struct mbuf *m, *ret; KASSERT(sb->sb_mb != NULL, ("%s: sb_mb is NULL", __func__)); - KASSERT(off + len <= sb->sb_cc, ("%s: beyond sb", __func__)); - KASSERT(sb->sb_sndptroff <= sb->sb_cc, ("%s: sndptroff broken", __func__)); + KASSERT(off + len <= sb->sb_acc, ("%s: beyond sb", __func__)); + KASSERT(sb->sb_sndptroff <= sb->sb_acc, ("%s: sndptroff broken", __func__)); /* * Is off below stored offset? Happens on retransmits. @@ -1096,7 +1260,7 @@ void sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) { - xsb->sb_cc = sb->sb_cc; + xsb->sb_cc = sb->sb_ccc; xsb->sb_hiwat = sb->sb_hiwat; xsb->sb_mbcnt = sb->sb_mbcnt; xsb->sb_mcnt = sb->sb_mcnt; Index: sys/kern/uipc_syscalls.c =================================================================== --- sys/kern/uipc_syscalls.c (.../head) (revision 270879) +++ sys/kern/uipc_syscalls.c (.../projects/sendfile) (revision 270881) @@ -132,9 +132,10 @@ static int filt_sfsync(struct knote *kn, long hint */ static SYSCTL_NODE(_kern_ipc, OID_AUTO, sendfile, CTLFLAG_RW, 0, "sendfile(2) tunables"); -static int sfreadahead = 1; + +static int sfreadahead = 0; SYSCTL_INT(_kern_ipc_sendfile, OID_AUTO, readahead, CTLFLAG_RW, - &sfreadahead, 0, "Number of sendfile(2) read-ahead MAXBSIZE blocks"); + &sfreadahead, 0, "Read this more pages than socket buffer can accept"); #ifdef SFSYNC_DEBUG static int sf_sync_debug = 0; @@ -2035,6 +2036,37 @@ sf_ext_free(void *arg1, void *arg2) } /* + * Same as above, but forces the page to be detached from the object + * and go into free pool. + */ +void +sf_ext_free_nocache(void *arg1, void *arg2) +{ + struct sf_buf *sf = arg1; + struct sendfile_sync *sfs = arg2; + vm_page_t pg = sf_buf_page(sf); + + sf_buf_free(sf); + + vm_page_lock(pg); + vm_page_unwire(pg, 0); + if (pg->wire_count == 0) { + vm_object_t obj; + + if ((obj = pg->object) == NULL) + vm_page_free(pg); + else if (!vm_page_xbusied(pg) && VM_OBJECT_TRYWLOCK(obj)) { + vm_page_free(pg); + VM_OBJECT_WUNLOCK(obj); + } + } + vm_page_unlock(pg); + + if (sfs != NULL) + sf_sync_deref(sfs); +} + +/* * Called to remove a reference to a sf_sync object. * * This is generally done during the mbuf free path to signify @@ -2627,106 +2659,168 @@ freebsd4_sendfile(struct thread *td, struct freebs } #endif /* COMPAT_FREEBSD4 */ + /* + * How much data to put into page i of n. + * Only first and last pages are special. + */ +static inline off_t +xfsize(int i, int n, off_t off, off_t len) +{ + + if (i == 0) + return (omin(PAGE_SIZE - (off & PAGE_MASK), len)); + + if (i == n - 1 && ((off + len) & PAGE_MASK) > 0) + return ((off + len) & PAGE_MASK); + + return (PAGE_SIZE); +} + +/* + * Offset within object for i page. + */ +static inline vm_offset_t +vmoff(int i, off_t off) +{ + + if (i == 0) + return ((vm_offset_t)off); + + return (trunc_page(off + i * PAGE_SIZE)); +} + +/* + * Pretend as if we don't have enough space, subtract xfsize() of + * all pages that failed. + */ +static inline void +fixspace(int old, int new, off_t off, int *space) +{ + + KASSERT(old > new, ("%s: old %d new %d", __func__, old, new)); + + /* Subtract last one. */ + *space -= xfsize(old - 1, old, off, *space); + old--; + + if (new == old) + /* There was only one page. */ + return; + + /* Subtract first one. */ + if (new == 0) { + *space -= xfsize(0, old, off, *space); + new++; + } + + /* Rest of pages are full sized. */ + *space -= (old - new) * PAGE_SIZE; + + KASSERT(*space >= 0, ("%s: space went backwards", __func__)); +} + +struct sf_io { + u_int nios; + int npages; + struct file *sock_fp; + struct mbuf *m; + vm_page_t pa[]; +}; + +static void +sf_io_done(void *arg) +{ + struct sf_io *sfio = arg; + struct socket *so; + + if (!refcount_release(&sfio->nios)) + return; + + so = sfio->sock_fp->f_data; + + (void)(so->so_proto->pr_usrreqs->pru_ready)(so, sfio->m, sfio->npages); + + /* XXXGL: curthread */ + fdrop(sfio->sock_fp, curthread); + free(sfio, M_TEMP); +} + static int -sendfile_readpage(vm_object_t obj, struct vnode *vp, int nd, - off_t off, int xfsize, int bsize, struct thread *td, vm_page_t *res) +sendfile_swapin(vm_object_t obj, struct sf_io *sfio, off_t off, off_t len, + int npages, int rhpages) { - vm_page_t m; - vm_pindex_t pindex; - ssize_t resid; - int error, readahead, rv; + vm_page_t *pa = sfio->pa; + int nios; - pindex = OFF_TO_IDX(off); + nios = 0; VM_OBJECT_WLOCK(obj); - m = vm_page_grab(obj, pindex, (vp != NULL ? VM_ALLOC_NOBUSY | - VM_ALLOC_IGN_SBUSY : 0) | VM_ALLOC_WIRED | VM_ALLOC_NORMAL); + for (int i = 0; i < npages; i++) + pa[i] = vm_page_grab(obj, OFF_TO_IDX(vmoff(i, off)), + VM_ALLOC_WIRED | VM_ALLOC_NORMAL); - /* - * Check if page is valid for what we need, otherwise initiate I/O. - * - * The non-zero nd argument prevents disk I/O, instead we - * return the caller what he specified in nd. In particular, - * if we already turned some pages into mbufs, nd == EAGAIN - * and the main function send them the pages before we come - * here again and block. - */ - if (m->valid != 0 && vm_page_is_valid(m, off & PAGE_MASK, xfsize)) { - if (vp == NULL) - vm_page_xunbusy(m); - VM_OBJECT_WUNLOCK(obj); - *res = m; - return (0); - } else if (nd != 0) { - if (vp == NULL) - vm_page_xunbusy(m); - error = nd; - goto free_page; - } + for (int i = 0; i < npages;) { + int j, a, count, rv; - /* - * Get the page from backing store. - */ - error = 0; - if (vp != NULL) { - VM_OBJECT_WUNLOCK(obj); - readahead = sfreadahead * MAXBSIZE; + if (vm_page_is_valid(pa[i], vmoff(i, off) & PAGE_MASK, + xfsize(i, npages, off, len))) { + vm_page_xunbusy(pa[i]); + i++; + continue; + } - /* - * Use vn_rdwr() instead of the pager interface for - * the vnode, to allow the read-ahead. - * - * XXXMAC: Because we don't have fp->f_cred here, we - * pass in NOCRED. This is probably wrong, but is - * consistent with our original implementation. - */ - error = vn_rdwr(UIO_READ, vp, NULL, readahead, trunc_page(off), - UIO_NOCOPY, IO_NODELOCKED | IO_VMIO | ((readahead / - bsize) << IO_SEQSHIFT), td->td_ucred, NOCRED, &resid, td); - SFSTAT_INC(sf_iocnt); - VM_OBJECT_WLOCK(obj); - } else { - if (vm_pager_has_page(obj, pindex, NULL, NULL)) { - rv = vm_pager_get_pages(obj, &m, 1, 0); - SFSTAT_INC(sf_iocnt); - m = vm_page_lookup(obj, pindex); - if (m == NULL) - error = EIO; - else if (rv != VM_PAGER_OK) { - vm_page_lock(m); - vm_page_free(m); - vm_page_unlock(m); - m = NULL; - error = EIO; + for (j = i + 1; j < npages; j++) + if (vm_page_is_valid(pa[j], vmoff(j, off) & PAGE_MASK, + xfsize(j, npages, off, len))) + break; + + while (!vm_pager_has_page(obj, OFF_TO_IDX(vmoff(i, off)), + NULL, &a) && i < j) { + pmap_zero_page(pa[i]); + pa[i]->valid = VM_PAGE_BITS_ALL; + pa[i]->dirty = 0; + vm_page_xunbusy(pa[i]); + i++; + } + if (i == j) + continue; + + count = min(a + 1, npages + rhpages - i); + for (j = npages; j < i + count; j++) { + pa[j] = vm_page_grab(obj, OFF_TO_IDX(vmoff(j, off)), + VM_ALLOC_NORMAL | VM_ALLOC_NOWAIT); + if (pa[j] == NULL) { + count = j - i; + break; } - } else { - pmap_zero_page(m); - m->valid = VM_PAGE_BITS_ALL; - m->dirty = 0; + if (pa[j]->valid) { + vm_page_xunbusy(pa[j]); + count = j - i; + break; + } } - if (m != NULL) - vm_page_xunbusy(m); + + refcount_acquire(&sfio->nios); + rv = vm_pager_get_pages_async(obj, pa + i, count, 0, + &sf_io_done, sfio); + + KASSERT(rv == VM_PAGER_OK, ("%s: pager fail obj %p page %p", + __func__, obj, pa[i])); + + SFSTAT_INC(sf_iocnt); + nios++; + + for (j = i; j < i + count && j < npages; j++) + KASSERT(pa[j] == vm_page_lookup(obj, + OFF_TO_IDX(vmoff(j, off))), + ("pa[j] %p lookup %p\n", pa[j], + vm_page_lookup(obj, OFF_TO_IDX(vmoff(j, off))))); + + i += count; } - if (error == 0) { - *res = m; - } else if (m != NULL) { -free_page: - vm_page_lock(m); - vm_page_unwire(m, PQ_INACTIVE); - /* - * See if anyone else might know about this page. If - * not and it is not valid, then free it. - */ - if (m->wire_count == 0 && m->valid == 0 && !vm_page_busied(m)) - vm_page_free(m); - vm_page_unlock(m); - } - KASSERT(error != 0 || (m->wire_count > 0 && - vm_page_is_valid(m, off & PAGE_MASK, xfsize)), - ("wrong page state m %p off %#jx xfsize %d", m, (uintmax_t)off, - xfsize)); VM_OBJECT_WUNLOCK(obj); - return (error); + + return (nios); } static int @@ -2833,41 +2927,26 @@ vn_sendfile(struct file *fp, int sockfd, struct ui struct vnode *vp; struct vm_object *obj; struct socket *so; - struct mbuf *m; + struct mbuf *m, *mh, *mhtail; struct sf_buf *sf; - struct vm_page *pg; struct shmfd *shmfd; struct vattr va; - off_t off, xfsize, fsbytes, sbytes, rem, obj_size; - int error, bsize, nd, hdrlen, mnw; + off_t off, sbytes, rem, obj_size; + int error, serror, bsize, hdrlen; - pg = NULL; obj = NULL; so = NULL; - m = NULL; - fsbytes = sbytes = 0; - hdrlen = mnw = 0; - rem = nbytes; - obj_size = 0; + m = mh = NULL; + sbytes = 0; error = sendfile_getobj(td, fp, &obj, &vp, &shmfd, &obj_size, &bsize); if (error != 0) return (error); - if (rem == 0) - rem = obj_size; error = kern_sendfile_getsock(td, sockfd, &sock_fp, &so); if (error != 0) goto out; - /* - * Do not wait on memory allocations but return ENOMEM for - * caller to retry later. - * XXX: Experimental. - */ - if (flags & SF_MNOWAIT) - mnw = 1; - #ifdef MAC error = mac_socket_check_send(td->td_ucred, so); if (error != 0) @@ -2875,31 +2954,27 @@ vn_sendfile(struct file *fp, int sockfd, struct ui #endif /* If headers are specified copy them into mbufs. */ - if (hdr_uio != NULL) { + if (hdr_uio != NULL && hdr_uio->uio_resid > 0) { hdr_uio->uio_td = td; hdr_uio->uio_rw = UIO_WRITE; - if (hdr_uio->uio_resid > 0) { - /* - * In FBSD < 5.0 the nbytes to send also included - * the header. If compat is specified subtract the - * header size from nbytes. - */ - if (kflags & SFK_COMPAT) { - if (nbytes > hdr_uio->uio_resid) - nbytes -= hdr_uio->uio_resid; - else - nbytes = 0; - } - m = m_uiotombuf(hdr_uio, (mnw ? M_NOWAIT : M_WAITOK), - 0, 0, 0); - if (m == NULL) { - error = mnw ? EAGAIN : ENOBUFS; - goto out; - } - hdrlen = m_length(m, NULL); + /* + * In FBSD < 5.0 the nbytes to send also included + * the header. If compat is specified subtract the + * header size from nbytes. + */ + if (kflags & SFK_COMPAT) { + if (nbytes > hdr_uio->uio_resid) + nbytes -= hdr_uio->uio_resid; + else + nbytes = 0; } - } + mh = m_uiotombuf(hdr_uio, M_WAITOK, 0, 0, 0); + hdrlen = m_length(mh, &mhtail); + } else + hdrlen = 0; + rem = nbytes ? omin(nbytes, obj_size - offset) : obj_size - offset; + /* * Protect against multiple writers to the socket. * @@ -2919,21 +2994,13 @@ vn_sendfile(struct file *fp, int sockfd, struct ui * The outer loop checks the state and available space of the socket * and takes care of the overall progress. */ - for (off = offset; ; ) { + for (off = offset; rem > 0; ) { + struct sf_io *sfio; + vm_page_t *pa; struct mbuf *mtail; - int loopbytes; - int space; - int done; + int nios, space, npages, rhpages; - if ((nbytes != 0 && nbytes == fsbytes) || - (nbytes == 0 && obj_size == fsbytes)) - break; - mtail = NULL; - loopbytes = 0; - space = 0; - done = 0; - /* * Check the socket state for ongoing connection, * no errors and space in socket buffer. @@ -3009,53 +3076,44 @@ retry_space: VOP_UNLOCK(vp, 0); goto done; } - obj_size = va.va_size; + if (va.va_size != obj_size) { + if (nbytes == 0) + rem += va.va_size - obj_size; + else if (offset + nbytes > va.va_size) + rem -= (offset + nbytes - va.va_size); + obj_size = va.va_size; + } } + if (space > rem) + space = rem; + + if (off & PAGE_MASK) + npages = 1 + howmany(space - + (PAGE_SIZE - (off & PAGE_MASK)), PAGE_SIZE); + else + npages = howmany(space, PAGE_SIZE); + + rhpages = SF_READAHEAD(flags) ? + SF_READAHEAD(flags) : sfreadahead; + rhpages = min(howmany(obj_size - (off & ~PAGE_MASK) - + (npages * PAGE_SIZE), PAGE_SIZE), rhpages); + + sfio = malloc(sizeof(struct sf_io) + + (rhpages + npages) * sizeof(vm_page_t), M_TEMP, M_WAITOK); + refcount_init(&sfio->nios, 1); + + nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages); + /* * Loop and construct maximum sized mbuf chain to be bulk * dumped into socket buffer. */ - while (space > loopbytes) { - vm_offset_t pgoff; + pa = sfio->pa; + for (int i = 0; i < npages; i++) { struct mbuf *m0; /* - * Calculate the amount to transfer. - * Not to exceed a page, the EOF, - * or the passed in nbytes. - */ - pgoff = (vm_offset_t)(off & PAGE_MASK); - rem = obj_size - offset; - if (nbytes != 0) - rem = omin(rem, nbytes); - rem -= fsbytes + loopbytes; - xfsize = omin(PAGE_SIZE - pgoff, rem); - xfsize = omin(space - loopbytes, xfsize); - if (xfsize <= 0) { - done = 1; /* all data sent */ - break; - } - - /* - * Attempt to look up the page. Allocate - * if not found or wait and loop if busy. - */ - if (m != NULL) - nd = EAGAIN; /* send what we already got */ - else if ((flags & SF_NODISKIO) != 0) - nd = EBUSY; - else - nd = 0; - error = sendfile_readpage(obj, vp, nd, off, - xfsize, bsize, td, &pg); - if (error != 0) { - if (error == EAGAIN) - error = 0; /* not a real error */ - break; - } - - /* * Get a sendfile buf. When allocating the * first buffer for mbuf chain, we usually * wait as long as necessary, but this wait @@ -3064,56 +3122,60 @@ retry_space: * threads might exhaust the buffers and then * deadlock. */ - sf = sf_buf_alloc(pg, (mnw || m != NULL) ? SFB_NOWAIT : - SFB_CATCH); + sf = sf_buf_alloc(pa[i], + m != NULL ? SFB_NOWAIT : SFB_CATCH); if (sf == NULL) { SFSTAT_INC(sf_allocfail); - vm_page_lock(pg); - vm_page_unwire(pg, PQ_INACTIVE); - KASSERT(pg->object != NULL, - ("%s: object disappeared", __func__)); - vm_page_unlock(pg); + for (int j = i; j < npages; j++) { + vm_page_lock(pa[j]); + vm_page_unwire(pa[j], PQ_INACTIVE); + vm_page_unlock(pa[j]); + } if (m == NULL) - error = (mnw ? EAGAIN : EINTR); + error = ENOBUFS; + fixspace(npages, i, off, &space); break; } /* - * Get an mbuf and set it up as having - * external storage. + * Get an mbuf and set it up. + * + * SF_NOCACHE sets the page as being freed upon send. + * However, we ignore it for the last page in 'space', + * if the page is truncated, and we got more data to + * send (rem > space), or if we have readahead + * configured (rhpages > 0). */ - m0 = m_get((mnw ? M_NOWAIT : M_WAITOK), MT_DATA); - if (m0 == NULL) { - error = (mnw ? EAGAIN : ENOBUFS); - sf_ext_free(sf, NULL); - break; - } - /* - * Attach EXT_SFBUF external storage. - */ - m0->m_ext.ext_buf = (caddr_t )sf_buf_kva(sf); + m0 = m_get(M_WAITOK, MT_DATA); + m0->m_ext.ext_buf = (char *)sf_buf_kva(sf); m0->m_ext.ext_size = PAGE_SIZE; m0->m_ext.ext_arg1 = sf; m0->m_ext.ext_arg2 = sfs; - m0->m_ext.ext_type = EXT_SFBUF; + if ((flags & SF_NOCACHE) == 0 || + (i == npages - 1 && + ((off + space) & PAGE_MASK) && + (rem > space || rhpages > 0))) + m0->m_ext.ext_type = EXT_SFBUF; + else + m0->m_ext.ext_type = EXT_SFBUF_NOCACHE; m0->m_ext.ext_flags = 0; - m0->m_flags |= (M_EXT|M_RDONLY); - m0->m_data = (char *)sf_buf_kva(sf) + pgoff; - m0->m_len = xfsize; + m0->m_flags |= (M_EXT | M_RDONLY); + if (nios) + m0->m_flags |= M_NOTREADY; + m0->m_data = (char *)sf_buf_kva(sf) + + (vmoff(i, off) & PAGE_MASK); + m0->m_len = xfsize(i, npages, off, space); + if (i == 0) + sfio->m = m0; + /* Append to mbuf chain. */ if (mtail != NULL) mtail->m_next = m0; - else if (m != NULL) - m_last(m)->m_next = m0; else m = m0; mtail = m0; - /* Keep track of bits processed. */ - loopbytes += xfsize; - off += xfsize; - /* * XXX eventually this should be a sfsync * method call! @@ -3125,47 +3187,51 @@ retry_space: if (vp != NULL) VOP_UNLOCK(vp, 0); + /* Keep track of bytes processed. */ + off += space; + rem -= space; + + /* Prepend header, if any. */ + if (hdrlen) { + mhtail->m_next = m; + m = mh; + mh = NULL; + } + + if (error) { + free(sfio, M_TEMP); + goto done; + } + /* Add the buffer chain to the socket buffer. */ - if (m != NULL) { - int mlen, err; + KASSERT(m_length(m, NULL) == space + hdrlen, + ("%s: mlen %u space %d hdrlen %d", + __func__, m_length(m, NULL), space, hdrlen)); - mlen = m_length(m, NULL); - SOCKBUF_LOCK(&so->so_snd); - if (so->so_snd.sb_state & SBS_CANTSENDMORE) { - error = EPIPE; - SOCKBUF_UNLOCK(&so->so_snd); - goto done; - } - SOCKBUF_UNLOCK(&so->so_snd); - CURVNET_SET(so->so_vnet); - /* Avoid error aliasing. */ - err = (*so->so_proto->pr_usrreqs->pru_send) - (so, 0, m, NULL, NULL, td); - CURVNET_RESTORE(); - if (err == 0) { - /* - * We need two counters to get the - * file offset and nbytes to send - * right: - * - sbytes contains the total amount - * of bytes sent, including headers. - * - fsbytes contains the total amount - * of bytes sent from the file. - */ - sbytes += mlen; - fsbytes += mlen; - if (hdrlen) { - fsbytes -= hdrlen; - hdrlen = 0; - } - } else if (error == 0) - error = err; - m = NULL; /* pru_send always consumes */ + CURVNET_SET(so->so_vnet); + if (nios == 0) { + free(sfio, M_TEMP); + serror = (*so->so_proto->pr_usrreqs->pru_send) + (so, 0, m, NULL, NULL, td); + } else { + sfio->sock_fp = sock_fp; + sfio->npages = npages; + fhold(sock_fp); + serror = (*so->so_proto->pr_usrreqs->pru_send) + (so, PRUS_NOTREADY, m, NULL, NULL, td); + sf_io_done(sfio); } + CURVNET_RESTORE(); - /* Quit outer loop on error or when we're done. */ - if (done) - break; + if (serror == 0) { + sbytes += space + hdrlen; + if (hdrlen) + hdrlen = 0; + } else if (error == 0) + error = serror; + m = NULL; /* pru_send always consumes */ + + /* Quit outer loop on error. */ if (error != 0) goto done; } @@ -3200,6 +3266,8 @@ out: fdrop(sock_fp, td); if (m) m_freem(m); + if (mh) + m_freem(mh); if (error == ERESTART) error = EINTR; Index: sys/kern/uipc_debug.c =================================================================== --- sys/kern/uipc_debug.c (.../head) (revision 270879) +++ sys/kern/uipc_debug.c (.../projects/sendfile) (revision 270881) @@ -403,7 +403,8 @@ db_print_sockbuf(struct sockbuf *sb, const char *s db_printf("sb_sndptroff: %u\n", sb->sb_sndptroff); db_print_indent(indent); - db_printf("sb_cc: %u ", sb->sb_cc); + db_printf("sb_acc: %u ", sb->sb_acc); + db_printf("sb_ccc: %u ", sb->sb_ccc); db_printf("sb_hiwat: %u ", sb->sb_hiwat); db_printf("sb_mbcnt: %u ", sb->sb_mbcnt); db_printf("sb_mbmax: %u\n", sb->sb_mbmax); Index: sys/kern/uipc_mbuf.c =================================================================== --- sys/kern/uipc_mbuf.c (.../head) (revision 270879) +++ sys/kern/uipc_mbuf.c (.../projects/sendfile) (revision 270881) @@ -300,6 +300,9 @@ mb_free_ext(struct mbuf *m) case EXT_SFBUF: sf_ext_free(m->m_ext.ext_arg1, m->m_ext.ext_arg2); break; + case EXT_SFBUF_NOCACHE: + sf_ext_free_nocache(m->m_ext.ext_arg1, m->m_ext.ext_arg2); + break; default: KASSERT(m->m_ext.ext_cnt != NULL, ("%s: no refcounting pointer on %p", __func__, m)); @@ -366,6 +369,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m) switch (m->m_ext.ext_type) { case EXT_SFBUF: + case EXT_SFBUF_NOCACHE: sf_ext_ref(m->m_ext.ext_arg1, m->m_ext.ext_arg2); break; default: @@ -388,7 +392,7 @@ mb_dupcl(struct mbuf *n, struct mbuf *m) * cleaned too. */ void -m_demote(struct mbuf *m0, int all) +m_demote(struct mbuf *m0, int all, int flags) { struct mbuf *m; @@ -404,7 +408,7 @@ void m_freem(m->m_nextpkt); m->m_nextpkt = NULL; } - m->m_flags = m->m_flags & (M_EXT|M_RDONLY|M_NOFREE); + m->m_flags = m->m_flags & (M_EXT | M_RDONLY | M_NOFREE | flags); } } Index: sys/kern/sys_socket.c =================================================================== --- sys/kern/sys_socket.c (.../head) (revision 270879) +++ sys/kern/sys_socket.c (.../projects/sendfile) (revision 270881) @@ -165,20 +165,17 @@ soo_ioctl(struct file *fp, u_long cmd, void *data, case FIONREAD: /* Unlocked read. */ - *(int *)data = so->so_rcv.sb_cc; + *(int *)data = sbavail(&so->so_rcv); break; case FIONWRITE: /* Unlocked read. */ - *(int *)data = so->so_snd.sb_cc; + *(int *)data = sbavail(&so->so_snd); break; case FIONSPACE: - if ((so->so_snd.sb_hiwat < so->so_snd.sb_cc) || - (so->so_snd.sb_mbmax < so->so_snd.sb_mbcnt)) - *(int *)data = 0; - else - *(int *)data = sbspace(&so->so_snd); + /* Unlocked read. */ + *(int *)data = sbspace(&so->so_snd); break; case FIOSETOWN: @@ -244,6 +241,7 @@ soo_stat(struct file *fp, struct stat *ub, struct struct thread *td) { struct socket *so = fp->f_data; + struct sockbuf *sb; #ifdef MAC int error; #endif @@ -259,15 +257,18 @@ soo_stat(struct file *fp, struct stat *ub, struct * If SBS_CANTRCVMORE is set, but there's still data left in the * receive buffer, the socket is still readable. */ - SOCKBUF_LOCK(&so->so_rcv); - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 || - so->so_rcv.sb_cc != 0) + sb = &so->so_rcv; + SOCKBUF_LOCK(sb); + if ((sb->sb_state & SBS_CANTRCVMORE) == 0 || sbavail(sb)) ub->st_mode |= S_IRUSR | S_IRGRP | S_IROTH; - ub->st_size = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; - SOCKBUF_UNLOCK(&so->so_rcv); - /* Unlocked read. */ - if ((so->so_snd.sb_state & SBS_CANTSENDMORE) == 0) + ub->st_size = sbavail(sb) - sb->sb_ctl; + SOCKBUF_UNLOCK(sb); + + sb = &so->so_snd; + SOCKBUF_LOCK(sb); + if ((sb->sb_state & SBS_CANTSENDMORE) == 0) ub->st_mode |= S_IWUSR | S_IWGRP | S_IWOTH; + SOCKBUF_UNLOCK(sb); ub->st_uid = so->so_cred->cr_uid; ub->st_gid = so->so_cred->cr_gid; return (*so->so_proto->pr_usrreqs->pru_sense)(so, ub); Index: sys/kern/uipc_usrreq.c =================================================================== --- sys/kern/uipc_usrreq.c (.../head) (revision 270879) +++ sys/kern/uipc_usrreq.c (.../projects/sendfile) (revision 270881) @@ -793,11 +793,10 @@ uipc_rcvd(struct socket *so, int flags) u_int mbcnt, sbcc; unp = sotounpcb(so); - KASSERT(unp != NULL, ("uipc_rcvd: unp == NULL")); + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_SEQPACKET, + ("%s: socktype %d", __func__, so->so_type)); - if (so->so_type != SOCK_STREAM && so->so_type != SOCK_SEQPACKET) - panic("uipc_rcvd socktype %d", so->so_type); - /* * Adjust backpressure on sender and wakeup any waiting to write. * @@ -810,7 +809,7 @@ uipc_rcvd(struct socket *so, int flags) */ SOCKBUF_LOCK(&so->so_rcv); mbcnt = so->so_rcv.sb_mbcnt; - sbcc = so->so_rcv.sb_cc; + sbcc = sbavail(&so->so_rcv); SOCKBUF_UNLOCK(&so->so_rcv); /* * There is a benign race condition at this point. If we're planning to @@ -846,7 +845,10 @@ uipc_send(struct socket *so, int flags, struct mbu int error = 0; unp = sotounpcb(so); - KASSERT(unp != NULL, ("uipc_send: unp == NULL")); + KASSERT(unp != NULL, ("%s: unp == NULL", __func__)); + KASSERT(so->so_type == SOCK_STREAM || so->so_type == SOCK_DGRAM || + so->so_type == SOCK_SEQPACKET, + ("%s: socktype %d", __func__, so->so_type)); if (flags & PRUS_OOB) { error = EOPNOTSUPP; @@ -997,8 +999,11 @@ uipc_send(struct socket *so, int flags, struct mbu } mbcnt = so2->so_rcv.sb_mbcnt; - sbcc = so2->so_rcv.sb_cc; - sorwakeup_locked(so2); + sbcc = sbavail(&so2->so_rcv); + if (sbcc) + sorwakeup_locked(so2); + else + SOCKBUF_UNLOCK(&so2->so_rcv); /* * The PCB lock on unp2 protects the SB_STOP flag. Without it, @@ -1014,9 +1019,6 @@ uipc_send(struct socket *so, int flags, struct mbu UNP_PCB_UNLOCK(unp2); m = NULL; break; - - default: - panic("uipc_send unknown socktype"); } /* @@ -1046,6 +1048,35 @@ release: } static int +uipc_ready(struct socket *so, struct mbuf *m, int count) +{ + struct unpcb *unp, *unp2; + struct socket *so2; + int error; + + unp = sotounpcb(so); + + UNP_LINK_RLOCK(); + unp2 = unp->unp_conn; + UNP_PCB_LOCK(unp2); + so2 = unp2->unp_socket; + + SOCKBUF_LOCK(&so2->so_rcv); + if (so2->so_rcv.sb_state & SBS_CANTRCVMORE) { + SOCKBUF_UNLOCK(&so2->so_rcv); + error = ENOTCONN; + } else if ((error = sbready(&so2->so_rcv, m, count)) == 0) + sorwakeup_locked(so2); + else + SOCKBUF_UNLOCK(&so2->so_rcv); + + UNP_PCB_UNLOCK(unp2); + UNP_LINK_RUNLOCK(); + + return (error); +} + +static int uipc_sense(struct socket *so, struct stat *sb) { struct unpcb *unp; @@ -1115,6 +1146,7 @@ static struct pr_usrreqs uipc_usrreqs_dgram = { .pru_peeraddr = uipc_peeraddr, .pru_rcvd = uipc_rcvd, .pru_send = uipc_send, + .pru_ready = uipc_ready, .pru_sense = uipc_sense, .pru_shutdown = uipc_shutdown, .pru_sockaddr = uipc_sockaddr, @@ -1137,6 +1169,7 @@ static struct pr_usrreqs uipc_usrreqs_seqpacket = .pru_peeraddr = uipc_peeraddr, .pru_rcvd = uipc_rcvd, .pru_send = uipc_send, + .pru_ready = uipc_ready, .pru_sense = uipc_sense, .pru_shutdown = uipc_shutdown, .pru_sockaddr = uipc_sockaddr, @@ -1159,6 +1192,7 @@ static struct pr_usrreqs uipc_usrreqs_stream = { .pru_peeraddr = uipc_peeraddr, .pru_rcvd = uipc_rcvd, .pru_send = uipc_send, + .pru_ready = uipc_ready, .pru_sense = uipc_sense, .pru_shutdown = uipc_shutdown, .pru_sockaddr = uipc_sockaddr, Index: sys/kern/vfs_default.c =================================================================== --- sys/kern/vfs_default.c (.../head) (revision 270879) +++ sys/kern/vfs_default.c (.../projects/sendfile) (revision 270881) @@ -111,6 +111,7 @@ struct vop_vector default_vnodeops = { .vop_close = VOP_NULL, .vop_fsync = VOP_NULL, .vop_getpages = vop_stdgetpages, + .vop_getpages_async = vop_stdgetpages_async, .vop_getwritemount = vop_stdgetwritemount, .vop_inactive = VOP_NULL, .vop_ioctl = VOP_ENOTTY, @@ -726,10 +727,19 @@ vop_stdgetpages(ap) { return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, - ap->a_count, ap->a_reqpage); + ap->a_count, ap->a_reqpage, NULL, NULL); } +/* XXX Needs good comment and a manpage. */ int +vop_stdgetpages_async(struct vop_getpages_async_args *ap) +{ + + return vnode_pager_generic_getpages(ap->a_vp, ap->a_m, + ap->a_count, ap->a_reqpage, ap->a_vop_getpages_iodone, ap->a_arg); +} + +int vop_stdkqfilter(struct vop_kqfilter_args *ap) { return vfs_kqfilter(ap); Index: sys/kern/uipc_socket.c =================================================================== --- sys/kern/uipc_socket.c (.../head) (revision 270879) +++ sys/kern/uipc_socket.c (.../projects/sendfile) (revision 270881) @@ -1526,12 +1526,12 @@ restart: * 2. MSG_DONTWAIT is not set */ if (m == NULL || (((flags & MSG_DONTWAIT) == 0 && - so->so_rcv.sb_cc < uio->uio_resid) && - so->so_rcv.sb_cc < so->so_rcv.sb_lowat && + sbavail(&so->so_rcv) < uio->uio_resid) && + sbavail(&so->so_rcv) < so->so_rcv.sb_lowat && m->m_nextpkt == NULL && (pr->pr_flags & PR_ATOMIC) == 0)) { - KASSERT(m != NULL || !so->so_rcv.sb_cc, - ("receive: m == %p so->so_rcv.sb_cc == %u", - m, so->so_rcv.sb_cc)); + KASSERT(m != NULL || !sbavail(&so->so_rcv), + ("receive: m == %p sbavail == %u", + m, sbavail(&so->so_rcv))); if (so->so_error) { if (m != NULL) goto dontblock; @@ -1710,7 +1710,8 @@ dontblock: */ moff = 0; offset = 0; - while (m != NULL && uio->uio_resid > 0 && error == 0) { + while (m != NULL && !(m->m_flags & M_NOTAVAIL) && uio->uio_resid > 0 + && error == 0) { /* * If the type of mbuf has changed since the last mbuf * examined ('type'), end the receive operation. @@ -1813,9 +1814,7 @@ dontblock: SOCKBUF_LOCK(&so->so_rcv); } } - m->m_data += len; - m->m_len -= len; - so->so_rcv.sb_cc -= len; + sbmtrim(&so->so_rcv, m, len); } } SOCKBUF_LOCK_ASSERT(&so->so_rcv); @@ -1980,7 +1979,7 @@ restart: /* Abort if socket has reported problems. */ if (so->so_error) { - if (sb->sb_cc > 0) + if (sbavail(sb) > 0) goto deliver; if (oresid > uio->uio_resid) goto out; @@ -1992,7 +1991,7 @@ restart: /* Door is closed. Deliver what is left, if any. */ if (sb->sb_state & SBS_CANTRCVMORE) { - if (sb->sb_cc > 0) + if (sbavail(sb) > 0) goto deliver; else goto out; @@ -1999,7 +1998,7 @@ restart: } /* Socket buffer is empty and we shall not block. */ - if (sb->sb_cc == 0 && + if (sbavail(sb) == 0 && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { error = EAGAIN; goto out; @@ -2006,18 +2005,18 @@ restart: } /* Socket buffer got some data that we shall deliver now. */ - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && + if (sbavail(sb) > 0 && !(flags & MSG_WAITALL) && ((sb->sb_flags & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)) || - sb->sb_cc >= sb->sb_lowat || - sb->sb_cc >= uio->uio_resid || - sb->sb_cc >= sb->sb_hiwat) ) { + sbavail(sb) >= sb->sb_lowat || + sbavail(sb) >= uio->uio_resid || + sbavail(sb) >= sb->sb_hiwat) ) { goto deliver; } /* On MSG_WAITALL we must wait until all data or error arrives. */ if ((flags & MSG_WAITALL) && - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_hiwat)) + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_hiwat)) goto deliver; /* @@ -2031,7 +2030,7 @@ restart: deliver: SOCKBUF_LOCK_ASSERT(&so->so_rcv); - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); + KASSERT(sbavail(sb) > 0, ("%s: sockbuf empty", __func__)); KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); /* Statistics. */ @@ -2039,7 +2038,7 @@ deliver: uio->uio_td->td_ru.ru_msgrcv++; /* Fill uio until full or current end of socket buffer is reached. */ - len = min(uio->uio_resid, sb->sb_cc); + len = min(uio->uio_resid, sbavail(sb)); if (mp0 != NULL) { /* Dequeue as many mbufs as possible. */ if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { @@ -2050,6 +2049,8 @@ deliver: for (m = sb->sb_mb; m != NULL && m->m_len <= len; m = m->m_next) { + KASSERT(!(m->m_flags & M_NOTAVAIL), + ("%s: m %p not available", __func__, m)); len -= m->m_len; uio->uio_resid -= m->m_len; sbfree(sb, m); @@ -2174,9 +2175,9 @@ soreceive_dgram(struct socket *so, struct sockaddr */ SOCKBUF_LOCK(&so->so_rcv); while ((m = so->so_rcv.sb_mb) == NULL) { - KASSERT(so->so_rcv.sb_cc == 0, - ("soreceive_dgram: sb_mb NULL but sb_cc %u", - so->so_rcv.sb_cc)); + KASSERT(sbavail(&so->so_rcv) == 0, + ("soreceive_dgram: sb_mb NULL but sbavail %u", + sbavail(&so->so_rcv))); if (so->so_error) { error = so->so_error; so->so_error = 0; @@ -3178,6 +3179,13 @@ pru_send_notsupp(struct socket *so, int flags, str return EOPNOTSUPP; } +int +pru_ready_notsupp(struct socket *so, struct mbuf *m, int count) +{ + + return (EOPNOTSUPP); +} + /* * This isn't really a ``null'' operation, but it's the default one and * doesn't do anything destructive. @@ -3249,7 +3257,7 @@ filt_soread(struct knote *kn, long hint) so = kn->kn_fp->f_data; SOCKBUF_LOCK_ASSERT(&so->so_rcv); - kn->kn_data = so->so_rcv.sb_cc - so->so_rcv.sb_ctl; + kn->kn_data = sbavail(&so->so_rcv) - so->so_rcv.sb_ctl; if (so->so_rcv.sb_state & SBS_CANTRCVMORE) { kn->kn_flags |= EV_EOF; kn->kn_fflags = so->so_error; @@ -3261,7 +3269,7 @@ filt_soread(struct knote *kn, long hint) if (kn->kn_data >= kn->kn_sdata) return 1; } else { - if (so->so_rcv.sb_cc >= so->so_rcv.sb_lowat) + if (sbavail(&so->so_rcv) >= so->so_rcv.sb_lowat) return 1; } @@ -3456,7 +3464,7 @@ soisdisconnected(struct socket *so) sorwakeup_locked(so); SOCKBUF_LOCK(&so->so_snd); so->so_snd.sb_state |= SBS_CANTSENDMORE; - sbdrop_locked(&so->so_snd, so->so_snd.sb_cc); + sbdrop_locked(&so->so_snd, sbused(&so->so_snd)); sowwakeup_locked(so); wakeup(&so->so_timeo); } Index: sys/tools/vnode_if.awk =================================================================== --- sys/tools/vnode_if.awk (.../head) (revision 270879) +++ sys/tools/vnode_if.awk (.../projects/sendfile) (revision 270881) @@ -254,16 +254,26 @@ while ((getline < srcfile) > 0) { if (sub(/;$/, "") < 1) die("Missing end-of-line ; in \"%s\".", $0); - # pick off variable name - if ((argp = match($0, /[A-Za-z0-9_]+$/)) < 1) - die("Missing var name \"a_foo\" in \"%s\".", $0); - args[numargs] = substr($0, argp); - $0 = substr($0, 1, argp - 1); - - # what is left must be type - # remove trailing space (if any) - sub(/ $/, ""); - types[numargs] = $0; + # pick off argument name + if ((argp = match($0, /[A-Za-z0-9_]+$/)) > 0) { + args[numargs] = substr($0, argp); + $0 = substr($0, 1, argp - 1); + sub(/ $/, ""); + delete fargs[numargs]; + types[numargs] = $0; + } else { # try to parse a function pointer argument + if ((argp = match($0, + /\(\*[A-Za-z0-9_]+\)\([A-Za-z0-9_*, ]+\)$/)) < 1) + die("Missing var name \"a_foo\" in \"%s\".", + $0); + args[numargs] = substr($0, argp + 2); + sub(/\).+/, "", args[numargs]); + fargs[numargs] = substr($0, argp); + sub(/^\([^)]+\)/, "", fargs[numargs]); + $0 = substr($0, 1, argp - 1); + sub(/ $/, ""); + types[numargs] = $0; + } } if (numargs > 4) ctrargs = 4; @@ -286,8 +296,13 @@ while ((getline < srcfile) > 0) { if (hfile) { # Print out the vop_F_args structure. printh("struct "name"_args {\n\tstruct vop_generic_args a_gen;"); - for (i = 0; i < numargs; ++i) - printh("\t" t_spc(types[i]) "a_" args[i] ";"); + for (i = 0; i < numargs; ++i) { + if (fargs[i]) { + printh("\t" t_spc(types[i]) "(*a_" args[i] \ + ")" fargs[i] ";"); + } else + printh("\t" t_spc(types[i]) "a_" args[i] ";"); + } printh("};"); printh(""); @@ -301,8 +316,14 @@ while ((getline < srcfile) > 0) { printh(""); printh("static __inline int " uname "("); for (i = 0; i < numargs; ++i) { - printh("\t" t_spc(types[i]) args[i] \ - (i < numargs - 1 ? "," : ")")); + if (fargs[i]) { + printh("\t" t_spc(types[i]) "(*" args[i] \ + ")" fargs[i] \ + (i < numargs - 1 ? "," : ")")); + } else { + printh("\t" t_spc(types[i]) args[i] \ + (i < numargs - 1 ? "," : ")")); + } } printh("{"); printh("\tstruct " name "_args a;"); Index: sys/netinet/sctp_var.h =================================================================== --- sys/netinet/sctp_var.h (.../head) (revision 270879) +++ sys/netinet/sctp_var.h (.../projects/sendfile) (revision 270881) @@ -82,9 +82,9 @@ extern struct pr_usrreqs sctp_usrreqs; #define sctp_maxspace(sb) (max((sb)->sb_hiwat,SCTP_MINIMAL_RWND)) -#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_cc) ? (sctp_maxspace(sb) - (asoc)->sb_cc) : 0)) +#define sctp_sbspace(asoc, sb) ((long) ((sctp_maxspace(sb) > (asoc)->sb_ccc) ? (sctp_maxspace(sb) - (asoc)->sb_ccc) : 0)) -#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_cc) ? (sctp_maxspace(sb) - (sb)->sb_cc) : 0)) +#define sctp_sbspace_failedmsgs(sb) ((long) ((sctp_maxspace(sb) > (sb)->sb_ccc) ? (sctp_maxspace(sb) - (sb)->sb_ccc) : 0)) #define sctp_sbspace_sub(a,b) ((a > b) ? (a - b) : 0) @@ -195,10 +195,10 @@ extern struct pr_usrreqs sctp_usrreqs; } #define sctp_sbfree(ctl, stcb, sb, m) { \ - SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_cc, SCTP_BUF_LEN((m))); \ + SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_ccc, SCTP_BUF_LEN((m))); \ SCTP_SAVE_ATOMIC_DECREMENT(&(sb)->sb_mbcnt, MSIZE); \ if (((ctl)->do_not_ref_stcb == 0) && stcb) {\ - SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_cc, SCTP_BUF_LEN((m))); \ + SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.sb_ccc, SCTP_BUF_LEN((m))); \ SCTP_SAVE_ATOMIC_DECREMENT(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ } \ if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ @@ -207,10 +207,10 @@ extern struct pr_usrreqs sctp_usrreqs; } #define sctp_sballoc(stcb, sb, m) { \ - atomic_add_int(&(sb)->sb_cc,SCTP_BUF_LEN((m))); \ + atomic_add_int(&(sb)->sb_ccc,SCTP_BUF_LEN((m))); \ atomic_add_int(&(sb)->sb_mbcnt, MSIZE); \ if (stcb) { \ - atomic_add_int(&(stcb)->asoc.sb_cc,SCTP_BUF_LEN((m))); \ + atomic_add_int(&(stcb)->asoc.sb_ccc,SCTP_BUF_LEN((m))); \ atomic_add_int(&(stcb)->asoc.my_rwnd_control_len, MSIZE); \ } \ if (SCTP_BUF_TYPE(m) != MT_DATA && SCTP_BUF_TYPE(m) != MT_HEADER && \ Index: sys/netinet/tcp_usrreq.c =================================================================== --- sys/netinet/tcp_usrreq.c (.../head) (revision 270879) +++ sys/netinet/tcp_usrreq.c (.../projects/sendfile) (revision 270881) @@ -826,7 +826,7 @@ tcp_usr_send(struct socket *so, int flags, struct m_freem(control); /* empty control, just free it */ } if (!(flags & PRUS_OOB)) { - sbappendstream(&so->so_snd, m); + sbappendstream(&so->so_snd, m, flags); if (nam && tp->t_state < TCPS_SYN_SENT) { /* * Do implied connect if not yet connected, @@ -858,7 +858,8 @@ tcp_usr_send(struct socket *so, int flags, struct socantsendmore(so); tcp_usrclosed(tp); } - if (!(inp->inp_flags & INP_DROPPED)) { + if (!(inp->inp_flags & INP_DROPPED) && + !(flags & PRUS_NOTREADY)) { if (flags & PRUS_MORETOCOME) tp->t_flags |= TF_MORETOCOME; error = tcp_output(tp); @@ -884,7 +885,7 @@ tcp_usr_send(struct socket *so, int flags, struct * of data past the urgent section. * Otherwise, snd_up should be one lower. */ - sbappendstream_locked(&so->so_snd, m); + sbappendstream_locked(&so->so_snd, m, flags); SOCKBUF_UNLOCK(&so->so_snd); if (nam && tp->t_state < TCPS_SYN_SENT) { /* @@ -908,10 +909,12 @@ tcp_usr_send(struct socket *so, int flags, struct tp->snd_wnd = TTCP_CLIENT_SND_WND; tcp_mss(tp, -1); } - tp->snd_up = tp->snd_una + so->so_snd.sb_cc; - tp->t_flags |= TF_FORCEDATA; - error = tcp_output(tp); - tp->t_flags &= ~TF_FORCEDATA; + tp->snd_up = tp->snd_una + sbavail(&so->so_snd); + if (!(flags & PRUS_NOTREADY)) { + tp->t_flags |= TF_FORCEDATA; + error = tcp_output(tp); + tp->t_flags &= ~TF_FORCEDATA; + } } out: TCPDEBUG2((flags & PRUS_OOB) ? PRU_SENDOOB : @@ -922,6 +925,38 @@ out: return (error); } +static int +tcp_usr_ready(struct socket *so, struct mbuf *m, int count) +{ + struct inpcb *inp; + struct tcpcb *tp; + int error; + + inp = sotoinpcb(so); + INP_WLOCK(inp); + if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { + INP_WUNLOCK(inp); + return (ECONNRESET); + } + tp = intotcpcb(inp); + + SOCKBUF_LOCK(&so->so_snd); + if (so->so_snd.sb_state & SBS_CANTSENDMORE) { + SOCKBUF_UNLOCK(&so->so_snd); + error = ENOTCONN; + } else if (sbready(&so->so_snd, m, count) == 0) { + SOCKBUF_UNLOCK(&so->so_snd); + error = tcp_output(tp); + } else { + SOCKBUF_UNLOCK(&so->so_snd); + error = EINPROGRESS; + } + + INP_WUNLOCK(inp); + + return (error); +} + /* * Abort the TCP. Drop the connection abruptly. */ @@ -1056,6 +1091,7 @@ struct pr_usrreqs tcp_usrreqs = { .pru_rcvd = tcp_usr_rcvd, .pru_rcvoob = tcp_usr_rcvoob, .pru_send = tcp_usr_send, + .pru_ready = tcp_usr_ready, .pru_shutdown = tcp_usr_shutdown, .pru_sockaddr = in_getsockaddr, .pru_sosetlabel = in_pcbsosetlabel, Index: sys/netinet/siftr.c =================================================================== --- sys/netinet/siftr.c (.../head) (revision 270879) +++ sys/netinet/siftr.c (.../projects/sendfile) (revision 270881) @@ -781,9 +781,9 @@ siftr_siftdata(struct pkt_node *pn, struct inpcb * pn->flags = tp->t_flags; pn->rxt_length = tp->t_rxtcur; pn->snd_buf_hiwater = inp->inp_socket->so_snd.sb_hiwat; - pn->snd_buf_cc = inp->inp_socket->so_snd.sb_cc; + pn->snd_buf_cc = sbused(&inp->inp_socket->so_snd); pn->rcv_buf_hiwater = inp->inp_socket->so_rcv.sb_hiwat; - pn->rcv_buf_cc = inp->inp_socket->so_rcv.sb_cc; + pn->rcv_buf_cc = sbused(&inp->inp_socket->so_rcv); pn->sent_inflight_bytes = tp->snd_max - tp->snd_una; pn->t_segqlen = tp->t_segqlen; Index: sys/netinet/sctp_os_bsd.h =================================================================== --- sys/netinet/sctp_os_bsd.h (.../head) (revision 270879) +++ sys/netinet/sctp_os_bsd.h (.../projects/sendfile) (revision 270881) @@ -405,7 +405,7 @@ typedef struct callout sctp_os_timer_t; #define SCTP_SOWAKEUP(so) wakeup(&(so)->so_timeo) /* clear the socket buffer state */ #define SCTP_SB_CLEAR(sb) \ - (sb).sb_cc = 0; \ + (sb).sb_ccc = 0; \ (sb).sb_mb = NULL; \ (sb).sb_mbcnt = 0; Index: sys/netinet/tcp_reass.c =================================================================== --- sys/netinet/tcp_reass.c (.../head) (revision 270879) +++ sys/netinet/tcp_reass.c (.../projects/sendfile) (revision 270881) @@ -248,7 +248,7 @@ present: m_freem(mq); else { mq->m_nextpkt = NULL; - sbappendstream_locked(&so->so_rcv, mq); + sbappendstream_locked(&so->so_rcv, mq, 0); wakeup = 1; } } Index: sys/netinet/sctp_indata.c =================================================================== --- sys/netinet/sctp_indata.c (.../head) (revision 270879) +++ sys/netinet/sctp_indata.c (.../projects/sendfile) (revision 270881) @@ -70,7 +70,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ /* * This is really set wrong with respect to a 1-2-m socket. Since - * the sb_cc is the count that everyone as put up. When we re-write + * the sb_ccc is the count that everyone as put up. When we re-write * sctp_soreceive then we will fix this so that ONLY this * associations data is taken into account. */ @@ -77,7 +77,7 @@ sctp_calc_rwnd(struct sctp_tcb *stcb, struct sctp_ if (stcb->sctp_socket == NULL) return (calc); - if (stcb->asoc.sb_cc == 0 && + if (stcb->asoc.sb_ccc == 0 && asoc->size_on_reasm_queue == 0 && asoc->size_on_all_streams == 0) { /* Full rwnd granted */ @@ -1363,7 +1363,7 @@ sctp_process_a_data_chunk(struct sctp_tcb *stcb, s * When we have NO room in the rwnd we check to make sure * the reader is doing its job... */ - if (stcb->sctp_socket->so_rcv.sb_cc) { + if (stcb->sctp_socket->so_rcv.sb_ccc) { /* some to read, wake-up */ #if defined(__APPLE__) || defined(SCTP_SO_LOCK_TESTING) struct socket *so; Index: sys/netinet/accf_http.c =================================================================== --- sys/netinet/accf_http.c (.../head) (revision 270879) +++ sys/netinet/accf_http.c (.../projects/sendfile) (revision 270881) @@ -92,7 +92,7 @@ sbfull(struct sockbuf *sb) "mbcnt(%ld) >= mbmax(%ld): %d", sb->sb_cc, sb->sb_hiwat, sb->sb_cc >= sb->sb_hiwat, sb->sb_mbcnt, sb->sb_mbmax, sb->sb_mbcnt >= sb->sb_mbmax); - return (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); + return (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax); } /* @@ -162,13 +162,14 @@ static int sohashttpget(struct socket *so, void *arg, int waitflag) { - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && !sbfull(&so->so_rcv)) { + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) == 0 && + !sbfull(&so->so_rcv)) { struct mbuf *m; char *cmp; int cmplen, cc; m = so->so_rcv.sb_mb; - cc = so->so_rcv.sb_cc - 1; + cc = sbavail(&so->so_rcv) - 1; if (cc < 1) return (SU_OK); switch (*mtod(m, char *)) { @@ -215,7 +216,7 @@ soparsehttpvers(struct socket *so, void *arg, int goto fallout; m = so->so_rcv.sb_mb; - cc = so->so_rcv.sb_cc; + cc = sbavail(&so->so_rcv); inspaces = spaces = 0; for (m = so->so_rcv.sb_mb; m; m = n) { n = m->m_nextpkt; @@ -304,7 +305,7 @@ soishttpconnected(struct socket *so, void *arg, in * have NCHRS left */ copied = 0; - ccleft = so->so_rcv.sb_cc; + ccleft = sbavail(&so->so_rcv); if (ccleft < NCHRS) goto readmore; a = b = c = '\0'; Index: sys/netinet/accf_dns.c =================================================================== --- sys/netinet/accf_dns.c (.../head) (revision 270879) +++ sys/netinet/accf_dns.c (.../projects/sendfile) (revision 270881) @@ -75,7 +75,7 @@ sohasdns(struct socket *so, void *arg, int waitfla struct sockbuf *sb = &so->so_rcv; /* If the socket is full, we're ready. */ - if (sb->sb_cc >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) + if (sbused(sb) >= sb->sb_hiwat || sb->sb_mbcnt >= sb->sb_mbmax) goto ready; /* Check to see if we have a request. */ @@ -115,7 +115,7 @@ skippacket(struct sockbuf *sb) { unsigned long packlen; struct packet q, *p = &q; - if (sb->sb_cc < 2) + if (sbavail(sb) < 2) return DNS_WAIT; q.m = sb->sb_mb; @@ -122,7 +122,7 @@ skippacket(struct sockbuf *sb) { q.n = q.m->m_nextpkt; q.moff = 0; q.offset = 0; - q.len = sb->sb_cc; + q.len = sbavail(sb); GET16(p, packlen); if (packlen + 2 > q.len) Index: sys/netinet/sctp_structs.h =================================================================== --- sys/netinet/sctp_structs.h (.../head) (revision 270879) +++ sys/netinet/sctp_structs.h (.../projects/sendfile) (revision 270881) @@ -990,7 +990,7 @@ struct sctp_association { uint32_t total_output_queue_size; - uint32_t sb_cc; /* shadow of sb_cc */ + uint32_t sb_ccc; /* shadow of sb_ccc */ uint32_t sb_send_resv; /* amount reserved on a send */ uint32_t my_rwnd_control_len; /* shadow of sb_mbcnt used for rwnd * control */ Index: sys/netinet/tcp_output.c =================================================================== --- sys/netinet/tcp_output.c (.../head) (revision 270879) +++ sys/netinet/tcp_output.c (.../projects/sendfile) (revision 270881) @@ -322,7 +322,7 @@ after_sack_rexmit: * to send then the probe will be the FIN * itself. */ - if (off < so->so_snd.sb_cc) + if (off < sbavail(&so->so_snd)) flags &= ~TH_FIN; sendwin = 1; } else { @@ -348,7 +348,8 @@ after_sack_rexmit: */ if (sack_rxmit == 0) { if (sack_bytes_rxmt == 0) - len = ((long)ulmin(so->so_snd.sb_cc, sendwin) - off); + len = ((long)ulmin(sbavail(&so->so_snd), sendwin) - + off); else { long cwin; @@ -357,8 +358,8 @@ after_sack_rexmit: * sending new data, having retransmitted all the * data possible in the scoreboard. */ - len = ((long)ulmin(so->so_snd.sb_cc, tp->snd_wnd) - - off); + len = ((long)ulmin(sbavail(&so->so_snd), tp->snd_wnd) - + off); /* * Don't remove this (len > 0) check ! * We explicitly check for len > 0 here (although it @@ -457,12 +458,15 @@ after_sack_rexmit: * TODO: Shrink send buffer during idle periods together * with congestion window. Requires another timer. Has to * wait for upcoming tcp timer rewrite. + * + * XXXGL: should there be used sbused() or sbavail()? */ if (V_tcp_do_autosndbuf && so->so_snd.sb_flags & SB_AUTOSIZE) { if ((tp->snd_wnd / 4 * 5) >= so->so_snd.sb_hiwat && - so->so_snd.sb_cc >= (so->so_snd.sb_hiwat / 8 * 7) && - so->so_snd.sb_cc < V_tcp_autosndbuf_max && - sendwin >= (so->so_snd.sb_cc - (tp->snd_nxt - tp->snd_una))) { + sbused(&so->so_snd) >= (so->so_snd.sb_hiwat / 8 * 7) && + sbused(&so->so_snd) < V_tcp_autosndbuf_max && + sendwin >= (sbused(&so->so_snd) - + (tp->snd_nxt - tp->snd_una))) { if (!sbreserve_locked(&so->so_snd, min(so->so_snd.sb_hiwat + V_tcp_autosndbuf_inc, V_tcp_autosndbuf_max), so, curthread)) @@ -499,10 +503,11 @@ after_sack_rexmit: tso = 1; if (sack_rxmit) { - if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc)) + if (SEQ_LT(p->rxmit + len, tp->snd_una + sbavail(&so->so_snd))) flags &= ~TH_FIN; } else { - if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + so->so_snd.sb_cc)) + if (SEQ_LT(tp->snd_nxt + len, tp->snd_una + + sbavail(&so->so_snd))) flags &= ~TH_FIN; } @@ -532,7 +537,7 @@ after_sack_rexmit: */ if (!(tp->t_flags & TF_MORETOCOME) && /* normal case */ (idle || (tp->t_flags & TF_NODELAY)) && - len + off >= so->so_snd.sb_cc && + len + off >= sbavail(&so->so_snd) && (tp->t_flags & TF_NOPUSH) == 0) { goto send; } @@ -660,7 +665,7 @@ dontupdate: * if window is nonzero, transmit what we can, * otherwise force out a byte. */ - if (so->so_snd.sb_cc && !tcp_timer_active(tp, TT_REXMT) && + if (sbavail(&so->so_snd) && !tcp_timer_active(tp, TT_REXMT) && !tcp_timer_active(tp, TT_PERSIST)) { tp->t_rxtshift = 0; tcp_setpersist(tp); @@ -786,7 +791,7 @@ send: * fractional unless the send sockbuf can * be emptied. */ - if (sendalot && off + len < so->so_snd.sb_cc) { + if (sendalot && off + len < sbavail(&so->so_snd)) { len -= len % (tp->t_maxopd - optlen); sendalot = 1; } @@ -889,7 +894,7 @@ send: * give data to the user when a buffer fills or * a PUSH comes in.) */ - if (off + len == so->so_snd.sb_cc) + if (off + len == sbavail(&so->so_snd)) flags |= TH_PUSH; SOCKBUF_UNLOCK(&so->so_snd); } else { Index: sys/netinet/sctputil.c =================================================================== --- sys/netinet/sctputil.c (.../head) (revision 270879) +++ sys/netinet/sctputil.c (.../projects/sendfile) (revision 270881) @@ -67,9 +67,9 @@ sctp_sblog(struct sockbuf *sb, struct sctp_tcb *st struct sctp_cwnd_log sctp_clog; sctp_clog.x.sb.stcb = stcb; - sctp_clog.x.sb.so_sbcc = sb->sb_cc; + sctp_clog.x.sb.so_sbcc = sb->sb_ccc; if (stcb) - sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_cc; + sctp_clog.x.sb.stcb_sbcc = stcb->asoc.sb_ccc; else sctp_clog.x.sb.stcb_sbcc = 0; sctp_clog.x.sb.incr = incr; @@ -4363,7 +4363,7 @@ sctp_add_to_readq(struct sctp_inpcb *inp, { /* * Here we must place the control on the end of the socket read - * queue AND increment sb_cc so that select will work properly on + * queue AND increment sb_ccc so that select will work properly on * read. */ struct mbuf *m, *prev = NULL; @@ -4489,7 +4489,7 @@ sctp_append_to_readq(struct sctp_inpcb *inp, * the reassembly queue. * * If PDAPI this means we need to add m to the end of the data. - * Increase the length in the control AND increment the sb_cc. + * Increase the length in the control AND increment the sb_ccc. * Otherwise sb is NULL and all we need to do is put it at the end * of the mbuf chain. */ @@ -4701,10 +4701,10 @@ sctp_free_bufspace(struct sctp_tcb *stcb, struct s if (stcb->sctp_socket && (((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) || ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE)))) { - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { - stcb->sctp_socket->so_snd.sb_cc -= tp1->book_size; + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { + stcb->sctp_socket->so_snd.sb_ccc -= tp1->book_size; } else { - stcb->sctp_socket->so_snd.sb_cc = 0; + stcb->sctp_socket->so_snd.sb_ccc = 0; } } @@ -5254,11 +5254,11 @@ sctp_sorecvmsg(struct socket *so, in_eeor_mode = sctp_is_feature_on(inp, SCTP_PCB_FLAGS_EXPLICIT_EOR); if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { sctp_misc_ints(SCTP_SORECV_ENTER, - rwnd_req, in_eeor_mode, so->so_rcv.sb_cc, uio->uio_resid); + rwnd_req, in_eeor_mode, so->so_rcv.sb_ccc, uio->uio_resid); } if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_RECV_RWND_LOGGING_ENABLE) { sctp_misc_ints(SCTP_SORECV_ENTERPL, - rwnd_req, block_allowed, so->so_rcv.sb_cc, uio->uio_resid); + rwnd_req, block_allowed, so->so_rcv.sb_ccc, uio->uio_resid); } error = sblock(&so->so_rcv, (block_allowed ? SBL_WAIT : 0)); if (error) { @@ -5277,7 +5277,7 @@ restart_nosblocks: (inp->sctp_flags & SCTP_PCB_FLAGS_SOCKET_ALLGONE)) { goto out; } - if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_cc == 0)) { + if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && (so->so_rcv.sb_ccc == 0)) { if (so->so_error) { error = so->so_error; if ((in_flags & MSG_PEEK) == 0) @@ -5284,7 +5284,7 @@ restart_nosblocks: so->so_error = 0; goto out; } else { - if (so->so_rcv.sb_cc == 0) { + if (so->so_rcv.sb_ccc == 0) { /* indicate EOF */ error = 0; goto out; @@ -5291,9 +5291,9 @@ restart_nosblocks: } } } - if ((so->so_rcv.sb_cc <= held_length) && block_allowed) { + if ((so->so_rcv.sb_ccc <= held_length) && block_allowed) { /* we need to wait for data */ - if ((so->so_rcv.sb_cc == 0) && + if ((so->so_rcv.sb_ccc == 0) && ((inp->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || (inp->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { if ((inp->sctp_flags & SCTP_PCB_FLAGS_CONNECTED) == 0) { @@ -5329,7 +5329,7 @@ restart_nosblocks: } held_length = 0; goto restart_nosblocks; - } else if (so->so_rcv.sb_cc == 0) { + } else if (so->so_rcv.sb_ccc == 0) { if (so->so_error) { error = so->so_error; if ((in_flags & MSG_PEEK) == 0) @@ -5386,11 +5386,11 @@ restart_nosblocks: SCTP_INP_READ_LOCK(inp); } control = TAILQ_FIRST(&inp->read_queue); - if ((control == NULL) && (so->so_rcv.sb_cc != 0)) { + if ((control == NULL) && (so->so_rcv.sb_ccc != 0)) { #ifdef INVARIANTS panic("Huh, its non zero and nothing on control?"); #endif - so->so_rcv.sb_cc = 0; + so->so_rcv.sb_ccc = 0; } SCTP_INP_READ_UNLOCK(inp); hold_rlock = 0; @@ -5511,11 +5511,11 @@ restart_nosblocks: } /* * if we reach here, not suitable replacement is available - * fragment interleave is NOT on. So stuff the sb_cc + * fragment interleave is NOT on. So stuff the sb_ccc * into the our held count, and its time to sleep again. */ - held_length = so->so_rcv.sb_cc; - control->held_length = so->so_rcv.sb_cc; + held_length = so->so_rcv.sb_ccc; + control->held_length = so->so_rcv.sb_ccc; goto restart; } /* Clear the held length since there is something to read */ @@ -5812,10 +5812,10 @@ get_more_data: if (SCTP_BASE_SYSCTL(sctp_logging_level) & SCTP_SB_LOGGING_ENABLE) { sctp_sblog(&so->so_rcv, control->do_not_ref_stcb ? NULL : stcb, SCTP_LOG_SBFREE, cp_len); } - atomic_subtract_int(&so->so_rcv.sb_cc, cp_len); + atomic_subtract_int(&so->so_rcv.sb_ccc, cp_len); if ((control->do_not_ref_stcb == 0) && stcb) { - atomic_subtract_int(&stcb->asoc.sb_cc, cp_len); + atomic_subtract_int(&stcb->asoc.sb_ccc, cp_len); } copied_so_far += cp_len; freed_so_far += cp_len; @@ -5960,7 +5960,7 @@ wait_some_more: (sctp_is_feature_on(inp, SCTP_PCB_FLAGS_FRAG_INTERLEAVE))) { goto release; } - if (so->so_rcv.sb_cc <= control->held_length) { + if (so->so_rcv.sb_ccc <= control->held_length) { error = sbwait(&so->so_rcv); if (error) { goto release; @@ -5987,8 +5987,8 @@ wait_some_more: } goto done_with_control; } - if (so->so_rcv.sb_cc > held_length) { - control->held_length = so->so_rcv.sb_cc; + if (so->so_rcv.sb_ccc > held_length) { + control->held_length = so->so_rcv.sb_ccc; held_length = 0; } goto wait_some_more; @@ -6135,13 +6135,13 @@ out: freed_so_far, ((uio) ? (slen - uio->uio_resid) : slen), stcb->asoc.my_rwnd, - so->so_rcv.sb_cc); + so->so_rcv.sb_ccc); } else { sctp_misc_ints(SCTP_SORECV_DONE, freed_so_far, ((uio) ? (slen - uio->uio_resid) : slen), 0, - so->so_rcv.sb_cc); + so->so_rcv.sb_ccc); } } stage_left: Index: sys/netinet/sctp_usrreq.c =================================================================== --- sys/netinet/sctp_usrreq.c (.../head) (revision 270879) +++ sys/netinet/sctp_usrreq.c (.../projects/sendfile) (revision 270881) @@ -586,7 +586,7 @@ sctp_must_try_again: if (((flags & SCTP_PCB_FLAGS_SOCKET_GONE) == 0) && (atomic_cmpset_int(&inp->sctp_flags, flags, (flags | SCTP_PCB_FLAGS_SOCKET_GONE | SCTP_PCB_FLAGS_CLOSE_IP)))) { if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) || - (so->so_rcv.sb_cc > 0)) { + (so->so_rcv.sb_ccc > 0)) { #ifdef SCTP_LOG_CLOSING sctp_log_closing(inp, NULL, 13); #endif @@ -751,7 +751,7 @@ sctp_disconnect(struct socket *so) } if (((so->so_options & SO_LINGER) && (so->so_linger == 0)) || - (so->so_rcv.sb_cc > 0)) { + (so->so_rcv.sb_ccc > 0)) { if (SCTP_GET_STATE(asoc) != SCTP_STATE_COOKIE_WAIT) { /* Left with Data unread */ @@ -916,7 +916,7 @@ sctp_flush(struct socket *so, int how) inp->sctp_flags |= SCTP_PCB_FLAGS_SOCKET_CANT_READ; SCTP_INP_READ_UNLOCK(inp); SCTP_INP_WUNLOCK(inp); - so->so_rcv.sb_cc = 0; + so->so_rcv.sb_ccc = 0; so->so_rcv.sb_mbcnt = 0; so->so_rcv.sb_mb = NULL; } @@ -925,7 +925,7 @@ sctp_flush(struct socket *so, int how) * First make sure the sb will be happy, we don't use these * except maybe the count */ - so->so_snd.sb_cc = 0; + so->so_snd.sb_ccc = 0; so->so_snd.sb_mbcnt = 0; so->so_snd.sb_mb = NULL; Index: sys/netinet/sctputil.h =================================================================== --- sys/netinet/sctputil.h (.../head) (revision 270879) +++ sys/netinet/sctputil.h (.../projects/sendfile) (revision 270881) @@ -286,10 +286,10 @@ do { \ } \ if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ - if (stcb->sctp_socket->so_snd.sb_cc >= tp1->book_size) { \ - atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_cc), tp1->book_size); \ + if (stcb->sctp_socket->so_snd.sb_ccc >= tp1->book_size) { \ + atomic_subtract_int(&((stcb)->sctp_socket->so_snd.sb_ccc), tp1->book_size); \ } else { \ - stcb->sctp_socket->so_snd.sb_cc = 0; \ + stcb->sctp_socket->so_snd.sb_ccc = 0; \ } \ } \ } \ @@ -307,10 +307,10 @@ do { \ } \ if (stcb->sctp_socket && ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ - if (stcb->sctp_socket->so_snd.sb_cc >= sp->length) { \ - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc,sp->length); \ + if (stcb->sctp_socket->so_snd.sb_ccc >= sp->length) { \ + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc,sp->length); \ } else { \ - stcb->sctp_socket->so_snd.sb_cc = 0; \ + stcb->sctp_socket->so_snd.sb_ccc = 0; \ } \ } \ } \ @@ -322,7 +322,7 @@ do { \ if ((stcb->sctp_socket != NULL) && \ ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || \ (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { \ - atomic_add_int(&stcb->sctp_socket->so_snd.sb_cc,sz); \ + atomic_add_int(&stcb->sctp_socket->so_snd.sb_ccc,sz); \ } \ } while (0) Index: sys/netinet/sctp_input.c =================================================================== --- sys/netinet/sctp_input.c (.../head) (revision 270879) +++ sys/netinet/sctp_input.c (.../projects/sendfile) (revision 270881) @@ -1044,7 +1044,7 @@ sctp_handle_shutdown_ack(struct sctp_shutdown_ack_ if (stcb->sctp_socket) { if ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL)) { - stcb->sctp_socket->so_snd.sb_cc = 0; + stcb->sctp_socket->so_snd.sb_ccc = 0; } sctp_ulp_notify(SCTP_NOTIFY_ASSOC_DOWN, stcb, 0, NULL, SCTP_SO_NOT_LOCKED); } Index: sys/netinet/sctp_output.c =================================================================== --- sys/netinet/sctp_output.c (.../head) (revision 270879) +++ sys/netinet/sctp_output.c (.../projects/sendfile) (revision 270881) @@ -7257,7 +7257,7 @@ one_more_time: if ((stcb->sctp_socket != NULL) && \ ((stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_TCPTYPE) || (stcb->sctp_ep->sctp_flags & SCTP_PCB_FLAGS_IN_TCPPOOL))) { - atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_cc, sp->length); + atomic_subtract_int(&stcb->sctp_socket->so_snd.sb_ccc, sp->length); } if (sp->data) { sctp_m_freem(sp->data); @@ -11537,7 +11537,7 @@ jump_out: drp->current_onq = htonl(asoc->size_on_reasm_queue + asoc->size_on_all_streams + asoc->my_rwnd_control_len + - stcb->sctp_socket->so_rcv.sb_cc); + stcb->sctp_socket->so_rcv.sb_ccc); } else { /*- * If my rwnd is 0, possibly from mbuf depletion as well as Index: sys/netinet/sctp_pcb.c =================================================================== --- sys/netinet/sctp_pcb.c (.../head) (revision 270879) +++ sys/netinet/sctp_pcb.c (.../projects/sendfile) (revision 270881) @@ -3407,7 +3407,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi if ((asoc->asoc.size_on_reasm_queue > 0) || (asoc->asoc.control_pdapi) || (asoc->asoc.size_on_all_streams > 0) || - (so && (so->so_rcv.sb_cc > 0))) { + (so && (so->so_rcv.sb_ccc > 0))) { /* Left with Data unread */ struct mbuf *op_err; @@ -3635,7 +3635,7 @@ sctp_inpcb_free(struct sctp_inpcb *inp, int immedi TAILQ_REMOVE(&inp->read_queue, sq, next); sctp_free_remote_addr(sq->whoFrom); if (so) - so->so_rcv.sb_cc -= sq->length; + so->so_rcv.sb_ccc -= sq->length; if (sq->data) { sctp_m_freem(sq->data); sq->data = NULL; @@ -4863,7 +4863,7 @@ sctp_free_assoc(struct sctp_inpcb *inp, struct sct inp->sctp_flags |= SCTP_PCB_FLAGS_WAS_CONNECTED; if (so) { SOCK_LOCK(so); - if (so->so_rcv.sb_cc == 0) { + if (so->so_rcv.sb_ccc == 0) { so->so_state &= ~(SS_ISCONNECTING | SS_ISDISCONNECTING | SS_ISCONFIRMING | Index: sys/netinet/sctp_pcb.h =================================================================== --- sys/netinet/sctp_pcb.h (.../head) (revision 270879) +++ sys/netinet/sctp_pcb.h (.../projects/sendfile) (revision 270881) @@ -369,7 +369,7 @@ struct sctp_inpcb { } ip_inp; - /* Socket buffer lock protects read_queue and of course sb_cc */ + /* Socket buffer lock protects read_queue and of course sb_ccc */ struct sctp_readhead read_queue; LIST_ENTRY(sctp_inpcb) sctp_list; /* lists all endpoints */ Index: sys/netinet/tcp_input.c =================================================================== --- sys/netinet/tcp_input.c (.../head) (revision 270879) +++ sys/netinet/tcp_input.c (.../projects/sendfile) (revision 270881) @@ -1734,7 +1734,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur); sowwakeup(so); - if (so->so_snd.sb_cc) + if (sbavail(&so->so_snd)) (void) tcp_output(tp); goto check_delack; } @@ -1844,7 +1844,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, newsize, so, NULL)) so->so_rcv.sb_flags &= ~SB_AUTOSIZE; m_adj(m, drop_hdrlen); /* delayed header drop */ - sbappendstream_locked(&so->so_rcv, m); + sbappendstream_locked(&so->so_rcv, m, 0); } /* NB: sorwakeup_locked() does an implicit unlock. */ sorwakeup_locked(so); @@ -2548,7 +2548,7 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, * Otherwise we would send pure ACKs. */ SOCKBUF_LOCK(&so->so_snd); - avail = so->so_snd.sb_cc - + avail = sbavail(&so->so_snd) - (tp->snd_nxt - tp->snd_una); SOCKBUF_UNLOCK(&so->so_snd); if (avail > 0) @@ -2683,10 +2683,10 @@ process_ACK: cc_ack_received(tp, th, CC_ACK); SOCKBUF_LOCK(&so->so_snd); - if (acked > so->so_snd.sb_cc) { - tp->snd_wnd -= so->so_snd.sb_cc; + if (acked > sbavail(&so->so_snd)) { + tp->snd_wnd -= sbavail(&so->so_snd); mfree = sbcut_locked(&so->so_snd, - (int)so->so_snd.sb_cc); + (int)sbavail(&so->so_snd)); ourfinisacked = 1; } else { mfree = sbcut_locked(&so->so_snd, acked); @@ -2812,7 +2812,7 @@ step6: * actually wanting to send this much urgent data. */ SOCKBUF_LOCK(&so->so_rcv); - if (th->th_urp + so->so_rcv.sb_cc > sb_max) { + if (th->th_urp + sbavail(&so->so_rcv) > sb_max) { th->th_urp = 0; /* XXX */ thflags &= ~TH_URG; /* XXX */ SOCKBUF_UNLOCK(&so->so_rcv); /* XXX */ @@ -2834,7 +2834,7 @@ step6: */ if (SEQ_GT(th->th_seq+th->th_urp, tp->rcv_up)) { tp->rcv_up = th->th_seq + th->th_urp; - so->so_oobmark = so->so_rcv.sb_cc + + so->so_oobmark = sbavail(&so->so_rcv) + (tp->rcv_up - tp->rcv_nxt) - 1; if (so->so_oobmark == 0) so->so_rcv.sb_state |= SBS_RCVATMARK; @@ -2904,7 +2904,7 @@ dodata: /* XXX */ if (so->so_rcv.sb_state & SBS_CANTRCVMORE) m_freem(m); else - sbappendstream_locked(&so->so_rcv, m); + sbappendstream_locked(&so->so_rcv, m, 0); /* NB: sorwakeup_locked() does an implicit unlock. */ sorwakeup_locked(so); } else { Index: sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c =================================================================== --- sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../head) (revision 270879) +++ sys/netgraph/bluetooth/socket/ng_btsocket_l2cap.c (.../projects/sendfile) (revision 270881) @@ -1127,9 +1127,8 @@ ng_btsocket_l2cap_process_l2ca_write_rsp(struct ng /* * Check if we have more data to send */ - sbdroprecord(&pcb->so->so_snd); - if (pcb->so->so_snd.sb_cc > 0) { + if (sbavail(&pcb->so->so_snd) > 0) { if (ng_btsocket_l2cap_send2(pcb) == 0) ng_btsocket_l2cap_timeout(pcb); else @@ -2514,7 +2513,7 @@ ng_btsocket_l2cap_send2(ng_btsocket_l2cap_pcb_p pc mtx_assert(&pcb->pcb_mtx, MA_OWNED); - if (pcb->so->so_snd.sb_cc == 0) + if (sbavail(&pcb->so->so_snd) == 0) return (EINVAL); /* XXX */ m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); Index: sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c =================================================================== --- sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../head) (revision 270879) +++ sys/netgraph/bluetooth/socket/ng_btsocket_rfcomm.c (.../projects/sendfile) (revision 270881) @@ -3279,7 +3279,7 @@ ng_btsocket_rfcomm_pcb_send(ng_btsocket_rfcomm_pcb } for (error = 0, sent = 0; sent < limit; sent ++) { - length = min(pcb->mtu, pcb->so->so_snd.sb_cc); + length = min(pcb->mtu, sbavail(&pcb->so->so_snd)); if (length == 0) break; Index: sys/netgraph/bluetooth/socket/ng_btsocket_sco.c =================================================================== --- sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../head) (revision 270879) +++ sys/netgraph/bluetooth/socket/ng_btsocket_sco.c (.../projects/sendfile) (revision 270881) @@ -906,7 +906,7 @@ ng_btsocket_sco_default_msg_input(struct ng_mesg * sbdroprecord(&pcb->so->so_snd); /* Send more if we have any */ - if (pcb->so->so_snd.sb_cc > 0) + if (sbavail(&pcb->so->so_snd) > 0) if (ng_btsocket_sco_send2(pcb) == 0) ng_btsocket_sco_timeout(pcb); @@ -1748,7 +1748,7 @@ ng_btsocket_sco_send2(ng_btsocket_sco_pcb_p pcb) mtx_assert(&pcb->pcb_mtx, MA_OWNED); while (pcb->rt->pending < pcb->rt->num_pkts && - pcb->so->so_snd.sb_cc > 0) { + sbavail(&pcb->so->so_snd) > 0) { /* Get a copy of the first packet on send queue */ m = m_dup(pcb->so->so_snd.sb_mb, M_NOWAIT); if (m == NULL) { Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c =================================================================== --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../head) (revision 270879) +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_rx.c (.../projects/sendfile) (revision 270881) @@ -183,7 +183,7 @@ sdp_post_recvs_needed(struct sdp_sock *ssk) * Compute bytes in the receive queue and socket buffer. */ bytes_in_process = (posted - SDP_MIN_TX_CREDITS) * buffer_size; - bytes_in_process += ssk->socket->so_rcv.sb_cc; + bytes_in_process += sbused(&ssk->socket->so_rcv); return bytes_in_process < max_bytes; } Index: sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c =================================================================== --- sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../head) (revision 270879) +++ sys/ofed/drivers/infiniband/ulp/sdp/sdp_main.c (.../projects/sendfile) (revision 270881) @@ -747,7 +747,7 @@ sdp_start_disconnect(struct sdp_sock *ssk) ("sdp_start_disconnect: sdp_drop() returned NULL")); } else { soisdisconnecting(so); - unread = so->so_rcv.sb_cc; + unread = sbused(&so->so_rcv); sbflush(&so->so_rcv); sdp_usrclosed(ssk); if (!(ssk->flags & SDP_DROPPED)) { @@ -889,7 +889,7 @@ sdp_append(struct sdp_sock *ssk, struct sockbuf *s m_adj(mb, SDP_HEAD_SIZE); n->m_pkthdr.len += mb->m_pkthdr.len; n->m_flags |= mb->m_flags & (M_PUSH | M_URG); - m_demote(mb, 1); + m_demote(mb, 1, 0); sbcompress(sb, mb, sb->sb_mbtail); return; } @@ -1259,7 +1259,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps /* We will never ever get anything unless we are connected. */ if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) { /* When disconnecting there may be still some data left. */ - if (sb->sb_cc > 0) + if (sbavail(sb)) goto deliver; if (!(so->so_state & SS_ISDISCONNECTED)) error = ENOTCONN; @@ -1267,7 +1267,7 @@ sdp_sorecv(struct socket *so, struct sockaddr **ps } /* Socket buffer is empty and we shall not block. */ - if (sb->sb_cc == 0 && + if (sbavail(sb) == 0 && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { error = EAGAIN; goto out; @@ -1278,7 +1278,7 @@ restart: /* Abort if socket has reported problems. */ if (so->so_error) { - if (sb->sb_cc > 0) + if (sbavail(sb)) goto deliver; if (oresid > uio->uio_resid) goto out; @@ -1290,7 +1290,7 @@ restart: /* Door is closed. Deliver what is left, if any. */ if (sb->sb_state & SBS_CANTRCVMORE) { - if (sb->sb_cc > 0) + if (sbavail(sb)) goto deliver; else goto out; @@ -1297,18 +1297,18 @@ restart: } /* Socket buffer got some data that we shall deliver now. */ - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && + if (sbavail(sb) && !(flags & MSG_WAITALL) && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)) || - sb->sb_cc >= sb->sb_lowat || - sb->sb_cc >= uio->uio_resid || - sb->sb_cc >= sb->sb_hiwat) ) { + sbavail(sb) >= sb->sb_lowat || + sbavail(sb) >= uio->uio_resid || + sbavail(sb) >= sb->sb_hiwat) ) { goto deliver; } /* On MSG_WAITALL we must wait until all data or error arrives. */ if ((flags & MSG_WAITALL) && - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) + (sbavail(sb) >= uio->uio_resid || sbavail(sb) >= sb->sb_lowat)) goto deliver; /* @@ -1322,7 +1322,7 @@ restart: deliver: SOCKBUF_LOCK_ASSERT(&so->so_rcv); - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); + KASSERT(sbavail(sb), ("%s: sockbuf empty", __func__)); KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); /* Statistics. */ @@ -1330,7 +1330,7 @@ deliver: uio->uio_td->td_ru.ru_msgrcv++; /* Fill uio until full or current end of socket buffer is reached. */ - len = min(uio->uio_resid, sb->sb_cc); + len = min(uio->uio_resid, sbavail(sb)); if (mp0 != NULL) { /* Dequeue as many mbufs as possible. */ if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { @@ -1510,7 +1510,7 @@ sdp_urg(struct sdp_sock *ssk, struct mbuf *mb) if (so == NULL) return; - so->so_oobmark = so->so_rcv.sb_cc + mb->m_pkthdr.len - 1; + so->so_oobmark = sbused(&so->so_rcv) + mb->m_pkthdr.len - 1; sohasoutofband(so); ssk->oobflags &= ~(SDP_HAVEOOB | SDP_HADOOB); if (!(so->so_options & SO_OOBINLINE)) { Index: sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c =================================================================== --- sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../head) (revision 270879) +++ sys/dev/cxgb/ulp/tom/cxgb_cpl_io.c (.../projects/sendfile) (revision 270881) @@ -445,8 +445,8 @@ t3_push_frames(struct socket *so, int req_completi * Autosize the send buffer. */ if (snd->sb_flags & SB_AUTOSIZE && VNET(tcp_do_autosndbuf)) { - if (snd->sb_cc >= (snd->sb_hiwat / 8 * 7) && - snd->sb_cc < VNET(tcp_autosndbuf_max)) { + if (sbused(snd) >= (snd->sb_hiwat / 8 * 7) && + sbused(snd) < VNET(tcp_autosndbuf_max)) { if (!sbreserve_locked(snd, min(snd->sb_hiwat + VNET(tcp_autosndbuf_inc), VNET(tcp_autosndbuf_max)), so, curthread)) @@ -597,10 +597,10 @@ t3_rcvd(struct toedev *tod, struct tcpcb *tp) INP_WLOCK_ASSERT(inp); SOCKBUF_LOCK(so_rcv); - KASSERT(toep->tp_enqueued >= so_rcv->sb_cc, - ("%s: so_rcv->sb_cc > enqueued", __func__)); - toep->tp_rx_credits += toep->tp_enqueued - so_rcv->sb_cc; - toep->tp_enqueued = so_rcv->sb_cc; + KASSERT(toep->tp_enqueued >= sbused(so_rcv), + ("%s: sbused(so_rcv) > enqueued", __func__)); + toep->tp_rx_credits += toep->tp_enqueued - sbused(so_rcv); + toep->tp_enqueued = sbused(so_rcv); SOCKBUF_UNLOCK(so_rcv); must_send = toep->tp_rx_credits + 16384 >= tp->rcv_wnd; @@ -1199,7 +1199,7 @@ do_rx_data(struct sge_qset *qs, struct rsp_desc *r } toep->tp_enqueued += m->m_pkthdr.len; - sbappendstream_locked(so_rcv, m); + sbappendstream_locked(so_rcv, m, 0); sorwakeup_locked(so); SOCKBUF_UNLOCK_ASSERT(so_rcv); @@ -1768,7 +1768,7 @@ wr_ack(struct toepcb *toep, struct mbuf *m) so_sowwakeup_locked(so); } - if (snd->sb_sndptroff < snd->sb_cc) + if (snd->sb_sndptroff < sbused(snd)) t3_push_frames(so, 0); out_free: Index: sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c =================================================================== --- sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../head) (revision 270879) +++ sys/dev/cxgb/ulp/iw_cxgb/iw_cxgb_cm.c (.../projects/sendfile) (revision 270881) @@ -1507,11 +1507,11 @@ process_data(struct iwch_ep *ep) process_mpa_request(ep); break; default: - if (ep->com.so->so_rcv.sb_cc) + if (sbavail(&ep->com.so->so_rcv)) printf("%s Unexpected streaming data." " ep %p state %d so %p so_state %x so_rcv.sb_cc %u so_rcv.sb_mb %p\n", __FUNCTION__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state, - ep->com.so->so_rcv.sb_cc, ep->com.so->so_rcv.sb_mb); + sbavail(&ep->com.so->so_rcv), ep->com.so->so_rcv.sb_mb); break; } return; Index: sys/dev/cxgbe/tom/t4_cpl_io.c =================================================================== --- sys/dev/cxgbe/tom/t4_cpl_io.c (.../head) (revision 270879) +++ sys/dev/cxgbe/tom/t4_cpl_io.c (.../projects/sendfile) (revision 270881) @@ -365,15 +365,15 @@ t4_rcvd(struct toedev *tod, struct tcpcb *tp) INP_WLOCK_ASSERT(inp); SOCKBUF_LOCK(sb); - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); + __func__, sb, sbused(sb), toep->sb_cc)); if (toep->ulp_mode == ULP_MODE_ISCSI) { toep->rx_credits += toep->sb_cc; toep->sb_cc = 0; } else { - toep->rx_credits += toep->sb_cc - sb->sb_cc; - toep->sb_cc = sb->sb_cc; + toep->rx_credits += toep->sb_cc - sbused(sb); + toep->sb_cc = sbused(sb); } credits = toep->rx_credits; SOCKBUF_UNLOCK(sb); @@ -1079,15 +1079,15 @@ do_peer_close(struct sge_iq *iq, const struct rss_ tp->rcv_nxt = be32toh(cpl->rcv_nxt); toep->ddp_flags &= ~(DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE); - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); #ifdef USE_DDP_RX_FLOW_CONTROL toep->rx_credits -= m->m_len; /* adjust for F_RX_FC_DDP */ #endif - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); } socantrcvmore_locked(so); /* unlocks the sockbuf */ @@ -1582,12 +1582,12 @@ do_rx_data(struct sge_iq *iq, const struct rss_hea } } - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); sorwakeup_locked(so); SOCKBUF_UNLOCK_ASSERT(sb); Index: sys/dev/cxgbe/tom/t4_ddp.c =================================================================== --- sys/dev/cxgbe/tom/t4_ddp.c (.../head) (revision 270879) +++ sys/dev/cxgbe/tom/t4_ddp.c (.../projects/sendfile) (revision 270881) @@ -224,15 +224,15 @@ insert_ddp_data(struct toepcb *toep, uint32_t n) tp->rcv_wnd -= n; #endif - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); #ifdef USE_DDP_RX_FLOW_CONTROL toep->rx_credits -= n; /* adjust for F_RX_FC_DDP */ #endif - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); } /* SET_TCB_FIELD sent as a ULP command looks like this */ @@ -459,15 +459,15 @@ handle_ddp_data(struct toepcb *toep, __be32 ddp_re else discourage_ddp(toep); - KASSERT(toep->sb_cc >= sb->sb_cc, + KASSERT(toep->sb_cc >= sbused(sb), ("%s: sb %p has more data (%d) than last time (%d).", - __func__, sb, sb->sb_cc, toep->sb_cc)); - toep->rx_credits += toep->sb_cc - sb->sb_cc; + __func__, sb, sbused(sb), toep->sb_cc)); + toep->rx_credits += toep->sb_cc - sbused(sb); #ifdef USE_DDP_RX_FLOW_CONTROL toep->rx_credits -= len; /* adjust for F_RX_FC_DDP */ #endif - sbappendstream_locked(sb, m); - toep->sb_cc = sb->sb_cc; + sbappendstream_locked(sb, m, 0); + toep->sb_cc = sbused(sb); wakeup: KASSERT(toep->ddp_flags & db_flag, ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x, report 0x%x", @@ -908,7 +908,7 @@ handle_ddp(struct socket *so, struct uio *uio, int #endif /* XXX: too eager to disable DDP, could handle NBIO better than this. */ - if (sb->sb_cc >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || + if (sbused(sb) >= uio->uio_resid || uio->uio_resid < sc->tt.ddp_thres || uio->uio_resid > MAX_DDP_BUFFER_SIZE || uio->uio_iovcnt > 1 || so->so_state & SS_NBIO || flags & (MSG_DONTWAIT | MSG_NBIO) || error || so->so_error || sb->sb_state & SBS_CANTRCVMORE) @@ -946,7 +946,7 @@ handle_ddp(struct socket *so, struct uio *uio, int * payload. */ ddp_flags = select_ddp_flags(so, flags, db_idx); - wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sb->sb_cc, ddp_flags); + wr = mk_update_tcb_for_ddp(sc, toep, db_idx, sbused(sb), ddp_flags); if (wr == NULL) { /* * Just unhold the pages. The DDP buffer's software state is @@ -971,8 +971,9 @@ handle_ddp(struct socket *so, struct uio *uio, int */ rc = sbwait(sb); while (toep->ddp_flags & buf_flag) { + /* XXXGL: shouldn't here be sbwait() call? */ sb->sb_flags |= SB_WAIT; - msleep(&sb->sb_cc, &sb->sb_mtx, PSOCK , "sbwait", 0); + msleep(&sb->sb_acc, &sb->sb_mtx, PSOCK , "sbwait", 0); } unwire_ddp_buffer(db); return (rc); @@ -1134,8 +1135,8 @@ restart: /* uio should be just as it was at entry */ KASSERT(oresid == uio->uio_resid, - ("%s: oresid = %d, uio_resid = %zd, sb_cc = %d", - __func__, oresid, uio->uio_resid, sb->sb_cc)); + ("%s: oresid = %d, uio_resid = %zd, sbused = %d", + __func__, oresid, uio->uio_resid, sbused(sb))); error = handle_ddp(so, uio, flags, 0); ddp_handled = 1; @@ -1145,7 +1146,7 @@ restart: /* Abort if socket has reported problems. */ if (so->so_error) { - if (sb->sb_cc > 0) + if (sbused(sb)) goto deliver; if (oresid > uio->uio_resid) goto out; @@ -1157,7 +1158,7 @@ restart: /* Door is closed. Deliver what is left, if any. */ if (sb->sb_state & SBS_CANTRCVMORE) { - if (sb->sb_cc > 0) + if (sbused(sb)) goto deliver; else goto out; @@ -1164,7 +1165,7 @@ restart: } /* Socket buffer is empty and we shall not block. */ - if (sb->sb_cc == 0 && + if (sbused(sb) == 0 && ((so->so_state & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)))) { error = EAGAIN; goto out; @@ -1171,18 +1172,18 @@ restart: } /* Socket buffer got some data that we shall deliver now. */ - if (sb->sb_cc > 0 && !(flags & MSG_WAITALL) && + if (sbused(sb) && !(flags & MSG_WAITALL) && ((sb->sb_flags & SS_NBIO) || (flags & (MSG_DONTWAIT|MSG_NBIO)) || - sb->sb_cc >= sb->sb_lowat || - sb->sb_cc >= uio->uio_resid || - sb->sb_cc >= sb->sb_hiwat) ) { + sbused(sb) >= sb->sb_lowat || + sbused(sb) >= uio->uio_resid || + sbused(sb) >= sb->sb_hiwat) ) { goto deliver; } /* On MSG_WAITALL we must wait until all data or error arrives. */ if ((flags & MSG_WAITALL) && - (sb->sb_cc >= uio->uio_resid || sb->sb_cc >= sb->sb_lowat)) + (sbused(sb) >= uio->uio_resid || sbused(sb) >= sb->sb_lowat)) goto deliver; /* @@ -1201,7 +1202,7 @@ restart: deliver: SOCKBUF_LOCK_ASSERT(&so->so_rcv); - KASSERT(sb->sb_cc > 0, ("%s: sockbuf empty", __func__)); + KASSERT(sbused(sb) > 0, ("%s: sockbuf empty", __func__)); KASSERT(sb->sb_mb != NULL, ("%s: sb_mb == NULL", __func__)); if (sb->sb_flags & SB_DDP_INDICATE && !ddp_handled) @@ -1212,7 +1213,7 @@ deliver: uio->uio_td->td_ru.ru_msgrcv++; /* Fill uio until full or current end of socket buffer is reached. */ - len = min(uio->uio_resid, sb->sb_cc); + len = min(uio->uio_resid, sbused(sb)); if (mp0 != NULL) { /* Dequeue as many mbufs as possible. */ if (!(flags & MSG_PEEK) && len >= sb->sb_mb->m_len) { Index: sys/dev/cxgbe/iw_cxgbe/cm.c =================================================================== --- sys/dev/cxgbe/iw_cxgbe/cm.c (.../head) (revision 270879) +++ sys/dev/cxgbe/iw_cxgbe/cm.c (.../projects/sendfile) (revision 270881) @@ -584,8 +584,8 @@ process_data(struct c4iw_ep *ep) { struct sockaddr_in *local, *remote; - CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sb_cc %d", __func__, - ep->com.so, ep, states[ep->com.state], ep->com.so->so_rcv.sb_cc); + CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__, + ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv)); switch (state_read(&ep->com)) { case MPA_REQ_SENT: @@ -601,11 +601,11 @@ process_data(struct c4iw_ep *ep) process_mpa_request(ep); break; default: - if (ep->com.so->so_rcv.sb_cc) - log(LOG_ERR, "%s: Unexpected streaming data. " - "ep %p, state %d, so %p, so_state 0x%x, sb_cc %u\n", + if (sbused(&ep->com.so->so_rcv)) + log(LOG_ERR, "%s: Unexpected streaming data. ep %p, " + "state %d, so %p, so_state 0x%x, sbused %u\n", __func__, ep, state_read(&ep->com), ep->com.so, - ep->com.so->so_state, ep->com.so->so_rcv.sb_cc); + ep->com.so->so_state, sbused(&ep->com.so->so_rcv)); break; } } Index: sys/dev/iscsi/icl.c =================================================================== --- sys/dev/iscsi/icl.c (.../head) (revision 270879) +++ sys/dev/iscsi/icl.c (.../projects/sendfile) (revision 270881) @@ -758,7 +758,7 @@ icl_receive_thread(void *arg) * is enough data received to read the PDU. */ SOCKBUF_LOCK(&so->so_rcv); - available = so->so_rcv.sb_cc; + available = sbavail(&so->so_rcv); if (available < ic->ic_receive_len) { so->so_rcv.sb_lowat = ic->ic_receive_len; cv_wait(&ic->ic_receive_cv, &so->so_rcv.sb_mtx); Index: sys/dev/ti/if_ti.c =================================================================== --- sys/dev/ti/if_ti.c (.../head) (revision 270879) +++ sys/dev/ti/if_ti.c (.../projects/sendfile) (revision 270881) @@ -1637,7 +1637,7 @@ ti_newbuf_jumbo(struct ti_softc *sc, int idx, stru m[i]->m_data = (void *)sf_buf_kva(sf[i]); m[i]->m_len = PAGE_SIZE; MEXTADD(m[i], sf_buf_kva(sf[i]), PAGE_SIZE, - sf_buf_mext, (void*)sf_buf_kva(sf[i]), sf[i], + sf_mext_free, (void*)sf_buf_kva(sf[i]), sf[i], 0, EXT_DISPOSABLE); m[i]->m_next = m[i+1]; } @@ -1702,7 +1702,7 @@ nobufs: if (m[i]) m_freem(m[i]); if (sf[i]) - sf_buf_mext((void *)sf_buf_kva(sf[i]), sf[i]); + sf_mext_free((void *)sf_buf_kva(sf[i]), sf[i]); } return (ENOBUFS); } Index: sys/vm/vm_pager.h =================================================================== --- sys/vm/vm_pager.h (.../head) (revision 270879) +++ sys/vm/vm_pager.h (.../projects/sendfile) (revision 270881) @@ -51,18 +51,21 @@ typedef vm_object_t pgo_alloc_t(void *, vm_ooffset struct ucred *); typedef void pgo_dealloc_t(vm_object_t); typedef int pgo_getpages_t(vm_object_t, vm_page_t *, int, int); +typedef int pgo_getpages_async_t(vm_object_t, vm_page_t *, int, int, + void(*)(void *), void *); typedef void pgo_putpages_t(vm_object_t, vm_page_t *, int, int, int *); typedef boolean_t pgo_haspage_t(vm_object_t, vm_pindex_t, int *, int *); typedef void pgo_pageunswapped_t(vm_page_t); struct pagerops { - pgo_init_t *pgo_init; /* Initialize pager. */ - pgo_alloc_t *pgo_alloc; /* Allocate pager. */ - pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ - pgo_getpages_t *pgo_getpages; /* Get (read) page. */ - pgo_putpages_t *pgo_putpages; /* Put (write) page. */ - pgo_haspage_t *pgo_haspage; /* Does pager have page? */ - pgo_pageunswapped_t *pgo_pageunswapped; + pgo_init_t *pgo_init; /* Initialize pager. */ + pgo_alloc_t *pgo_alloc; /* Allocate pager. */ + pgo_dealloc_t *pgo_dealloc; /* Disassociate. */ + pgo_getpages_t *pgo_getpages; /* Get (read) page. */ + pgo_getpages_async_t *pgo_getpages_async; /* Get page asyncly. */ + pgo_putpages_t *pgo_putpages; /* Put (write) page. */ + pgo_haspage_t *pgo_haspage; /* Query page. */ + pgo_pageunswapped_t *pgo_pageunswapped; }; extern struct pagerops defaultpagerops; @@ -103,6 +106,8 @@ vm_object_t vm_pager_allocate(objtype_t, void *, v void vm_pager_bufferinit(void); void vm_pager_deallocate(vm_object_t); static __inline int vm_pager_get_pages(vm_object_t, vm_page_t *, int, int); +static __inline int vm_pager_get_pages_async(vm_object_t, vm_page_t *, int, + int, void(*)(void *), void *); static __inline boolean_t vm_pager_has_page(vm_object_t, vm_pindex_t, int *, int *); void vm_pager_init(void); vm_object_t vm_pager_object_lookup(struct pagerlst *, void *); @@ -131,6 +136,27 @@ vm_pager_get_pages( return (r); } +static __inline int +vm_pager_get_pages_async(vm_object_t object, vm_page_t *m, int count, + int reqpage, void (*iodone)(void *), void *arg) +{ + int r; + + VM_OBJECT_ASSERT_WLOCKED(object); + + if (*pagertab[object->type]->pgo_getpages_async == NULL) { + /* Emulate async operation. */ + r = vm_pager_get_pages(object, m, count, reqpage); + VM_OBJECT_WUNLOCK(object); + (iodone)(arg); + VM_OBJECT_WLOCK(object); + } else + r = (*pagertab[object->type]->pgo_getpages_async)(object, m, + count, reqpage, iodone, arg); + + return (r); +} + static __inline void vm_pager_put_pages( vm_object_t object, Index: sys/vm/vm_page.c =================================================================== --- sys/vm/vm_page.c (.../head) (revision 270879) +++ sys/vm/vm_page.c (.../projects/sendfile) (revision 270881) @@ -2692,6 +2692,8 @@ retrylookup: sleep = (allocflags & VM_ALLOC_IGN_SBUSY) != 0 ? vm_page_xbusied(m) : vm_page_busied(m); if (sleep) { + if (allocflags & VM_ALLOC_NOWAIT) + return (NULL); /* * Reference the page before unlocking and * sleeping so that the page daemon is less @@ -2719,6 +2721,8 @@ retrylookup: } m = vm_page_alloc(object, pindex, allocflags & ~VM_ALLOC_IGN_SBUSY); if (m == NULL) { + if (allocflags & VM_ALLOC_NOWAIT) + return (NULL); VM_OBJECT_WUNLOCK(object); VM_WAIT; VM_OBJECT_WLOCK(object); Index: sys/vm/vm_page.h =================================================================== --- sys/vm/vm_page.h (.../head) (revision 270879) +++ sys/vm/vm_page.h (.../projects/sendfile) (revision 270881) @@ -391,6 +391,7 @@ vm_page_t PHYS_TO_VM_PAGE(vm_paddr_t pa); #define VM_ALLOC_IGN_SBUSY 0x1000 /* vm_page_grab() only */ #define VM_ALLOC_NODUMP 0x2000 /* don't include in dump */ #define VM_ALLOC_SBUSY 0x4000 /* Shared busy the page */ +#define VM_ALLOC_NOWAIT 0x8000 /* Return NULL instead of sleeping */ #define VM_ALLOC_COUNT_SHIFT 16 #define VM_ALLOC_COUNT(count) ((count) << VM_ALLOC_COUNT_SHIFT) Index: sys/vm/vnode_pager.c =================================================================== --- sys/vm/vnode_pager.c (.../head) (revision 270879) +++ sys/vm/vnode_pager.c (.../projects/sendfile) (revision 270881) @@ -83,6 +83,8 @@ static int vnode_pager_input_smlfs(vm_object_t obj static int vnode_pager_input_old(vm_object_t object, vm_page_t m); static void vnode_pager_dealloc(vm_object_t); static int vnode_pager_getpages(vm_object_t, vm_page_t *, int, int); +static int vnode_pager_getpages_async(vm_object_t, vm_page_t *, int, int, + void(*)(void *), void *); static void vnode_pager_putpages(vm_object_t, vm_page_t *, int, boolean_t, int *); static boolean_t vnode_pager_haspage(vm_object_t, vm_pindex_t, int *, int *); static vm_object_t vnode_pager_alloc(void *, vm_ooffset_t, vm_prot_t, @@ -92,6 +94,7 @@ struct pagerops vnodepagerops = { .pgo_alloc = vnode_pager_alloc, .pgo_dealloc = vnode_pager_dealloc, .pgo_getpages = vnode_pager_getpages, + .pgo_getpages_async = vnode_pager_getpages_async, .pgo_putpages = vnode_pager_putpages, .pgo_haspage = vnode_pager_haspage, }; @@ -664,6 +667,40 @@ vnode_pager_getpages(vm_object_t object, vm_page_t return rtval; } +static int +vnode_pager_getpages_async(vm_object_t object, vm_page_t *m, int count, + int reqpage, void (*iodone)(void *), void *arg) +{ + int rtval; + struct vnode *vp; + int bytes = count * PAGE_SIZE; + + vp = object->handle; + VM_OBJECT_WUNLOCK(object); + rtval = VOP_GETPAGES_ASYNC(vp, m, bytes, reqpage, 0, iodone, arg); + KASSERT(rtval != EOPNOTSUPP, + ("vnode_pager: FS getpages_async not implemented\n")); + VM_OBJECT_WLOCK(object); + return rtval; +} + +struct getpages_softc { + vm_page_t *m; + struct buf *bp; + vm_object_t object; + vm_offset_t kva; + off_t foff; + int size; + int count; + int unmapped; + int reqpage; + void (*iodone)(void *); + void *arg; +}; + +int vnode_pager_generic_getpages_done(struct getpages_softc *); +void vnode_pager_generic_getpages_done_async(struct buf *); + /* * This is now called from local media FS's to operate against their * own vnodes if they fail to implement VOP_GETPAGES. @@ -670,11 +707,11 @@ vnode_pager_getpages(vm_object_t object, vm_page_t */ int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, int bytecount, - int reqpage) + int reqpage, void (*iodone)(void *), void *arg) { vm_object_t object; vm_offset_t kva; - off_t foff, tfoff, nextoff; + off_t foff; int i, j, size, bsize, first; daddr_t firstaddr, reqblock; struct bufobj *bo; @@ -684,6 +721,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ struct mount *mp; int count; int error; + int unmapped; object = vp->v_object; count = bytecount / PAGE_SIZE; @@ -891,8 +929,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ * requires mapped buffers. */ mp = vp->v_mount; - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0 && - unmapped_buf_allowed) { + unmapped = (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS)); + if (unmapped && unmapped_buf_allowed) { bp->b_data = unmapped_buf; bp->b_kvabase = unmapped_buf; bp->b_offset = 0; @@ -905,7 +943,6 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ /* build a minimal buffer header */ bp->b_iocmd = BIO_READ; - bp->b_iodone = bdone; KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred")); KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred")); bp->b_rcred = crhold(curthread->td_ucred); @@ -923,10 +960,88 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ /* do the input */ bp->b_iooffset = dbtob(bp->b_blkno); - bstrategy(bp); - bwait(bp, PVM, "vnread"); + if (iodone) { /* async */ + struct getpages_softc *sc; + sc = malloc(sizeof(*sc), M_TEMP, M_WAITOK); + + sc->m = m; + sc->bp = bp; + sc->object = object; + sc->foff = foff; + sc->size = size; + sc->count = count; + sc->unmapped = unmapped; + sc->reqpage = reqpage; + sc->kva = kva; + + sc->iodone = iodone; + sc->arg = arg; + + bp->b_iodone = vnode_pager_generic_getpages_done_async; + bp->b_caller1 = sc; + BUF_KERNPROC(bp); + bstrategy(bp); + /* Good bye! */ + } else { + struct getpages_softc sc; + + sc.m = m; + sc.bp = bp; + sc.object = object; + sc.foff = foff; + sc.size = size; + sc.count = count; + sc.unmapped = unmapped; + sc.reqpage = reqpage; + sc.kva = kva; + + bp->b_iodone = bdone; + bstrategy(bp); + bwait(bp, PVM, "vnread"); + error = vnode_pager_generic_getpages_done(&sc); + } + + return (error ? VM_PAGER_ERROR : VM_PAGER_OK); +} + +void +vnode_pager_generic_getpages_done_async(struct buf *bp) +{ + struct getpages_softc *sc = bp->b_caller1; + int error; + + error = vnode_pager_generic_getpages_done(sc); + + vm_page_xunbusy(sc->m[sc->reqpage]); + + sc->iodone(sc->arg); + + free(sc, M_TEMP); +} + +int +vnode_pager_generic_getpages_done(struct getpages_softc *sc) +{ + vm_object_t object; + vm_offset_t kva; + vm_page_t *m; + struct buf *bp; + off_t foff, tfoff, nextoff; + int i, size, count, unmapped, reqpage; + int error = 0; + + m = sc->m; + bp = sc->bp; + object = sc->object; + foff = sc->foff; + size = sc->size; + count = sc->count; + unmapped = sc->unmapped; + reqpage = sc->reqpage; + kva = sc->kva; + if ((bp->b_ioflags & BIO_ERROR) != 0) error = EIO; @@ -939,7 +1054,7 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ } if ((bp->b_flags & B_UNMAPPED) == 0) pmap_qremove(kva, count); - if (mp != NULL && (mp->mnt_kern_flag & MNTK_UNMAPPED_BUFS) != 0) { + if (unmapped) { bp->b_data = (caddr_t)kva; bp->b_kvabase = (caddr_t)kva; bp->b_flags &= ~B_UNMAPPED; @@ -995,7 +1110,8 @@ vnode_pager_generic_getpages(struct vnode *vp, vm_ if (error) { printf("vnode_pager_getpages: I/O read error\n"); } - return (error ? VM_PAGER_ERROR : VM_PAGER_OK); + + return (error); } /* Index: sys/vm/vnode_pager.h =================================================================== --- sys/vm/vnode_pager.h (.../head) (revision 270879) +++ sys/vm/vnode_pager.h (.../projects/sendfile) (revision 270881) @@ -41,7 +41,7 @@ #ifdef _KERNEL int vnode_pager_generic_getpages(struct vnode *vp, vm_page_t *m, - int count, int reqpage); + int count, int reqpage, void (*iodone)(void *), void *arg); int vnode_pager_generic_putpages(struct vnode *vp, vm_page_t *m, int count, boolean_t sync, int *rtvals); Index: usr.bin/netstat/inet.c =================================================================== --- usr.bin/netstat/inet.c (.../head) (revision 270879) +++ usr.bin/netstat/inet.c (.../projects/sendfile) (revision 270881) @@ -137,7 +137,7 @@ pcblist_sysctl(int proto, const char *name, char * static void sbtoxsockbuf(struct sockbuf *sb, struct xsockbuf *xsb) { - xsb->sb_cc = sb->sb_cc; + xsb->sb_cc = sb->sb_ccc; xsb->sb_hiwat = sb->sb_hiwat; xsb->sb_mbcnt = sb->sb_mbcnt; xsb->sb_mcnt = sb->sb_mcnt; @@ -479,7 +479,8 @@ protopr(u_long off, const char *name, int af1, int printf("%6u %6u %6u ", tp->t_sndrexmitpack, tp->t_rcvoopack, tp->t_sndzerowin); } else { - printf("%6u %6u ", so->so_rcv.sb_cc, so->so_snd.sb_cc); + printf("%6u %6u ", + so->so_rcv.sb_cc, so->so_snd.sb_cc); } if (numeric_port) { if (inp->inp_vflag & INP_IPV4) { Index: usr.bin/netstat/netgraph.c =================================================================== --- usr.bin/netstat/netgraph.c (.../head) (revision 270879) +++ usr.bin/netstat/netgraph.c (.../projects/sendfile) (revision 270881) @@ -119,7 +119,7 @@ netgraphprotopr(u_long off, const char *name, int if (Aflag) printf("%8lx ", (u_long) this); printf("%-5.5s %6u %6u ", - name, sockb.so_rcv.sb_cc, sockb.so_snd.sb_cc); + name, sockb.so_rcv.sb_ccc, sockb.so_snd.sb_ccc); /* Get info on associated node */ if (ngpcb.node_id == 0 || csock == -1) Index: usr.bin/netstat/unix.c =================================================================== --- usr.bin/netstat/unix.c (.../head) (revision 270879) +++ usr.bin/netstat/unix.c (.../projects/sendfile) (revision 270881) @@ -287,7 +287,8 @@ unixdomainpr(struct xunpcb *xunp, struct xsocket * } else { printf("%8lx %-6.6s %6u %6u %8lx %8lx %8lx %8lx", (long)so->so_pcb, socktype[so->so_type], so->so_rcv.sb_cc, - so->so_snd.sb_cc, (long)unp->unp_vnode, (long)unp->unp_conn, + so->so_snd.sb_cc, (long)unp->unp_vnode, + (long)unp->unp_conn, (long)LIST_FIRST(&unp->unp_refs), (long)LIST_NEXT(unp, unp_reflink)); } Index: usr.bin/systat/netstat.c =================================================================== --- usr.bin/systat/netstat.c (.../head) (revision 270879) +++ usr.bin/systat/netstat.c (.../projects/sendfile) (revision 270881) @@ -333,8 +333,8 @@ enter_kvm(struct inpcb *inp, struct socket *so, in struct netinfo *p; if ((p = enter(inp, state, proto)) != NULL) { - p->ni_rcvcc = so->so_rcv.sb_cc; - p->ni_sndcc = so->so_snd.sb_cc; + p->ni_rcvcc = so->so_rcv.sb_ccc; + p->ni_sndcc = so->so_snd.sb_ccc; } } Index: usr.bin/bluetooth/btsockstat/btsockstat.c =================================================================== --- usr.bin/bluetooth/btsockstat/btsockstat.c (.../head) (revision 270879) +++ usr.bin/bluetooth/btsockstat/btsockstat.c (.../projects/sendfile) (revision 270881) @@ -255,8 +255,8 @@ hcirawpr(kvm_t *kvmd, u_long addr) (unsigned long) pcb.so, (unsigned long) this, pcb.flags, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, pcb.addr.hci_node); } } /* hcirawpr */ @@ -303,8 +303,8 @@ l2caprawpr(kvm_t *kvmd, u_long addr) "%-8lx %-8lx %6d %6d %-17.17s\n", (unsigned long) pcb.so, (unsigned long) this, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, bdaddrpr(&pcb.src, NULL, 0)); } } /* l2caprawpr */ @@ -361,8 +361,8 @@ l2cappr(kvm_t *kvmd, u_long addr) fprintf(stdout, "%-8lx %6d %6d %-17.17s/%-5d %-17.17s %-5d %s\n", (unsigned long) this, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, bdaddrpr(&pcb.src, local, sizeof(local)), pcb.psm, bdaddrpr(&pcb.dst, remote, sizeof(remote)), @@ -467,8 +467,8 @@ rfcommpr(kvm_t *kvmd, u_long addr) fprintf(stdout, "%-8lx %6d %6d %-17.17s %-17.17s %-4d %-4d %s\n", (unsigned long) this, - so.so_rcv.sb_cc, - so.so_snd.sb_cc, + so.so_rcv.sb_ccc, + so.so_snd.sb_ccc, bdaddrpr(&pcb.src, local, sizeof(local)), bdaddrpr(&pcb.dst, remote, sizeof(remote)), pcb.channel, --hTiIB9CRvBOLTyqY--