From owner-svn-src-all@freebsd.org Thu Mar 22 18:23:15 2018 Return-Path: Delivered-To: svn-src-all@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 04647F5E379; Thu, 22 Mar 2018 18:23:15 +0000 (UTC) (envelope-from rb743@hermes.cam.ac.uk) Received: from ppsw-40.csi.cam.ac.uk (ppsw-40.csi.cam.ac.uk [131.111.8.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7FBA783B50; Thu, 22 Mar 2018 18:23:14 +0000 (UTC) (envelope-from rb743@hermes.cam.ac.uk) X-Cam-AntiVirus: no malware found X-Cam-ScannerInfo: http://help.uis.cam.ac.uk/email-scanner-virus Received: from sc1.bsdpad.com ([163.172.212.18]:35633) by ppsw-40.csi.cam.ac.uk (smtp.hermes.cam.ac.uk [131.111.8.158]:587) with esmtpsa (LOGIN:rb743) (TLSv1:ECDHE-RSA-AES256-SHA:256) id 1ez4rp-000sJt-jZ (Exim 4.89_2) (return-path ); Thu, 22 Mar 2018 18:23:13 +0000 Date: Thu, 22 Mar 2018 18:14:15 +0000 From: Ruslan Bukin To: Jonathan Looney Cc: "Jonathan T. Looney" , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r331347 - in head: etc/mtree include sys/conf sys/dev/tcp_log sys/kern sys/netinet usr.bin/netstat Message-ID: <20180322181415.GA8657@bsdpad.com> References: <201803220940.w2M9e8T4067719@repo.freebsd.org> <20180322141606.GA4972@bsdpad.com> <20180322142225.GA5139@bsdpad.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.6.1 (2016-04-27) Sender: "R. Bukin" X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Mar 2018 18:23:15 -0000 Look at these https://ci.freebsd.org/job/FreeBSD-head-mips-build/lastBuild/console https://ci.freebsd.org/job/FreeBSD-head-powerpc-build/lastBuild/console Example make -j5 TARGET=mips TARGET_ARCH=mipsel kernel-toolchain make -j5 TARGET=mips TARGET_ARCH=mipsel KERNCONF=CANNA buildkernel Ruslan On Thu, Mar 22, 2018 at 03:39:23PM +0000, Jonathan Looney wrote: > A tinderbox build didn't complain about atomic_fetchadd_64, so I assume it > is OK. > Yes, this can be made optional, if there is a need for that. > Jonathan > On Thu, Mar 22, 2018 at 2:22 PM, Ruslan Bukin > <[1]ruslan.bukin@cl.cam.ac.uk> wrote: > > Also can this be pluggable ? > It looks like it is optional device which means it can free up some > space in embedded environment when unused > Ruslan > On Thu, Mar 22, 2018 at 02:16:06PM +0000, Ruslan Bukin wrote: > > We don't have atomic_fetchadd_64 for mips32 I think > > > > Ruslan > > > > On Thu, Mar 22, 2018 at 09:40:08AM +0000, Jonathan T. Looney wrote: > > > Author: jtl > > > Date: Thu Mar 22 09:40:08 2018 > > > New Revision: 331347 > > > URL: [2]https://svnweb.freebsd.org/changeset/base/331347 > > > > > > Log: > > >   Add the "TCP Blackbox Recorder" which we discussed at the > developer > > >   summits at BSDCan and BSDCam in 2017. > > > > > >   The TCP Blackbox Recorder allows you to capture events on a TCP > connection > > >   in a ring buffer. It stores metadata with the event. It > optionally stores > > >   the TCP header associated with an event (if the event is > associated with a > > >   packet) and also optionally stores information on the sockets. > > > > > >   It supports setting a log ID on a TCP connection and using this > to correlate > > >   multiple connections that share a common log ID. > > > > > >   You can log connections in different modes. If you are doing a > coordinated > > >   test with a particular connection, you may tell the system to > put it in > > >   mode 4 (continuous dump). Or, if you just want to monitor for > errors, you > > >   can put it in mode 1 (ring buffer) and dump all the ring buffers > associated > > >   with the connection ID when we receive an error signal for that > connection > > >   ID. You can set a default mode that will be applied to a > particular ratio > > >   of incoming connections. You can also manually set a mode using > a socket > > >   option. > > > > > >   This commit includes only basic probes. rrs@ has added quite an > abundance > > >   of probes in his TCP development work. He plans to commit those > soon. > > > > > >   There are user-space programs which we plan to commit as ports. > These read > > >   the data from the log device and output pcapng files, and then > let you > > >   analyze the data (and metadata) in the pcapng files. > > > > > >   Reviewed by:      gnn (previous version) > > >   Obtained from:    Netflix, Inc. > > >   Relnotes: yes > > >   Differential Revision:    > [3]https://reviews.freebsd.org/D11085 > > > > > > Added: > > >   head/sys/dev/tcp_log/ > > >   head/sys/dev/tcp_log/tcp_log_dev.c   (contents, props changed) > > >   head/sys/dev/tcp_log/tcp_log_dev.h   (contents, props changed) > > >   head/sys/netinet/tcp_log_buf.c   (contents, props changed) > > >   head/sys/netinet/tcp_log_buf.h   (contents, props changed) > > > Modified: > > >   head/etc/mtree/BSD.include.dist > > >   head/include/Makefile > > >   head/sys/conf/files > > >   head/sys/kern/subr_witness.c > > >   head/sys/netinet/tcp.h > > >   head/sys/netinet/tcp_input.c > > >   head/sys/netinet/tcp_output.c > > >   head/sys/netinet/tcp_subr.c > > >   head/sys/netinet/tcp_timer.c > > >   head/sys/netinet/tcp_usrreq.c > > >   head/sys/netinet/tcp_var.h > > >   head/usr.bin/netstat/inet.c > > >   head/usr.bin/netstat/main.c > > >   head/usr.bin/netstat/netstat.1 > > >   head/usr.bin/netstat/netstat.h > > > > > > Modified: head/etc/mtree/BSD.include.dist > > > > ============================================================================== > > > --- head/etc/mtree/BSD.include.dist Thu Mar 22 08:32:39 2018      >   (r331346) > > > +++ head/etc/mtree/BSD.include.dist Thu Mar 22 09:40:08 2018      >   (r331347) > > > @@ -158,6 +158,8 @@ > > >          .. > > >          speaker > > >          .. > > > +        tcp_log > > > +        .. > > >          usb > > >          .. > > >          vkbd > > > > > > Modified: head/include/Makefile > > > > ============================================================================== > > > --- head/include/Makefile   Thu Mar 22 08:32:39 2018        > (r331346) > > > +++ head/include/Makefile   Thu Mar 22 09:40:08 2018        > (r331347) > > > @@ -47,7 +47,7 @@ LSUBDIRS= cam/ata cam/mmc cam/nvme cam/scsi \ > > >     dev/hwpmc dev/hyperv \ > > >     dev/ic dev/iicbus dev/io dev/lmc dev/mfi dev/mmc dev/nvme \ > > >     dev/ofw dev/pbio dev/pci ${_dev_powermac_nvram} dev/ppbus > dev/smbus \ > > > -   dev/speaker dev/vkbd dev/wi \ > > > +   dev/speaker dev/tcp_log dev/vkbd dev/wi \ > > >     fs/devfs fs/fdescfs fs/msdosfs fs/nandfs fs/nfs fs/nullfs \ > > >     fs/procfs fs/smbfs fs/udf fs/unionfs \ > > >     geom/cache geom/concat geom/eli geom/gate geom/journal > geom/label \ > > > > > > Modified: head/sys/conf/files > > > > ============================================================================== > > > --- head/sys/conf/files     Thu Mar 22 08:32:39 2018        > (r331346) > > > +++ head/sys/conf/files     Thu Mar 22 09:40:08 2018        > (r331347) > > > @@ -3161,6 +3161,7 @@ dev/syscons/star/star_saver.c optional > star_saver > > >  dev/syscons/syscons.c              optional sc > > >  dev/syscons/sysmouse.c             optional sc > > >  dev/syscons/warp/warp_saver.c      optional warp_saver > > > +dev/tcp_log/tcp_log_dev.c  optional inet | inet6 > > >  dev/tdfx/tdfx_linux.c              optional tdfx_linux tdfx > compat_linux > > >  dev/tdfx/tdfx_pci.c                optional tdfx pci > > >  dev/ti/if_ti.c                     optional ti pci > > > @@ -4309,6 +4310,7 @@ netinet/tcp_debug.c           optional > tcpdebug > > >  netinet/tcp_fastopen.c             optional inet > tcp_rfc7413 | inet6 tcp_rfc7413 > > >  netinet/tcp_hostcache.c            optional inet | inet6 > > >  netinet/tcp_input.c                optional inet | inet6 > > > +netinet/tcp_log_buf.c              optional inet | inet6 > > >  netinet/tcp_lro.c          optional inet | inet6 > > >  netinet/tcp_output.c               optional inet | inet6 > > >  netinet/tcp_offload.c              optional tcp_offload > inet | tcp_offload inet6 > > > > > > Added: head/sys/dev/tcp_log/tcp_log_dev.c > > > > ============================================================================== > > > --- /dev/null       00:00:00 1970   (empty, because file is > newly added) > > > +++ head/sys/dev/tcp_log/tcp_log_dev.c      Thu Mar 22 09:40:08 > 2018        (r331347) > > > @@ -0,0 +1,521 @@ > > > +/*- > > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD > > > + * > > > + * Copyright (c) 2016-2017 > > > + * Netflix Inc.  All rights reserved. > > > + * > > > + * Redistribution and use in source and binary forms, with or > without > > > + * modification, are permitted provided that the following > conditions > > > + * are met: > > > + * 1. Redistributions of source code must retain the above > copyright > > > + *    notice, this list of conditions and the following > disclaimer. > > > + * 2. Redistributions in binary form must reproduce the above > copyright > > > + *    notice, this list of conditions and the following > disclaimer in the > > > + *    documentation and/or other materials provided with the > distribution. > > > + * > > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS > IS'' AND > > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > TO, THE > > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > PARTICULAR PURPOSE > > > + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS > BE LIABLE > > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > SUBSTITUTE GOODS > > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > CONTRACT, STRICT > > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING > IN ANY WAY > > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > POSSIBILITY OF > > > + * SUCH DAMAGE. > > > + * > > > + */ > > > + > > > +#include > > > +__FBSDID("$FreeBSD$"); > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#include > > > + > > > +#ifdef TCPLOG_DEBUG_COUNTERS > > > +extern counter_u64_t tcp_log_que_read; > > > +extern counter_u64_t tcp_log_que_freed; > > > +#endif > > > + > > > +static struct cdev *tcp_log_dev; > > > +static struct selinfo tcp_log_sel; > > > + > > > +static struct log_queueh tcp_log_dev_queue_head = > STAILQ_HEAD_INITIALIZER(tcp_log_dev_queue_head); > > > +static struct log_infoh tcp_log_dev_reader_head = > STAILQ_HEAD_INITIALIZER(tcp_log_dev_reader_head); > > > + > > > +MALLOC_DEFINE(M_TCPLOGDEV, "tcp_log_dev", "TCP log device data > structures"); > > > + > > > +static int tcp_log_dev_listeners = 0; > > > + > > > +static struct mtx tcp_log_dev_queue_lock; > > > + > > > +#define    TCP_LOG_DEV_QUEUE_LOCK()        > mtx_lock(&tcp_log_dev_queue_lock) > > > +#define    TCP_LOG_DEV_QUEUE_UNLOCK()      > mtx_unlock(&tcp_log_dev_queue_lock) > > > +#define    TCP_LOG_DEV_QUEUE_LOCK_ASSERT() > mtx_assert(&tcp_log_dev_queue_lock, MA_OWNED) > > > +#define    TCP_LOG_DEV_QUEUE_UNLOCK_ASSERT() > mtx_assert(&tcp_log_dev_queue_lock, MA_NOTOWNED) > > > +#define    TCP_LOG_DEV_QUEUE_REF(tldq)    >  refcount_acquire(&((tldq)->tldq_refcnt)) > > > +#define    TCP_LOG_DEV_QUEUE_UNREF(tldq)  >  refcount_release(&((tldq)->tldq_refcnt)) > > > + > > > +static void        tcp_log_dev_clear_refcount(struct > tcp_log_dev_queue *entry); > > > +static void        tcp_log_dev_clear_cdevpriv(void *data); > > > +static int tcp_log_dev_open(struct cdev *dev __unused, int flags, > > > +    int devtype __unused, struct thread *td __unused); > > > +static int tcp_log_dev_write(struct cdev *dev __unused, > > > +    struct uio *uio __unused, int flags __unused); > > > +static int tcp_log_dev_read(struct cdev *dev __unused, struct uio > *uio, > > > +    int flags __unused); > > > +static int tcp_log_dev_ioctl(struct cdev *dev __unused, u_long cmd, > > > +    caddr_t data, int fflag __unused, struct thread *td > __unused); > > > +static int tcp_log_dev_poll(struct cdev *dev __unused, int events, > > > +    struct thread *td); > > > + > > > + > > > +enum tcp_log_dev_queue_lock_state { > > > +   QUEUE_UNLOCKED = 0, > > > +   QUEUE_LOCKED, > > > +}; > > > + > > > +static struct cdevsw tcp_log_cdevsw = { > > > +   .d_version =    D_VERSION, > > > +   .d_read =       tcp_log_dev_read, > > > +   .d_open =       tcp_log_dev_open, > > > +   .d_write =      tcp_log_dev_write, > > > +   .d_poll =       tcp_log_dev_poll, > > > +   .d_ioctl =      tcp_log_dev_ioctl, > > > +#ifdef NOTYET > > > +   .d_mmap =       tcp_log_dev_mmap, > > > +#endif > > > +   .d_name =       "tcp_log", > > > +}; > > > + > > > +static __inline void > > > +tcp_log_dev_queue_validate_lock(int lockstate) > > > +{ > > > + > > > +#ifdef INVARIANTS > > > +   switch (lockstate) { > > > +   case QUEUE_LOCKED: > > > +           TCP_LOG_DEV_QUEUE_LOCK_ASSERT(); > > > +           break; > > > +   case QUEUE_UNLOCKED: > > > +           TCP_LOG_DEV_QUEUE_UNLOCK_ASSERT(); > > > +           break; > > > +   default: > > > +           kassert_panic("%s:%d: unknown queue lock state", > __func__, > > > +               __LINE__); > > > +   } > > > +#endif > > > +} > > > + > > > +/* > > > + * Clear the refcount. If appropriate, it will remove the entry > from the > > > + * queue and call the destructor. > > > + * > > > + * This must be called with the queue lock held. > > > + */ > > > +static void > > > +tcp_log_dev_clear_refcount(struct tcp_log_dev_queue *entry) > > > +{ > > > + > > > +   KASSERT(entry != NULL, ("%s: called with NULL entry", > __func__)); > > > + > > > +   TCP_LOG_DEV_QUEUE_LOCK_ASSERT(); > > > + > > > +   if (TCP_LOG_DEV_QUEUE_UNREF(entry)) { > > > +#ifdef TCPLOG_DEBUG_COUNTERS > > > +           counter_u64_add(tcp_log_que_freed, 1); > > > +#endif > > > +           /* Remove the entry from the queue and call the > destructor. */ > > > +           STAILQ_REMOVE(&tcp_log_dev_queue_head, entry, > tcp_log_dev_queue, > > > +               tldq_queue); > > > +           (*entry->tldq_dtor)(entry); > > > +   } > > > +} > > > + > > > +static void > > > +tcp_log_dev_clear_cdevpriv(void *data) > > > +{ > > > +   struct tcp_log_dev_info *priv; > > > +   struct tcp_log_dev_queue *entry, *entry_tmp; > > > + > > > +   priv = (struct tcp_log_dev_info *)data; > > > +   if (priv == NULL) > > > +           return; > > > + > > > +   /* > > > +    * Lock the queue and drop our references. We hold references > to all > > > +    * the entries starting with tldi_head (or, if tldi_head == > NULL, all > > > +    * entries in the queue). > > > +    * > > > +    * Because we don't want anyone adding addition things to the > queue > > > +    * while we are doing this, we lock the queue. > > > +    */ > > > +   TCP_LOG_DEV_QUEUE_LOCK(); > > > +   if (priv->tldi_head != NULL) { > > > +           entry = priv->tldi_head; > > > +           STAILQ_FOREACH_FROM_SAFE(entry, > &tcp_log_dev_queue_head, > > > +               tldq_queue, entry_tmp) { > > > +                   tcp_log_dev_clear_refcount(entry); > > > +           } > > > +   } > > > +   tcp_log_dev_listeners--; > > > +   KASSERT(tcp_log_dev_listeners >= 0, > > > +       ("%s: tcp_log_dev_listeners is unexpectedly negative", > __func__)); > > > +   STAILQ_REMOVE(&tcp_log_dev_reader_head, priv, > tcp_log_dev_info, > > > +       tldi_list); > > > +   TCP_LOG_DEV_QUEUE_LOCK_ASSERT(); > > > +   TCP_LOG_DEV_QUEUE_UNLOCK(); > > > +   free(priv, M_TCPLOGDEV); > > > +} > > > + > > > +static int > > > +tcp_log_dev_open(struct cdev *dev __unused, int flags, int devtype > __unused, > > > +    struct thread *td __unused) > > > +{ > > > +   struct tcp_log_dev_info *priv; > > > +   struct tcp_log_dev_queue *entry; > > > +   int rv; > > > + > > > +   /* > > > +    * Ideally, we shouldn't see these because of file system > > > +    * permissions. > > > +    */ > > > +   if (flags & (FWRITE | FEXEC | FAPPEND | O_TRUNC)) > > > +           return (ENODEV); > > > + > > > +   /* Allocate space to hold information about where we are. */ > > > +   priv = malloc(sizeof(struct tcp_log_dev_info), M_TCPLOGDEV, > > > +       M_ZERO | M_WAITOK); > > > + > > > +   /* Stash the private data away. */ > > > +   rv = devfs_set_cdevpriv((void *)priv, > tcp_log_dev_clear_cdevpriv); > > > +   if (!rv) { > > > +           /* > > > +            * Increase the listener count, add this reader to > the list, and > > > +            * take references on all current queues. > > > +            */ > > > +           TCP_LOG_DEV_QUEUE_LOCK(); > > > +           tcp_log_dev_listeners++; > > > +           STAILQ_INSERT_HEAD(&tcp_log_dev_reader_head, priv, > tldi_list); > > > +           priv->tldi_head = > STAILQ_FIRST(&tcp_log_dev_queue_head); > > > +           if (priv->tldi_head != NULL) > > > +                   priv->tldi_cur = > priv->tldi_head->tldq_buf; > > > +           STAILQ_FOREACH(entry, &tcp_log_dev_queue_head, > tldq_queue) > > > +                   TCP_LOG_DEV_QUEUE_REF(entry); > > > +           TCP_LOG_DEV_QUEUE_UNLOCK(); > > > +   } else { > > > +           /* Free the entry. */ > > > +           free(priv, M_TCPLOGDEV); > > > +   } > > > +   return (rv); > > > +} > > > + > > > +static int > > > +tcp_log_dev_write(struct cdev *dev __unused, struct uio *uio > __unused, > > > +    int flags __unused) > > > +{ > > > + > > > +   return (ENODEV); > > > +} > > > + > > > +static __inline void > > > +tcp_log_dev_rotate_bufs(struct tcp_log_dev_info *priv, int > *lockstate) > > > +{ > > > +   struct tcp_log_dev_queue *entry; > > > + > > > +   KASSERT(priv->tldi_head != NULL, > > > +       ("%s:%d: priv->tldi_head unexpectedly NULL", > > > +       __func__, __LINE__)); > > > +   KASSERT(priv->tldi_head->tldq_buf == priv->tldi_cur, > > > +       ("%s:%d: buffer mismatch (%p vs %p)", > > > +       __func__, __LINE__, priv->tldi_head->tldq_buf, > > > +       priv->tldi_cur)); > > > +   tcp_log_dev_queue_validate_lock(*lockstate); > > > + > > > +   if (*lockstate == QUEUE_UNLOCKED) { > > > +           TCP_LOG_DEV_QUEUE_LOCK(); > > > +           *lockstate = QUEUE_LOCKED; > > > +   } > > > +   entry = priv->tldi_head; > > > +   priv->tldi_head = STAILQ_NEXT(entry, tldq_queue); > > > +   tcp_log_dev_clear_refcount(entry); > > > +   priv->tldi_cur = NULL; > > > +} > > > + > > > +static int > > > +tcp_log_dev_read(struct cdev *dev __unused, struct uio *uio, int > flags) > > > +{ > > > +   struct tcp_log_common_header *buf; > > > +   struct tcp_log_dev_info *priv; > > > +   struct tcp_log_dev_queue *entry; > > > +   ssize_t len; > > > +   int lockstate, rv; > > > + > > > +   /* Get our private info. */ > > > +   rv = devfs_get_cdevpriv((void **)&priv); > > > +   if (rv) > > > +           return (rv); > > > + > > > +   lockstate = QUEUE_UNLOCKED; > > > + > > > +   /* Do we need to get a new buffer? */ > > > +   while (priv->tldi_cur == NULL || > > > +       priv->tldi_cur->tlch_length <= priv->tldi_off) { > > > +           /* Did we somehow forget to rotate? */ > > > +           KASSERT(priv->tldi_cur == NULL, > > > +               ("%s:%d: tldi_cur is unexpectedly non-NULL", > __func__, > > > +               __LINE__)); > > > +           if (priv->tldi_cur != NULL) > > > +                   tcp_log_dev_rotate_bufs(priv, > &lockstate); > > > + > > > +           /* > > > +            * Before we start looking at tldi_head, we need a > lock on the > > > +            * queue to make sure tldi_head stays stable. > > > +            */ > > > +           if (lockstate == QUEUE_UNLOCKED) { > > > +                   TCP_LOG_DEV_QUEUE_LOCK(); > > > +                   lockstate = QUEUE_LOCKED; > > > +           } > > > + > > > +           /* We need the next buffer. Do we have one? */ > > > +           if (priv->tldi_head == NULL && (flags & > FNONBLOCK)) { > > > +                   rv = EAGAIN; > > > +                   goto done; > > > +           } > > > +           if (priv->tldi_head == NULL) { > > > +                   /* Sleep and wait for more things we > can read. */ > > > +                   rv = mtx_sleep(&tcp_log_dev_listeners, > > > +                       &tcp_log_dev_queue_lock, PCATCH, > "tcplogdev", 0); > > > +                   if (rv) > > > +                           goto done; > > > +                   if (priv->tldi_head == NULL) > > > +                           continue; > > > +           } > > > + > > > +           /* > > > +            * We have an entry to read. We want to try to > create a > > > +            * buffer, if one doesn't already exist. > > > +            */ > > > +           entry = priv->tldi_head; > > > +           if (entry->tldq_buf == NULL) { > > > +                   TCP_LOG_DEV_QUEUE_LOCK_ASSERT(); > > > +                   buf = (*entry->tldq_xform)(entry); > > > +                   if (buf == NULL) { > > > +                           rv = EBUSY; > > > +                           goto done; > > > +                   } > > > +                   entry->tldq_buf = buf; > > > +           } > > > + > > > +           priv->tldi_cur = entry->tldq_buf; > > > +           priv->tldi_off = 0; > > > +   } > > > + > > > +   /* Copy what we can from this buffer to the output buffer. */ > > > +   if (uio->uio_resid > 0) { > > > +           /* Drop locks so we can take page faults. */ > > > +           if (lockstate == QUEUE_LOCKED) > > > +                   TCP_LOG_DEV_QUEUE_UNLOCK(); > > > +           lockstate = QUEUE_UNLOCKED; > > > + > > > +           KASSERT(priv->tldi_cur != NULL, > > > +               ("%s: priv->tldi_cur is unexpectedly NULL", > __func__)); > > > + > > > +           /* Copy as much as we can to this uio. */ > > > +           len = priv->tldi_cur->tlch_length - > priv->tldi_off; > > > +           if (len > uio->uio_resid) > > > +                   len = uio->uio_resid; > > > +           rv = uiomove(((uint8_t *)priv->tldi_cur) + > priv->tldi_off, > > > +               len, uio); > > > +           if (rv != 0) > > > +                   goto done; > > > +           priv->tldi_off += len; > > > +#ifdef TCPLOG_DEBUG_COUNTERS > > > +           counter_u64_add(tcp_log_que_read, len); > > > +#endif > > > +   } > > > +   /* Are we done with this buffer? If so, find the next one. */ > > > +   if (priv->tldi_off >= priv->tldi_cur->tlch_length) { > > > +           KASSERT(priv->tldi_off == > priv->tldi_cur->tlch_length, > > > +               ("%s: offset (%ju) exceeds length (%ju)", > __func__, > > > +               (uintmax_t)priv->tldi_off, > > > +               (uintmax_t)priv->tldi_cur->tlch_length)); > > > +           tcp_log_dev_rotate_bufs(priv, &lockstate); > > > +   } > > > +done: > > > +   tcp_log_dev_queue_validate_lock(lockstate); > > > +   if (lockstate == QUEUE_LOCKED) > > > +           TCP_LOG_DEV_QUEUE_UNLOCK(); > > > +   return (rv); > > > +} > > > + > > > +static int > > > +tcp_log_dev_ioctl(struct cdev *dev __unused, u_long cmd, caddr_t > data, > > > +    int fflag __unused, struct thread *td __unused) > > > +{ > > > +   struct tcp_log_dev_info *priv; > > > +   int rv; > > > + > > > +   /* Get our private info. */ > > > +   rv = devfs_get_cdevpriv((void **)&priv); > > > +   if (rv) > > > +           return (rv); > > > + > > > +   /* > > > +    * Set things. Here, we are most concerned about the > non-blocking I/O > > > +    * flag. > > > +    */ > > > +   rv = 0; > > > +   switch (cmd) { > > > +   case FIONBIO: > > > +           break; > > > +   case FIOASYNC: > > > +           if (*(int *)data != 0) > > > +                   rv = EINVAL; > > > +           break; > > > +   default: > > > +           rv = ENOIOCTL; > > > +   } > > > +   return (rv); > > > +} > > > + > > > +static int > > > +tcp_log_dev_poll(struct cdev *dev __unused, int events, struct > thread *td) > > > +{ > > > +   struct tcp_log_dev_info *priv; > > > +   int revents; > > > + > > > +   /* > > > +    * Get our private info. If this fails, claim that all events > are > > > +    * ready. That should prod the user to do something that will > > > +    * make the error evident to them. > > > +    */ > > > +   if (devfs_get_cdevpriv((void **)&priv)) > > > +           return (events); > > > + > > > +   revents = 0; > > > +   if (events & (POLLIN | POLLRDNORM)) { > > > +           /* > > > +            * We can (probably) read right now if we are > partway through > > > +            * a buffer or if we are just about to start a > buffer. > > > +            * Because we are going to read tldi_head, we > should acquire > > > +            * a read lock on the queue. > > > +            */ > > > +           TCP_LOG_DEV_QUEUE_LOCK(); > > > +           if ((priv->tldi_head != NULL && priv->tldi_cur == > NULL) || > > > +               (priv->tldi_cur != NULL && > > > +               priv->tldi_off < > priv->tldi_cur->tlch_length)) > > > +                   revents = events & (POLLIN | > POLLRDNORM); > > > +           else > > > +                   selrecord(td, &tcp_log_sel); > > > +           TCP_LOG_DEV_QUEUE_UNLOCK(); > > > +   } else { > > > +           /* > > > +            * It only makes sense to poll for reading. So, > again, prod the > > > +            * user to do something that will make the error > of their ways > > > +            * apparent. > > > +            */ > > > +           revents = events; > > > +   } > > > +   return (revents); > > > +} > > > + > > > +int > > > +tcp_log_dev_add_log(struct tcp_log_dev_queue *entry) > > > +{ > > > +   struct tcp_log_dev_info *priv; > > > +   int rv; > > > +   bool wakeup_needed; > > > + > > > +   KASSERT(entry->tldq_buf != NULL || entry->tldq_xform != NULL, > > > +       ("%s: Called with both tldq_buf and tldq_xform set to > NULL", > > > +       __func__)); > > > +   KASSERT(entry->tldq_dtor != NULL, > > > +       ("%s: Called with tldq_dtor set to NULL", __func__)); > > > + > > > +   /* Get a lock on the queue. */ > > > +   TCP_LOG_DEV_QUEUE_LOCK(); > > > + > > > +   /* If no one is listening, tell the caller to free the > resources. */ > > > +   if (tcp_log_dev_listeners == 0) { > > > +           rv = ENXIO; > > > +           goto done; > > > +   } > > > + > > > +   /* Add this to the end of the tailq. */ > > > +   STAILQ_INSERT_TAIL(&tcp_log_dev_queue_head, entry, > tldq_queue); > > > + > > > +   /* Add references for all current listeners. */ > > > +   refcount_init(&entry->tldq_refcnt, tcp_log_dev_listeners); > > > + > > > +   /* > > > +    * If any listener is currently stuck on NULL, that means they > are > > > +    * waiting. Point their head to this new entry. > > > +    */ > > > +   wakeup_needed = false; > > > +   STAILQ_FOREACH(priv, &tcp_log_dev_reader_head, tldi_list) > > > +           if (priv->tldi_head == NULL) { > > > +                   priv->tldi_head = entry; > > > +                   wakeup_needed = true; > > > +           } > > > + > > > +   if (wakeup_needed) { > > > +           selwakeup(&tcp_log_sel); > > > +           wakeup(&tcp_log_dev_listeners); > > > +   } > > > + > > > +   rv = 0; > > > + > > > +done: > > > +   TCP_LOG_DEV_QUEUE_LOCK_ASSERT(); > > > +   TCP_LOG_DEV_QUEUE_UNLOCK(); > > > +   return (rv); > > > +} > > > + > > > +static int > > > +tcp_log_dev_modevent(module_t mod __unused, int type, void *data > __unused) > > > +{ > > > + > > > +   /* TODO: Support intelligent unloading. */ > > > +   switch (type) { > > > +   case MOD_LOAD: > > > +           if (bootverbose) > > > +                   printf("tcp_log: tcp_log device\n"); > > > +           memset(&tcp_log_sel, 0, sizeof(tcp_log_sel)); > > > +           memset(&tcp_log_dev_queue_lock, 0, sizeof(struct > mtx)); > > > +           mtx_init(&tcp_log_dev_queue_lock, "tcp_log dev", > > > +                    "tcp_log device queues", MTX_DEF); > > > +           tcp_log_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, > > > +               &tcp_log_cdevsw, 0, NULL, UID_ROOT, > GID_WHEEL, 0400, > > > +               "tcp_log"); > > > +           break; > > > +   default: > > > +           return (EOPNOTSUPP); > > > +   } > > > + > > > +   return (0); > > > +} > > > + > > > +DEV_MODULE(tcp_log_dev, tcp_log_dev_modevent, NULL); > > > +MODULE_VERSION(tcp_log_dev, 1); > > > > > > Added: head/sys/dev/tcp_log/tcp_log_dev.h > > > > ============================================================================== > > > --- /dev/null       00:00:00 1970   (empty, because file is > newly added) > > > +++ head/sys/dev/tcp_log/tcp_log_dev.h      Thu Mar 22 09:40:08 > 2018        (r331347) > > > @@ -0,0 +1,88 @@ > > > +/*- > > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD > > > + * > > > + * Copyright (c) 2016 > > > + * Netflix Inc.  All rights reserved. > > > + * > > > + * Redistribution and use in source and binary forms, with or > without > > > + * modification, are permitted provided that the following > conditions > > > + * are met: > > > + * 1. Redistributions of source code must retain the above > copyright > > > + *    notice, this list of conditions and the following > disclaimer. > > > + * 2. Redistributions in binary form must reproduce the above > copyright > > > + *    notice, this list of conditions and the following > disclaimer in the > > > + *    documentation and/or other materials provided with the > distribution. > > > + * > > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS > IS'' AND > > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > TO, THE > > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > PARTICULAR PURPOSE > > > + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS > BE LIABLE > > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > SUBSTITUTE GOODS > > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > CONTRACT, STRICT > > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING > IN ANY WAY > > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > POSSIBILITY OF > > > + * SUCH DAMAGE. > > > + * > > > + * $FreeBSD$ > > > + */ > > > + > > > +#ifndef __tcp_log_dev_h__ > > > +#define    __tcp_log_dev_h__ > > > + > > > +/* > > > + * This is the common header for data streamed from the log device. > All > > > + * blocks of data need to start with this header. > > > + */ > > > +struct tcp_log_common_header { > > > +   uint32_t        tlch_version;   /* Version is specific > to type. */ > > > +   uint32_t        tlch_type;      /* Type of entry(ies) > that follow. */ > > > +   uint64_t        tlch_length;    /* Total length, > including header. */ > > > +} __packed; > > > + > > > +#define    TCP_LOG_DEV_TYPE_BBR    1       /* black box > recorder */ > > > + > > > +#ifdef _KERNEL > > > +/* > > > + * This is a queue entry. All queue entries need to start with this > structure > > > + * so the common code can cast them to this structure; however, > other modules > > > + * are free to include additional data after this structure. > > > + * > > > + * The elements are explained here: > > > + * tldq_queue: used by the common code to maintain this entry's > position in the > > > + *     queue. > > > + * tldq_buf: should be NULL, or a pointer to a chunk of data. The > data must be > > > + *     as long as the common header indicates. > > > + * tldq_xform: If tldq_buf is NULL, the code will call this to > create the > > > + *     the tldq_buf object. The function should *not* directly > modify tldq_buf, > > > + *     but should return the buffer (which must meet the > restrictions > > > + *     indicated for tldq_buf). > > > + * tldq_dtor: This function is called to free the queue entry. If > tldq_buf is > > > + *     not NULL, the dtor function must free that, too. > > > + * tldq_refcnt: used by the common code to indicate how many > readers still need > > > + *     this data. > > > + */ > > > +struct tcp_log_dev_queue { > > > +   STAILQ_ENTRY(tcp_log_dev_queue) tldq_queue; > > > +   struct tcp_log_common_header *tldq_buf; > > > +   struct tcp_log_common_header *(*tldq_xform)(struct > tcp_log_dev_queue *entry); > > > +   void    (*tldq_dtor)(struct tcp_log_dev_queue *entry); > > > +   volatile u_int tldq_refcnt; > > > +}; > > > + > > > +STAILQ_HEAD(log_queueh, tcp_log_dev_queue); > > > + > > > +struct tcp_log_dev_info { > > > +   STAILQ_ENTRY(tcp_log_dev_info) tldi_list; > > > +   struct tcp_log_dev_queue *tldi_head; > > > +   struct tcp_log_common_header *tldi_cur; > > > +   off_t                   tldi_off; > > > +}; > > > +STAILQ_HEAD(log_infoh, tcp_log_dev_info); > > > + > > > + > > > +MALLOC_DECLARE(M_TCPLOGDEV); > > > +int tcp_log_dev_add_log(struct tcp_log_dev_queue *entry); > > > +#endif /* _KERNEL */ > > > +#endif /* !__tcp_log_dev_h__ */ > > > > > > Modified: head/sys/kern/subr_witness.c > > > > ============================================================================== > > > --- head/sys/kern/subr_witness.c    Thu Mar 22 08:32:39 2018    >     (r331346) > > > +++ head/sys/kern/subr_witness.c    Thu Mar 22 09:40:08 2018    >     (r331347) > > > @@ -640,6 +640,14 @@ static struct witness_order_list_entry > order_lists[] = > > >     { "db->db_mtx", &lock_class_sx }, > > >     { NULL, NULL }, > > >     /* > > > +    * TCP log locks > > > +    */ > > > +   { "TCP ID tree", &lock_class_rw }, > > > +   { "tcp log id bucket", &lock_class_mtx_sleep }, > > > +   { "tcpinp", &lock_class_rw }, > > > +   { "TCP log expireq", &lock_class_mtx_sleep }, > > > +   { NULL, NULL }, > > > +   /* > > >      * spin locks > > >      */ > > >  #ifdef SMP > > > > > > Modified: head/sys/netinet/tcp.h > > > > ============================================================================== > > > --- head/sys/netinet/tcp.h  Thu Mar 22 08:32:39 2018        > (r331346) > > > +++ head/sys/netinet/tcp.h  Thu Mar 22 09:40:08 2018        > (r331347) > > > @@ -168,6 +168,12 @@ struct tcphdr { > > >  #define TCP_NOOPT  8       /* don't use TCP options */ > > >  #define TCP_MD5SIG 16      /* use MD5 digests (RFC2385) */ > > >  #define    TCP_INFO        32      /* retrieve tcp_info > structure */ > > > +#define    TCP_LOG         34      /* configure event > logging for connection */ > > > +#define    TCP_LOGBUF      35      /* retrieve event log > for connection */ > > > +#define    TCP_LOGID       36      /* configure log ID to > correlate connections */ > > > +#define    TCP_LOGDUMP     37      /* dump connection log > events to device */ > > > +#define    TCP_LOGDUMPID   38      /* dump events from > connections with same ID to > > > +                              device */ > > >  #define    TCP_CONGESTION  64      /* get/set congestion > control algorithm */ > > >  #define    TCP_CCALGOOPT   65      /* get/set cc algorithm > specific options */ > > >  #define    TCP_KEEPINIT    128     /* N, time to establish > connection */ > > > @@ -188,6 +194,9 @@ struct tcphdr { > > >  #define    TCPI_OPT_WSCALE         0x04 > > >  #define    TCPI_OPT_ECN            0x08 > > >  #define    TCPI_OPT_TOE            0x10 > > > + > > > +/* Maximum length of log ID. */ > > > +#define TCP_LOG_ID_LEN     64 > > > > > >  /* > > >   * The TCP_INFO socket option comes from the Linux 2.6 TCP API, > and permits > > > > > > Modified: head/sys/netinet/tcp_input.c > > > > ============================================================================== > > > --- head/sys/netinet/tcp_input.c    Thu Mar 22 08:32:39 2018    >     (r331346) > > > +++ head/sys/netinet/tcp_input.c    Thu Mar 22 09:40:08 2018    >     (r331347) > > > @@ -102,6 +102,7 @@ __FBSDID("$FreeBSD$"); > > >  #include > > >  #include > > >  #include > > > +#include > > >  #include > > >  #include > > >  #include > > > @@ -1592,6 +1593,8 @@ tcp_do_segment(struct mbuf *m, struct tcphdr > *th, stru > > >     /* Save segment, if requested. */ > > >     tcp_pcap_add(th, m, &(tp->t_inpkts)); > > >  #endif > > > +   TCP_LOG_EVENT(tp, th, &so->so_rcv, &so->so_snd, TCP_LOG_IN, 0, > > > +       tlen, NULL, true); > > > > > >     if ((thflags & TH_SYN) && (thflags & TH_FIN) && > V_drop_synfin) { > > >             if ((s = tcp_log_addrs(inc, th, NULL, NULL))) { > > > > > > Added: head/sys/netinet/tcp_log_buf.c > > > > ============================================================================== > > > --- /dev/null       00:00:00 1970   (empty, because file is > newly added) > > > +++ head/sys/netinet/tcp_log_buf.c  Thu Mar 22 09:40:08 2018    >     (r331347) > > > @@ -0,0 +1,2480 @@ > > > +/*- > > > + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD > > > + * > > > + * Copyright (c) 2016-2018 > > > + * Netflix Inc.  All rights reserved. > > > + * > > > + * Redistribution and use in source and binary forms, with or > without > > > + * modification, are permitted provided that the following > conditions > > > + * are met: > > > + * 1. Redistributions of source code must retain the above > copyright > > > + *    notice, this list of conditions and the following > disclaimer. > > > + * 2. Redistributions in binary form must reproduce the above > copyright > > > + *    notice, this list of conditions and the following > disclaimer in the > > > + *    documentation and/or other materials provided with the > distribution. > > > + * > > > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS > IS'' AND > > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED > TO, THE > > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > PARTICULAR PURPOSE > > > + * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS > BE LIABLE > > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > SUBSTITUTE GOODS > > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > INTERRUPTION) > > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > CONTRACT, STRICT > > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING > IN ANY WAY > > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > POSSIBILITY OF > > > + * SUCH DAMAGE. > > > + * > > > + */ > > > + > > > +#include > > > +__FBSDID("$FreeBSD$"); > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +#include > > > + > > > +#include > > > +#include > > > +#include > > > + > > > +#include > > > +#include > > > +#include > > > +#include > > > +#include > > > + > > > +/* Default expiry time */ > > > +#define    TCP_LOG_EXPIRE_TIME     ((sbintime_t)60 * SBT_1S) > > > + > > > +/* Max interval at which to run the expiry timer */ > > > +#define    TCP_LOG_EXPIRE_INTVL    ((sbintime_t)5 * SBT_1S) > > > + > > > +bool       tcp_log_verbose; > > > +static uma_zone_t tcp_log_bucket_zone, tcp_log_node_zone, > tcp_log_zone; > > > +static int tcp_log_session_limit = > TCP_LOG_BUF_DEFAULT_SESSION_LIMIT; > > > +static uint32_t    tcp_log_version = TCP_LOG_BUF_VER; > > > +RB_HEAD(tcp_log_id_tree, tcp_log_id_bucket); > > > +static struct tcp_log_id_tree tcp_log_id_head; > > > +static STAILQ_HEAD(, tcp_log_id_node) tcp_log_expireq_head = > > > +    STAILQ_HEAD_INITIALIZER(tcp_log_expireq_head); > > > +static struct mtx tcp_log_expireq_mtx; > > > +static struct callout tcp_log_expireq_callout; > > > +static uint64_t tcp_log_auto_ratio = 0; > > > +static uint64_t tcp_log_auto_ratio_cur = 0; > > > +static uint32_t tcp_log_auto_mode = TCP_LOG_STATE_TAIL; > > > +static bool tcp_log_auto_all = false; > > > + > > > +RB_PROTOTYPE_STATIC(tcp_log_id_tree, tcp_log_id_bucket, tlb_rb, > tcp_log_id_cmp) > > > + > > > +SYSCTL_NODE(_net_inet_tcp, OID_AUTO, bb, CTLFLAG_RW, 0, "TCP Black > Box controls"); > > > + > > > +SYSCTL_BOOL(_net_inet_tcp_bb, OID_AUTO, log_verbose, CTLFLAG_RW, > &tcp_log_verbose, > > > +    0, "Force verbose logging for TCP traces"); > > > + > > > +SYSCTL_INT(_net_inet_tcp_bb, OID_AUTO, log_session_limit, > > > +    CTLFLAG_RW, &tcp_log_session_limit, 0, > > > +    "Maximum number of events maintained for each TCP session"); > > > + > > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_global_limit, > CTLFLAG_RW, > > > +    &tcp_log_zone, "Maximum number of events maintained for all > TCP sessions"); > > > + > > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_global_entries, > CTLFLAG_RD, > > > +    &tcp_log_zone, "Current number of events maintained for all > TCP sessions"); > > > + > > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_id_limit, > CTLFLAG_RW, > > > +    &tcp_log_bucket_zone, "Maximum number of log IDs"); > > > + > > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_id_entries, > CTLFLAG_RD, > > > +    &tcp_log_bucket_zone, "Current number of log IDs"); > > > + > > > +SYSCTL_UMA_MAX(_net_inet_tcp_bb, OID_AUTO, log_id_tcpcb_limit, > CTLFLAG_RW, > > > +    &tcp_log_node_zone, "Maximum number of tcpcbs with log IDs"); > > > + > > > +SYSCTL_UMA_CUR(_net_inet_tcp_bb, OID_AUTO, log_id_tcpcb_entries, > CTLFLAG_RD, > > > +    &tcp_log_node_zone, "Current number of tcpcbs with log IDs"); > > > + > > > +SYSCTL_U32(_net_inet_tcp_bb, OID_AUTO, log_version, CTLFLAG_RD, > &tcp_log_version, > > > +    0, "Version of log formats exported"); > > > + > > > +SYSCTL_U64(_net_inet_tcp_bb, OID_AUTO, log_auto_ratio, CTLFLAG_RW, > > > +    &tcp_log_auto_ratio, 0, "Do auto capturing for 1 out of N > sessions"); > > > + > > > +SYSCTL_U32(_net_inet_tcp_bb, OID_AUTO, log_auto_mode, CTLFLAG_RW, > > > +    &tcp_log_auto_mode, TCP_LOG_STATE_HEAD_AUTO, > > > +    "Logging mode for auto-selected sessions (default is > TCP_LOG_STATE_HEAD_AUTO)"); > > > + > > > +SYSCTL_BOOL(_net_inet_tcp_bb, OID_AUTO, log_auto_all, CTLFLAG_RW, > > > +    &tcp_log_auto_all, false, > > > +    "Auto-select from all sessions (rather than just those with > IDs)"); > > > + > > > +#ifdef TCPLOG_DEBUG_COUNTERS > > > +counter_u64_t tcp_log_queued; > > > +counter_u64_t tcp_log_que_fail1; > > > +counter_u64_t tcp_log_que_fail2; > > > +counter_u64_t tcp_log_que_fail3; > > > +counter_u64_t tcp_log_que_fail4; > > > +counter_u64_t tcp_log_que_fail5; > > > +counter_u64_t tcp_log_que_copyout; > > > +counter_u64_t tcp_log_que_read; > > > +counter_u64_t tcp_log_que_freed; > > > + > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, queued, CTLFLAG_RD, > > > +    &tcp_log_queued, "Number of entries queued"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail1, CTLFLAG_RD, > > > +    &tcp_log_que_fail1, "Number of entries queued but fail 1"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail2, CTLFLAG_RD, > > > +    &tcp_log_que_fail2, "Number of entries queued but fail 2"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail3, CTLFLAG_RD, > > > +    &tcp_log_que_fail3, "Number of entries queued but fail 3"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail4, CTLFLAG_RD, > > > +    &tcp_log_que_fail4, "Number of entries queued but fail 4"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, fail5, CTLFLAG_RD, > > > +    &tcp_log_que_fail5, "Number of entries queued but fail 4"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, copyout, CTLFLAG_RD, > > > +    &tcp_log_que_copyout, "Number of entries copied out"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, read, CTLFLAG_RD, > > > +    &tcp_log_que_read, "Number of entries read from the queue"); > > > +SYSCTL_COUNTER_U64(_net_inet_tcp_bb, OID_AUTO, freed, CTLFLAG_RD, > > > +    &tcp_log_que_freed, "Number of entries freed after reading"); > > > +#endif > > > + > > > +#ifdef INVARIANTS > > > +#define    TCPLOG_DEBUG_RINGBUF > > > +#endif > > > + > > > +struct tcp_log_mem > > > +{ > > > +   STAILQ_ENTRY(tcp_log_mem) tlm_queue; > > > +   struct tcp_log_buffer   tlm_buf; > > > +   struct tcp_log_verbose  tlm_v; > > > +#ifdef TCPLOG_DEBUG_RINGBUF > > > +   volatile int            tlm_refcnt; > > > +#endif > > > +}; > > > + > > > +/* 60 bytes for the header, + 16 bytes for padding */ > > > +static uint8_t     zerobuf[76]; > > > + > > > +/* > > > + * Lock order: > > > + * 1. TCPID_TREE > > > + * 2. TCPID_BUCKET > > > + * 3. INP > > > + * > > > + * Rules: > > > + * A. You need a lock on the Tree to add/remove buckets. > > > + * B. You need a lock on the bucket to add/remove nodes from the > bucket. > > > + * C. To change information in a node, you need the INP lock if the > tln_closed > > > + *    field is false. Otherwise, you need the bucket lock. (Note > that the > > > + *    tln_closed field can change at any point, so you need to > recheck the > > > + *    entry after acquiring the INP lock.) > > > + * D. To remove a node from the bucket, you must have that entry > locked, > > > + *    according to the criteria of Rule C. Also, the node must > not be on > > > + *    the expiry queue. > > > + * E. The exception to C is the expiry queue fields, which are > locked by > > > + *    the TCPLOG_EXPIREQ lock. > > > + * > > > + * Buckets have a reference count. Each node is a reference. > Further, > > > + * other callers may add reference counts to keep a bucket from > disappearing. > > > + * You can add a reference as long as you own a lock sufficient to > keep the > > > + * bucket from disappearing. For example, a common use is: > > > + *   a. Have a locked INP, but need to lock the TCPID_BUCKET. > > > + *   b. Add a refcount on the bucket. (Safe because the INP lock > prevents > > > + *      the TCPID_BUCKET from going away.) > > > + *   c. Drop the INP lock. > > > + *   d. Acquire a lock on the TCPID_BUCKET. > > > + *   e. Acquire a lock on the INP. > > > + *   f. Drop the refcount on the bucket. > > > + *      (At this point, the bucket may disappear.) > > > + * > > > + * Expire queue lock: > > > + * You can acquire this with either the bucket or INP lock. Don't > reverse it. > > > + * When the expire code has committed to freeing a node, it resets > the expiry > > > + * time to SBT_MAX. That is the signal to everyone else that they > should > > > + * leave that node alone. > > > + */ > > > +static struct rwlock tcp_id_tree_lock; > > > +#define    TCPID_TREE_WLOCK()              > rw_wlock(&tcp_id_tree_lock) > > > +#define    TCPID_TREE_RLOCK()              > rw_rlock(&tcp_id_tree_lock) > > > +#define    TCPID_TREE_UPGRADE()            > rw_try_upgrade(&tcp_id_tree_lock) > > > +#define    TCPID_TREE_WUNLOCK()            > rw_wunlock(&tcp_id_tree_lock) > > > +#define    TCPID_TREE_RUNLOCK()            > rw_runlock(&tcp_id_tree_lock) > > > +#define    TCPID_TREE_WLOCK_ASSERT()      >  rw_assert(&tcp_id_tree_lock, RA_WLOCKED) > > > +#define    TCPID_TREE_RLOCK_ASSERT()      >  rw_assert(&tcp_id_tree_lock, RA_RLOCKED) > > > +#define    TCPID_TREE_UNLOCK_ASSERT()      > rw_assert(&tcp_id_tree_lock, RA_UNLOCKED) > > > + > > > +#define    TCPID_BUCKET_LOCK_INIT(tlb)    >  mtx_init(&((tlb)->tlb_mtx), "tcp log id bucket", NULL, MTX_DEF) > > > +#define    TCPID_BUCKET_LOCK_DESTROY(tlb)  > mtx_destroy(&((tlb)->tlb_mtx)) > > > +#define    TCPID_BUCKET_LOCK(tlb)          > mtx_lock(&((tlb)->tlb_mtx)) > > > +#define    TCPID_BUCKET_UNLOCK(tlb)        > mtx_unlock(&((tlb)->tlb_mtx)) > > > +#define    TCPID_BUCKET_LOCK_ASSERT(tlb)  >  mtx_assert(&((tlb)->tlb_mtx), MA_OWNED) > > > +#define    TCPID_BUCKET_UNLOCK_ASSERT(tlb) > mtx_assert(&((tlb)->tlb_mtx), MA_NOTOWNED) > > > + > > > +#define    TCPID_BUCKET_REF(tlb)          >  refcount_acquire(&((tlb)->tlb_refcnt)) > > > +#define    TCPID_BUCKET_UNREF(tlb)        >  refcount_release(&((tlb)->tlb_refcnt)) > > > + > > > +#define    TCPLOG_EXPIREQ_LOCK()          >  mtx_lock(&tcp_log_expireq_mtx) > > > +#define    TCPLOG_EXPIREQ_UNLOCK()        >  mtx_unlock(&tcp_log_expireq_mtx) > > > + > > > +SLIST_HEAD(tcp_log_id_head, tcp_log_id_node); > > > + > > > +struct tcp_log_id_bucket > > > +{ > > > +   /* > > > +    * tlb_id must be first. This lets us use strcmp on > > > +    * (struct tcp_log_id_bucket *) and (char *) interchangeably. > > > +    */ > > > +   char                            > tlb_id[TCP_LOG_ID_LEN]; > > > +   RB_ENTRY(tcp_log_id_bucket)     tlb_rb; > > > +   struct tcp_log_id_head          tlb_head; > > > +   struct mtx                      tlb_mtx; > > > +   volatile u_int                  tlb_refcnt; > > > +}; > > > + > > > +struct tcp_log_id_node > > > +{ > > > +   SLIST_ENTRY(tcp_log_id_node) tln_list; > > > +   STAILQ_ENTRY(tcp_log_id_node) tln_expireq; /* Locked by the > expireq lock */ > > > +   sbintime_t              tln_expiretime; /* Locked by > the expireq lock */ > > > + > > > +   /* > > > +    * If INP is NULL, that means the connection has closed. We've > > > +    * saved the connection endpoint information and the log > entries > > > +    * in the tln_ie and tln_entries members. We've also saved a > pointer > > > +    * to the enclosing bucket here. If INP is not NULL, the > information is > > > +    * in the PCB and not here. > > > +    */ > > > +   struct inpcb            *tln_inp; > > > +   struct tcpcb            *tln_tp; > > > +   struct tcp_log_id_bucket *tln_bucket; > > > +   struct in_endpoints     tln_ie; > > > +   struct tcp_log_stailq   tln_entries; > > > +   int                     tln_count; > > > +   volatile int            tln_closed; > > > +   uint8_t                 tln_af; > > > +}; > > > + > > > +enum tree_lock_state { > > > +   TREE_UNLOCKED = 0, > > > +   TREE_RLOCKED, > > > +   TREE_WLOCKED, > > > +}; > > > + > > > +/* Do we want to select this session for auto-logging? */ > > > +static __inline bool > > > +tcp_log_selectauto(void) > > > +{ > > > + > > > +   /* > > > > > > *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** > > > > > > > References > > Visible links > 1. mailto:ruslan.bukin@cl.cam.ac.uk > 2. https://svnweb.freebsd.org/changeset/base/331347 > 3. https://reviews.freebsd.org/D11085