From owner-freebsd-arch@freebsd.org  Sun Aug 30 05:25:06 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id F36729C4C85
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Sun, 30 Aug 2015 05:25:05 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id A3756286
 for <freebsd-arch@freebsd.org>; Sun, 30 Aug 2015 05:25:05 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au
 (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id C9B21D66097;
 Sun, 30 Aug 2015 15:02:11 +1000 (AEST)
Date: Sun, 30 Aug 2015 15:02:10 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Justin Hibbits <jrh29@alumni.cwru.edu>
cc: Adrian Chadd <adrian.chadd@gmail.com>, 
 Marcel Moolenaar <marcel@xcllnt.net>, 
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: Devices with 36-bit paddr on 32-bit system
In-Reply-To: <CAHSQbTAGD=4A20XZL09YXbEm9hdf5K2_QCRPFOAjrXHF4eg9sQ@mail.gmail.com>
Message-ID: <20150830130612.L890@besplex.bde.org>
References: <CAHSQbTDsvB32+LyzHJO78VwUwAfUTMOUQp13BMCUpapSMT0fbg@mail.gmail.com>
 <ED4B5B25-D7A7-440C-9452-4C79B0800D2E@xcllnt.net>
 <1568331.OrSoeYfXsf@ralph.baldwin.cx>
 <CAJ-VmomduZBYT6=e7HUm2V1m0taM0MAMXxMojYV8wvgEKyUEyA@mail.gmail.com>
 <CAHSQbTAGD=4A20XZL09YXbEm9hdf5K2_QCRPFOAjrXHF4eg9sQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=L4TgOLn8 c=1 sm=1 tr=0
 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8
 a=kj9zAlcOel0A:10 a=JSiOX3MnzAab593RgikA:9 a=RBkRyt4X6hdqDwO5:21
 a=_7EL_taM2OgDYD4P:21 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Aug 2015 05:25:06 -0000

On Sat, 29 Aug 2015, Justin Hibbits wrote:

> Hey, I already had that thought, too, you just encouraged me to take
> it :)  Anyway, here's an initial patch.  I've *only* tested it on one
> PowerPC board (the one needing this change), none of my other devices,
> and no other architectures.  Thoughts?

It has lots of style bugs and type errors.

% Index: sys/dev/ata/ata-pci.c
% ===================================================================
% --- sys/dev/ata/ata-pci.c	(revision 287189)
% +++ sys/dev/ata/ata-pci.c	(working copy)
% @@ -217,7 +217,7 @@
% 
%  struct resource *
%  ata_pci_alloc_resource(device_t dev, device_t child, int type, int *rid,
% -		       u_long start, u_long end, u_long count, u_int flags)
% +		       bus_addr_t start, bus_addr_t end, u_long count, u_int flags)

The first style bug is in the first changed line (line made longer than 80
columns by blind substitution).

% Index: sys/dev/ata/ata-pci.h
% ===================================================================
% --- sys/dev/ata/ata-pci.h	(revision 287189)
% +++ sys/dev/ata/ata-pci.h	(working copy)
% @@ -535,7 +535,7 @@
%  int ata_pci_print_child(device_t dev, device_t child);
%  int ata_pci_child_location_str(device_t dev, device_t child, char *buf,
%      size_t buflen);
% -struct resource * ata_pci_alloc_resource(device_t dev, device_t child, int type, int *rid, u_long start, u_long end, u_long count, u_int flags);
% +struct resource * ata_pci_alloc_resource(device_t dev, device_t child, int type, int *rid, bus_addr_t start, bus_addr_t end, u_long count, u_int flags);
%  int ata_pci_release_resource(device_t dev, device_t child, int type, int rid, struct resource *r);
%  int ata_pci_setup_intr(device_t dev, device_t child, struct resource *irq, int flags, driver_filter_t *filter, driver_intr_t *function, void *argument, void **cookiep);
%   int ata_pci_teardown_intr(device_t dev, device_t child, struct resource *irq, void *cookie);

These were already misformatted.

No comment on further style bugs from blind substitution.

% Index: sys/dev/ata/chipsets/ata-promise.c
% ===================================================================
% --- sys/dev/ata/chipsets/ata-promise.c	(revision 287189)
% +++ sys/dev/ata/chipsets/ata-promise.c	(working copy)
% @@ -191,18 +191,19 @@
%  	!BUS_READ_IVAR(device_get_parent(GRANDPARENT(dev)),
%  		       GRANDPARENT(dev), PCI_IVAR_DEVID, &devid) &&
%  	((devid == ATA_DEC_21150) || (devid == ATA_DEC_21150_1))) {
% -	static long start = 0, end = 0;
% +	static bus_addr_t start = 0;
% +	static u_long count = 0;

Renaming the variable is an unrelated fix.  The closure of this fix is
quite large:
- bus_get_resource() is undocumented except for a bogus reference to
   bus_get_resource(9) in bus_set_resource(9) , so the meaning of its
   last arg takes more work than necessary to check.  The source file
   bus_get_resource.9 doesn't exist, and the installed file
   bus_get_resource.9.gz cannot be a link to bus_set_resource.9.gz
   since the top-level bus man pages are partially correctly organized
   with only 1 function per file.
- the meaning of bus_get_resource()'s args are hard to see from its
   prototype since its prototype has no paramater names.  <sys/bus.h> uses
   a nonense mixture of no parameter names, parameter names without
   underscores, and parameter names with underscores.

% 
%  	if (pci_get_slot(dev) == 1) {
% -	    bus_get_resource(dev, SYS_RES_IRQ, 0, &start, &end);
% +	    bus_get_resource(dev, SYS_RES_IRQ, 0, &start, &count);
%  	    strcat(buffer, " (channel 0+1)");
%  	}
% -	else if (pci_get_slot(dev) == 2 && start && end) {
% -	    bus_set_resource(dev, SYS_RES_IRQ, 0, start, end);
% +	else if (pci_get_slot(dev) == 2 && start && count) {
% +	    bus_set_resource(dev, SYS_RES_IRQ, 0, start, count);
%  	    strcat(buffer, " (channel 2+3)");
%  	}
%  	else {
% -	    start = end = 0;
% +	    start = count = 0;

This assumes that the types are similar enough for the conversion to
work and not give a warning.  If the types are integral, then the
conversion always works and compilers shouldn't warn if it is a
downcast (since 0 is representable in both types).  But it isn't
clear that the types are integral.  pc98 uses handles, but bus_addr_t
is still an integer and the handles are used to map it to the actual
address.  Perhaps large addresses should be handled similarly.

%  	}
%      }
%      sprintf(buffer, "%s %s controller", buffer, ata_mode2str(idx->max_dma));
% Index: sys/dev/fdt/simplebus.c
% ===================================================================
% --- sys/dev/fdt/simplebus.c	(revision 287189)
% +++ sys/dev/fdt/simplebus.c	(working copy)
% ...
% @@ -365,7 +365,8 @@
%  		if (j == sc->nranges && sc->nranges != 0) {
%  			if (bootverbose)
%  				device_printf(bus, "Could not map resource "
% -				    "%#lx-%#lx\n", start, end);
% +				    "%#llx-%#llx\n", (uint64_t)start,
% +				    (uint64_t)end);
% 
%  			return (NULL);
%  		}

This uses the long long abomination, and isn't even correct since type
type isn't even (unsigned) long long.  The type starts as u_long, which
was not unsigned long long on any arch.  It is cast to uint64_t.  This
breaks it if it is larger than uint64_t.  This makes it compatible with
unsigned long long on 32-bit arches, but has no effect on 32-bit arches
-- there the type remains as u_long and the above fails to compile.

No comment on further uses of the abomination.

% Index: sys/dev/gpio/gpiobus.c
% ===================================================================
% --- sys/dev/gpio/gpiobus.c	(revision 287189)
% +++ sys/dev/gpio/gpiobus.c	(working copy)
% @@ -178,7 +178,7 @@
%  	sc->sc_intr_rman.rm_type = RMAN_ARRAY;
%  	sc->sc_intr_rman.rm_descr = "GPIO Interrupts";
%  	if (rman_init(&sc->sc_intr_rman) != 0 ||
% -	    rman_manage_region(&sc->sc_intr_rman, 0, ~0) != 0)
% +	    rman_manage_region(&sc->sc_intr_rman, 0, ~(bus_addr_t)0) != 0)

I think ~0 gave the correct value except in the 1's complement case.
Stranglely, ~0ul is more fragile than ~0.

Elsewhere you spell this better as RM_MAX_END.

%  		panic("%s: failed to set up rman.", __func__);
% 
%  	if (GPIO_PIN_MAX(sc->sc_dev, &sc->sc_npins) != 0)
% ...
% @@ -516,7 +516,7 @@
% 
%  	if (type != SYS_RES_IRQ)
%  		return (NULL);
% -	isdefault = (start == 0UL && end == ~0UL && count == 1);
% +	isdefault = (start == 0UL && end == ~(bus_addr_t)0UL && count == 1);

Not using RM_MIN_START and RM_MAX_END gives even worse spelling than above.
Both of the UL's are unnecessary, and the second one is bogus since the
cast gives the correct type and it is not always UL.

%  	rle = NULL;
%  	if (isdefault) {
%  		rl = BUS_GET_RESOURCE_LIST(bus, child);
% Index: sys/dev/ofw/ofwbus.c
% ===================================================================
% --- sys/dev/ofw/ofwbus.c	(revision 287189)
% +++ sys/dev/ofw/ofwbus.c	(working copy)
% @@ -156,7 +156,7 @@
%  	sc->sc_mem_rman.rm_descr = "Device Memory";
%  	if (rman_init(&sc->sc_intr_rman) != 0 ||
%  	    rman_init(&sc->sc_mem_rman) != 0 ||
% -	    rman_manage_region(&sc->sc_intr_rman, 0, ~0) != 0 ||
% +	    rman_manage_region(&sc->sc_intr_rman, 0, BUS_SPACE_MAXADDR) != 0 ||

Looks like a logic error.  This is an rman function, so it should use an
rman limit (now spelled RMAN_MAX_END), not a bus space limit.

%  	    rman_manage_region(&sc->sc_mem_rman, 0, BUS_SPACE_MAXADDR) != 0)

The old code had a mixture of ~0 and BUS_SPACE_MAXADDR.  This almost made
sense.  The first thing being managed is an interrupt id.  Interrupts ids
are normally small nonnegative integers.  But they are still represented
as u_longs or bus addresses.  ~0 works for signed integers of any size
and also for unsigned integers of any size except in the 1's complement
case, and seems to have been used to reflect the smaller range required.

%  		panic("%s: failed to set up rmans.", __func__);
% 
% ...
% @@ -201,7 +201,7 @@
%  		}
%  		start = rle->start;
%  		count = ulmax(count, rle->count);
% -		end = ulmax(rle->end, start + count - 1);
% +		end = qmax(rle->end, start + count - 1);

This doesn't use the long long abomination, but assumes that quads are
large enough.  This is very fragile.  No one knows what quads are, but
they are currently int64_t.  uquads are uint64_t, but uqmax() doesn't
exist.  So the above only works for 63-bit addresses.  It seems to be
very broken already for 64-bit bus_addr_t, since that gives the 64-bit
address RMAN_MAX_END.

No comment on further uses of quads.

%  	}
% 
%  	switch (type) {
% ...
% Index: sys/dev/pci/pci_pci.c
% ===================================================================
% --- sys/dev/pci/pci_pci.c	(revision 287189)
% +++ sys/dev/pci/pci_pci.c	(working copy)
% @@ -387,7 +388,7 @@
%  	char buf[64];
%  	int error, rid;
% 
% -	if (max_address != (u_long)max_address)
% +	if (max_address != (bus_addr_t)max_address)
%  		max_address = ~0ul;

pci uses pci_addr_t for addresses including max_address.  pci_addr_t
happens to be uint64_t, so it works better with a larger rman type/

Assume that the rman type is not larger.  Then the above checks that
the rman type is large enough to hold the current max_address, and if
not it reduces to the _old_ rman limit.  It should reduce to the current
rman limit (RMAN_END_MAX).

If the rman type is larger, then this code has no effect.  Assigning
RMAN_END_MAX to max_address would overflow, but is not reached since
the rman type is large enough to hold any max_address.

%  	w->rman.rm_start = 0;
%  	w->rman.rm_end = max_address;
% ...
% Index: sys/kern/subr_rman.c
% ===================================================================
% --- sys/kern/subr_rman.c	(revision 287189)
% +++ sys/kern/subr_rman.c	(working copy)
% @@ -90,8 +90,8 @@
%  	TAILQ_ENTRY(resource_i)	r_link;
%  	LIST_ENTRY(resource_i)	r_sharelink;
%  	LIST_HEAD(, resource_i)	*r_sharehead;
% -	u_long	r_start;	/* index of the first entry in this resource */
% -	u_long	r_end;		/* index of the last entry (inclusive) */
% +	bus_addr_t	r_start;	/* index of the first entry in this resource */
% +	bus_addr_t	r_end;		/* index of the last entry (inclusive) */

Blind substitution also breaks indentation of comments.

%  	u_int	r_flags;
%  	void	*r_virtual;	/* virtual address of this resource */
%  	struct device *r_dev;	/* device which has allocated this resource */
% @@ -135,7 +135,7 @@
%  	}
% 
%  	if (rm->rm_start == 0 && rm->rm_end == 0)

RMAN_MIN_START is still spelled as 0 in many places.  Since it is not very
magic, this spelling is better.  Let prototypes comvert plain 0 to the
actual type.

% ...
% @@ -174,7 +174,7 @@
% 
%  	/* Skip entries before us. */
%  	TAILQ_FOREACH(s, &rm->rm_list, r_link) {
% -		if (s->r_end == ULONG_MAX)
% +		if (s->r_end == BUS_SPACE_MAXADDR)

Why not RMAN_MAX_END?

I think some cases of RMAN_MAX_END are actually magic meaning "not really
the end, but a magic (default?) value".  A different RMAN_FOO could be
used for this.

% Index: sys/powerpc/powerpc/nexus.c
% ===================================================================
% --- sys/powerpc/powerpc/nexus.c	(revision 287189)
% +++ sys/powerpc/powerpc/nexus.c	(working copy)
% @@ -189,13 +189,13 @@
%  {
% 
%  	if (type == SYS_RES_MEMORY) {
% -		vm_offset_t start;
% +		vm_paddr_t start;

I think more places uses this instead of bus_addr_t or uint64_t.

%  		void *p;
% 
% -		start = (vm_offset_t) rman_get_start(r);
% +		start = rman_get_start(r);

The conversion is still logically non-null.

% ...
% Index: sys/sys/bus.h
% ===================================================================
% --- sys/sys/bus.h	(revision 287189)
% +++ sys/sys/bus.h	(working copy)
% @@ -29,6 +29,7 @@
%  #ifndef _SYS_BUS_H_
%  #define _SYS_BUS_H_
% 
% +#include <machine/_bus.h>
%  #include <machine/_limits.h>
%  #include <sys/_bus_dma.h>
%  #include <sys/ioccom.h>
% @@ -292,8 +293,8 @@
%  	int	rid;			/**< @brief resource identifier */
%  	int	flags;			/**< @brief resource flags */
%  	struct	resource *res;		/**< @brief the real resource when allocated */
% -	u_long	start;			/**< @brief start of resource range */
% -	u_long	end;			/**< @brief end of resource range */
% +	bus_addr_t	start;		/**< @brief start of resource range */
% +	bus_addr_t	end;		/**< @brief end of resource range */

I think rman functions should use an rman type and not hard-code bus_addr_t.
Related bus functions should then use this type.  Style bugs from blind
substitution can be reduced by using a less verbose name.

%  	u_long	count;			/**< @brief count within range */
%  };
%  STAILQ_HEAD(resource_list, resource_list_entry);
% ...
% @@ -474,7 +475,8 @@
%  static __inline struct resource *
%  bus_alloc_resource_any(device_t dev, int type, int *rid, u_int flags)
%  {
% -	return (bus_alloc_resource(dev, type, rid, 0ul, ~0ul, 1, flags));
% +	return (bus_alloc_resource(dev, type, rid, (bus_addr_t)0ul,
% +	    ~(bus_addr_t)0ul, 1, flags));

"ul" is now bogus, as above for UL except with a worse spelling.
(Apparently, RLIM_FOO is not available here, so you have to expand it
inline.)

The first cast to bus_addr_t also has no affect.

%  }
% 
%  /*
% Index: sys/sys/rman.h
% ===================================================================
% --- sys/sys/rman.h	(revision 287189)
% +++ sys/sys/rman.h	(working copy)
% @@ -54,6 +54,9 @@
%  #define	RF_ALIGNMENT_LOG2(x)	((x) << RF_ALIGNMENT_SHIFT)
%  #define	RF_ALIGNMENT(x)		(((x) & RF_ALIGNMENT_MASK) >> RF_ALIGNMENT_SHIFT)

Example of normal style (except for the long line).

% 
% +#define RM_MIN_START	((bus_addr_t)0)
% +#define RM_MAX_END	(~(bus_addr_t)0)

Style bugs (space instead of table after define).

There are no bogus ULs here.  The first cast to bus_addr_t is probably
unecessary here too.

% @@ -108,8 +111,8 @@
%  	struct	resource_head 	rm_list;
%  	struct	mtx *rm_mtx;	/* mutex used to protect rm_list */
%  	TAILQ_ENTRY(rman)	rm_link; /* link in list of all rmans */
% -	u_long	rm_start;	/* index of globally first entry */
% -	u_long	rm_end;		/* index of globally last entry */
% +	bus_addr_t	rm_start;	/* index of globally first entry */
% +	bus_addr_t	rm_end;	/* index of globally last entry */
%  	enum	rman_type rm_type; /* what type of resource this is */
%  	const	char *rm_descr;	/* text descripion of this resource */
%  };

This has mounds of new and old style bugs:

new:
unformatting by blind substitution

old:
- still uses old style of a tab instead of a space after struct, enum,
   and even in the middle of "const char"
- avoids misindented comment for rm_list by not having one
- has minimally misindented comment for rm_link after first using
   excessive indentation for the member name (then a space instead of
   a tab before the comment)
- has minimally misindented comment for rm_type (after first using
   excessive indentation after enum, use only a space for the member
   name and the comment).

Bruce

From owner-freebsd-arch@freebsd.org  Sun Aug 30 05:49:34 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2FD7F9C5970
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Sun, 30 Aug 2015 05:49:34 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au
 [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id E6AF0C04
 for <freebsd-arch@freebsd.org>; Sun, 30 Aug 2015 05:49:33 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au
 (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197])
 by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id C0DA1D6551D;
 Sun, 30 Aug 2015 15:49:30 +1000 (AEST)
Date: Sun, 30 Aug 2015 15:49:29 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: Bruce Evans <brde@optusnet.com.au>
cc: Justin Hibbits <jrh29@alumni.cwru.edu>, 
 Adrian Chadd <adrian.chadd@gmail.com>, 
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>, 
 Marcel Moolenaar <marcel@xcllnt.net>
Subject: Re: Devices with 36-bit paddr on 32-bit system
In-Reply-To: <20150830130612.L890@besplex.bde.org>
Message-ID: <20150830153859.S1263@besplex.bde.org>
References: <CAHSQbTDsvB32+LyzHJO78VwUwAfUTMOUQp13BMCUpapSMT0fbg@mail.gmail.com>
 <ED4B5B25-D7A7-440C-9452-4C79B0800D2E@xcllnt.net>
 <1568331.OrSoeYfXsf@ralph.baldwin.cx>
 <CAJ-VmomduZBYT6=e7HUm2V1m0taM0MAMXxMojYV8wvgEKyUEyA@mail.gmail.com>
 <CAHSQbTAGD=4A20XZL09YXbEm9hdf5K2_QCRPFOAjrXHF4eg9sQ@mail.gmail.com>
 <20150830130612.L890@besplex.bde.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.1 cv=L4TgOLn8 c=1 sm=1 tr=0
 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8
 a=kj9zAlcOel0A:10 a=xl9oN2MitZH6uz9LxZsA:9 a=CjuIK1q_8ugA:10
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Aug 2015 05:49:34 -0000

On Sun, 30 Aug 2015, Bruce Evans wrote:

> % ...
> % Index: sys/sys/bus.h
> % ===================================================================
> % --- sys/sys/bus.h	(revision 287189)
> % +++ sys/sys/bus.h	(working copy)
> % @@ -29,6 +29,7 @@
> %  #ifndef _SYS_BUS_H_
> %  #define _SYS_BUS_H_
> % % +#include <machine/_bus.h>
> %  #include <machine/_limits.h>
> %  #include <sys/_bus_dma.h>
> %  #include <sys/ioccom.h>
> % @@ -292,8 +293,8 @@
> %  	int	rid;			/**< @brief resource identifier */
> %  	int	flags;			/**< @brief resource flags */
> %  	struct	resource *res;		/**< @brief the real resource when 
> allocated */
> % -	u_long	start;			/**< @brief start of resource range 
> */
> % -	u_long	end;			/**< @brief end of resource range */
> % +	bus_addr_t	start;		/**< @brief start of resource range 
> */
> % +	bus_addr_t	end;		/**< @brief end of resource range */

Mail programs (mostly mine) corrupted the formatting more competely.

> I think rman functions should use an rman type and not hard-code bus_addr_t.
> Related bus functions should then use this type.  Style bugs from blind
> substitution can be reduced by using a less verbose name.
>
> %  	u_long	count;			/**< @brief count within range */
> %  };

Or just use uintmax_t for everything in rman.  rman was written before
C99 broke C by making u_long no longer the largest integer type.  It
used u_long because it was the largest integer type (though it actually
wasn't, since FreeBSD used nonstandard extensions in Gnu C) and it is
easiest to use a single non-typedefed type that is large enough for all
cases.  uintmax_t is C99's replacement of u_long.

I don't like the bloat from using uintmax_t for everything, but rman
should only used for initialization so uintmax_t for rman should only
give space bloat, only on 32-bit arches.

An rman typedef for this type allows re-optimizing the 32-bit arches,
but brings back the problem of typedefed types being hard to use.

Bruce

From owner-freebsd-arch@freebsd.org  Sun Aug 30 07:24:31 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 09ED99C490D
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Sun, 30 Aug 2015 07:24:31 +0000 (UTC)
 (envelope-from chmeeedalf@gmail.com)
Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com
 [IPv6:2607:f8b0:4001:c05::22b])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id C81F77F2
 for <freebsd-arch@freebsd.org>; Sun, 30 Aug 2015 07:24:30 +0000 (UTC)
 (envelope-from chmeeedalf@gmail.com)
Received: by igbuu8 with SMTP id uu8so30370115igb.0
 for <freebsd-arch@freebsd.org>; Sun, 30 Aug 2015 00:24:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:sender:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type;
 bh=tNFZ9whj+Ks/z+meJb9JOXwXUbq6LBGTj+lDxXebVo4=;
 b=GAzGHO+m8L0lxt1bFRBn/qpVqUkwvtuPg4HUHZJcQ9Mb6ATTe9a564leBpBfuGFxHy
 F2akOtIUs6GRTcY7w/ihOTFtzAYOlFXXHRwNRaawrmeb8yR3mfMkHiTfKqqwSQSBOIUC
 G+yjwxjQ8ny1vbKIoFhxogJLRJakA/Dn6vg2OEseiIjAA3h/UPwoRNFHQdGBuGrjJYoH
 v4R6YWI8gk733mb6FAd08lWByd3MOksUHiQClt2WJep92WUHc/AO03fIwtCWIM0F5zE6
 GjhZzcu80FMgN5rCyiQk8IwdYOWIPivbDRDM8UVBTTLgtp6Ho2+4DXQwr+5Qt1WqgGqN
 2oLg==
MIME-Version: 1.0
X-Received: by 10.50.62.148 with SMTP id y20mr9761242igr.17.1440919470181;
 Sun, 30 Aug 2015 00:24:30 -0700 (PDT)
Sender: chmeeedalf@gmail.com
Received: by 10.36.58.149 with HTTP; Sun, 30 Aug 2015 00:24:29 -0700 (PDT)
In-Reply-To: <20150830153859.S1263@besplex.bde.org>
References: <CAHSQbTDsvB32+LyzHJO78VwUwAfUTMOUQp13BMCUpapSMT0fbg@mail.gmail.com>
 <ED4B5B25-D7A7-440C-9452-4C79B0800D2E@xcllnt.net>
 <1568331.OrSoeYfXsf@ralph.baldwin.cx>
 <CAJ-VmomduZBYT6=e7HUm2V1m0taM0MAMXxMojYV8wvgEKyUEyA@mail.gmail.com>
 <CAHSQbTAGD=4A20XZL09YXbEm9hdf5K2_QCRPFOAjrXHF4eg9sQ@mail.gmail.com>
 <20150830130612.L890@besplex.bde.org>
 <20150830153859.S1263@besplex.bde.org>
Date: Sun, 30 Aug 2015 00:24:29 -0700
X-Google-Sender-Auth: 60wbfbO6IaSvthlPfoYGGsknACw
Message-ID: <CAHSQbTDtEagLJ+oTke=bCHLeBEryyVpx5fh1rk-oR7TUKTQkoQ@mail.gmail.com>
Subject: Re: Devices with 36-bit paddr on 32-bit system
From: Justin Hibbits <jrh29@alumni.cwru.edu>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Adrian Chadd <adrian.chadd@gmail.com>, 
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>,
 Marcel Moolenaar <marcel@xcllnt.net>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Aug 2015 07:24:31 -0000

Hi Bruce,

On Sat, Aug 29, 2015 at 10:49 PM, Bruce Evans <brde@optusnet.com.au> wrote:
> On Sun, 30 Aug 2015, Bruce Evans wrote:
>
>> % ...
>> % Index: sys/sys/bus.h
>> % ===================================================================
>> % --- sys/sys/bus.h     (revision 287189)
>> % +++ sys/sys/bus.h     (working copy)
>> % @@ -29,6 +29,7 @@
>> %  #ifndef _SYS_BUS_H_
>> %  #define _SYS_BUS_H_
>> % % +#include <machine/_bus.h>
>> %  #include <machine/_limits.h>
>> %  #include <sys/_bus_dma.h>
>> %  #include <sys/ioccom.h>
>> % @@ -292,8 +293,8 @@
>> %       int     rid;                    /**< @brief resource identifier */
>> %       int     flags;                  /**< @brief resource flags */
>> %       struct  resource *res;          /**< @brief the real resource when
>> allocated */
>> % -     u_long  start;                  /**< @brief start of resource
>> range */
>> % -     u_long  end;                    /**< @brief end of resource range
>> */
>> % +     bus_addr_t      start;          /**< @brief start of resource
>> range */
>> % +     bus_addr_t      end;            /**< @brief end of resource range
>> */
>
>
> Mail programs (mostly mine) corrupted the formatting more competely.
>
>> I think rman functions should use an rman type and not hard-code
>> bus_addr_t.
>> Related bus functions should then use this type.  Style bugs from blind
>> substitution can be reduced by using a less verbose name.
>>
>> %       u_long  count;                  /**< @brief count within range */
>> %  };
>
>
> Or just use uintmax_t for everything in rman.  rman was written before
> C99 broke C by making u_long no longer the largest integer type.  It
> used u_long because it was the largest integer type (though it actually
> wasn't, since FreeBSD used nonstandard extensions in Gnu C) and it is
> easiest to use a single non-typedefed type that is large enough for all
> cases.  uintmax_t is C99's replacement of u_long.
>
> I don't like the bloat from using uintmax_t for everything, but rman
> should only used for initialization so uintmax_t for rman should only
> give space bloat, only on 32-bit arches.
>
> An rman typedef for this type allows re-optimizing the 32-bit arches,
> but brings back the problem of typedefed types being hard to use.
>
> Bruce

Thanks for the lengthy review.  I am well aware of (most of) the style
issues and intend to address them all before releasing a final patch.
My goal with this rough early release was to solicit comments on the
approach taken, and if there should be a better way to address the
overall problem.  I did consider your suggestion of uintmax_t across
the board early on, but felt there must be a better type name to
identify purpose.  A 'rman_addr_t' is better than bus_addr_t, so I can
easily make that mechanical change (simple sed on the diff would do
it).  Also, good eye on the qmax()/qmin() needing to have unsigned
counterparts, like I said, I haven't yet tested on a full 64-bit
platform :)

Regarding casts and macros, since this was an evolutionary process, I
haven't yet cleaned things up (why clean it up, if people tell me the
basic approach sucks anyway?), so that'll come in round 2, but thanks
for identifying them, they may have slipped through later patches.
I'm not tied one way or the other to RM_MIN_START vs 0.  Keeping it 0
would definitely be easier on the fingers, less rewrites, and I do
have more to change for the RM_MAX_END (choose a better name?)

As for the 'long long abomination', do you have anything better?  I
can't use PRIxPTR, because if that was possible I wouldn't need to
make this change in the first place.  I guess I could use %jx, or
PRIxMAX, and cast to uintmax_t.

So, that all being said, any suggestion on what the naming should be?
I'll pose: rman_addr_t and rman_size_t, as counterparts to bus_*_t.
Yes, it's more verbose, but it also shows usage intent.  However, I
would also yield to uintmax_t if others think that's the best option
as well.

- Justin

From owner-freebsd-arch@freebsd.org  Sun Aug 30 21:21:40 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 030719C67C7
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Sun, 30 Aug 2015 21:21:40 +0000 (UTC)
 (envelope-from hoomanfazaeli@gmail.com)
Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com
 [IPv6:2a00:1450:400c:c05::235])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 91F8CCF2;
 Sun, 30 Aug 2015 21:21:39 +0000 (UTC)
 (envelope-from hoomanfazaeli@gmail.com)
Received: by wicne3 with SMTP id ne3so58419725wic.0;
 Sun, 30 Aug 2015 14:21:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=message-id:date:from:user-agent:mime-version:to:cc:subject
 :references:in-reply-to:content-type:content-transfer-encoding;
 bh=6JHzySLlG4jGQX+Shkwu2ewVXTwSpj3uohSYUu6htAQ=;
 b=kkD1H4mhHmxn5fhkvErKgXIgfM846EaIfmkpOyRB1rWagFbu/CT+DQoqkPOB2dvSbF
 tR41UD6hyhPlIkwIbIM2Uu0NVVkRHl9ec1n3GX4CPCKN6b7Ur0z99MgoPGSsVBBBMTNV
 9tZfEiMiBp+FoavBf16bI6ASwYRmqm0+W5minri6uBD3agr+2zxtc4jZwFHNDti/Yu3/
 A5JbaAQTnyZdLqrJ2MqTkHgS4Wve1Q6YTDTRF1sMZQaSZy89EW631u1ytM23wGgGgPYS
 TDlxY9xRbJN95c8xPJLzjutnR0Uoa53r/Lpi6khZIZow5cErSx1a9uvfxPw72FvOW89s
 HYzQ==
X-Received: by 10.180.189.108 with SMTP id gh12mr5132512wic.53.1440969697901; 
 Sun, 30 Aug 2015 14:21:37 -0700 (PDT)
Received: from [192.168.2.30] ([2.190.161.53])
 by smtp.googlemail.com with ESMTPSA id bq7sm18994000wjc.31.2015.08.30.14.21.35
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Sun, 30 Aug 2015 14:21:37 -0700 (PDT)
Message-ID: <55E373DC.3060306@gmail.com>
Date: Mon, 31 Aug 2015 01:51:32 +0430
From: Hooman Fazaeli <hoomanfazaeli@gmail.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130215 Thunderbird/17.0.3
MIME-Version: 1.0
To: Sean Bruno <sbruno@freebsd.org>
CC: freebsd-arch@freebsd.org
Subject: Re: Network card interrupt handling
References: <55DDE9B8.4080903@freebsd.org>
In-Reply-To: <55DDE9B8.4080903@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 30 Aug 2015 21:21:40 -0000

On 8/26/2015 9:00 PM, Sean Bruno wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> We've been diagnosing what appeared to be out of order processing in
> the network stack this week only to find out that the network card
> driver was shoveling bits to us out of order (em).
>
> This *seems* to be due to a design choice where the driver is allowed
> to assert a "soft interrupt" to the h/w device while real interrupts
> are disabled.  This allows a fake "em_msix_rx" to be started *while*
> "em_handle_que" is running from the taskqueue.  We've isolated and
> worked around this by setting our processing_limit in the driver to
> - -1.  This means that *most* packet processing is now handled in the
> MSI-X handler instead of being deferred.  Some periodic interference
> is still detectable via em_local_timer() which causes one of these
> "fake" interrupt assertions in the normal, card is *not* hung case.
>
> Both functions use identical code for a start.  Both end up down
> inside of em_rxeof() to process packets.  Both drop the RX lock prior
> to handing the data up the network stack.
>
> This means that the em_handle_que running from the taskqueue will be
> preempted.  Dtrace confirms that this allows out of order processing
> to occur at times and generates a lot of resets.
>
> The reason I'm bringing this up on -arch and not on -net is that this
> is a common design pattern in some of the Ethernet drivers.  We've
> done preliminary tests on a patch that moves *all* processing of RX
> packets to the rx_task taskqueue, which means that em_handle_que is
> now the only path to get packets processed.
>
> <stable10 diff>
> https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff
>
> My sense is that this is a slightly "better" method to handle the
> packets but removes some immediacy from packet processing since all
> processing is deferred.  However, all packet processing is now
> serialized per queue, which I think is the proper implementation.
>
> Am I smoking "le dope" here or is this the way forward?
>
>
Which versions of the driver have this problem?

-- 
Best regards
Hooman Fazaeli


From owner-freebsd-arch@freebsd.org  Mon Aug 31 00:00:10 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2793D9C6C5A
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Mon, 31 Aug 2015 00:00:10 +0000 (UTC)
 (envelope-from jmg@gold.funkthat.com)
Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18])
 (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
 (Client CN "gold.funkthat.com", Issuer "gold.funkthat.com" (not verified))
 by mx1.freebsd.org (Postfix) with ESMTPS id E1732E17;
 Mon, 31 Aug 2015 00:00:09 +0000 (UTC)
 (envelope-from jmg@gold.funkthat.com)
Received: from gold.funkthat.com (localhost [127.0.0.1])
 by gold.funkthat.com (8.14.5/8.14.5) with ESMTP id t7V003EE019461
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Sun, 30 Aug 2015 17:00:03 -0700 (PDT)
 (envelope-from jmg@gold.funkthat.com)
Received: (from jmg@localhost)
 by gold.funkthat.com (8.14.5/8.14.5/Submit) id t7V003Yl019460;
 Sun, 30 Aug 2015 17:00:03 -0700 (PDT) (envelope-from jmg)
Date: Sun, 30 Aug 2015 17:00:03 -0700
From: John-Mark Gurney <jmg@funkthat.com>
To: Adrian Chadd <adrian.chadd@gmail.com>
Cc: Jack Vogel <jfvogel@gmail.com>, Sean Bruno <sbruno@freebsd.org>,
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject: Re: Network card interrupt handling
Message-ID: <20150831000003.GJ33167@funkthat.com>
References: <55DDE9B8.4080903@freebsd.org>
 <20150828184800.GE33167@funkthat.com>
 <CAFOYbcm55rJ2ZLyp0J8nkMcd2zCsTjS7GcbE1ZOJ7uXH91q7Bg@mail.gmail.com>
 <CAJ-VmonvrvpXDoqfsEffGSdnRkEL4w-6eBebwkWHXd8Hea-C3g@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAJ-VmonvrvpXDoqfsEffGSdnRkEL4w-6eBebwkWHXd8Hea-C3g@mail.gmail.com>
X-Operating-System: FreeBSD 9.1-PRERELEASE amd64
X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88  9322 9CB1 8F74 6D3F A396
X-Files: The truth is out there
X-URL: http://resnet.uoregon.edu/~gurney_j/
X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html
X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE
X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger?
User-Agent: Mutt/1.5.21 (2010-09-15)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7
 (gold.funkthat.com [127.0.0.1]); Sun, 30 Aug 2015 17:00:03 -0700 (PDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Aug 2015 00:00:10 -0000

Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700:
> [snip]
> 
> Well, the other big reason for doing it deferred like this is to avoid
> network based deadlocks because you're being fed packets faster than
> you can handle them. If you never yield, you stop other NIC
> processing.

You snipped the part of me asking isn't the interrupt thread just the
same interruptable context as the task queue?  Maybe the priority is
different, but that can be adjusted to be the same and still save the
context switch...

There is no break/moderation in the taskqueue, as it'll just enqueue
itself, and when the task queue breaks out, it'll just immediately run
itself, since it has a dedicated thread to itself... So, looks like
you get the same spinning behavior...

> People used to do run-to-completion and then complained when this
> happened, so polling was a thing.

Maybe when using PCI shared interrupts, but we are talking about PCIe
MSI-X unshared interrupts.

> So - I'm all for doing it with a fast interrupt handler and a fast
> taskqueue. As long as we don't run things to completion and
> re-schedule the taskqueue (so other things on that core get network
> processing) then I'm okay.
> 
> (I kinda want us to have NAPI at some point...)

-- 
  John-Mark Gurney				Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

From owner-freebsd-arch@freebsd.org  Mon Aug 31 00:24:49 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7154D9C66EF
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Mon, 31 Aug 2015 00:24:49 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: from mail-io0-x230.google.com (mail-io0-x230.google.com
 [IPv6:2607:f8b0:4001:c06::230])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 3E4761863;
 Mon, 31 Aug 2015 00:24:49 +0000 (UTC)
 (envelope-from adrian.chadd@gmail.com)
Received: by iod35 with SMTP id 35so12334426iod.3;
 Sun, 30 Aug 2015 17:24:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=yMrqHCW6ifCf0bTBACmQnwqe9J1FW2+cly1O/djyelA=;
 b=UzY4wF9rcqF8DRgN45GQBZcD63Eyx4gFqFZMWKuuCBpjb5s0b4ZIelLmUXIorpS56p
 6um9PU2NjpLds3UEuJBADtqJQvhF55nSGCiEarJ5I3YnZGUk1z+izRDUXaRLCLjS0nTe
 BtZWXAWH04E4hmIan1hSomi9Mmh7NnfB8NUxsazyPpo7DfAxnSOGVfCC9GVCuzO4IY5C
 ScpHKU0RDECe9ypSCJhWhqKUleGjHeCOtvjFr2m981eAFuz+ahEICSF3QRdZ/LDlaHZF
 XK5aLr9GoXNHXvPyzzHSV9xQ7Vze2uK/FcVpaF+jZhSroi6uqvk+Lia6t3I4guPF6ygR
 +4oA==
MIME-Version: 1.0
X-Received: by 10.107.154.212 with SMTP id c203mr3370414ioe.123.1440980688639; 
 Sun, 30 Aug 2015 17:24:48 -0700 (PDT)
Received: by 10.36.28.208 with HTTP; Sun, 30 Aug 2015 17:24:48 -0700 (PDT)
In-Reply-To: <20150831000003.GJ33167@funkthat.com>
References: <55DDE9B8.4080903@freebsd.org>
 <20150828184800.GE33167@funkthat.com>
 <CAFOYbcm55rJ2ZLyp0J8nkMcd2zCsTjS7GcbE1ZOJ7uXH91q7Bg@mail.gmail.com>
 <CAJ-VmonvrvpXDoqfsEffGSdnRkEL4w-6eBebwkWHXd8Hea-C3g@mail.gmail.com>
 <20150831000003.GJ33167@funkthat.com>
Date: Sun, 30 Aug 2015 17:24:48 -0700
Message-ID: <CAJ-Vmo=iBHMMzg-Hqg05OqkSto1mCAr4+tp8M-80ju57WFOktg@mail.gmail.com>
Subject: Re: Network card interrupt handling
From: Adrian Chadd <adrian.chadd@gmail.com>
To: John-Mark Gurney <jmg@funkthat.com>
Cc: Jack Vogel <jfvogel@gmail.com>, Sean Bruno <sbruno@freebsd.org>, 
 "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Content-Type: text/plain; charset=UTF-8
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Aug 2015 00:24:49 -0000

On 30 August 2015 at 17:00, John-Mark Gurney <jmg@funkthat.com> wrote:
> Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700:
>> [snip]
>>
>> Well, the other big reason for doing it deferred like this is to avoid
>> network based deadlocks because you're being fed packets faster than
>> you can handle them. If you never yield, you stop other NIC
>> processing.
>
> You snipped the part of me asking isn't the interrupt thread just the
> same interruptable context as the task queue?  Maybe the priority is
> different, but that can be adjusted to be the same and still save the
> context switch...
>
> There is no break/moderation in the taskqueue, as it'll just enqueue
> itself, and when the task queue breaks out, it'll just immediately run
> itself, since it has a dedicated thread to itself... So, looks like
> you get the same spinning behavior...
>
>> People used to do run-to-completion and then complained when this
>> happened, so polling was a thing.
>
> Maybe when using PCI shared interrupts, but we are talking about PCIe
> MSI-X unshared interrupts.

Well, try it and see what happens. You can still get network livelock
and starvation of other interfaces with ridiculously high pps if you
never yield. :P


-adrian

From owner-freebsd-arch@freebsd.org  Mon Aug 31 15:49:29 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 488A69C7A3D
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Mon, 31 Aug 2015 15:49:29 +0000 (UTC)
 (envelope-from sbruno@freebsd.org)
Received: from mail.ignoranthack.me (ignoranthack.me [199.102.79.106])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 2DBC024A
 for <freebsd-arch@freebsd.org>; Mon, 31 Aug 2015 15:49:28 +0000 (UTC)
 (envelope-from sbruno@freebsd.org)
Received: from [192.168.200.200] (unknown [50.136.155.142])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 (Authenticated sender: sbruno@ignoranthack.me)
 by mail.ignoranthack.me (Postfix) with ESMTPSA id 802FD193655
 for <freebsd-arch@freebsd.org>; Mon, 31 Aug 2015 15:49:22 +0000 (UTC)
Subject: Re: Network card interrupt handling
To: freebsd-arch@freebsd.org
References: <55DDE9B8.4080903@freebsd.org>
 <20150828184800.GE33167@funkthat.com>
From: Sean Bruno <sbruno@freebsd.org>
X-Enigmail-Draft-Status: N1110
Message-ID: <55E47781.7020009@freebsd.org>
Date: Mon, 31 Aug 2015 08:49:21 -0700
User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101
 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <20150828184800.GE33167@funkthat.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Aug 2015 15:49:29 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512


> I have a better question, for MSI-X, we have a dedicated interrupt 
> thread to do the processing, so why are we even doing any
> moderation in this case?  It's not any different than spinning in
> the task queue..
> 
> How about the attached patch that just disables taskqueue
> processing for MSI-X RX interrupts, and does all processing in the
> interrupt thread?

This is another design that I had thought of.  For em(4) when using
seperate ISR threads for *each* rx queue and *each* tx queue, I think
that doing processing in the interrupt thread is the right thing to do.

I'm unsure of what the correct thing to do when tx/rx is combined into
a single handler though (igb/ix for example).  This would lead to
possible starvation as Adrian has pointed out.  There is nothing
stopping us from breaking the queues apart into seperate tx/rx threads
of execution for these drivers.  em(4) was my little science project
to see what the behavior would be.

> 
> Do you need to add the rx_task to the em_local_timer?  If so, then 
> I would look at setting a flag in the _rxeof that says that
> processing is happening... and in the case of the taskqueue, when
> it sees this flag set, it just exits, while for the interrupt
> filter case, we'd need to be more careful (possibly set a flag that
> the taskqueue will inspect, and cause it to stop processing the rx
> queue)...
> 

^^ I'll ponder this a bit further today and comment after coffee.

> btw, since you're hacking on em a lot, interrested in fixing em's 
> jumbo frames so it doesn't use 9k clusters, but instead page sized 
> clusters?
> 
> 

Uh ... hrm.   I can look into it, but would need more details as I'm
pretty ignorant of what you're referring to.  Ping me off list and
I'll take a look (jumbo frames is out of scope for $dayjob).

sean
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQF8BAEBCgBmBQJV5Hd+XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w
ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx
MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kL4wH/AmMvnZ7EKfF04qKhUxdOA90
YN5OZpyeXc8zk0QUq+INNMIHiQJN1wCHiOVd46YIuwjdWeSvHxKgnJMV1whDod55
c4QO6WG5yRcP5Wl30YN5XhjfUW48MYytYXxlAY5cC5A+TIUq/HywSNmyEVxKAooY
lSw+8Zpdzaozv1LxS70bRggi9y/y5NEgcVViO1cjip+nkl3eNvYOFq5jp2KJc0vS
+oe/wqbF5syRiBO4R2XnJs6PJ9BALOF73pFteHBebAe5jUwt6UD17c35/I2B4v60
+zNuM3rdNdDCOecFEdFctOQe6XDpSAu6Q7Dv88SKoKeIs+2lXHD/AJf24heTQOg=
=oY0k
-----END PGP SIGNATURE-----

From owner-freebsd-arch@freebsd.org  Mon Aug 31 21:33:33 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62D929C6EFC
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Mon, 31 Aug 2015 21:33:33 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 41ADD1603
 for <freebsd-arch@freebsd.org>; Mon, 31 Aug 2015 21:33:33 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 22806B95D
 for <freebsd-arch@freebsd.org>; Mon, 31 Aug 2015 17:33:32 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Subject: Re: Supporting cross-debugging vmcores in libkvm (Testing needed)
Date: Mon, 31 Aug 2015 14:21:19 -0700
Message-ID: <4121266.HXN5YIfUMj@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; )
In-Reply-To: <5166205.rtMjEmhvmo@ralph.baldwin.cx>
References: <3121152.ujdxFEovO3@ralph.baldwin.cx>
 <5166205.rtMjEmhvmo@ralph.baldwin.cx>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 31 Aug 2015 17:33:32 -0400 (EDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Aug 2015 21:33:33 -0000

On Wednesday, August 12, 2015 10:50:20 AM John Baldwin wrote:
> On Tuesday, August 04, 2015 10:56:09 AM John Baldwin wrote:
> > Many debuggers (recent gdb and lldb) support cross-architecture debugging 
> > just fine.  My current WIP port of kgdb to gdb7 supports cross-debugging for
> > remote targets already, but I wanted it to also support cross-debugging for
> > vmcores.
> > 
> > The existing libkvm/kgdb code in the tree has some limited support for
> > cross-debugging.  It requires building a custom libkvm (e.g. libkvm-i386.a)
> > and custom kgdb for each target platform.  However, gdb (and lldb) both
> > support multiple targets in a single binary, so I'd like to have a single
> > kgdb binary that can cross-debug anything.
> > 
> > I started hacking on libkvm last weekend and have a prototype that I've used
> > (along with some patches to my kgdb port) to debug an amd64 vmcore on an
> > i386 machine and vice versa.
> >
> > ...
> > 
> > What I'm mostly after is comments on the API, etc.  Once that is settled I
> > will move forward on converting and/or stubbing the other backends (the
> > stub route would be to only support other backends on native systems for
> > now).
> 
> I guess this is closer to a nuclear power plant than a bikeshed judging by the
> feedback.  I have ported the rest of the MD backends and verified that the
> updated libkvm passes a universe build (including various static assertions
> for the duplicated constants in other backends).  What I have not done is any
> runtime testing and I would like to ask for help with that now.  In particular
> I need someone to test that kgdb and/or ps works against a native core dump
> on all platforms other than amd64 and i386.  Note that some of the trickiness
> is that the backends now have to make runtime decisions for things that were
> previously compile-time decisions.  The biggest one affected by this is the
> MIPS backend as that backend handles three ABIs (mipso32, mipsn32, and mipsn64).
> I believe I have the handling for that correct (mips[on]32 use 32-bit KSEGs
> where as mipsn64 uses the extended segments and compat32 KSEGS, and mipso32
> uses 32-bit PTEs and mipsn32/n64 both use 64-bit PTEs) (plus both endians
> for both in theory).  The ARM backend also handles both endians (in theory).
> 
> Another wrinkle is that sparc64 uses its own dump format instead of writing
> out an ELF file.  I had to convert the header structures to use fixed-width
> types to be cross-friendly.  It would be good to ensure that a new libkvm
> can read a vmcore from an old kernel and vice versa to make sure my conversion
> is correct (I added an explicit padding field that I believe was implicit
> before).
> 
> The code is currently available for review in phabric at
> https://reviews.freebsd.org/D3341
> 
> To test, you can run 'arc patch D3341' in a clean tree to apply the patch.

I've just rebased this to port aarch64's minidump support.  I just need people
willing and able to test on non-x86.  Testing with the in-tree kgdb using an
updated libkvm would be sufficient.

-- 
John Baldwin

From owner-freebsd-arch@freebsd.org  Mon Aug 31 21:33:38 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7B2129C6F22
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Mon, 31 Aug 2015 21:33:38 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 5A32817EE
 for <freebsd-arch@freebsd.org>; Mon, 31 Aug 2015 21:33:38 +0000 (UTC)
 (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6F25BB98D;
 Mon, 31 Aug 2015 17:33:37 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: John-Mark Gurney <jmg@funkthat.com>
Cc: 'freebsd-arch' <freebsd-arch@freebsd.org>
Subject: Re: Supporting cross-debugging vmcores in libkvm
Date: Fri, 28 Aug 2015 16:37:56 -0700
Message-ID: <8789359.0qvzdHHvAS@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; )
In-Reply-To: <20150828211952.GG33167@funkthat.com>
References: <3121152.ujdxFEovO3@ralph.baldwin.cx>
 <3887505.r5DL7PVlOf@ralph.baldwin.cx> <20150828211952.GG33167@funkthat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 31 Aug 2015 17:33:37 -0400 (EDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Aug 2015 21:33:38 -0000

On Friday, August 28, 2015 02:19:52 PM John-Mark Gurney wrote:
> John Baldwin wrote this message on Fri, Aug 28, 2015 at 13:39 -0700:
> > On Friday, August 28, 2015 11:27:05 AM John-Mark Gurney wrote:
> > > John Baldwin wrote this message on Tue, Aug 04, 2015 at 10:56 -0700:
> > > How will we prevent native only aware apps from getting confused when
> > > accessing non-native cores?  Will kvm_openfiles fail for non-native
> > > cores? or will kvm_read fail for non-native cores?
> > 
> > kvm_openfiles() will fail.  kvm_open2() will fail for a non-native core
> > if a symbol resolving routine is not supplied.
> > 
> > One API question I had is if it would be useful to allow a void * cookie
> > to be passed to the symbol resolving routine (the same cookie would be
> > passed to kvm_open2() and stored internally to be passed on each resolution
> > request).  I think in practice we don't need that level of complexity
> > though (my kgdb changes did not).
> 
> I can't think of a reason it would be required, but that doesn't mean
> someone else wouldn't need it...

You need to resolve symbols to find the root of the global page table
structures that let you do virtual to physical translations.

> Though wouldn't the core parser provide the symbol lookup function?
> 
> > I will need to rebase this to port the arm64 minidump support over, but
> > I also need people to test this.
> 
> I'll see what I can do to help test it...

Mostly it needs testing on non-x86.

-- 
John Baldwin

From owner-freebsd-arch@freebsd.org  Mon Aug 31 21:33:34 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4C339C6F04
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Mon, 31 Aug 2015 21:33:34 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 829AF1625;
 Mon, 31 Aug 2015 21:33:34 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 925E3B913;
 Mon, 31 Aug 2015 17:33:33 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: freebsd-arch@freebsd.org
Cc: John-Mark Gurney <jmg@funkthat.com>, Adrian Chadd <adrian.chadd@gmail.com>,
 Sean Bruno <sbruno@freebsd.org>, Jack Vogel <jfvogel@gmail.com>
Subject: Re: Network card interrupt handling
Date: Mon, 31 Aug 2015 14:18:17 -0700
Message-ID: <5699836.EEH7cPOQBG@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; )
In-Reply-To: <20150831000003.GJ33167@funkthat.com>
References: <55DDE9B8.4080903@freebsd.org>
 <CAJ-VmonvrvpXDoqfsEffGSdnRkEL4w-6eBebwkWHXd8Hea-C3g@mail.gmail.com>
 <20150831000003.GJ33167@funkthat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 31 Aug 2015 17:33:33 -0400 (EDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Aug 2015 21:33:34 -0000

On Sunday, August 30, 2015 05:00:03 PM John-Mark Gurney wrote:
> Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700:
> > [snip]
> > 
> > Well, the other big reason for doing it deferred like this is to avoid
> > network based deadlocks because you're being fed packets faster than
> > you can handle them. If you never yield, you stop other NIC
> > processing.
> 
> You snipped the part of me asking isn't the interrupt thread just the
> same interruptable context as the task queue?  Maybe the priority is
> different, but that can be adjusted to be the same and still save the
> context switch...
> 
> There is no break/moderation in the taskqueue, as it'll just enqueue
> itself, and when the task queue breaks out, it'll just immediately run
> itself, since it has a dedicated thread to itself... So, looks like
> you get the same spinning behavior...

Yes, that is true and why all the interrupt moderation stuff in the NIC
drivers that I've seen has always been pointless.  All it does is add
extra overhead since you waste time with extra context switches back to
yourself in between servicing packets.  It does not permit any other
NICs to run at all.  (One of the goals of my other patches that I
mentioned is to make it possible for multiple devices to share ithreads
even when using discrete interrupts (e.g. MSI) so that the yielding
done actually would give a chance for other devices to run, but currently
it is all just a waste of CPU cycles).

If you think this actually helps, I challenge to you capture a KTR_SCHED
trace of it ever working as intended.

-- 
John Baldwin

From owner-freebsd-arch@freebsd.org  Tue Sep  1 01:05:19 2015
Return-Path: <owner-freebsd-arch@freebsd.org>
Delivered-To: freebsd-arch@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id C24F39C7DDF
 for <freebsd-arch@mailman.ysv.freebsd.org>;
 Tue,  1 Sep 2015 01:05:19 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1])
 (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 9AB5F68D;
 Tue,  1 Sep 2015 01:05:19 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net
 [73.231.226.104])
 by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6E526B913;
 Mon, 31 Aug 2015 21:05:18 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: "K. Macy" <kmacy@freebsd.org>
Cc: freebsd-arch@freebsd.org, Sean Bruno <sbruno@freebsd.org>
Subject: Re: Network card interrupt handling
Date: Mon, 31 Aug 2015 14:41:14 -0700
Message-ID: <1709356.SnmUAQFSba@ralph.baldwin.cx>
User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; )
In-Reply-To: <CAHM0Q_N65J9OSaU=znjgJ_gEiu=M-cb9q1hrxskGSvYFhxL_NQ@mail.gmail.com>
References: <55DDE9B8.4080903@freebsd.org>
 <24017021.PxBoCiQKDJ@ralph.baldwin.cx>
 <CAHM0Q_N65J9OSaU=znjgJ_gEiu=M-cb9q1hrxskGSvYFhxL_NQ@mail.gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7
 (bigwig.baldwin.cx); Mon, 31 Aug 2015 21:05:18 -0400 (EDT)
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch/>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 01 Sep 2015 01:05:20 -0000

On Friday, August 28, 2015 06:25:53 PM K. Macy wrote:
> On Aug 28, 2015 12:59 PM, "John Baldwin" <jhb@freebsd.org> wrote:
> >
> > On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote:
> > > We've been diagnosing what appeared to be out of order processing in
> > > the network stack this week only to find out that the network card
> > > driver was shoveling bits to us out of order (em).
> > >
> > > This *seems* to be due to a design choice where the driver is allowed
> > > to assert a "soft interrupt" to the h/w device while real interrupts
> > > are disabled.  This allows a fake "em_msix_rx" to be started *while*
> > > "em_handle_que" is running from the taskqueue.  We've isolated and
> > > worked around this by setting our processing_limit in the driver to
> > > -1.  This means that *most* packet processing is now handled in the
> > > MSI-X handler instead of being deferred.  Some periodic interference
> > > is still detectable via em_local_timer() which causes one of these
> > > "fake" interrupt assertions in the normal, card is *not* hung case.
> > >
> > > Both functions use identical code for a start.  Both end up down
> > > inside of em_rxeof() to process packets.  Both drop the RX lock prior
> > > to handing the data up the network stack.
> > >
> > > This means that the em_handle_que running from the taskqueue will be
> > > preempted.  Dtrace confirms that this allows out of order processing
> > > to occur at times and generates a lot of resets.
> > >
> > > The reason I'm bringing this up on -arch and not on -net is that this
> > > is a common design pattern in some of the Ethernet drivers.  We've
> > > done preliminary tests on a patch that moves *all* processing of RX
> > > packets to the rx_task taskqueue, which means that em_handle_que is
> > > now the only path to get packets processed.
> >
> > It is only a common pattern in the Intel drivers. :-/  We (collectively)
> > spent quite a while fixing this in ixgbe and igb.  Longer (hopefully more
> > like medium) term I have an update to the interrupt API I want to push in
> > that allows drivers to manually schedule interrupt handlers using an
> > 'hwi' API to replace the manual taskqueues.  This also ensures that
> > the handler that dequeues packets is only ever running in an ithread
> > context and never concurrently.
> >
> 
> Jeff has a generalization of the net_task infrastructure used at Nokia
> called grouptaskq that I've used for iflib. That does essentially what you
> refer to. I've converted ixl and am currently about to test an ixgbe
> conversion. I anticipate converting mlxen, all Intel drivers as well as the
> remaining drivers with device specific code in netmap. The one catch is
> finding someone who will publicly admit to owning re hardware so that I can
> buy it from him and test my changes.

Note that the ithread changes I refer to are for all devices (not just
network interfaces) and fix some other issues as well (e.g. INTR_FILTER is
always enabled and races with tearing down filters are closed, it also uses
a more thread_lock()-friendly state for idle ithreads, and it also allows us
to experiment with sharing ithreads among devices as well as having multiple
threads service a queue of interrupt handlers if desired).  It may be that
this will make your life easier since you might be able to reuse the new
primitives more directly rather than bypassing ithreads.  I've posted the
changes to arch@ a few different times over the past several years just
haven't pushed them in.  (They aren't perfect in that I don't yet have
APIs for changing the plumbing around due to lack of use cases to build
the APIs from.)

-- 
John Baldwin