From owner-freebsd-arch@freebsd.org Sun Aug 30 05:25:06 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F36729C4C85 for ; Sun, 30 Aug 2015 05:25:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id A3756286 for ; Sun, 30 Aug 2015 05:25:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id C9B21D66097; Sun, 30 Aug 2015 15:02:11 +1000 (AEST) Date: Sun, 30 Aug 2015 15:02:10 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Justin Hibbits cc: Adrian Chadd , Marcel Moolenaar , "freebsd-arch@freebsd.org" Subject: Re: Devices with 36-bit paddr on 32-bit system In-Reply-To: Message-ID: <20150830130612.L890@besplex.bde.org> References: <1568331.OrSoeYfXsf@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=L4TgOLn8 c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=JSiOX3MnzAab593RgikA:9 a=RBkRyt4X6hdqDwO5:21 a=_7EL_taM2OgDYD4P:21 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 05:25:06 -0000 On Sat, 29 Aug 2015, Justin Hibbits wrote: > Hey, I already had that thought, too, you just encouraged me to take > it :) Anyway, here's an initial patch. I've *only* tested it on one > PowerPC board (the one needing this change), none of my other devices, > and no other architectures. Thoughts? It has lots of style bugs and type errors. % Index: sys/dev/ata/ata-pci.c % =================================================================== % --- sys/dev/ata/ata-pci.c (revision 287189) % +++ sys/dev/ata/ata-pci.c (working copy) % @@ -217,7 +217,7 @@ % % struct resource * % ata_pci_alloc_resource(device_t dev, device_t child, int type, int *rid, % - u_long start, u_long end, u_long count, u_int flags) % + bus_addr_t start, bus_addr_t end, u_long count, u_int flags) The first style bug is in the first changed line (line made longer than 80 columns by blind substitution). % Index: sys/dev/ata/ata-pci.h % =================================================================== % --- sys/dev/ata/ata-pci.h (revision 287189) % +++ sys/dev/ata/ata-pci.h (working copy) % @@ -535,7 +535,7 @@ % int ata_pci_print_child(device_t dev, device_t child); % int ata_pci_child_location_str(device_t dev, device_t child, char *buf, % size_t buflen); % -struct resource * ata_pci_alloc_resource(device_t dev, device_t child, int type, int *rid, u_long start, u_long end, u_long count, u_int flags); % +struct resource * ata_pci_alloc_resource(device_t dev, device_t child, int type, int *rid, bus_addr_t start, bus_addr_t end, u_long count, u_int flags); % int ata_pci_release_resource(device_t dev, device_t child, int type, int rid, struct resource *r); % int ata_pci_setup_intr(device_t dev, device_t child, struct resource *irq, int flags, driver_filter_t *filter, driver_intr_t *function, void *argument, void **cookiep); % int ata_pci_teardown_intr(device_t dev, device_t child, struct resource *irq, void *cookie); These were already misformatted. No comment on further style bugs from blind substitution. % Index: sys/dev/ata/chipsets/ata-promise.c % =================================================================== % --- sys/dev/ata/chipsets/ata-promise.c (revision 287189) % +++ sys/dev/ata/chipsets/ata-promise.c (working copy) % @@ -191,18 +191,19 @@ % !BUS_READ_IVAR(device_get_parent(GRANDPARENT(dev)), % GRANDPARENT(dev), PCI_IVAR_DEVID, &devid) && % ((devid == ATA_DEC_21150) || (devid == ATA_DEC_21150_1))) { % - static long start = 0, end = 0; % + static bus_addr_t start = 0; % + static u_long count = 0; Renaming the variable is an unrelated fix. The closure of this fix is quite large: - bus_get_resource() is undocumented except for a bogus reference to bus_get_resource(9) in bus_set_resource(9) , so the meaning of its last arg takes more work than necessary to check. The source file bus_get_resource.9 doesn't exist, and the installed file bus_get_resource.9.gz cannot be a link to bus_set_resource.9.gz since the top-level bus man pages are partially correctly organized with only 1 function per file. - the meaning of bus_get_resource()'s args are hard to see from its prototype since its prototype has no paramater names. uses a nonense mixture of no parameter names, parameter names without underscores, and parameter names with underscores. % % if (pci_get_slot(dev) == 1) { % - bus_get_resource(dev, SYS_RES_IRQ, 0, &start, &end); % + bus_get_resource(dev, SYS_RES_IRQ, 0, &start, &count); % strcat(buffer, " (channel 0+1)"); % } % - else if (pci_get_slot(dev) == 2 && start && end) { % - bus_set_resource(dev, SYS_RES_IRQ, 0, start, end); % + else if (pci_get_slot(dev) == 2 && start && count) { % + bus_set_resource(dev, SYS_RES_IRQ, 0, start, count); % strcat(buffer, " (channel 2+3)"); % } % else { % - start = end = 0; % + start = count = 0; This assumes that the types are similar enough for the conversion to work and not give a warning. If the types are integral, then the conversion always works and compilers shouldn't warn if it is a downcast (since 0 is representable in both types). But it isn't clear that the types are integral. pc98 uses handles, but bus_addr_t is still an integer and the handles are used to map it to the actual address. Perhaps large addresses should be handled similarly. % } % } % sprintf(buffer, "%s %s controller", buffer, ata_mode2str(idx->max_dma)); % Index: sys/dev/fdt/simplebus.c % =================================================================== % --- sys/dev/fdt/simplebus.c (revision 287189) % +++ sys/dev/fdt/simplebus.c (working copy) % ... % @@ -365,7 +365,8 @@ % if (j == sc->nranges && sc->nranges != 0) { % if (bootverbose) % device_printf(bus, "Could not map resource " % - "%#lx-%#lx\n", start, end); % + "%#llx-%#llx\n", (uint64_t)start, % + (uint64_t)end); % % return (NULL); % } This uses the long long abomination, and isn't even correct since type type isn't even (unsigned) long long. The type starts as u_long, which was not unsigned long long on any arch. It is cast to uint64_t. This breaks it if it is larger than uint64_t. This makes it compatible with unsigned long long on 32-bit arches, but has no effect on 32-bit arches -- there the type remains as u_long and the above fails to compile. No comment on further uses of the abomination. % Index: sys/dev/gpio/gpiobus.c % =================================================================== % --- sys/dev/gpio/gpiobus.c (revision 287189) % +++ sys/dev/gpio/gpiobus.c (working copy) % @@ -178,7 +178,7 @@ % sc->sc_intr_rman.rm_type = RMAN_ARRAY; % sc->sc_intr_rman.rm_descr = "GPIO Interrupts"; % if (rman_init(&sc->sc_intr_rman) != 0 || % - rman_manage_region(&sc->sc_intr_rman, 0, ~0) != 0) % + rman_manage_region(&sc->sc_intr_rman, 0, ~(bus_addr_t)0) != 0) I think ~0 gave the correct value except in the 1's complement case. Stranglely, ~0ul is more fragile than ~0. Elsewhere you spell this better as RM_MAX_END. % panic("%s: failed to set up rman.", __func__); % % if (GPIO_PIN_MAX(sc->sc_dev, &sc->sc_npins) != 0) % ... % @@ -516,7 +516,7 @@ % % if (type != SYS_RES_IRQ) % return (NULL); % - isdefault = (start == 0UL && end == ~0UL && count == 1); % + isdefault = (start == 0UL && end == ~(bus_addr_t)0UL && count == 1); Not using RM_MIN_START and RM_MAX_END gives even worse spelling than above. Both of the UL's are unnecessary, and the second one is bogus since the cast gives the correct type and it is not always UL. % rle = NULL; % if (isdefault) { % rl = BUS_GET_RESOURCE_LIST(bus, child); % Index: sys/dev/ofw/ofwbus.c % =================================================================== % --- sys/dev/ofw/ofwbus.c (revision 287189) % +++ sys/dev/ofw/ofwbus.c (working copy) % @@ -156,7 +156,7 @@ % sc->sc_mem_rman.rm_descr = "Device Memory"; % if (rman_init(&sc->sc_intr_rman) != 0 || % rman_init(&sc->sc_mem_rman) != 0 || % - rman_manage_region(&sc->sc_intr_rman, 0, ~0) != 0 || % + rman_manage_region(&sc->sc_intr_rman, 0, BUS_SPACE_MAXADDR) != 0 || Looks like a logic error. This is an rman function, so it should use an rman limit (now spelled RMAN_MAX_END), not a bus space limit. % rman_manage_region(&sc->sc_mem_rman, 0, BUS_SPACE_MAXADDR) != 0) The old code had a mixture of ~0 and BUS_SPACE_MAXADDR. This almost made sense. The first thing being managed is an interrupt id. Interrupts ids are normally small nonnegative integers. But they are still represented as u_longs or bus addresses. ~0 works for signed integers of any size and also for unsigned integers of any size except in the 1's complement case, and seems to have been used to reflect the smaller range required. % panic("%s: failed to set up rmans.", __func__); % % ... % @@ -201,7 +201,7 @@ % } % start = rle->start; % count = ulmax(count, rle->count); % - end = ulmax(rle->end, start + count - 1); % + end = qmax(rle->end, start + count - 1); This doesn't use the long long abomination, but assumes that quads are large enough. This is very fragile. No one knows what quads are, but they are currently int64_t. uquads are uint64_t, but uqmax() doesn't exist. So the above only works for 63-bit addresses. It seems to be very broken already for 64-bit bus_addr_t, since that gives the 64-bit address RMAN_MAX_END. No comment on further uses of quads. % } % % switch (type) { % ... % Index: sys/dev/pci/pci_pci.c % =================================================================== % --- sys/dev/pci/pci_pci.c (revision 287189) % +++ sys/dev/pci/pci_pci.c (working copy) % @@ -387,7 +388,7 @@ % char buf[64]; % int error, rid; % % - if (max_address != (u_long)max_address) % + if (max_address != (bus_addr_t)max_address) % max_address = ~0ul; pci uses pci_addr_t for addresses including max_address. pci_addr_t happens to be uint64_t, so it works better with a larger rman type/ Assume that the rman type is not larger. Then the above checks that the rman type is large enough to hold the current max_address, and if not it reduces to the _old_ rman limit. It should reduce to the current rman limit (RMAN_END_MAX). If the rman type is larger, then this code has no effect. Assigning RMAN_END_MAX to max_address would overflow, but is not reached since the rman type is large enough to hold any max_address. % w->rman.rm_start = 0; % w->rman.rm_end = max_address; % ... % Index: sys/kern/subr_rman.c % =================================================================== % --- sys/kern/subr_rman.c (revision 287189) % +++ sys/kern/subr_rman.c (working copy) % @@ -90,8 +90,8 @@ % TAILQ_ENTRY(resource_i) r_link; % LIST_ENTRY(resource_i) r_sharelink; % LIST_HEAD(, resource_i) *r_sharehead; % - u_long r_start; /* index of the first entry in this resource */ % - u_long r_end; /* index of the last entry (inclusive) */ % + bus_addr_t r_start; /* index of the first entry in this resource */ % + bus_addr_t r_end; /* index of the last entry (inclusive) */ Blind substitution also breaks indentation of comments. % u_int r_flags; % void *r_virtual; /* virtual address of this resource */ % struct device *r_dev; /* device which has allocated this resource */ % @@ -135,7 +135,7 @@ % } % % if (rm->rm_start == 0 && rm->rm_end == 0) RMAN_MIN_START is still spelled as 0 in many places. Since it is not very magic, this spelling is better. Let prototypes comvert plain 0 to the actual type. % ... % @@ -174,7 +174,7 @@ % % /* Skip entries before us. */ % TAILQ_FOREACH(s, &rm->rm_list, r_link) { % - if (s->r_end == ULONG_MAX) % + if (s->r_end == BUS_SPACE_MAXADDR) Why not RMAN_MAX_END? I think some cases of RMAN_MAX_END are actually magic meaning "not really the end, but a magic (default?) value". A different RMAN_FOO could be used for this. % Index: sys/powerpc/powerpc/nexus.c % =================================================================== % --- sys/powerpc/powerpc/nexus.c (revision 287189) % +++ sys/powerpc/powerpc/nexus.c (working copy) % @@ -189,13 +189,13 @@ % { % % if (type == SYS_RES_MEMORY) { % - vm_offset_t start; % + vm_paddr_t start; I think more places uses this instead of bus_addr_t or uint64_t. % void *p; % % - start = (vm_offset_t) rman_get_start(r); % + start = rman_get_start(r); The conversion is still logically non-null. % ... % Index: sys/sys/bus.h % =================================================================== % --- sys/sys/bus.h (revision 287189) % +++ sys/sys/bus.h (working copy) % @@ -29,6 +29,7 @@ % #ifndef _SYS_BUS_H_ % #define _SYS_BUS_H_ % % +#include % #include % #include % #include % @@ -292,8 +293,8 @@ % int rid; /**< @brief resource identifier */ % int flags; /**< @brief resource flags */ % struct resource *res; /**< @brief the real resource when allocated */ % - u_long start; /**< @brief start of resource range */ % - u_long end; /**< @brief end of resource range */ % + bus_addr_t start; /**< @brief start of resource range */ % + bus_addr_t end; /**< @brief end of resource range */ I think rman functions should use an rman type and not hard-code bus_addr_t. Related bus functions should then use this type. Style bugs from blind substitution can be reduced by using a less verbose name. % u_long count; /**< @brief count within range */ % }; % STAILQ_HEAD(resource_list, resource_list_entry); % ... % @@ -474,7 +475,8 @@ % static __inline struct resource * % bus_alloc_resource_any(device_t dev, int type, int *rid, u_int flags) % { % - return (bus_alloc_resource(dev, type, rid, 0ul, ~0ul, 1, flags)); % + return (bus_alloc_resource(dev, type, rid, (bus_addr_t)0ul, % + ~(bus_addr_t)0ul, 1, flags)); "ul" is now bogus, as above for UL except with a worse spelling. (Apparently, RLIM_FOO is not available here, so you have to expand it inline.) The first cast to bus_addr_t also has no affect. % } % % /* % Index: sys/sys/rman.h % =================================================================== % --- sys/sys/rman.h (revision 287189) % +++ sys/sys/rman.h (working copy) % @@ -54,6 +54,9 @@ % #define RF_ALIGNMENT_LOG2(x) ((x) << RF_ALIGNMENT_SHIFT) % #define RF_ALIGNMENT(x) (((x) & RF_ALIGNMENT_MASK) >> RF_ALIGNMENT_SHIFT) Example of normal style (except for the long line). % % +#define RM_MIN_START ((bus_addr_t)0) % +#define RM_MAX_END (~(bus_addr_t)0) Style bugs (space instead of table after define). There are no bogus ULs here. The first cast to bus_addr_t is probably unecessary here too. % @@ -108,8 +111,8 @@ % struct resource_head rm_list; % struct mtx *rm_mtx; /* mutex used to protect rm_list */ % TAILQ_ENTRY(rman) rm_link; /* link in list of all rmans */ % - u_long rm_start; /* index of globally first entry */ % - u_long rm_end; /* index of globally last entry */ % + bus_addr_t rm_start; /* index of globally first entry */ % + bus_addr_t rm_end; /* index of globally last entry */ % enum rman_type rm_type; /* what type of resource this is */ % const char *rm_descr; /* text descripion of this resource */ % }; This has mounds of new and old style bugs: new: unformatting by blind substitution old: - still uses old style of a tab instead of a space after struct, enum, and even in the middle of "const char" - avoids misindented comment for rm_list by not having one - has minimally misindented comment for rm_link after first using excessive indentation for the member name (then a space instead of a tab before the comment) - has minimally misindented comment for rm_type (after first using excessive indentation after enum, use only a space for the member name and the comment). Bruce From owner-freebsd-arch@freebsd.org Sun Aug 30 05:49:34 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2FD7F9C5970 for ; Sun, 30 Aug 2015 05:49:34 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail109.syd.optusnet.com.au (mail109.syd.optusnet.com.au [211.29.132.80]) by mx1.freebsd.org (Postfix) with ESMTP id E6AF0C04 for ; Sun, 30 Aug 2015 05:49:33 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail109.syd.optusnet.com.au (Postfix) with ESMTPS id C0DA1D6551D; Sun, 30 Aug 2015 15:49:30 +1000 (AEST) Date: Sun, 30 Aug 2015 15:49:29 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans cc: Justin Hibbits , Adrian Chadd , "freebsd-arch@freebsd.org" , Marcel Moolenaar Subject: Re: Devices with 36-bit paddr on 32-bit system In-Reply-To: <20150830130612.L890@besplex.bde.org> Message-ID: <20150830153859.S1263@besplex.bde.org> References: <1568331.OrSoeYfXsf@ralph.baldwin.cx> <20150830130612.L890@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=L4TgOLn8 c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=kj9zAlcOel0A:10 a=xl9oN2MitZH6uz9LxZsA:9 a=CjuIK1q_8ugA:10 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 05:49:34 -0000 On Sun, 30 Aug 2015, Bruce Evans wrote: > % ... > % Index: sys/sys/bus.h > % =================================================================== > % --- sys/sys/bus.h (revision 287189) > % +++ sys/sys/bus.h (working copy) > % @@ -29,6 +29,7 @@ > % #ifndef _SYS_BUS_H_ > % #define _SYS_BUS_H_ > % % +#include > % #include > % #include > % #include > % @@ -292,8 +293,8 @@ > % int rid; /**< @brief resource identifier */ > % int flags; /**< @brief resource flags */ > % struct resource *res; /**< @brief the real resource when > allocated */ > % - u_long start; /**< @brief start of resource range > */ > % - u_long end; /**< @brief end of resource range */ > % + bus_addr_t start; /**< @brief start of resource range > */ > % + bus_addr_t end; /**< @brief end of resource range */ Mail programs (mostly mine) corrupted the formatting more competely. > I think rman functions should use an rman type and not hard-code bus_addr_t. > Related bus functions should then use this type. Style bugs from blind > substitution can be reduced by using a less verbose name. > > % u_long count; /**< @brief count within range */ > % }; Or just use uintmax_t for everything in rman. rman was written before C99 broke C by making u_long no longer the largest integer type. It used u_long because it was the largest integer type (though it actually wasn't, since FreeBSD used nonstandard extensions in Gnu C) and it is easiest to use a single non-typedefed type that is large enough for all cases. uintmax_t is C99's replacement of u_long. I don't like the bloat from using uintmax_t for everything, but rman should only used for initialization so uintmax_t for rman should only give space bloat, only on 32-bit arches. An rman typedef for this type allows re-optimizing the 32-bit arches, but brings back the problem of typedefed types being hard to use. Bruce From owner-freebsd-arch@freebsd.org Sun Aug 30 07:24:31 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 09ED99C490D for ; Sun, 30 Aug 2015 07:24:31 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: from mail-ig0-x22b.google.com (mail-ig0-x22b.google.com [IPv6:2607:f8b0:4001:c05::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C81F77F2 for ; Sun, 30 Aug 2015 07:24:30 +0000 (UTC) (envelope-from chmeeedalf@gmail.com) Received: by igbuu8 with SMTP id uu8so30370115igb.0 for ; Sun, 30 Aug 2015 00:24:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=tNFZ9whj+Ks/z+meJb9JOXwXUbq6LBGTj+lDxXebVo4=; b=GAzGHO+m8L0lxt1bFRBn/qpVqUkwvtuPg4HUHZJcQ9Mb6ATTe9a564leBpBfuGFxHy F2akOtIUs6GRTcY7w/ihOTFtzAYOlFXXHRwNRaawrmeb8yR3mfMkHiTfKqqwSQSBOIUC G+yjwxjQ8ny1vbKIoFhxogJLRJakA/Dn6vg2OEseiIjAA3h/UPwoRNFHQdGBuGrjJYoH v4R6YWI8gk733mb6FAd08lWByd3MOksUHiQClt2WJep92WUHc/AO03fIwtCWIM0F5zE6 GjhZzcu80FMgN5rCyiQk8IwdYOWIPivbDRDM8UVBTTLgtp6Ho2+4DXQwr+5Qt1WqgGqN 2oLg== MIME-Version: 1.0 X-Received: by 10.50.62.148 with SMTP id y20mr9761242igr.17.1440919470181; Sun, 30 Aug 2015 00:24:30 -0700 (PDT) Sender: chmeeedalf@gmail.com Received: by 10.36.58.149 with HTTP; Sun, 30 Aug 2015 00:24:29 -0700 (PDT) In-Reply-To: <20150830153859.S1263@besplex.bde.org> References: <1568331.OrSoeYfXsf@ralph.baldwin.cx> <20150830130612.L890@besplex.bde.org> <20150830153859.S1263@besplex.bde.org> Date: Sun, 30 Aug 2015 00:24:29 -0700 X-Google-Sender-Auth: 60wbfbO6IaSvthlPfoYGGsknACw Message-ID: Subject: Re: Devices with 36-bit paddr on 32-bit system From: Justin Hibbits To: Bruce Evans Cc: Adrian Chadd , "freebsd-arch@freebsd.org" , Marcel Moolenaar Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 07:24:31 -0000 Hi Bruce, On Sat, Aug 29, 2015 at 10:49 PM, Bruce Evans wrote: > On Sun, 30 Aug 2015, Bruce Evans wrote: > >> % ... >> % Index: sys/sys/bus.h >> % =================================================================== >> % --- sys/sys/bus.h (revision 287189) >> % +++ sys/sys/bus.h (working copy) >> % @@ -29,6 +29,7 @@ >> % #ifndef _SYS_BUS_H_ >> % #define _SYS_BUS_H_ >> % % +#include >> % #include >> % #include >> % #include >> % @@ -292,8 +293,8 @@ >> % int rid; /**< @brief resource identifier */ >> % int flags; /**< @brief resource flags */ >> % struct resource *res; /**< @brief the real resource when >> allocated */ >> % - u_long start; /**< @brief start of resource >> range */ >> % - u_long end; /**< @brief end of resource range >> */ >> % + bus_addr_t start; /**< @brief start of resource >> range */ >> % + bus_addr_t end; /**< @brief end of resource range >> */ > > > Mail programs (mostly mine) corrupted the formatting more competely. > >> I think rman functions should use an rman type and not hard-code >> bus_addr_t. >> Related bus functions should then use this type. Style bugs from blind >> substitution can be reduced by using a less verbose name. >> >> % u_long count; /**< @brief count within range */ >> % }; > > > Or just use uintmax_t for everything in rman. rman was written before > C99 broke C by making u_long no longer the largest integer type. It > used u_long because it was the largest integer type (though it actually > wasn't, since FreeBSD used nonstandard extensions in Gnu C) and it is > easiest to use a single non-typedefed type that is large enough for all > cases. uintmax_t is C99's replacement of u_long. > > I don't like the bloat from using uintmax_t for everything, but rman > should only used for initialization so uintmax_t for rman should only > give space bloat, only on 32-bit arches. > > An rman typedef for this type allows re-optimizing the 32-bit arches, > but brings back the problem of typedefed types being hard to use. > > Bruce Thanks for the lengthy review. I am well aware of (most of) the style issues and intend to address them all before releasing a final patch. My goal with this rough early release was to solicit comments on the approach taken, and if there should be a better way to address the overall problem. I did consider your suggestion of uintmax_t across the board early on, but felt there must be a better type name to identify purpose. A 'rman_addr_t' is better than bus_addr_t, so I can easily make that mechanical change (simple sed on the diff would do it). Also, good eye on the qmax()/qmin() needing to have unsigned counterparts, like I said, I haven't yet tested on a full 64-bit platform :) Regarding casts and macros, since this was an evolutionary process, I haven't yet cleaned things up (why clean it up, if people tell me the basic approach sucks anyway?), so that'll come in round 2, but thanks for identifying them, they may have slipped through later patches. I'm not tied one way or the other to RM_MIN_START vs 0. Keeping it 0 would definitely be easier on the fingers, less rewrites, and I do have more to change for the RM_MAX_END (choose a better name?) As for the 'long long abomination', do you have anything better? I can't use PRIxPTR, because if that was possible I wouldn't need to make this change in the first place. I guess I could use %jx, or PRIxMAX, and cast to uintmax_t. So, that all being said, any suggestion on what the naming should be? I'll pose: rman_addr_t and rman_size_t, as counterparts to bus_*_t. Yes, it's more verbose, but it also shows usage intent. However, I would also yield to uintmax_t if others think that's the best option as well. - Justin From owner-freebsd-arch@freebsd.org Sun Aug 30 21:21:40 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 030719C67C7 for ; Sun, 30 Aug 2015 21:21:40 +0000 (UTC) (envelope-from hoomanfazaeli@gmail.com) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 91F8CCF2; Sun, 30 Aug 2015 21:21:39 +0000 (UTC) (envelope-from hoomanfazaeli@gmail.com) Received: by wicne3 with SMTP id ne3so58419725wic.0; Sun, 30 Aug 2015 14:21:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=6JHzySLlG4jGQX+Shkwu2ewVXTwSpj3uohSYUu6htAQ=; b=kkD1H4mhHmxn5fhkvErKgXIgfM846EaIfmkpOyRB1rWagFbu/CT+DQoqkPOB2dvSbF tR41UD6hyhPlIkwIbIM2Uu0NVVkRHl9ec1n3GX4CPCKN6b7Ur0z99MgoPGSsVBBBMTNV 9tZfEiMiBp+FoavBf16bI6ASwYRmqm0+W5minri6uBD3agr+2zxtc4jZwFHNDti/Yu3/ A5JbaAQTnyZdLqrJ2MqTkHgS4Wve1Q6YTDTRF1sMZQaSZy89EW631u1ytM23wGgGgPYS TDlxY9xRbJN95c8xPJLzjutnR0Uoa53r/Lpi6khZIZow5cErSx1a9uvfxPw72FvOW89s HYzQ== X-Received: by 10.180.189.108 with SMTP id gh12mr5132512wic.53.1440969697901; Sun, 30 Aug 2015 14:21:37 -0700 (PDT) Received: from [192.168.2.30] ([2.190.161.53]) by smtp.googlemail.com with ESMTPSA id bq7sm18994000wjc.31.2015.08.30.14.21.35 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 30 Aug 2015 14:21:37 -0700 (PDT) Message-ID: <55E373DC.3060306@gmail.com> Date: Mon, 31 Aug 2015 01:51:32 +0430 From: Hooman Fazaeli User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130215 Thunderbird/17.0.3 MIME-Version: 1.0 To: Sean Bruno CC: freebsd-arch@freebsd.org Subject: Re: Network card interrupt handling References: <55DDE9B8.4080903@freebsd.org> In-Reply-To: <55DDE9B8.4080903@freebsd.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 30 Aug 2015 21:21:40 -0000 On 8/26/2015 9:00 PM, Sean Bruno wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > We've been diagnosing what appeared to be out of order processing in > the network stack this week only to find out that the network card > driver was shoveling bits to us out of order (em). > > This *seems* to be due to a design choice where the driver is allowed > to assert a "soft interrupt" to the h/w device while real interrupts > are disabled. This allows a fake "em_msix_rx" to be started *while* > "em_handle_que" is running from the taskqueue. We've isolated and > worked around this by setting our processing_limit in the driver to > - -1. This means that *most* packet processing is now handled in the > MSI-X handler instead of being deferred. Some periodic interference > is still detectable via em_local_timer() which causes one of these > "fake" interrupt assertions in the normal, card is *not* hung case. > > Both functions use identical code for a start. Both end up down > inside of em_rxeof() to process packets. Both drop the RX lock prior > to handing the data up the network stack. > > This means that the em_handle_que running from the taskqueue will be > preempted. Dtrace confirms that this allows out of order processing > to occur at times and generates a lot of resets. > > The reason I'm bringing this up on -arch and not on -net is that this > is a common design pattern in some of the Ethernet drivers. We've > done preliminary tests on a patch that moves *all* processing of RX > packets to the rx_task taskqueue, which means that em_handle_que is > now the only path to get packets processed. > > > https://people.freebsd.org/~sbruno/em_interupt_to_taskqueue.diff > > My sense is that this is a slightly "better" method to handle the > packets but removes some immediacy from packet processing since all > processing is deferred. However, all packet processing is now > serialized per queue, which I think is the proper implementation. > > Am I smoking "le dope" here or is this the way forward? > > Which versions of the driver have this problem? -- Best regards Hooman Fazaeli From owner-freebsd-arch@freebsd.org Mon Aug 31 00:00:10 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2793D9C6C5A for ; Mon, 31 Aug 2015 00:00:10 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (gate2.funkthat.com [208.87.223.18]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "gold.funkthat.com", Issuer "gold.funkthat.com" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E1732E17; Mon, 31 Aug 2015 00:00:09 +0000 (UTC) (envelope-from jmg@gold.funkthat.com) Received: from gold.funkthat.com (localhost [127.0.0.1]) by gold.funkthat.com (8.14.5/8.14.5) with ESMTP id t7V003EE019461 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 30 Aug 2015 17:00:03 -0700 (PDT) (envelope-from jmg@gold.funkthat.com) Received: (from jmg@localhost) by gold.funkthat.com (8.14.5/8.14.5/Submit) id t7V003Yl019460; Sun, 30 Aug 2015 17:00:03 -0700 (PDT) (envelope-from jmg) Date: Sun, 30 Aug 2015 17:00:03 -0700 From: John-Mark Gurney To: Adrian Chadd Cc: Jack Vogel , Sean Bruno , "freebsd-arch@freebsd.org" Subject: Re: Network card interrupt handling Message-ID: <20150831000003.GJ33167@funkthat.com> References: <55DDE9B8.4080903@freebsd.org> <20150828184800.GE33167@funkthat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Operating-System: FreeBSD 9.1-PRERELEASE amd64 X-PGP-Fingerprint: 54BA 873B 6515 3F10 9E88 9322 9CB1 8F74 6D3F A396 X-Files: The truth is out there X-URL: http://resnet.uoregon.edu/~gurney_j/ X-Resume: http://resnet.uoregon.edu/~gurney_j/resume.html X-TipJar: bitcoin:13Qmb6AeTgQecazTWph4XasEsP7nGRbAPE X-to-the-FBI-CIA-and-NSA: HI! HOW YA DOIN? can i haz chizburger? User-Agent: Mutt/1.5.21 (2010-09-15) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (gold.funkthat.com [127.0.0.1]); Sun, 30 Aug 2015 17:00:03 -0700 (PDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 00:00:10 -0000 Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700: > [snip] > > Well, the other big reason for doing it deferred like this is to avoid > network based deadlocks because you're being fed packets faster than > you can handle them. If you never yield, you stop other NIC > processing. You snipped the part of me asking isn't the interrupt thread just the same interruptable context as the task queue? Maybe the priority is different, but that can be adjusted to be the same and still save the context switch... There is no break/moderation in the taskqueue, as it'll just enqueue itself, and when the task queue breaks out, it'll just immediately run itself, since it has a dedicated thread to itself... So, looks like you get the same spinning behavior... > People used to do run-to-completion and then complained when this > happened, so polling was a thing. Maybe when using PCI shared interrupts, but we are talking about PCIe MSI-X unshared interrupts. > So - I'm all for doing it with a fast interrupt handler and a fast > taskqueue. As long as we don't run things to completion and > re-schedule the taskqueue (so other things on that core get network > processing) then I'm okay. > > (I kinda want us to have NAPI at some point...) -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." From owner-freebsd-arch@freebsd.org Mon Aug 31 00:24:49 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7154D9C66EF for ; Mon, 31 Aug 2015 00:24:49 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-io0-x230.google.com (mail-io0-x230.google.com [IPv6:2607:f8b0:4001:c06::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3E4761863; Mon, 31 Aug 2015 00:24:49 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: by iod35 with SMTP id 35so12334426iod.3; Sun, 30 Aug 2015 17:24:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=yMrqHCW6ifCf0bTBACmQnwqe9J1FW2+cly1O/djyelA=; b=UzY4wF9rcqF8DRgN45GQBZcD63Eyx4gFqFZMWKuuCBpjb5s0b4ZIelLmUXIorpS56p 6um9PU2NjpLds3UEuJBADtqJQvhF55nSGCiEarJ5I3YnZGUk1z+izRDUXaRLCLjS0nTe BtZWXAWH04E4hmIan1hSomi9Mmh7NnfB8NUxsazyPpo7DfAxnSOGVfCC9GVCuzO4IY5C ScpHKU0RDECe9ypSCJhWhqKUleGjHeCOtvjFr2m981eAFuz+ahEICSF3QRdZ/LDlaHZF XK5aLr9GoXNHXvPyzzHSV9xQ7Vze2uK/FcVpaF+jZhSroi6uqvk+Lia6t3I4guPF6ygR +4oA== MIME-Version: 1.0 X-Received: by 10.107.154.212 with SMTP id c203mr3370414ioe.123.1440980688639; Sun, 30 Aug 2015 17:24:48 -0700 (PDT) Received: by 10.36.28.208 with HTTP; Sun, 30 Aug 2015 17:24:48 -0700 (PDT) In-Reply-To: <20150831000003.GJ33167@funkthat.com> References: <55DDE9B8.4080903@freebsd.org> <20150828184800.GE33167@funkthat.com> <20150831000003.GJ33167@funkthat.com> Date: Sun, 30 Aug 2015 17:24:48 -0700 Message-ID: Subject: Re: Network card interrupt handling From: Adrian Chadd To: John-Mark Gurney Cc: Jack Vogel , Sean Bruno , "freebsd-arch@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 00:24:49 -0000 On 30 August 2015 at 17:00, John-Mark Gurney wrote: > Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700: >> [snip] >> >> Well, the other big reason for doing it deferred like this is to avoid >> network based deadlocks because you're being fed packets faster than >> you can handle them. If you never yield, you stop other NIC >> processing. > > You snipped the part of me asking isn't the interrupt thread just the > same interruptable context as the task queue? Maybe the priority is > different, but that can be adjusted to be the same and still save the > context switch... > > There is no break/moderation in the taskqueue, as it'll just enqueue > itself, and when the task queue breaks out, it'll just immediately run > itself, since it has a dedicated thread to itself... So, looks like > you get the same spinning behavior... > >> People used to do run-to-completion and then complained when this >> happened, so polling was a thing. > > Maybe when using PCI shared interrupts, but we are talking about PCIe > MSI-X unshared interrupts. Well, try it and see what happens. You can still get network livelock and starvation of other interfaces with ridiculously high pps if you never yield. :P -adrian From owner-freebsd-arch@freebsd.org Mon Aug 31 15:49:29 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 488A69C7A3D for ; Mon, 31 Aug 2015 15:49:29 +0000 (UTC) (envelope-from sbruno@freebsd.org) Received: from mail.ignoranthack.me (ignoranthack.me [199.102.79.106]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2DBC024A for ; Mon, 31 Aug 2015 15:49:28 +0000 (UTC) (envelope-from sbruno@freebsd.org) Received: from [192.168.200.200] (unknown [50.136.155.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: sbruno@ignoranthack.me) by mail.ignoranthack.me (Postfix) with ESMTPSA id 802FD193655 for ; Mon, 31 Aug 2015 15:49:22 +0000 (UTC) Subject: Re: Network card interrupt handling To: freebsd-arch@freebsd.org References: <55DDE9B8.4080903@freebsd.org> <20150828184800.GE33167@funkthat.com> From: Sean Bruno X-Enigmail-Draft-Status: N1110 Message-ID: <55E47781.7020009@freebsd.org> Date: Mon, 31 Aug 2015 08:49:21 -0700 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0 MIME-Version: 1.0 In-Reply-To: <20150828184800.GE33167@funkthat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 15:49:29 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 > I have a better question, for MSI-X, we have a dedicated interrupt > thread to do the processing, so why are we even doing any > moderation in this case? It's not any different than spinning in > the task queue.. > > How about the attached patch that just disables taskqueue > processing for MSI-X RX interrupts, and does all processing in the > interrupt thread? This is another design that I had thought of. For em(4) when using seperate ISR threads for *each* rx queue and *each* tx queue, I think that doing processing in the interrupt thread is the right thing to do. I'm unsure of what the correct thing to do when tx/rx is combined into a single handler though (igb/ix for example). This would lead to possible starvation as Adrian has pointed out. There is nothing stopping us from breaking the queues apart into seperate tx/rx threads of execution for these drivers. em(4) was my little science project to see what the behavior would be. > > Do you need to add the rx_task to the em_local_timer? If so, then > I would look at setting a flag in the _rxeof that says that > processing is happening... and in the case of the taskqueue, when > it sees this flag set, it just exits, while for the interrupt > filter case, we'd need to be more careful (possibly set a flag that > the taskqueue will inspect, and cause it to stop processing the rx > queue)... > ^^ I'll ponder this a bit further today and comment after coffee. > btw, since you're hacking on em a lot, interrested in fixing em's > jumbo frames so it doesn't use 9k clusters, but instead page sized > clusters? > > Uh ... hrm. I can look into it, but would need more details as I'm pretty ignorant of what you're referring to. Ping me off list and I'll take a look (jumbo frames is out of scope for $dayjob). sean -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQF8BAEBCgBmBQJV5Hd+XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXRCQUFENDYzMkU3MTIxREU4RDIwOTk3REQx MjAxRUZDQTFFNzI3RTY0AAoJEBIB78oecn5kL4wH/AmMvnZ7EKfF04qKhUxdOA90 YN5OZpyeXc8zk0QUq+INNMIHiQJN1wCHiOVd46YIuwjdWeSvHxKgnJMV1whDod55 c4QO6WG5yRcP5Wl30YN5XhjfUW48MYytYXxlAY5cC5A+TIUq/HywSNmyEVxKAooY lSw+8Zpdzaozv1LxS70bRggi9y/y5NEgcVViO1cjip+nkl3eNvYOFq5jp2KJc0vS +oe/wqbF5syRiBO4R2XnJs6PJ9BALOF73pFteHBebAe5jUwt6UD17c35/I2B4v60 +zNuM3rdNdDCOecFEdFctOQe6XDpSAu6Q7Dv88SKoKeIs+2lXHD/AJf24heTQOg= =oY0k -----END PGP SIGNATURE----- From owner-freebsd-arch@freebsd.org Mon Aug 31 21:33:33 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 62D929C6EFC for ; Mon, 31 Aug 2015 21:33:33 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 41ADD1603 for ; Mon, 31 Aug 2015 21:33:33 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 22806B95D for ; Mon, 31 Aug 2015 17:33:32 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Subject: Re: Supporting cross-debugging vmcores in libkvm (Testing needed) Date: Mon, 31 Aug 2015 14:21:19 -0700 Message-ID: <4121266.HXN5YIfUMj@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <5166205.rtMjEmhvmo@ralph.baldwin.cx> References: <3121152.ujdxFEovO3@ralph.baldwin.cx> <5166205.rtMjEmhvmo@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 31 Aug 2015 17:33:32 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 21:33:33 -0000 On Wednesday, August 12, 2015 10:50:20 AM John Baldwin wrote: > On Tuesday, August 04, 2015 10:56:09 AM John Baldwin wrote: > > Many debuggers (recent gdb and lldb) support cross-architecture debugging > > just fine. My current WIP port of kgdb to gdb7 supports cross-debugging for > > remote targets already, but I wanted it to also support cross-debugging for > > vmcores. > > > > The existing libkvm/kgdb code in the tree has some limited support for > > cross-debugging. It requires building a custom libkvm (e.g. libkvm-i386.a) > > and custom kgdb for each target platform. However, gdb (and lldb) both > > support multiple targets in a single binary, so I'd like to have a single > > kgdb binary that can cross-debug anything. > > > > I started hacking on libkvm last weekend and have a prototype that I've used > > (along with some patches to my kgdb port) to debug an amd64 vmcore on an > > i386 machine and vice versa. > > > > ... > > > > What I'm mostly after is comments on the API, etc. Once that is settled I > > will move forward on converting and/or stubbing the other backends (the > > stub route would be to only support other backends on native systems for > > now). > > I guess this is closer to a nuclear power plant than a bikeshed judging by the > feedback. I have ported the rest of the MD backends and verified that the > updated libkvm passes a universe build (including various static assertions > for the duplicated constants in other backends). What I have not done is any > runtime testing and I would like to ask for help with that now. In particular > I need someone to test that kgdb and/or ps works against a native core dump > on all platforms other than amd64 and i386. Note that some of the trickiness > is that the backends now have to make runtime decisions for things that were > previously compile-time decisions. The biggest one affected by this is the > MIPS backend as that backend handles three ABIs (mipso32, mipsn32, and mipsn64). > I believe I have the handling for that correct (mips[on]32 use 32-bit KSEGs > where as mipsn64 uses the extended segments and compat32 KSEGS, and mipso32 > uses 32-bit PTEs and mipsn32/n64 both use 64-bit PTEs) (plus both endians > for both in theory). The ARM backend also handles both endians (in theory). > > Another wrinkle is that sparc64 uses its own dump format instead of writing > out an ELF file. I had to convert the header structures to use fixed-width > types to be cross-friendly. It would be good to ensure that a new libkvm > can read a vmcore from an old kernel and vice versa to make sure my conversion > is correct (I added an explicit padding field that I believe was implicit > before). > > The code is currently available for review in phabric at > https://reviews.freebsd.org/D3341 > > To test, you can run 'arc patch D3341' in a clean tree to apply the patch. I've just rebased this to port aarch64's minidump support. I just need people willing and able to test on non-x86. Testing with the in-tree kgdb using an updated libkvm would be sufficient. -- John Baldwin From owner-freebsd-arch@freebsd.org Mon Aug 31 21:33:38 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7B2129C6F22 for ; Mon, 31 Aug 2015 21:33:38 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5A32817EE for ; Mon, 31 Aug 2015 21:33:38 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6F25BB98D; Mon, 31 Aug 2015 17:33:37 -0400 (EDT) From: John Baldwin To: John-Mark Gurney Cc: 'freebsd-arch' Subject: Re: Supporting cross-debugging vmcores in libkvm Date: Fri, 28 Aug 2015 16:37:56 -0700 Message-ID: <8789359.0qvzdHHvAS@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20150828211952.GG33167@funkthat.com> References: <3121152.ujdxFEovO3@ralph.baldwin.cx> <3887505.r5DL7PVlOf@ralph.baldwin.cx> <20150828211952.GG33167@funkthat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 31 Aug 2015 17:33:37 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 21:33:38 -0000 On Friday, August 28, 2015 02:19:52 PM John-Mark Gurney wrote: > John Baldwin wrote this message on Fri, Aug 28, 2015 at 13:39 -0700: > > On Friday, August 28, 2015 11:27:05 AM John-Mark Gurney wrote: > > > John Baldwin wrote this message on Tue, Aug 04, 2015 at 10:56 -0700: > > > How will we prevent native only aware apps from getting confused when > > > accessing non-native cores? Will kvm_openfiles fail for non-native > > > cores? or will kvm_read fail for non-native cores? > > > > kvm_openfiles() will fail. kvm_open2() will fail for a non-native core > > if a symbol resolving routine is not supplied. > > > > One API question I had is if it would be useful to allow a void * cookie > > to be passed to the symbol resolving routine (the same cookie would be > > passed to kvm_open2() and stored internally to be passed on each resolution > > request). I think in practice we don't need that level of complexity > > though (my kgdb changes did not). > > I can't think of a reason it would be required, but that doesn't mean > someone else wouldn't need it... You need to resolve symbols to find the root of the global page table structures that let you do virtual to physical translations. > Though wouldn't the core parser provide the symbol lookup function? > > > I will need to rebase this to port the arm64 minidump support over, but > > I also need people to test this. > > I'll see what I can do to help test it... Mostly it needs testing on non-x86. -- John Baldwin From owner-freebsd-arch@freebsd.org Mon Aug 31 21:33:34 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A4C339C6F04 for ; Mon, 31 Aug 2015 21:33:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 829AF1625; Mon, 31 Aug 2015 21:33:34 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 925E3B913; Mon, 31 Aug 2015 17:33:33 -0400 (EDT) From: John Baldwin To: freebsd-arch@freebsd.org Cc: John-Mark Gurney , Adrian Chadd , Sean Bruno , Jack Vogel Subject: Re: Network card interrupt handling Date: Mon, 31 Aug 2015 14:18:17 -0700 Message-ID: <5699836.EEH7cPOQBG@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: <20150831000003.GJ33167@funkthat.com> References: <55DDE9B8.4080903@freebsd.org> <20150831000003.GJ33167@funkthat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 31 Aug 2015 17:33:33 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 31 Aug 2015 21:33:34 -0000 On Sunday, August 30, 2015 05:00:03 PM John-Mark Gurney wrote: > Adrian Chadd wrote this message on Fri, Aug 28, 2015 at 12:41 -0700: > > [snip] > > > > Well, the other big reason for doing it deferred like this is to avoid > > network based deadlocks because you're being fed packets faster than > > you can handle them. If you never yield, you stop other NIC > > processing. > > You snipped the part of me asking isn't the interrupt thread just the > same interruptable context as the task queue? Maybe the priority is > different, but that can be adjusted to be the same and still save the > context switch... > > There is no break/moderation in the taskqueue, as it'll just enqueue > itself, and when the task queue breaks out, it'll just immediately run > itself, since it has a dedicated thread to itself... So, looks like > you get the same spinning behavior... Yes, that is true and why all the interrupt moderation stuff in the NIC drivers that I've seen has always been pointless. All it does is add extra overhead since you waste time with extra context switches back to yourself in between servicing packets. It does not permit any other NICs to run at all. (One of the goals of my other patches that I mentioned is to make it possible for multiple devices to share ithreads even when using discrete interrupts (e.g. MSI) so that the yielding done actually would give a chance for other devices to run, but currently it is all just a waste of CPU cycles). If you think this actually helps, I challenge to you capture a KTR_SCHED trace of it ever working as intended. -- John Baldwin From owner-freebsd-arch@freebsd.org Tue Sep 1 01:05:19 2015 Return-Path: Delivered-To: freebsd-arch@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C24F39C7DDF for ; Tue, 1 Sep 2015 01:05:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9AB5F68D; Tue, 1 Sep 2015 01:05:19 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from ralph.baldwin.cx (c-73-231-226-104.hsd1.ca.comcast.net [73.231.226.104]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 6E526B913; Mon, 31 Aug 2015 21:05:18 -0400 (EDT) From: John Baldwin To: "K. Macy" Cc: freebsd-arch@freebsd.org, Sean Bruno Subject: Re: Network card interrupt handling Date: Mon, 31 Aug 2015 14:41:14 -0700 Message-ID: <1709356.SnmUAQFSba@ralph.baldwin.cx> User-Agent: KMail/4.14.3 (FreeBSD/10.2-PRERELEASE; KDE/4.14.3; amd64; ; ) In-Reply-To: References: <55DDE9B8.4080903@freebsd.org> <24017021.PxBoCiQKDJ@ralph.baldwin.cx> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Mon, 31 Aug 2015 21:05:18 -0400 (EDT) X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Sep 2015 01:05:20 -0000 On Friday, August 28, 2015 06:25:53 PM K. Macy wrote: > On Aug 28, 2015 12:59 PM, "John Baldwin" wrote: > > > > On Wednesday, August 26, 2015 09:30:48 AM Sean Bruno wrote: > > > We've been diagnosing what appeared to be out of order processing in > > > the network stack this week only to find out that the network card > > > driver was shoveling bits to us out of order (em). > > > > > > This *seems* to be due to a design choice where the driver is allowed > > > to assert a "soft interrupt" to the h/w device while real interrupts > > > are disabled. This allows a fake "em_msix_rx" to be started *while* > > > "em_handle_que" is running from the taskqueue. We've isolated and > > > worked around this by setting our processing_limit in the driver to > > > -1. This means that *most* packet processing is now handled in the > > > MSI-X handler instead of being deferred. Some periodic interference > > > is still detectable via em_local_timer() which causes one of these > > > "fake" interrupt assertions in the normal, card is *not* hung case. > > > > > > Both functions use identical code for a start. Both end up down > > > inside of em_rxeof() to process packets. Both drop the RX lock prior > > > to handing the data up the network stack. > > > > > > This means that the em_handle_que running from the taskqueue will be > > > preempted. Dtrace confirms that this allows out of order processing > > > to occur at times and generates a lot of resets. > > > > > > The reason I'm bringing this up on -arch and not on -net is that this > > > is a common design pattern in some of the Ethernet drivers. We've > > > done preliminary tests on a patch that moves *all* processing of RX > > > packets to the rx_task taskqueue, which means that em_handle_que is > > > now the only path to get packets processed. > > > > It is only a common pattern in the Intel drivers. :-/ We (collectively) > > spent quite a while fixing this in ixgbe and igb. Longer (hopefully more > > like medium) term I have an update to the interrupt API I want to push in > > that allows drivers to manually schedule interrupt handlers using an > > 'hwi' API to replace the manual taskqueues. This also ensures that > > the handler that dequeues packets is only ever running in an ithread > > context and never concurrently. > > > > Jeff has a generalization of the net_task infrastructure used at Nokia > called grouptaskq that I've used for iflib. That does essentially what you > refer to. I've converted ixl and am currently about to test an ixgbe > conversion. I anticipate converting mlxen, all Intel drivers as well as the > remaining drivers with device specific code in netmap. The one catch is > finding someone who will publicly admit to owning re hardware so that I can > buy it from him and test my changes. Note that the ithread changes I refer to are for all devices (not just network interfaces) and fix some other issues as well (e.g. INTR_FILTER is always enabled and races with tearing down filters are closed, it also uses a more thread_lock()-friendly state for idle ithreads, and it also allows us to experiment with sharing ithreads among devices as well as having multiple threads service a queue of interrupt handlers if desired). It may be that this will make your life easier since you might be able to reuse the new primitives more directly rather than bypassing ithreads. I've posted the changes to arch@ a few different times over the past several years just haven't pushed them in. (They aren't perfect in that I don't yet have APIs for changing the plumbing around due to lack of use cases to build the APIs from.) -- John Baldwin