From owner-svn-src-all@freebsd.org Sun Nov 24 19:42:46 2019 Return-Path: Delivered-To: svn-src-all@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id C9F4D1B8D39 for ; Sun, 24 Nov 2019 19:42:46 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qv1-xf41.google.com (mail-qv1-xf41.google.com [IPv6:2607:f8b0:4864:20::f41]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47LgYd55xdz4P0l for ; Sun, 24 Nov 2019 19:42:45 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qv1-xf41.google.com with SMTP id y18so4881136qve.2 for ; Sun, 24 Nov 2019 11:42:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aJytov4nnx8d2U9RkRfcMod5EsMojNiBu1tVcy+lUEc=; b=iLmD4dmQtsZ0cesP0on0+IEHT6jyQQbtbeOZC7jbnONuZhF9hrLTEKNyw4XIJg1mdt bGySK/EI65jNRE2mxetZVL1j2LvOOAXVaQ/eXpC9sTqAZIQkv6nShRmgqyntmXj6+i7C RU8Bvezw7RfHbtfch8bU5EvqjqffKaRasv7YOa6gvxm/vQpYpvRanYTlbVtZiZ2scJfD jS9XlJMkspCDqcwFBySSiMohVuQCR+jDGUzC7M84l4HCcF2IaDRDSUu7xvCBWb+iZ6bm j+igs02tOKvBmXqiTtNhNB3/IXbQ/hAxJjJ/p/uI9Ehnu9HOgQXzwgxsalefkqgjtdI3 L3hw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aJytov4nnx8d2U9RkRfcMod5EsMojNiBu1tVcy+lUEc=; b=r1XJig39COtaMzqXcWdL+LrRKlqY7xnKgeyezy+gyDiFGDj9AMuP0OHscqe4mqxo/v uLUtiiwZMDGxUD6TvGbhV+9YN7prNThWdu++ATaafe+2p2VioDPhUuUKhecf7qs+kJGa qO06B9c+CrDaO4vHS+P8ZEO44htfbJNetaGEk9UB7SEMT7bzOJ9maHAipwSM4LjUkEmc eax6HyqlMaHbpFvs3kYnU90qlK1mjwlYoDr009yH9SxzyQ1cpf+KIQaBaFDjnMMdER76 ihxpLOGLTtsGtbc2Q2kFls+cApwPYBP363YLXyI/PsrvbpA4YMMpyuch9itu7q3tRkHt Gnag== X-Gm-Message-State: APjAAAXL/mt9PAdfYys499xq6sBR7005rO6ghB7m/O20G09hoM7Mdxez WhPJpht5D5IUSztxZbYhcJspBReY1vSrZCQGmEHyXw== X-Google-Smtp-Source: APXvYqxKtS8vafKqEqRcjSfExGdeImlUrK8Ddb0aJ1yWblz7ov698GudOB/l3wvPUE6tuH9Rdq9Kj8IhMG1AVDeWi6o= X-Received: by 2002:a0c:8506:: with SMTP id n6mr12954277qva.87.1574624558245; Sun, 24 Nov 2019 11:42:38 -0800 (PST) MIME-Version: 1.0 References: <201911232343.xANNhqkQ097797@repo.freebsd.org> <20191124131010.GB2707@kib.kiev.ua> <20191124164536.GC2707@kib.kiev.ua> In-Reply-To: <20191124164536.GC2707@kib.kiev.ua> From: Warner Losh Date: Sun, 24 Nov 2019 12:42:26 -0700 Message-ID: Subject: Re: svn commit: r355037 - head/sys/dev/pci To: Konstantin Belousov Cc: Warner Losh , src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org X-Rspamd-Queue-Id: 47LgYd55xdz4P0l X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=iLmD4dmQ; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::f41) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-3.05 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[svn-src-all@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; RCPT_COUNT_FIVE(0.00)[5]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; R_SPF_NA(0.00)[]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; FREEMAIL_TO(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; IP_SCORE(-1.06)[ip: (-0.99), ipnet: 2607:f8b0::/32(-2.28), asn: 15169(-1.96), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 24 Nov 2019 19:42:46 -0000 On Sun, Nov 24, 2019, 9:45 AM Konstantin Belousov wrote: > On Sun, Nov 24, 2019 at 07:37:01AM -0700, Warner Losh wrote: > > On Sun, Nov 24, 2019, 6:10 AM Konstantin Belousov > > wrote: > > > > > On Sat, Nov 23, 2019 at 11:43:52PM +0000, Warner Losh wrote: > > > > Author: imp > > > > Date: Sat Nov 23 23:43:52 2019 > > > > New Revision: 355037 > > > > URL: https://svnweb.freebsd.org/changeset/base/355037 > > > > > > > > Log: > > > > Push Giant down one layer > > > > > > > > The /dev/pci device doesn't need GIANT, per se. However, one > routine > > > > that it calls, pci_find_dbsf implicitly does. It walks a list that > can > > > > change when PCI scans a new bus. With hotplug, this means we could > > > > have a race with that scanning. To prevent that, take out Giant > around > > > > scanning the list. > > > > > > > > However, given that we have places in the tree that drop giant, if > > > > held when we call into them, the whole use of Giant to protect > newbus > > > > may be less effective that we desire, so add a comment about why > we're > > > > talking it out, and we'll address the issue when we lock newbus > with > > > > something other than Giant. > > > > > > > > Modified: > > > > head/sys/dev/pci/pci.c > > > > head/sys/dev/pci/pci_user.c > > > > > > > > Modified: head/sys/dev/pci/pci.c > > > > > > > > ============================================================================== > > > > --- head/sys/dev/pci/pci.c Sat Nov 23 23:41:21 2019 > (r355036) > > > > +++ head/sys/dev/pci/pci.c Sat Nov 23 23:43:52 2019 > (r355037) > > > > @@ -445,18 +445,21 @@ pci_find_bsf(uint8_t bus, uint8_t slot, uint8_t > > > func) > > > > device_t > > > > pci_find_dbsf(uint32_t domain, uint8_t bus, uint8_t slot, uint8_t > func) > > > > { > > > > - struct pci_devinfo *dinfo; > > > > + struct pci_devinfo *dinfo = NULL; > > > > > > > > + /* Giant because newbus is Giant locked revisit with newbus > > > locking */ > > > > + mtx_lock(&Giant); > > > > STAILQ_FOREACH(dinfo, &pci_devq, pci_links) { > > > > if ((dinfo->cfg.domain == domain) && > > > > (dinfo->cfg.bus == bus) && > > > > (dinfo->cfg.slot == slot) && > > > > (dinfo->cfg.func == func)) { > > > > - return (dinfo->cfg.dev); > > > > + break; > > > > } > > > > } > > > > + mtx_unlock(&Giant); > > > > > > > > - return (NULL); > > > > + return (dinfo != NULL ? dinfo->cfg.dev : NULL); > > > I do not think this change is correct. If the parallel hotplug, or > > > rather, hot-unplug event occurs, then dinfo potentially becomes invalid > > > right after the Giant unlock, which makes both this function and its > > > callers to access freed memory. Having caller to lock a newbus lock > > > around both the call and consumption of the returned data is required. > > > > > > There are many data lifetime issues. If anything the PCI user device > calls > > drops Giant and then picks it back up again we are in the same boat... I > > totally agree this is a bad situation, but can only really be fixed by > > locking newbus with a different lock than Giant and likely using some > kind > > of reference count for device_t that are handed out... > > > > In the mean time, I'll move giant back up into the ioctl routine and hope > > it isn't dropped by things it calls..m > > I think we can start at least by marking the Giant acqusitions that are > related to newbus. > > I never saw anybody talking publically why the naive translation of newbus > Giant into a sleepable lock, e.g. sx in exclusive mode, cannot work. From > my memory, one of the big issues is that many sleeps done at probe/attach, > need to drop the newbus lock. > If we are holding a topology lock, we can't drop it... otherwise someone else can acquire it and change things out from us... Also, locks are useless for device_t lifetimes. Too many places cache them in non obvious ways, so we need to add some kind of ref count. And since these relationships are often not parent / child, we'd likely need a spoilage call to release refs. You shouldn't complete your detaching if others holding references. Finally, we walk both up and down the tree. When I tried a super naive lock in the past, I ran into a bunch of LORs due to this. And what do you do if multiple nodes in the tree want to start a detach/delete at the same time? Or if there is a suspend going on at the same time... These are all solvable problems, but simply replacing giant with a sx lock won't solve them and may introduce issues because giant is special in many ways... it would, however, fix the one aspect of giant we don't want: random parts of the kernel dropping it. Warner >