From owner-freebsd-current@freebsd.org Sat Dec 7 00:17:40 2019 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 41E531BCBB8 for ; Sat, 7 Dec 2019 00:17:40 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 47V95G4rNYz47vn; Sat, 7 Dec 2019 00:17:38 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [62.141.129.235]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 3513A260190; Sat, 7 Dec 2019 01:17:34 +0100 (CET) Subject: Re: CAM breaks USB [was Re: USB causing boot to hang] To: Alexander Motin , sgk@troutmask.apl.washington.edu Cc: Warner Losh , FreeBSD Current References: <20191206202316.GA1053@troutmask.apl.washington.edu> <20191206223144.GA3224@troutmask.apl.washington.edu> <20191206225231.GA949@troutmask.apl.washington.edu> <20191206234105.GA1027@troutmask.apl.washington.edu> <3df3ff25-9f62-6f0f-7823-e846a43725eb@FreeBSD.org> From: Hans Petter Selasky Message-ID: <3e5ead69-b933-70a4-a183-67552d8932fb@selasky.org> Date: Sat, 7 Dec 2019 01:16:13 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:68.0) Gecko/20100101 Thunderbird/68.1.2 MIME-Version: 1.0 In-Reply-To: <3df3ff25-9f62-6f0f-7823-e846a43725eb@FreeBSD.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 47V95G4rNYz47vn X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of hps@selasky.org designates 88.99.82.50 as permitted sender) smtp.mailfrom=hps@selasky.org X-Spamd-Result: default: False [-5.43 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; R_SPF_ALLOW(-0.20)[+a:mail.turbocat.net]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[selasky.org]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; IP_SCORE(-3.13)[ip: (-9.34), ipnet: 88.99.0.0/16(-4.73), asn: 24940(-1.58), country: DE(-0.01)]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:24940, ipnet:88.99.0.0/16, country:DE]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Dec 2019 00:17:40 -0000 On 2019-12-07 01:09, Alexander Motin wrote: > On 06.12.2019 18:41, Steve Kargl wrote: >> On Fri, Dec 06, 2019 at 06:15:32PM -0500, Alexander Motin wrote: >>> On 06.12.2019 17:52, Steve Kargl wrote: >>>> On Fri, Dec 06, 2019 at 03:33:09PM -0700, Warner Losh wrote: >>>>> On Fri, Dec 6, 2019 at 3:31 PM Steve Kargl >>>>> wrote: >>>>>> The problem seems to be caused 355010. This is a commit to >>>>>> fix CAM, which seems to break USB. >>>>>> >>>>> Yes. mav@ made this change... >>>>> >>>> src/UPDATING seems to be missing an entry about CAM breaking USB. >>> >>> And also that moon is made of cheese. :-\ >> >> Not sure what you mean. > > I mean that if we are going to write there random fairy-tales, then I > prefer my moon. > > If serious, then my change did not change semantics of any existing > tunables, only the way some of them are implemented, so there was > nothing to write in UPDATING. > >> You made a change, and the commit log >> even notes that there could be an issue. Yet, you want a user >> to waste half a day finding the root cause of the problem. > > I am sorry that you wasted your time, but quick and ungrounded blames is > the last thing I want to read on Friday evening after the long day. > >>>> The commit message for 355010 states: >>>> >>>> Devices appearing on USB bus later may still require setting >>>> kern.cam.boot_delay, but hopefully those are minority. >>>> >>>> There is no statement about "where" kern.cam.boot_delay should be set. >>>> There is no statement about "what" value(s) kern.cam.boot_delay should be. >>> >>> If you never needed it before, you still don't need it. >> >> Prior to 355010 the system just boots up. After 355010 >> the system hangs. Will kern.cam.boot_delay paper over >> whatever (latent?) bug you've exposed? > > My change affected the timing of system boot process, allowing system to > continue booting some further, not waiting for CAM to scan its buses and > disks. If the problem is reproducible even without USB storage, then > CAM probably does not wait for it, so it is not the problem I first > thought about. > >>> If system hangs even without any USB disk attached, then I don't see a >>> relation between CAM and USB here. My change could affect some timings >>> of the boot process, but without closer debugging it is hard to guess >>> something. To be sure whether USB is related I would try to disable all >>> USB controllers either in BIOS or with set of loader tunables like >>> hint.ehci.0.disabled=1 , hint.ohci.0.disabled=1 , >>> hint.xhci.0.disabled=1, ... >> >> Yep. Completely disabling USB allows the system to boot. I don't >> see how this would be unexpected as umass using cam. > > umass uses CAM, but you've told the problem happens even without umass, > that is why I told that I don't see any relation. Does disabling of > _all_ USB fixes the problem? Have you tried to narrow it down to > specific controller or device? > > Is there anything special in your system? Are you running GENERIC > kernel? If not, then what do you have changed? > > If your kernel includes VERBOSE_SYSINIT as GENERIC does, I would try to > set debug.verbose_sysinit=1 and see how far the boot process goes and at > which stage it may is hanging (if we guess that hang is related to the > stage and not asynchronous). > Hi, There is an option you can compile into the kernel which will allow the keyboard to enter the debugger. options ALT_BREAK_TO_DEBUGGER Sounds to me like either a leaked refcount or that one thread is spinning blocking execution of other threads. --HPS