From owner-freebsd-current@freebsd.org Sat Dec 7 00:33:48 2019 Return-Path: Delivered-To: freebsd-current@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 253C61BDBD1 for ; Sat, 7 Dec 2019 00:33:48 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) server-signature RSA-PSS (4096 bits) client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 47V9Rt728Pz49Ym for ; Sat, 7 Dec 2019 00:33:46 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: by mail-qk1-x729.google.com with SMTP id m188so8079313qkc.4 for ; Fri, 06 Dec 2019 16:33:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gohUV8cWMLkRQJOGEUhuof1arffOi+wbncP+Q0aaCOM=; b=QfjgaMGFD/qweiVYPtTyiM8VSkJfKlHAn2QU3J0ksFqUByVgWze58EOm7bcHvgHXq8 23J+l1roRPOYw8DL3Hx8MyY88NTQTUJTkGD1KBeda4BAYFzGv+5DKu0nUjQ/ZqH42OpT /9b6bG99zhCdv+HH4F6BUxrvYFvn2ESQ5ltOAzzV+D6k8w4DVLXiXvd0a7bTZq+eI+Zh z0sn6KpKVHTWqTyahrVO71zxV5L/eNrR82p1fVhMqN4iI+bbyT/jVnLJFF2uTFjDl8br UdcllOaCZ0wsFiQa+SU9Q2eiH2PsPPrkIrNGnQ0ZGZwLNKwrfLikfLhAlHHcLlv4Fqws roIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gohUV8cWMLkRQJOGEUhuof1arffOi+wbncP+Q0aaCOM=; b=Dl8gq9VGmgiUsZqd3W1oXCqVw3iGqo2nclRsCDtTyMY0POkCEdcOdhWsencO3Sd+6J g8nBfk5Bf52hjLECs44Jcee3w/p7HVr0TaLJJm++BSv4t31wJxNCuj3e+puVMFaUlNZO apjDp/6kAqScDzzzSNYXa3NvpZdRksqXA3oT8W16nYpTxTkeMg5kcwoX8ZbGAXTE1jEF ShPHmGy9fvjcNKLGlnZRKHTmlDvF0eJID7l6pzAAwslSQJtn5yiJdtbzz23mjE7khms9 aMKwJMPWRBCmnhXEZd331AEtLRL46+6fTY8iD2dF0c5zYtcfUfgWyV8cKUDaI77m+lWL mU6g== X-Gm-Message-State: APjAAAVP3SpkKZSa54ivkIgghQlpgd1iblhbCSfS+3kp5Ig59w9B25Mz ENAinWplUoHqq53FH+pvhnaggueIuhXdcVI06pfcN183UR2+QA== X-Google-Smtp-Source: APXvYqzsG93qepLjwcALfCB90YLLjS/rcyDscCH0+Vuxv6lQpEL1idzjp2aQmye89Ml6BrIhpkDPIp3xlfqSUFg6Poc= X-Received: by 2002:a37:b0c5:: with SMTP id z188mr10497040qke.215.1575678825226; Fri, 06 Dec 2019 16:33:45 -0800 (PST) MIME-Version: 1.0 References: <20191206202316.GA1053@troutmask.apl.washington.edu> <20191206223144.GA3224@troutmask.apl.washington.edu> <20191206225231.GA949@troutmask.apl.washington.edu> <20191206234105.GA1027@troutmask.apl.washington.edu> In-Reply-To: <20191206234105.GA1027@troutmask.apl.washington.edu> From: Warner Losh Date: Fri, 6 Dec 2019 17:33:33 -0700 Message-ID: Subject: Re: CAM breaks USB [was Re: USB causing boot to hang] To: Steve Kargl Cc: Alexander Motin , FreeBSD Current X-Rspamd-Queue-Id: 47V9Rt728Pz49Ym X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=bsdimp-com.20150623.gappssmtp.com header.s=20150623 header.b=QfjgaMGF; dmarc=none; spf=none (mx1.freebsd.org: domain of wlosh@bsdimp.com has no SPF policy when checking 2607:f8b0:4864:20::729) smtp.mailfrom=wlosh@bsdimp.com X-Spamd-Result: default: False [-4.68 / 15.00]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; R_DKIM_ALLOW(-0.20)[bsdimp-com.20150623.gappssmtp.com:s=20150623]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; DMARC_NA(0.00)[bsdimp.com]; TO_MATCH_ENVRCPT_SOME(0.00)[]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[bsdimp-com.20150623.gappssmtp.com:+]; RCVD_IN_DNSWL_NONE(0.00)[9.2.7.0.0.0.0.0.0.0.0.0.0.0.0.0.0.2.0.0.4.6.8.4.0.b.8.f.7.0.6.2.list.dnswl.org : 127.0.5.0]; R_SPF_NA(0.00)[]; FORGED_SENDER(0.30)[imp@bsdimp.com,wlosh@bsdimp.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; IP_SCORE(-2.68)[ip: (-9.19), ipnet: 2607:f8b0::/32(-2.23), asn: 15169(-1.93), country: US(-0.05)]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FROM_NEQ_ENVFROM(0.00)[imp@bsdimp.com,wlosh@bsdimp.com]; RCVD_TLS_ALL(0.00)[]; RCVD_COUNT_TWO(0.00)[2] Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 07 Dec 2019 00:33:48 -0000 On Fri, Dec 6, 2019 at 4:41 PM Steve Kargl wrote: > On Fri, Dec 06, 2019 at 06:15:32PM -0500, Alexander Motin wrote: > > On 06.12.2019 17:52, Steve Kargl wrote: > > > On Fri, Dec 06, 2019 at 03:33:09PM -0700, Warner Losh wrote: > > >> On Fri, Dec 6, 2019 at 3:31 PM Steve Kargl < > sgk@troutmask.apl.washington.edu> > > >> wrote: > > >> > > >>> On Fri, Dec 06, 2019 at 12:23:16PM -0800, Steve Kargl wrote: > > >>>> I updates /usr/src to r355452, and updated by kernel and > > >>>> world. Upon rebooting, verbose boot messages susgests > > >>>> the system is hanging when USB starts to attach. With > > >>>> the 3-week kernel verbose boot shows: > > >>>> > > >>>> ... > > >>>> pcm4: Playback channel matrix is: 2.0 (unknown) > > >>>> usbus0: 5.0Gbps Super Speed USB v3.0 > > >>>> ... > > >>>> > > >>>> end with a prompt on the console. With today's kernel, > > >>>> boot is hung after the last pcm4: message and no usbus0 > > >>>> is displayed. > > >>>> > > >>>> The booting kernel/system is a > > >>>> > > >>>> % uname -a > > >>>> FreeBSD 13.0-CURRENT #1 r354658: Wed Nov 13 11:23:32 PST 2019, > amd64 > > >>>> > > >>>> Again, the failing kernel is r 355452 > > >>>> > > >>> > > >>> The problem seems to be caused 355010. This is a commit to > > >>> fix CAM, which seems to break USB. > > >>> > > >> > > >> Yes. mav@ made this change... > > >> > > > > > > src/UPDATING seems to be missing an entry about CAM breaking USB. > > > > And also that moon is made of cheese. :-\ > > > > Not sure what you mean. You made a change, and the commit log > even notes that there could be an issue. Yet, you want a user > to waste half a day finding the root cause of the problem. > > > > The commit message for 355010 states: > > > > > > Devices appearing on USB bus later may still require setting > > > kern.cam.boot_delay, but hopefully those are minority. > > > > > > There is no statement about "where" kern.cam.boot_delay should be set. > > > There is no statement about "what" value(s) kern.cam.boot_delay > should be. > > > > If you never needed it before, you still don't need it. > > Prior to 355010 the system just boots up. After 355010 > the system hangs. Will kern.cam.boot_delay paper over > whatever (latent?) bug you've exposed? > > > > For the record add kern.cam.boot_delay to /boot/loader.conf with the > > > values 0, 1, and "1" did not allow the system to boot. > > > > boot_delay value is measured in milliseconds, so values of 0 and 1 mean > > close to nothing. You may try to set it to some 10000, if you really > > want to try to delay CAM devices attach, but I doubt. > > 0 and 1 were my guesses that boot_delay was an integer representation > of a boolean value; 0 being disable the new code; 1 being enable new > code. Looks like I guessed wrong given the documentation. > > > > > The system > > > will not boot with or without > > > > > > umass0 on uhub1 > > > umass0: on usbus0 > > > umass0: SCSI over Bulk-Only; quirks = 0x0100 > > > umass0:9:0: Attached to scbus9 > > > da0 at umass-sim0 bus 0 scbus9 target 0 lun 0 > > > da0: Fixed Direct Access SPC-4 SCSI device > > > da0: Serial Number NA7PEG27 > > > da0: 400.000MB/s transfers > > > da0: 3815447MB (7814037167 512 byte sectors) > > > da0: quirks=0x2 > > > > > > plugged into the port. > > > > If system hangs even without any USB disk attached, then I don't see a > > relation between CAM and USB here. My change could affect some timings > > of the boot process, but without closer debugging it is hard to guess > > something. To be sure whether USB is related I would try to disable all > > USB controllers either in BIOS or with set of loader tunables like > > hint.ehci.0.disabled=1 , hint.ohci.0.disabled=1 , > > hint.xhci.0.disabled=1, ... > > Yep. Completely disabling USB allows the system to boot. I don't > see how this would be unexpected as umass using cam. > There is a long, tangled history of multiple mechanisms being used to control releasing mountroot() to do its thing. CAM historically used one method, while USB used another. I've not closely reviewed this change to see what the issue might be, but if the system booted before, but doesn't now, then there's been a de-facto bug introduce or exposed by this change. Maybe it would be better to back out 355010 and have it reviewed and tested more carefully. It's been tricky in the past to get right and since there's issues that have come up, it might be best to take a more conservative approach. If we can't get a quick resolution, I'd recommend that we go this route... Looking at the change, I see that it is a bit weird... It ditches the 'do all the config intr hook stuff' which completes before we look at the root holds for using the root holds. In theory, this should be fine... however, USB does root holds for its uhub exploration which then finds umass, which needs its own enumeration before it's usable as root. I need to see what interlocks are there, but it does look a little like there might be a chance for USB config and CAM config to race more now than before. I say 'might' because I've not looked all places where things were held, released, etc. Disabling USB is a big clue, but I'm not entirely what it's a clue of. I think it means it's disable thing other half of the race, but it could also be disabling a deadlock between threads that before could never deadlock. Warner