From nobody Thu May 15 16:29:43 2025 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zywfk1Tf3z5wZdC for ; Thu, 15 May 2025 16:30:06 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from webmail5.jnielsen.net (webmail5.jnielsen.net [IPv6:2607:f170:34:11::b0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.freebsdsolutions.net", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zywfg3xvqz3Nck for ; Thu, 15 May 2025 16:30:03 +0000 (UTC) (envelope-from lists@jnielsen.net) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of lists@jnielsen.net designates 2607:f170:34:11::b0 as permitted sender) smtp.mailfrom=lists@jnielsen.net; dmarc=none Received: from smtpclient.apple ([50.207.241.62]) (authenticated bits=0) by webmail5.jnielsen.net (8.18.1/8.18.1) with ESMTPSA id 54FGTrbt057550 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 15 May 2025 10:29:55 -0600 (MDT) (envelope-from lists@jnielsen.net) X-Authentication-Warning: webmail5.jnielsen.net: Host [50.207.241.62] claimed to be smtpclient.apple From: John Nielsen Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-scsi@freebsd.org Sender: owner-freebsd-scsi@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.500.181.1.5\)) Subject: isboot: help me understand what CAM is doing Message-Id: Date: Thu, 15 May 2025 10:29:43 -0600 To: scsi@freebsd.org X-Mailer: Apple Mail (2.3826.500.181.1.5) X-Rspamd-Queue-Id: 4Zywfg3xvqz3Nck X-Spamd-Bar: / X-Spamd-Result: default: False [0.19 / 15.00]; NEURAL_HAM_LONG(-0.80)[-0.798]; MV_CASE(0.50)[]; NEURAL_SPAM_SHORT(0.45)[0.452]; NEURAL_SPAM_MEDIUM(0.33)[0.332]; R_SPF_ALLOW(-0.20)[+a:mailers.freebsdsolutions.net]; MIME_GOOD(-0.10)[text/plain]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:6364, ipnet:2607:f170:30::/44, country:US]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; R_DKIM_NA(0.00)[]; MLMMJ_DEST(0.00)[scsi@freebsd.org]; DMARC_NA(0.00)[jnielsen.net]; FROM_HAS_DN(0.00)[]; HAS_XAW(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; FROM_EQ_ENVFROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[scsi@freebsd.org]; RCVD_TLS_ALL(0.00)[] Hi all- I=E2=80=99m working on a cosmetic bug in isboot-kmod. There is a global = string called isboot_boot_device which is printed for informational = purposes and also available via the net.isboot.device sysctl. The string = is populated in this function: = https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1870. I don=E2=80=99t know when this changed but historically the string = comparison on = https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1904 = would be called once with =E2=80=9Cpass=E2=80=9D and once with =E2=80=9Cda= =E2=80=9D and the global isboot_boot_device would be correctly populated = with e.g. =E2=80=9Cda0=E2=80=9D. That sometimes happens in the current code as well, but on my test = machine (running 14-STABLE) it is ONLY when I have enabled debug output = (by setting bootverbose or net.isboot.debug to 1 or higher). Otherwise, = the string comparison is called only once with =E2=80=9Cprobe=E2=80=9D = rather than =E2=80=9Cpass=E2=80=9D or =E2=80=9Cda=E2=80=9D. Everything = still works in this case; the disk is found (at da0 or whatever) and = mounted, but the isboot_boot_device is never populated (or populated = with the wrong name like =E2=80=9Cprobe0=E2=80=9D if I mess with what = the string comparison is looking for. There is no functional change other than debug messages when debug = output is enabled, so I=E2=80=99m guessing this is a race condition. But = since I am still a rank amateur when it comes to kernel programming I = don=E2=80=99t know where else to look. So my questions are twofold: 1) What is going on here? When does the =E2=80=9Cprobe=E2=80=9D name = show up in ccb.cgdl.periph_name and why doesn=E2=80=99t the loop ever = see =E2=80=9Cda=E2=80=9D or =E2=80=9Cpass=E2=80=9D when it does? = Corollary: does isboot_cam_set_devices() operate in a safe/sane way for = modern CAM?=20 2) What would be a safer or more reliable way to determine the correct = device name so it can be written to the isboot_boot_device global = variable? Thanks! JN From nobody Thu May 15 20:24:57 2025 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zz1sy4vVVz5wsHP for ; Thu, 15 May 2025 20:25:10 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-pj1-x1033.google.com (mail-pj1-x1033.google.com [IPv6:2607:f8b0:4864:20::1033]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zz1sy2wHtz3m9W for ; Thu, 15 May 2025 20:25:10 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pj1-x1033.google.com with SMTP id 98e67ed59e1d1-30e7bfef27dso295703a91.0 for ; Thu, 15 May 2025 13:25:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1747340709; x=1747945509; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=k2vWOXKN6bI03LvCjQfDVvxKFyTpIXecgciSiC+dRxo=; b=jXM3VV+yNjSYw50zD0C7tFyOQUqMGuZh5yyA4wK4pPC0SZp2JBg4E5US9FxLyQJ0Wq qyfliypZZ5oEDIT2Ok+vG5zMuq4groVqOMYvDCKzxYSXpVk83h/SNpDZ763HzGBXbsVS 0hPGZohaZcaCDqy20PqAOgIobraxFqIQasURK95Wl5lPxij+8JSOJZLPtPai0voQ6rJP EiIvo3wTYenxI8wmc4pfY/NjsdSjQGe5/DX2r2H+UfMcjXISn1sNuBYiqMHgmYP8d76q WqL72beLFhYEfYOTDO38GdazOtJyXmrxskU3umimz+4TdtbR6eq2KTt+o3x5KWrFZKUd wHfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747340709; x=1747945509; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=k2vWOXKN6bI03LvCjQfDVvxKFyTpIXecgciSiC+dRxo=; b=Q+yPLgvqv4utuzWPBAy/Py9UWxiYqEXFp6NjOb4vuNwCL15xBa+GAEw5bnpj508TpH 8wQoz4cOG4G+97STxt0AkBRJcT2isjmqNR2WTAVJaPrdYZC4JiR6dqTXi8R2uHkoyrCV LnYoI1MhwcB5FhqPADtjCY8fiqtB3YOWojNHnYpNdaNAgkTcSYTPYmSmxi/rRomElaic GOXtN8t/bzKGZrzQ3mq5uFeNWIUDu7YSXoAePuECaM50PplOWaFxCMJlZ8fzlO4AT+YT 0OJ0r/h7oQONc28s1+SozsG+Fq3PXf7nQsx7LhNwfIQmTp4XQsv6Mi1qBjD9g1FprgPS QR5g== X-Gm-Message-State: AOJu0Ywdpnz25pmh9n0vKDD0bObDs5V638aMnAAH+UibJJqgTfvR2CDE 1reav9iwVJkpUMhO52LwMTItsHiPhjemEMHu0DUhcFYA8IPEgHgaZAIlG3rNWbDlGR45gk4F+ZN pt+4XgWSRiiZ1Ty8+ApPTTl9F5C93kXFfrg7Twt+5rqa61GfmiNvSsKg= X-Gm-Gg: ASbGncuOyYmZ+Z+cqUlKDxDOPSlOX8bPEgp/am+OsoZFEaBL7rJ41qupw3pueQp3xTo mUTFt5l10D06FoMCYPhznhnc+QqH2/+tu8IncErsx8C4MWiCye1IaOrgRUEG4chgGHQoR0WjAT9 Wlkh/HDKI1RwrmYos3sGr/yPD3fqAXWrEAXS+o4OslfUs= X-Google-Smtp-Source: AGHT+IGMvDd0c4xVFH8BubBSU9sp9uKEa0zfHK47vXl2ei7w0XUMhjm5Rb/yVT0lMvpzwlTfpoInv7Z2lX6Oh0wFlts= X-Received: by 2002:a17:90b:4c42:b0:2ff:7b28:a51a with SMTP id 98e67ed59e1d1-30e7d558badmr1095367a91.17.1747340708866; Thu, 15 May 2025 13:25:08 -0700 (PDT) List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-scsi@freebsd.org Sender: owner-freebsd-scsi@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Warner Losh Date: Thu, 15 May 2025 14:24:57 -0600 X-Gm-Features: AX0GCFtyY8qMwNaEdwJOxVxnUcMDWgVCvn4UuLvMZPHmjkqqnsc4y0pKz9Fxc_g Message-ID: Subject: Re: isboot: help me understand what CAM is doing To: John Nielsen Cc: scsi@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 4Zz1sy2wHtz3m9W X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Spamd-Bar: ---- On Thu, May 15, 2025 at 10:30=E2=80=AFAM John Nielsen = wrote: > > Hi all- > > I=E2=80=99m working on a cosmetic bug in isboot-kmod. There is a global s= tring called isboot_boot_device which is printed for informational purposes= and also available via the net.isboot.device sysctl. The string is populat= ed in this function: https://github.com/jnielsendotnet/isboot/blob/master/s= rc/iscsi.c#L1870. > > I don=E2=80=99t know when this changed but historically the string compar= ison on https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1= 904 would be called once with =E2=80=9Cpass=E2=80=9D and once with =E2=80= =9Cda=E2=80=9D and the global isboot_boot_device would be correctly populat= ed with e.g. =E2=80=9Cda0=E2=80=9D. Yes. We create two peripherals for each device found (well, we always create pass when pass is in the kernel, and if another driver like da or cd likes the device, we'll create a periph for that device. I've never used this iscsi before... > That sometimes happens in the current code as well, but on my test machin= e (running 14-STABLE) it is ONLY when I have enabled debug output (by setti= ng bootverbose or net.isboot.debug to 1 or higher). Otherwise, the string c= omparison is called only once with =E2=80=9Cprobe=E2=80=9D rather than =E2= =80=9Cpass=E2=80=9D or =E2=80=9Cda=E2=80=9D. Everything still works in this= case; the disk is found (at da0 or whatever) and mounted, but the isboot_b= oot_device is never populated (or populated with the wrong name like =E2=80= =9Cprobe0=E2=80=9D if I mess with what the string comparison is looking for= . So what we do is that the scsi XPT layer creates a probe device (whose name is "probe") for each device that's either scanned or that the SIM tells XPT exists. This probe device then sends a bunch of SCSI commands to the device to determine what the device is. Once that's done, it offers the device to each of the periph drivers, who either pass on the device, or create a cam_periph for that device. > There is no functional change other than debug messages when debug output= is enabled, so I=E2=80=99m guessing this is a race condition. But since I = am still a rank amateur when it comes to kernel programming I don=E2=80=99t= know where else to look. So my questions are twofold: Likely we're not proceeding to create the pass or the da device because the initial commands fail somehow. > 1) What is going on here? When does the =E2=80=9Cprobe=E2=80=9D name show= up in ccb.cgdl.periph_name and why doesn=E2=80=99t the loop ever see =E2= =80=9Cda=E2=80=9D or =E2=80=9Cpass=E2=80=9D when it does? Corollary: does i= sboot_cam_set_devices() operate in a safe/sane way for modern CAM? Not sure which loop this is, so I can't say. But 'pass' is there while scsi_xpt is looking at the device, and then the async routines decide whether to add da or pass devices and then the probe device is removed. The last two happen in parallel, so there could be a race there if you are examining the periph lists. >From looking at the code, it looks like you may be doing racey things by rescanning the device and doing things when the rescan is done. > 2) What would be a safer or more reliable way to determine the correct de= vice name so it can be written to the isboot_boot_device global variable? It should be something like da0. pass isn't going to be a block device (so you can't boot off of it). cd0 you could boot off of, but nobody exports their SCSI cd. And it's rare that the boot media is multi-voliume, so it's unlikely to be da1, etc. and we don't support any other kind of boot (tape, etc). So I'd love to help, but you're currently way too zoomed in on the problem and assuming that we have more context to what you're trying to do than I think we have. This makes it hard to know how to help. Warner From nobody Thu May 15 21:58:03 2025 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zz3xW50Ssz5x0Ql for ; Thu, 15 May 2025 21:58:23 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from webmail5.jnielsen.net (webmail5.jnielsen.net [IPv6:2607:f170:34:11::b0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.freebsdsolutions.net", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zz3xW20Nzz3Wq1 for ; Thu, 15 May 2025 21:58:22 +0000 (UTC) (envelope-from lists@jnielsen.net) Authentication-Results: mx1.freebsd.org; none Received: from smtpclient.apple ([50.207.241.62]) (authenticated bits=0) by webmail5.jnielsen.net (8.18.1/8.18.1) with ESMTPSA id 54FLwEBs063680 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 15 May 2025 15:58:16 -0600 (MDT) (envelope-from lists@jnielsen.net) X-Authentication-Warning: webmail5.jnielsen.net: Host [50.207.241.62] claimed to be smtpclient.apple Content-Type: text/plain; charset=utf-8 List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-scsi@freebsd.org Sender: owner-freebsd-scsi@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.500.181.1.5\)) Subject: Re: isboot: help me understand what CAM is doing From: John Nielsen In-Reply-To: Date: Thu, 15 May 2025 15:58:03 -0600 Cc: scsi@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <8D8F60A2-98E2-4ED7-AB59-E1B7B3DFD10A@jnielsen.net> References: To: Warner Losh X-Mailer: Apple Mail (2.3826.500.181.1.5) X-Rspamd-Queue-Id: 4Zz3xW20Nzz3Wq1 X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6364, ipnet:2607:f170:30::/44, country:US] X-Spamd-Bar: ---- > On May 15, 2025, at 2:24=E2=80=AFPM, Warner Losh = wrote: >=20 > On Thu, May 15, 2025 at 10:30=E2=80=AFAM John Nielsen = wrote: >>=20 >> Hi all- >>=20 >> I=E2=80=99m working on a cosmetic bug in isboot-kmod. There is a = global string called isboot_boot_device which is printed for = informational purposes and also available via the net.isboot.device = sysctl. The string is populated in this function: = https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1870. >>=20 >> I don=E2=80=99t know when this changed but historically the string = comparison on = https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1904 = would be called once with =E2=80=9Cpass=E2=80=9D and once with =E2=80=9Cda= =E2=80=9D and the global isboot_boot_device would be correctly populated = with e.g. =E2=80=9Cda0=E2=80=9D. >=20 > Yes. We create two peripherals for each device found (well, we always > create pass when pass is in the kernel, and if another driver like da > or cd likes the device, we'll create a periph for that device. >=20 > I've never used this iscsi before... >=20 >> That sometimes happens in the current code as well, but on my test = machine (running 14-STABLE) it is ONLY when I have enabled debug output = (by setting bootverbose or net.isboot.debug to 1 or higher). Otherwise, = the string comparison is called only once with =E2=80=9Cprobe=E2=80=9D = rather than =E2=80=9Cpass=E2=80=9D or =E2=80=9Cda=E2=80=9D. Everything = still works in this case; the disk is found (at da0 or whatever) and = mounted, but the isboot_boot_device is never populated (or populated = with the wrong name like =E2=80=9Cprobe0=E2=80=9D if I mess with what = the string comparison is looking for. >=20 > So what we do is that the scsi XPT layer creates a probe device (whose > name is "probe") for each device that's either scanned or that the SIM > tells XPT exists. This probe device then sends a bunch of SCSI > commands to the device to determine what the device is. Once that's > done, it offers the device to each of the periph drivers, who either > pass on the device, or create a cam_periph for that device. >=20 >> There is no functional change other than debug messages when debug = output is enabled, so I=E2=80=99m guessing this is a race condition. But = since I am still a rank amateur when it comes to kernel programming I = don=E2=80=99t know where else to look. So my questions are twofold: >=20 > Likely we're not proceeding to create the pass or the da device > because the initial commands fail somehow. >=20 >> 1) What is going on here? When does the =E2=80=9Cprobe=E2=80=9D name = show up in ccb.cgdl.periph_name and why doesn=E2=80=99t the loop ever = see =E2=80=9Cda=E2=80=9D or =E2=80=9Cpass=E2=80=9D when it does? = Corollary: does isboot_cam_set_devices() operate in a safe/sane way for = modern CAM? >=20 > Not sure which loop this is, so I can't say. But 'pass' is there while > scsi_xpt is looking at the device, and then the async routines decide > whether to add da or pass devices and then the probe device is > removed. The last two happen in parallel, so there could be a race > there if you are examining the periph lists. >=20 >> =46rom looking at the code, it looks like you may be doing racey = things > by rescanning the device and doing things when the rescan is done. >=20 >> 2) What would be a safer or more reliable way to determine the = correct device name so it can be written to the isboot_boot_device = global variable? >=20 > It should be something like da0. pass isn't going to be a block device > (so you can't boot off of it). cd0 you could boot off of, but nobody > exports their SCSI cd. And it's rare that the boot media is > multi-voliume, so it's unlikely to be da1, etc. and we don't support > any other kind of boot (tape, etc). >=20 > So I'd love to help, but you're currently way too zoomed in on the > problem and assuming that we have more context to what you're trying > to do than I think we have. This makes it hard to know how to help. Thanks Warner. Apologies, let me provide some more context. This isboot = code was written by Daisuke Aaoyama in the FreeBSD 8 days (or earlier). = He stopped updating it several years ago. I have been the port = maintainer for net/isboot-kmod for a long time but am not well-versed in = the code. In 2021 I created the GitHub repository where it lives today = since I needed a place to host patches submitted by others. Since then I = have made some minor improvements (mostly to support my own use cases) = in addition to bringing in other patches to keep the code working in = newer FreeBSD versions (thanks jhb). Ideally this functionality should exist in the base system, but I = don=E2=80=99t know enough to merge the isboot code with the kernel = initiator code that=E2=80=99s already there. In the mean time, I=E2=80=99m= trying to fix issues that I and others have encountered and reported on = the GitHub project. The links in my previous email are to specific line numbers in the code = on the GitHub repository (and possibly off by a few lines due to commits = I=E2=80=99ve made in the mean time=E2=80=A6). Since I did not author = this code and have not otherwise worked much with FreeBSD=E2=80=99s SCSI = subsystem I can=E2=80=99t provide much more context than what is in this = code itself. So if you or someone else would be willing to take a look = I=E2=80=99d be grateful. I=E2=80=99m also trying to learn as I go so I = can be more useful in the future. The other open issue I=E2=80=99m trying to solve is where the CAM rescan = fails to complete entirely. So I=E2=80=99m open to any type of feedback = from =E2=80=9Ctry getting the device name at this point instead=E2=80=9D = to =E2=80=9Cthis line looks race-y because=E2=80=A6=E2=80=9D to =E2=80=9Cs= pend some time reading X or Y=E2=80=9D (but chapter 12 of the developer = handbook and your 2015 CAM scheduling BSDCan talk/slides are already on = my list). Or even =E2=80=9Clook at this function in this driver as a = good example of doing Z=E2=80=9D. Zooming out even farther, getting this functionality into the base = system would be fantastic and eliminate the need for a port or this code = living separately entirely. There=E2=80=99s a patch on Phabricator (not = mine) purporting to do that at https://reviews.freebsd.org/D34477 . If = that=E2=80=99s the way forward I=E2=80=99d love to see progress on it = (and help if I can be useful). If not (or in parallel?), I could make a = diff to bring isboot in to the tree as-is, but it would need careful = review before I would expect it to be merged. And removing the bits that = are redundant with the existing initiator would still want to be done at = some point. In any event, thank you for your attention. JN= From nobody Thu May 15 21:58:03 2025 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Zz3xZ3JVqz5x06x for ; Thu, 15 May 2025 21:58:26 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from webmail5.jnielsen.net (webmail5.jnielsen.net [IPv6:2607:f170:34:11::b0]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.freebsdsolutions.net", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Zz3xZ0Q7xz3WxS for ; Thu, 15 May 2025 21:58:25 +0000 (UTC) (envelope-from lists@jnielsen.net) Authentication-Results: mx1.freebsd.org; none Received: from smtpclient.apple ([50.207.241.62]) (authenticated bits=0) by webmail5.jnielsen.net (8.18.1/8.18.1) with ESMTPSA id 54FLwD4S063679 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 15 May 2025 15:58:15 -0600 (MDT) (envelope-from lists@jnielsen.net) X-Authentication-Warning: webmail5.jnielsen.net: Host [50.207.241.62] claimed to be smtpclient.apple Content-Type: text/plain; charset=utf-8 List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-scsi@freebsd.org Sender: owner-freebsd-scsi@FreeBSD.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.500.181.1.5\)) Subject: Re: isboot: help me understand what CAM is doing From: John Nielsen In-Reply-To: Date: Thu, 15 May 2025 15:58:03 -0600 Cc: scsi@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <8D8F60A2-98E2-4ED7-AB59-E1B7B3DFD10A@jnielsen.net> References: To: Warner Losh X-Mailer: Apple Mail (2.3826.500.181.1.5) X-Rspamd-Queue-Id: 4Zz3xZ0Q7xz3WxS X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:6364, ipnet:2607:f170:30::/44, country:US] X-Spamd-Bar: ---- > On May 15, 2025, at 2:24=E2=80=AFPM, Warner Losh = wrote: >=20 > On Thu, May 15, 2025 at 10:30=E2=80=AFAM John Nielsen = wrote: >>=20 >> Hi all- >>=20 >> I=E2=80=99m working on a cosmetic bug in isboot-kmod. There is a = global string called isboot_boot_device which is printed for = informational purposes and also available via the net.isboot.device = sysctl. The string is populated in this function: = https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1870. >>=20 >> I don=E2=80=99t know when this changed but historically the string = comparison on = https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1904 = would be called once with =E2=80=9Cpass=E2=80=9D and once with =E2=80=9Cda= =E2=80=9D and the global isboot_boot_device would be correctly populated = with e.g. =E2=80=9Cda0=E2=80=9D. >=20 > Yes. We create two peripherals for each device found (well, we always > create pass when pass is in the kernel, and if another driver like da > or cd likes the device, we'll create a periph for that device. >=20 > I've never used this iscsi before... >=20 >> That sometimes happens in the current code as well, but on my test = machine (running 14-STABLE) it is ONLY when I have enabled debug output = (by setting bootverbose or net.isboot.debug to 1 or higher). Otherwise, = the string comparison is called only once with =E2=80=9Cprobe=E2=80=9D = rather than =E2=80=9Cpass=E2=80=9D or =E2=80=9Cda=E2=80=9D. Everything = still works in this case; the disk is found (at da0 or whatever) and = mounted, but the isboot_boot_device is never populated (or populated = with the wrong name like =E2=80=9Cprobe0=E2=80=9D if I mess with what = the string comparison is looking for. >=20 > So what we do is that the scsi XPT layer creates a probe device (whose > name is "probe") for each device that's either scanned or that the SIM > tells XPT exists. This probe device then sends a bunch of SCSI > commands to the device to determine what the device is. Once that's > done, it offers the device to each of the periph drivers, who either > pass on the device, or create a cam_periph for that device. >=20 >> There is no functional change other than debug messages when debug = output is enabled, so I=E2=80=99m guessing this is a race condition. But = since I am still a rank amateur when it comes to kernel programming I = don=E2=80=99t know where else to look. So my questions are twofold: >=20 > Likely we're not proceeding to create the pass or the da device > because the initial commands fail somehow. >=20 >> 1) What is going on here? When does the =E2=80=9Cprobe=E2=80=9D name = show up in ccb.cgdl.periph_name and why doesn=E2=80=99t the loop ever = see =E2=80=9Cda=E2=80=9D or =E2=80=9Cpass=E2=80=9D when it does? = Corollary: does isboot_cam_set_devices() operate in a safe/sane way for = modern CAM? >=20 > Not sure which loop this is, so I can't say. But 'pass' is there while > scsi_xpt is looking at the device, and then the async routines decide > whether to add da or pass devices and then the probe device is > removed. The last two happen in parallel, so there could be a race > there if you are examining the periph lists. >=20 >> =46rom looking at the code, it looks like you may be doing racey = things > by rescanning the device and doing things when the rescan is done. >=20 >> 2) What would be a safer or more reliable way to determine the = correct device name so it can be written to the isboot_boot_device = global variable? >=20 > It should be something like da0. pass isn't going to be a block device > (so you can't boot off of it). cd0 you could boot off of, but nobody > exports their SCSI cd. And it's rare that the boot media is > multi-voliume, so it's unlikely to be da1, etc. and we don't support > any other kind of boot (tape, etc). >=20 > So I'd love to help, but you're currently way too zoomed in on the > problem and assuming that we have more context to what you're trying > to do than I think we have. This makes it hard to know how to help. Thanks Warner. Apologies, let me provide some more context. This isboot = code was written by Daisuke Aaoyama in the FreeBSD 8 days (or earlier). = He stopped updating it several years ago. I have been the port = maintainer for net/isboot-kmod for a long time but am not well-versed in = the code. In 2021 I created the GitHub repository where it lives today = since I needed a place to host patches submitted by others. Since then I = have made some minor improvements (mostly to support my own use cases) = in addition to bringing in other patches to keep the code working in = newer FreeBSD versions (thanks jhb). Ideally this functionality should exist in the base system, but I = don=E2=80=99t know enough to merge the isboot code with the kernel = initiator code that=E2=80=99s already there. In the mean time, I=E2=80=99m= trying to fix issues that I and others have encountered and reported on = the GitHub project. The links in my previous email are to specific line numbers in the code = on the GitHub repository (and possibly off by a few lines due to commits = I=E2=80=99ve made in the mean time=E2=80=A6). Since I did not author = this code and have not otherwise worked much with FreeBSD=E2=80=99s SCSI = subsystem I can=E2=80=99t provide much more context than what is in this = code itself. So if you or someone else would be willing to take a look = I=E2=80=99d be grateful. I=E2=80=99m also trying to learn as I go so I = can be more useful in the future. The other open issue I=E2=80=99m trying to solve is where the CAM rescan = fails to complete entirely. So I=E2=80=99m open to any type of feedback = from =E2=80=9Ctry getting the device name at this point instead=E2=80=9D = to =E2=80=9Cthis line looks race-y because=E2=80=A6=E2=80=9D to =E2=80=9Cs= pend some time reading X or Y=E2=80=9D (but chapter 12 of the developer = handbook and your 2015 CAM scheduling BSDCan talk/slides are already on = my list). Or even =E2=80=9Clook at this function in this driver as a = good example of doing Z=E2=80=9D. Zooming out even farther, getting this functionality into the base = system would be fantastic and eliminate the need for a port or this code = living separately entirely. There=E2=80=99s a patch on Phabricator (not = mine) purporting to do that at https://reviews.freebsd.org/D34477 . If = that=E2=80=99s the way forward I=E2=80=99d love to see progress on it = (and help if I can be useful). If not (or in parallel?), I could make a = diff to bring isboot in to the tree as-is, but it would need careful = review before I would expect it to be merged. And removing the bits that = are redundant with the existing initiator would still want to be done at = some point. In any event, thank you for your attention. JN=