Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 May 2025 14:24:57 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        John Nielsen <lists@jnielsen.net>
Cc:        scsi@freebsd.org
Subject:   Re: isboot: help me understand what CAM is doing
Message-ID:  <CANCZdfpzhu0n1EuX4YsbOkxN_qK2A6gSRfQQnqWLub%2BTPBwEWA@mail.gmail.com>
In-Reply-To: <F16B5BD7-4D1F-41F2-8091-508923F09553@jnielsen.net>
References:  <F16B5BD7-4D1F-41F2-8091-508923F09553@jnielsen.net>

next in thread | previous in thread | raw e-mail | index | archive | help

On Thu, May 15, 2025 at 10:30 AM John Nielsen <lists@jnielsen.net> wrote:
>
> Hi all-
>
> I’m working on a cosmetic bug in isboot-kmod. There is a global string called isboot_boot_device which is printed for informational purposes and also available via the net.isboot.device sysctl. The string is populated in this function: https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1870.
>
> I don’t know when this changed but historically the string comparison on https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L1904 would be called once with “pass” and once with “da” and the global isboot_boot_device would be correctly populated with e.g. “da0”.

Yes. We create two peripherals for each device found (well, we always
create pass when pass is in the kernel, and if another driver like da
or cd likes the device, we'll create a periph for that device.

I've never used this iscsi before...

> That sometimes happens in the current code as well, but on my test machine (running 14-STABLE) it is ONLY when I have enabled debug output (by setting bootverbose or net.isboot.debug to 1 or higher). Otherwise, the string comparison is called only once with “probe” rather than “pass” or “da”. Everything still works in this case; the disk is found (at da0 or whatever) and mounted, but the isboot_boot_device is never populated (or populated with the wrong name like “probe0” if I mess with what the string comparison is looking for.

So what we do is that the scsi XPT layer creates a probe device (whose
name is "probe") for each device that's either scanned or that the SIM
tells XPT exists. This probe device then sends a bunch of SCSI
commands to the device to determine what the device is. Once that's
done, it offers the device to each of the periph drivers, who either
pass on the device, or create a cam_periph for that device.

> There is no functional change other than debug messages when debug output is enabled, so I’m guessing this is a race condition. But since I am still a rank amateur when it comes to kernel programming I don’t know where else to look. So my questions are twofold:

Likely we're not proceeding to create the pass or the da device
because the initial commands fail somehow.

> 1) What is going on here? When does the “probe” name show up in ccb.cgdl.periph_name and why doesn’t the loop ever see “da” or “pass” when it does? Corollary: does isboot_cam_set_devices() operate in a safe/sane way for modern CAM?

Not sure which loop this is, so I can't say. But 'pass' is there while
scsi_xpt is looking at the device, and then the async routines decide
whether to add da or pass devices and then the probe device is
removed. The last two happen in parallel, so there could be a race
there if you are examining the periph lists.

>From looking at the code, it looks like you may be doing racey things
by rescanning the device and doing things when the rescan is done.

> 2) What would be a safer or more reliable way to determine the correct device name so it can be written to the isboot_boot_device global variable?

It should be something like da0. pass isn't going to be a block device
(so you can't boot off of it). cd0 you could boot off of, but nobody
exports their SCSI cd. And it's rare that the boot media is
multi-voliume, so it's unlikely to be da1, etc. and we don't support
any other kind of boot (tape, etc).

So I'd love to help, but you're currently way too zoomed in on the
problem and assuming that we have more context to what you're trying
to do than I think we have. This makes it hard to know how to help.

Warner



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CANCZdfpzhu0n1EuX4YsbOkxN_qK2A6gSRfQQnqWLub%2BTPBwEWA>