Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 May 2019 15:45:03 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-questions@freebsd.org
Subject:   Re: Suggestions for working with unstable nvme dev names in AWS
Message-ID:  <af060036-901d-e74b-91a1-e05cb1fc4dea@denninger.net>
In-Reply-To: <eb1d290e48b4ba21ab350044b25592525e61457c.camel@smormegpa.no>
References:  <23770.10599.687213.86492@alice.local> <08660a2a-489f-8172-22ee-47aeba315986@FreeBSD.org> <23770.58821.826610.399467@alice.local> <20190514210203.3d951fb8.freebsd@edvax.de> <23771.5612.105696.170743@alice.local> <eb1d290e48b4ba21ab350044b25592525e61457c.camel@smormegpa.no>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
On 5/14/2019 15:17, Matthias Oestreicher wrote:
> Am Dienstag, den 14.05.2019, 12:24 -0700 schrieb George Hartzell:
>> Polytropon writes:
>>  > On Tue, 14 May 2019 08:59:01 -0700, George Hartzell wrote:
>>  > > Matthew Seaman writes:
>>  > >  > [...] but if you
>>  > >  > are using ZFS, then shuffling the disks around should not make any
>>  > >  > difference. 
>>  > >  > [...]
>>  > > Yes, once I have them set up (ZFS or labeled), it doesn't matter what
>>  > > device names they end up having.  For now I just do the setup by hand,
>>  > > poking around a bit.  Same trick in the Linux world, you end up
>>  > > referring to them by their UUID or ....
>>  > 
>>  > In addition to what Matthew suggested, you could use UFS-IDs
>>  > in case the disks are initialized with UFS. You can find more
>>  > information here (at the bottom of the page):
>>  > [...]
>>
>> Yes.  As I mentioned in my response to Matthew, once I have some sort
>> of filesystem/zpool on the device, it's straightforward (TMTOWTDI).
>>
>> The problem is being able to provision the system automatically
>> without user intervention.
>>
>> In the Linux world, I can use e.g. Terraform to set up a pair of
>> additional volumes and tell it to call them `/dev/sdy` and `/dev/sdz`.
>> The Linux magic happens and I get pair of symlinks that I can use in
>> my e.g. Ansible playbooks, that point to whatever the devices came up
>> as when it booted.  I build filesystems on the devices, add them via
>> their UUID's to `/etc/fstab` and I'm off and running.
>>
>> I can't [seem to] do this in the FreeBSD world; even if I name the
>> devices `/dev/nvme1` (the fast and big one) and `/dev/nvme2` (the slow
>> and small one), there's no guarantee that they'll have those names
>> when the machine boots.
>>
>> This is a weirdly AWS issue and their peace offering is to stash the
>> requested device name in the device/controller/"hardware" and provide
>> a tool that digs it out.
>>
>> I'm trying to figure out what I can do about it from FreeBSD.  Perhaps
>> there's already a solution.  Perhaps the nvme driver needs to be
>> extended to provide access to the magic AWS info stash and then
>> something like Amazon Linux's `ebsnvme-id` can pry it out.
>>
>> g.
> Hei,
> I'm not familiar with Amazon's AWS, but if your only problem is shiftig device
> names for UFS filesystems, then on modern systems, GPT labels is the way to go.
> There has been a lot of confusion over the years, about the many ways to apply
> different types of labels to devices on FreeBSD, but really GEOM labels, UUIDs,
> etc, are only useful on old systems where there's no support for GPT.
>
> GPT labels are only applied to partitions, not whole drives, but they are extremely
> flexible. They can be applied and changed at any time, even on mounted filesystems.
> In comparison to GEOM labels and all other ID types, they will never be hidden if
> the devices original device name (like nvm0 or nvm1) is in use.
> At any time will 'gpart show -l' show the GPT labels you applied, and they can be
> used to manually mount and in /etc/fstab.
> I have never used any other labels for years and even disables all others in
>
> /boot/loader.conf
> kern.geom.label.disk_ident.enable=0
> kern.geom.label.gptid.enable=0
> kern.geom.label.ufsid.enable=0
>
> You can apply a GPT label with
> # gpart modify -l mylabel -i N /dev/nvm1
>
> and then add something like the following to /etc/fstab
> /dev/gpt/mylabel       /       ufs     rw      1       1
>
> There is only a single limitation with GPT labels and that is they don't work
> when you use UFS journaling via GEOM, as the GPT label will be the same for e.g
> /dev/nvm0p1 and /dev/nvm0p1.journal.
>
> Another big plus is, they work with every partition type, freebsd-ufs, freebsd-boot,
> swap, EFI, freebsd-zfs...
> One label type for everything can avoid some headache imo.
>
> Hope that clears up some confusion.
> Matthias
>
Uh, one possible warning on that.

They *do* disappear if you boot from an encrypted partition.

For example:

root@NewFS:/dev/gpt # zpool status zsr
  pool: zsr
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:04:17 with 0 errors on Mon May 13
03:24:33 2019
config:

        NAME            STATE     READ WRITE CKSUM
        zsr             ONLINE       0     0     0
          raidz2-0      ONLINE       0     0     0
            da2p4.eli   ONLINE       0     0     0
            da1p4.eli   ONLINE       0     0     0
            da11p4.eli  ONLINE       0     0     0
            da0p4.eli   ONLINE       0     0     0
            da3p4.eli   ONLINE       0     0     0

errors: No known data errors

root@NewFS:/dev/gpt # gpart show -l da2
=>       40  468862048  da2  GPT  (224G)
         40       1024    1  (null)  (512K)
       1064    1048576    2  (null)  (512M)
    1049640   10485760    3  swap1  (5.0G)
   11535400  457326688    4  ssd1  (218G)

You'd think /dev/gpt/ssd1 (and the rest) would be there.  Nope.

root@NewFS:/dev/gpt # ls
backup61        rust1.eli       rust4           swap1.eli       swap4
backup61.eli    rust2           rust4.eli       swap2           swap5
backup62-2      rust2.eli       rust5           swap2.eli
backup62-2.eli  rust3           rust5.eli       swap3
rust1           rust3.eli       swap1           swap3.eli
root@NewFS:/dev/gpt #

Note that the other two pools, plus all the swap partitions (three of
which I am using with automatic encryption) *do* show up.

I don't know if the system would in fact boot if I disabled all the
other label options; the loader finds the pool members via their
"native" (da-x) names however, and once it has them all mounted under
geli it boots from them -- and the labels do not show up under /dev/gpt.

My label settings....

root@NewFS:/dev/gpt # sysctl -a|grep kern.geom.label
kern.geom.label.disk_ident.enable: 1
kern.geom.label.gptid.enable: 0
kern.geom.label.gpt.enable: 1
kern.geom.label.ufs.enable: 1
kern.geom.label.ufsid.enable: 1
kern.geom.label.reiserfs.enable: 1
kern.geom.label.ntfs.enable: 1
kern.geom.label.msdosfs.enable: 1
kern.geom.label.iso9660.enable: 1
kern.geom.label.ext2fs.enable: 1
kern.geom.label.debug: 0

I don't know if the loader will properly find the pools if I was to turn
off disk_ident.enable -- never mind if I was to do that, and then wanted
to set up a *new* disk, how would I do it on the bare device if the disk
identifier can't be accessed?

-- 
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

[-- Attachment #2 --]
0	*H
010
	`He0	*H

00H^Ōc!5
H0
	*H
010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA0
170817164217Z
270815164217Z0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0"0
	*H
0
h-5B>[;olӴ0~͎O9}9Ye*$g!ukvʶLzN`jL>MD'7U45CB+kY`bd~b*c3Ny-78ju]9HeuέsӬDؽmgwER?&UURj'}9nWD i`XcbGz\gG=u%\Oi13ߝ4
K44pYQr]Ie/r0+eEޝݖ0C15Mݚ@JSZ(zȏNTa(25DD5.l<g[[ZarQQ%Buȴ~~`IohRbʳڟu2MS8EdFUClCMaѳ!}ș+2k/bųE,n当ꖛ\(8WV8	d]b	yXw	܊:I39
00U]^§Q\ӎ0U#0T039N0b010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CA	@Ui0U00U0
	*H
:P U!>vJnio-#ן]WyujǑR̀Q
nƇ!GѦFg\yLxgw=OPycehf[}ܷ['4ڝ\[p6\o.B&JF"ZC{;*o*mcCcLY߾`
t*S!񫶭(`]DHP5A~/NPp6=mhk밣'doA$86hm5ӚS@jެEgl
)0JG`%k35PaC?σ
׳HEt}!P㏏%*BxbQwaKG$6h¦Mve;[o-Iی&
I,Tcߎ#t wPA@l0P+KXBպT	zGv;NcI3&JĬUPNa?/%W6G۟N000k#Xd\=0
	*H
0{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CA0
170817212120Z
220816212120Z0W10	UUS10UFlorida10U
Cuda Systems LLC10Ukarl@denninger.net0"0
	*H
0
T[I-ΆϏdn;Å@שy.us~_ZG%<MYd\gvfnsa1'6Egyjs"C [{~_KPn+<*pv#Q+H/7[-vqDV^U>f%GX)H.|l`M(Cr>е͇6#odc"YljҦln8@5SA0&ۖ"OGj?UDWZ5	dDB7k-)9Izs-JAv
J6L$Ն1SmY.Lqw*SH;EF'DĦH]MOgQQ|Mٙג2Z9y@y]}6ٽeY9Y2xˆ$T=eCǺǵbn֛{j|@LLt1[Dk5:$=	`	M00<+00.0,+0 http://ocsp.cudasystems.net:88880	U00	`HB0U0U%0++03	`HB
&$OpenSSL Generated Client Certificate0U%՞V=؁;bzQ0U#0]^§Q\ӎϡ010	UUS10UFlorida10U	Niceville10U
Cuda Systems LLC10UCuda Systems CA1!0UCuda Systems LLC 2017 CAH^Ōc!5
H0U0karl@denninger.net0
	*H
۠A0-j%--$%g2#ޡ1^>{K+uGEv1ş7Af&b&O;.;A5*U)ND2bF|\=]<sˋL!wrw٧>YMÄ3\mWR hSv!_zvl? 3_ xU%\^#O*Gk̍YI_&Fꊛ@&1n”} ͬ:{hTP3B.;bU8:Z=^Gw8!k-@xE@i,+'Iᐚ:fhztX7/(hY` O.1}a`%RW^akǂpCAufgDixUTЩ/7}%=jnVZvcF<M=
2^GKH5魉
_O4ެByʈySkw=5@h.0z>
W1000{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	`HeE0	*H
	1	*H
0	*H
	1
190514204503Z0O	*H
	1B@HGȬ۳lyR)Bng$\ug̘l}M/4|g]QO;.R[	0l	*H
	1_0]0	`He*0	`He0
*H
0*H
0
*H
@0+0
*H
(0	+7100{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0*H
	10{10	UUS10UFlorida10U
Cuda Systems LLC10UCuda Systems CA1%0#UCuda Systems LLC 2017 Int CAk#Xd\=0
	*H
UpRV6FN%W{IOT!Zu+V/((aOIDü
須u=9n,w*Xväx#nastJEm{1_#A}Oi-rnQku^"8HM@Ij)ӑhĜCۓ\HL T3Kx18ʐ?F~C1ESQamDA=t*{.!fq+zǒU,
(M*Mho`,cyB{}$&$U
B6K~?O4j
Q[X{"1.ŻWHO G<+!pEO#hIGL+k>֠ߥHD,{8VEiָjhAgF6<uHzbiUOVcFC31uh+SR&JYFUqu!ba+Qf!}>d'z:

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?af060036-901d-e74b-91a1-e05cb1fc4dea>