From nobody Wed Jun 7 18:25:41 2023 X-Original-To: freebsd-hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Qbwl931CKz4bphZ; Wed, 7 Jun 2023 18:25:57 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Received: from mail-lf1-x134.google.com (mail-lf1-x134.google.com [IPv6:2a00:1450:4864:20::134]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Qbwl81n9lz3pKr; Wed, 7 Jun 2023 18:25:56 +0000 (UTC) (envelope-from gusev.vitaliy@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=LLP7uIih; spf=pass (mx1.freebsd.org: domain of gusev.vitaliy@gmail.com designates 2a00:1450:4864:20::134 as permitted sender) smtp.mailfrom=gusev.vitaliy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lf1-x134.google.com with SMTP id 2adb3069b0e04-4f6454a21a9so977349e87.3; Wed, 07 Jun 2023 11:25:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686162353; x=1688754353; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=ntn/ojICyGo375vFkoWg4WzcDPY21XXPKEZyJj98DGI=; b=LLP7uIihKKxPFFASNanKoalRbMbh5ROdK+nbV7Zysgi/c8DF+SIlUoZqV4N/0mJ+E5 HMCIL/n1mtor0SK13u9Uj0wn0hhjFKM65sD7ynORYEY/l1iHhfIxM9SDIdzYjAlGi1sS 7J/H9jlIlxwwooYDcMZywxLvgkjb7TBtvRYwke6ewzkvtgoGxc6yY9WZFyNGTdlB4sRR dqMDT01//N48xgIVjiQZcBmjtCCv/fBG6+JM2yooauWffScxyIQbNVvPNJkAxZVNn8YR SWwSvNd5YtLOBE2bSkH22m3k/RcehYCIp4pjwjND5cu4p9uJye/MGaSubzMQEQFeF6O6 RaoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686162353; x=1688754353; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ntn/ojICyGo375vFkoWg4WzcDPY21XXPKEZyJj98DGI=; b=EtgLV/dZ51iyXV52UaDHhuMWFHmgSnp0q9CUoT8FxqCxFkRCZaRRVkQ4pxlQUyWFhf h2EsrlfhDRHfYU54e2kFT9vqgIXZxfyFVSO8dvrzvddzQakiZVQrH2nTST4P5jhRWlHQ mCkRmlk50g3fB8WhyRiuLEoAhqpTzJDJjtXoTJ4XwnTPofe9YdfssI70x9hs0Lxzyakk 78areW8GOTdC1q/fWErDe0E9z026uhCuc4stXqykG5CmI/IopBOrUet+6/hgrkwJVaQO sASfXmK0/p8g4moWKhRvyrT1l7CB7d8+YxfOqbWv8SUX6MJrQQdn8yniZNqwCgJBcgkx oJOg== X-Gm-Message-State: AC+VfDwpqRncHYYicV7BLpZMsZU1zX9vcRahnu+lsAgqQG70HoNO0brK u+kWGKGwr2lPO7E9s3HgDUZcmZ4I4Hc= X-Google-Smtp-Source: ACHHUZ4fg57EBL71dETyzTYvmfYARRtcAcAgprsRCkkOao+Pk/OMZOGG/RC1kPPiOtnzXiJ1iQ0Fog== X-Received: by 2002:a19:740f:0:b0:4f3:802c:7725 with SMTP id v15-20020a19740f000000b004f3802c7725mr2426920lfe.30.1686162352400; Wed, 07 Jun 2023 11:25:52 -0700 (PDT) Received: from smtpclient.apple ([188.187.60.230]) by smtp.gmail.com with ESMTPSA id c24-20020ac25318000000b004eefdd8b37fsm1876015lfh.194.2023.06.07.11.25.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 07 Jun 2023 11:25:51 -0700 (PDT) From: Vitaliy Gusev Message-Id: <986A83D8-E0E0-4030-9369-A5B3500E27C6@gmail.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_95CB93A3-5926-4337-8EBD-2B9AD8CCD631" List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Subject: Re: BHYVE SNAPSHOT image format proposal Date: Wed, 7 Jun 2023 21:25:41 +0300 In-Reply-To: Cc: virtualization@freebsd.org, freebsd-hackers@freebsd.org To: =?utf-8?Q?Corvin_K=C3=B6hne?= References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <6b98da58a5bd8e83bc466efa20b5a900298210aa.camel@FreeBSD.org> <8387AC83-6667-48E5-A3FA-11475EA96A5F@gmail.com> X-Mailer: Apple Mail (2.3731.600.7) X-Spamd-Result: default: False [-3.30 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-0.98)[-0.978]; NEURAL_HAM_SHORT(-0.82)[-0.818]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; MV_CASE(0.50)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; MID_RHS_MATCH_FROM(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::134:from]; RCVD_COUNT_THREE(0.00)[3]; FREEMAIL_FROM(0.00)[gmail.com]; TO_DN_SOME(0.00)[]; TAGGED_FROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+,1:+,2:~]; MLMMJ_DEST(0.00)[virtualization@freebsd.org,freebsd-hackers@freebsd.org] X-Rspamd-Queue-Id: 4Qbwl81n9lz3pKr X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail=_95CB93A3-5926-4337-8EBD-2B9AD8CCD631 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Corvin,=20 > On 6 Jun 2023, at 15:59, Corvin K=C3=B6hne = wrote: >> ... >=20 > We may have different version of the format from the same produce. > IMHO, it makes sense to have a dedicated IDENT and VERSION field to > easily figure out >=20 > 1) if the producer of the image is known > 2) if we support that version of the producer >=20 > Even if you allocated a huge amount of free space, someone would need > more. So, what do you think about this format: >=20 > = +---------------------------------------------------------------------+ > | IDENT - 16 BYTES = | > = +-------------------+-----------------------+-------------------------+ > | VERSION - 4 BYTES | NVLIST SIZE - 4 BYTES | NVLIST OFFSET - 8 BYTES = | > = +-------------------+-----------------------+-------------------------+ > | POSSIBLE FREE SPACE (e.g. for custom data, alignment etc.) = | > = +---------------------------------------------------------------------+ > | NVLIST DATA = | > = +---------------------------------------------------------------------+ > | POSSIBLE FREE SPACE (for whatever reason) = | > = +---------------------------------------------------------------------+ > | SNAPSHOT DATA = | > = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94+ >=20 Note, simple string "BHYVE CHECKPOINT IMAGE=E2=80=9D has 22 bytes. So 16 = bytes seems too small.=20 So I would not to complicate a header first.=20 I would rather describe ideas, conditions and then solutions: Need to distinguish snapshot image file from other files.=20 Solution: Header should have "magic id=E2=80=9D. Need a barrier for resuming if image "is not ours=E2=80=9D. Idea is not = to allow to resume images from other producers. The reason to have it and use it instead of header versioning:=20 Imagine that mainstream has its own implementation and company=E2=80=99s = fork repo has its own implementation. How to ensure that the versions in = an image file are ours and not somebody=E2=80=99s else? Solution: Header should have =E2=80=9CProducer id=E2=80=9D string. Example: Snapshot image file has empty Producer string , but bhyve has = current Producer as =E2=80=9CMYCOMPANY=E2=80=9D. Strings are not equal, = resume must fail. The Rule above does not restrict getting/decoding data from an image = file. It should be possible to look at an image file and analyse = internals, to get/decode values, etc. Solution: Have additional option either in bhyve or bhyvectl to get into = image file. Following nvlist header data should have a short information about image = file and its internals. Solution: NV HEADER can have several sections: =E2=80=9Cconfig=E2=80=9D, = =E2=80=9Ckernel=E2=80=9D, =E2=80=9Cdevices=E2=80=9D, =E2=80=9Cmemory=E2=80= =9D, =E2=80=A6 Versioning of NV HEADER. Idea is to have an information in advance = whether it is possible to be resumed or not. In other words, before do = resume, get information about ability to resume. Solution: Each Section should have =E2=80=9Cversion=E2=80=9D and = =E2=80=9Csubversion=E2=80=9D. While =E2=80=9Cversion=E2=80=9D is = responsible for both types of compatibility: backward and forward, = =E2=80=9Csubversion=E2=80=9D is for forward compatibility only. Rules for check: If bh_version =3D=3D version && bh_subversion >=3D subversion = then Bhyve able to resume the Section Else Bhyve cannot resume the Section Endif Example 1: Section in image has =E2=80=9Cversion=3D1", = =E2=80=9Csubversion=3D5=E2=80=9D, Bhyve has =E2=80=9Cversion=3D1", = =E2=80=9Csubversion=3D6". That means, bhyve can resume the Section. Example 2: The same image Section, but bhyve has =E2=80=9Cversion=3D1", = =E2=80=9Csubversion=3D4". Bhyve cannot resume the Section. Example 3: The same image Section, but bhyve has =E2=80=9Cversion=3D2", = =E2=80=9Csubversion=3D5=E2=80=9D. Bhyve cannot resume the Section. Rules for increasing versions: - If during code-change =E2=80=9Cbackward=E2=80=9D compatibility is = broken, =E2=80=9Cversion=E2=80=9D should be increased and = =E2=80=9Csubversion=E2=80=9D is set to 0. - If during code-change =E2=80=9Cforward=E2=80=9D compatibility is = broken, =E2=80=9Csubversion=E2=80=9D should be increased. Other versioning in HEADER is redundant. If something is changed in the = format, =E2=80=9Cmagic id=E2=80=9D can be changed appropriately. Solution: =E2=80=9Cmagic id=E2=80=9D should be stable and not changed = for a long time. As result I would suggest to give at least 32 bytes for "magic id=E2=80=9D= / ident and 32 bytes for =E2=80=9Cproducer id=E2=80=9D. Format of entire image file can be: +-----------------------------------------------------------------+ | HEADER MAGIC ID - 32 BYTES | +-----------------------------------------------------------------+ | HEADER PRODUCER ID - 32 BYTES | = +=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94--------------= --=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= --=E2=80=94-----------------------+ | NVLIST HEADER SIZE - 4 BYTES |=20 +-----------------------------------------------------------------+ | NVLIST HEADER DATA (SECTIONS) | +-----------------------------------------------------------------+ | SNAPSHOT DATA | +-----------------------------------------------------------------+ MAGIC ID: should be hardcoded: "BHYVE CHECKPOINT IMAGE=E2=80=9D. PRODUCER ID: can be empty and supported by producer, i.e. reserved.=20 NVLIST HEADER SIZE: has enough dimension, but in general size is less = than 4KB NVLIST HEADER DATA: Packed nvlist data, contains Sections: = =E2=80=9Cconfig=E2=80=9D, =E2=80=9Ckernel=E2=80=9D, =E2=80=9Cdevices=E2=80= =9D, =E2=80=9Cmemory=E2=80=9D, =E2=80=A6 : [config] offset =3D 0x1000 (4096) size =3D 0x1f6 (502) type =3D text vers =3D 1 subvers =3D 5 [kernel] offset =3D 0x11f6 (4598) size =3D 0x19a7 (6567) type =3D nvlist vers =3D 1 subvers =3D 0 [devices] offset =3D 0x2b9d (11165) size =3D 0x10145ba (16860602) type =3D nvlist vers =3D 2 subvers =3D 1 [memory] offset =3D 0x1200000 (18874368) size =3D 0x3ce00000 (1021313024) type =3D pages vers =3D 1 subvers =3D 0=20 I hope I gained a whole understanding. Thanks, Vitaliy Gusev --Apple-Mail=_95CB93A3-5926-4337-8EBD-2B9AD8CCD631 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi = Corvin, 

On 6 Jun 2023, at 15:59, = Corvin K=C3=B6hne <corvink@FreeBSD.org> = wrote:
...

We may have different version of the = format from the same produce.
IMHO, it makes sense to have a = dedicated IDENT and VERSION field to
easily figure out

1) if = the producer of the image is known
2) if we support that version of = the producer

Even if you allocated a huge amount of free space, = someone would need
more. So, what do you think about this = format:

+----------------------------------------------------------= -----------+
| IDENT - 16 BYTES =             &n= bsp;           &nbs= p;            =             &n= bsp; |
+-------------------+-----------------------+--------------= -----------+
| VERSION - 4 BYTES | NVLIST SIZE - 4 BYTES | NVLIST = OFFSET - 8 BYTES = |
+-------------------+-----------------------+------------------------= -+
| POSSIBLE FREE SPACE (e.g. for custom data, alignment etc.) =          |
+--------------= -------------------------------------------------------+
| NVLIST = DATA =             &n= bsp;           &nbs= p;            =             &n= bsp;      |
+----------------------------= -----------------------------------------+
| POSSIBLE FREE SPACE (for = whatever reason) =             &n= bsp;           &nbs= p; |
+------------------------------------------------------------= ---------+
| SNAPSHOT DATA =             &n= bsp;           &nbs= p;            =             &n= bsp;    |
+=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2= =80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94=E2=80=94=E2=80=94=E2=80=94+


Note, = simple string "BHYVE CHECKPOINT = IMAGE=E2=80=9D has 22 bytes. So 16 bytes seems
too = small. 

So I would not to complicate a header = first. 

I would rather describe ideas, = conditions and then solutions:

  1. Need to distinguish snapshot image file = from other files.
    Solution: Header should have "magic = id=E2=80=9D.

  2. Need a barrier for resuming if image = "is not ours=E2=80=9D. Idea is not to allow to resume images from other = producers.

    The reason to have it and use it instead of header = versioning:

    Imagine that mainstream has its own implementation = and company=E2=80=99s fork repo has its own implementation. How to = ensure that the versions in an image file are ours = and not somebody=E2=80=99s else?

    Solution:  Header = should have =E2=80=9CProducer id=E2=80=9D string.

    Example: = Snapshot image file has empty Producer string , but bhyve has current = Producer as =E2=80=9CMYCOMPANY=E2=80=9D. Strings are not equal, resume = must fail.

  3. The Rule above does not restrict = getting/decoding data from an image file. It should be possible to look = at an image file and analyse internals, to get/decode values, = etc.
    Solution: Have additional option either in bhyve or = bhyvectl to get into image file.

  4. Following nvlist header = data should have a short information about image file and its = internals.
    Solution: NV HEADER can have several sections: = =E2=80=9Cconfig=E2=80=9D, =E2=80=9Ckernel=E2=80=9D, =E2=80=9Cdevices=E2=80= =9D, =E2=80=9Cmemory=E2=80=9D, =E2=80=A6

  5. Versioning of = NV HEADER. Idea is to have an information in advance whether it is = possible to be resumed or not. In other words, before do resume, get = information about ability to resume.

    Solution: Each = Section should have =E2=80=9Cversion=E2=80=9D  and = =E2=80=9Csubversion=E2=80=9D. While =E2=80=9Cversion=E2=80=9D is = responsible for both  types of compatibility: backward and forward, = =E2=80=9Csubversion=E2=80=9D is for forward compatibility = only.

    Rules for check:
            If = bh_version =3D=3D version && bh_subversion >=3D subversion =  then
                    =   Bhyve able to resume the Section
            = Else
                      = Bhyve cannot resume the Section
            = Endif

    Example 1
    : Section in image has =E2=80=9Cversion=3D= 1", =E2=80=9Csubversion=3D5=E2=80=9D,  Bhyve has =E2=80=9Cversion=3D1= ", =E2=80=9Csubversion=3D6". That means, bhyve can resume the = Section.

    Example 2: The = same image Section, but bhyve has =E2=80=9Cversion=3D1", = =E2=80=9Csubversion=3D4". Bhyve cannot resume the Section.

    Example 3: The same image Section, = but bhyve has =E2=80=9Cversion=3D2", =E2=80=9Csubversion=3D5=E2=80=9D. = Bhyve cannot resume the Section.

    Rules for increasing = versions:
      -  If during code-change =E2=80=9Cbackward=E2= =80=9D compatibility is broken, =E2=80=9Cversion=E2=80=9D should be = increased and =E2=80=9Csubversion=E2=80=9D is set to 0.
      - =  If during code-change =E2=80=9Cforward=E2=80=9D compatibility is = broken, =E2=80=9Csubversion=E2=80=9D should be = increased.

  6. Other versioning in HEADER is redundant. If = something is changed in the format, =E2=80=9Cmagic id=E2=80=9D can be = changed appropriately.
    Solution: =E2=80=9Cmagic id=E2=80=9D = should be stable and not changed for a long = time.


As result I would suggest to = give at least 32 bytes for "magic id=E2=80=9D / ident and 32 bytes for = =E2=80=9Cproducer id=E2=80=9D.

Format of entire image = file can be:


+--------------------------------------------------------------= ---+
|             =   HEADER MAGIC ID     - 32 BYTES       =              |
+--------------------------------------------------------------= ---+
|             =   HEADER PRODUCER ID  - 32 BYTES         =            |
+=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94= =E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94--= --------------=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80=94=E2=80= =94=E2=80=94--=E2=80=94-----------------------+
|   =             NVLIST HEADER SIZE  - =  4 BYTES                   =  | 
+--------------------------------------------------------------= ---+
|             =   NVLIST HEADER DATA (SECTIONS)           =           |
+--------------------------------------------------------------= ---+
|             =           SNAPSHOT DATA       =                     =   = |
+-----------------------------------------------------------------+


MAGIC ID: should be hardcoded: "BHYVE CHECKPOINT = IMAGE=E2=80=9D.

PRODUCER ID: can = be empty and supported by producer, i.e. = reserved. 

NVLIST HEADER SIZE: has = enough dimension, but in general size is less than = 4KB

NVLIST HEADER DATA: Packed nvlist data, = contains Sections:  =E2=80=9Cconfig=E2=80=9D, =E2=80=9Ckernel=E2=80=9D= , =E2=80=9Cdevices=E2=80=9D, =E2=80=9Cmemory=E2=80=9D, =E2=80=A6 = :

[config]

    =     offset =3D 0x1000 (4096)

    =     size =3D 0x1f6 (502)

    =     type =3D text

vers =3D = 1
= subvers =3D 5

[kernel]

    =     offset =3D 0x11f6 (4598)

    =     size =3D 0x19a7 (6567)

    =     type =3D nvlist

vers =3D 1
subvers =3D 0

[devices]

    =     offset =3D 0x2b9d (11165)

    =     size =3D 0x10145ba (16860602)

    =     type =3D nvlist

vers =3D 2
subvers =3D 1

[memory]

    =     offset =3D 0x1200000 (18874368)

    =     size =3D 0x3ce00000 (1021313024)

        type = =3D pages

vers =3D 1
subvers =3D = 0
 



I hope = I gained a whole understanding.
Thanks,
Vitaliy = Gusev

= --Apple-Mail=_95CB93A3-5926-4337-8EBD-2B9AD8CCD631--