Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 May 2023 19:33:48 +0200
From:      Mario Marietto <marietto2008@gmail.com>
To:        Vitaliy Gusev <gusev.vitaliy@gmail.com>
Cc:        Tomek CEDRO <tomek@cedro.info>, virtualization@freebsd.org,  freebsd-hackers@freebsd.org
Subject:   Re: BHYVE SNAPSHOT image format proposal
Message-ID:  <CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g@mail.gmail.com>
In-Reply-To: <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com>
References:  <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <CAFYkXjng1LWy5wVyTnSo0xrEWOy%2BOx9ZjLcmFqQs5EVpT8J_uA@mail.gmail.com> <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000038930005fc73ea5e
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

@gusev.vitaliy@gmail.com <gusev.vitaliy@gmail.com> : Do you want to explain
to me how to test the new "snapshot" feature ? I'm interested to test and
stress it on my system. Is it ready to be used ?

On Wed, May 24, 2023 at 5:11=E2=80=AFPM Vitaliy Gusev <gusev.vitaliy@gmail.=
com>
wrote:

> Hi Tomek,
>
> Try to answer to the all questions below, please let me know if I miss
> some important.
>
>
> On 23 May 2023, at 21:58, Tomek CEDRO <tomek@cedro.info> wrote:
>
> On Tue, May 23, 2023 at 6:06=E2=80=AFPM Vitaliy Gusev wrote:
>
> Hi,
> Here is a proposal for bhyve snapshot/checkpoint image format improvement=
s.
> It implies moving snapshot code to nvlist engine.
>
>
> Hey there Vitaliy :-) bhyve getting more and more traction, I am new
> user of bhyve and no expert, but new and missing features are welcome
> I guess.. there was a discussion on the mailing lists recently on
> better snapshots mechanism :-)
>
>
> Current snapshot implementation has disadvantages:
> 3 files per snapshot: .meta, .kern, vram
>
>
> No problem, unless new single file will be protected against
> corruption (filesystem, transfer, application crash) and possible to
> be easily and cheaply modified in place?
>
>
> Current snapshot implementation doesn=E2=80=99t have it. I would say more=
, current
> pkg implementation doesn=E2=80=99t track/notify if some of files are chan=
ged.
> Binary files on a
> system can be changed, for example ELF files, without any notification.
>
> Tar doesn=E2=80=99t have protection for keeping data.  Some filesystems l=
ike ZFS
> guarantee that data is not modified by underlying disks.
>
> Protecting requires more efforts and it should be clearly defined: what i=
s
> purpose. If
> purpose is having checksum with 99.9% reliability, NVLIST HEADER can be
> widen
> to have =E2=80=9Cchecksum=E2=80=9D key/value for a Section.
>
> If purpose is having crypto verification - I believe sha256 program shoul=
d
> be your choice.
>
>
> Binary Stream format of data.
>
>
> This is small and fast? Will new format too?
>
>
> Small is not so perfect. As the first attempt snapshot code is good. But
> if you want to get
> values related to some specific device, for example, for NIC or HPET, you
> cannot get it easily. Please
> try :)
>
> Stream doesn=E2=80=99t have flexibility. It is good for well specified  a=
nd long
> long time discussed protocols
> like XDR (NFS), when it has RFC and each position in the stream is
> described. Example: RFC1813.
>
> New format with NVLIST has flexibility and is fast enough. Note, ZFS uses
> nvlist for keeping attributes
> and more another things.
>
>
> Adding  optional variable - breaks resume
> Removing variable - breaks resume
> Changing saved order of variables - breaks resume
>
>
> Obviously need improvement :-)
>
> Hard to get information about what is saved and decode.
> Hard to debug if somethings goes wrong
>
>
> Additional tools missing? Will new format allow text editor interaction?
>
>
> Why do you need modify snapshot image ? Could you describe more? Do you
> modify current 3 snapshot files?
>
>
> No versions. If change code, resume of an old images can be
> passed, but with UB.
>
>
> Is new format future proof and provides backward compatibility?
>
>
> Intention of moving to the new format - to have backward compatibility if
> some code
> is changed:
>
>
>    - Adding optional variable
>    - Removing variable that is not used anymore
>    - Change order of saving variables
>    - =E2=80=9CHot Fixes=E2=80=9D.
>
>
> If changes are critical and are incompatible, restore stage should have
> clear information about
> incompatibility and break resume. Ideally it should be able to get
> informed even before starting
> restore process. For this purpose, the new format introduce versions.
>
>
>
> New nvlist implementation should solve all things above. The first step -
> improve snapshot/checkpoint saving format. It eliminates three files usag=
e
> per a snapshot.
>
> (..)
>
>
> So this will be new text config based format with variable =3D value and
> sections?
>
>
> This is NVLIST approach with key=3Dvalue, where key is string, and value =
can
> be
> Integer, array, string, etc.
>
>
> How much bigger will be the overal file size increase?
>
>
> Not so huge. NVLIST internals is well specified. For example, for my VM
>
>   [kernel]
>
>         kernel.offset =3D 0x11f6 (4598)
>
>         kernel.size =3D 0x19a7 (6567)
>
>         kernel.type =3D =E2=80=9Cnvlist"
>
>   [devices]
>
>         devices.offset =3D 0x2b9d (11165)
>
>         devices.size =3D 0x10145ba (16860602)
>
>         devices.type =3D =E2=80=9Cnvlist=E2=80=9D
>
> So packed size for *kernel*  is 6567 bytes, for *devices*  is 16860602
> including
> framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size
> 83386 bytes.
>
>
>
> How much longer it will take do decode/encode/process files?
>
>
> It is fast, just several milliseconds. NVLIST is very fast format. It is
> already integrated
> into bhyve as Config engine.
>
>
>
> What is the possibility of format change and backward/foward compatibilit=
y?
>
>
> If you are talking about compatibility of a Image format - it should be
> compatible in
> both directions, at least for not so big format changes.
>
> If consider overall snapshot/resume compatibility - I believe  forward
> compatibility
> is not case and target. Indeed, why do you need  to resume an image
> created by
> a higher version of a program?
>
> The most important thing - backward compatibility, i.e. when an image is
> created
> by an older version of a program, but should be resumed on a new one.
>
> This is target and and intention of this improvement.
>
>
> Have you considered efficiency comparison of current format, proposed
> format, and maybe using SQLITE or JSON storage/parsers?  For instance
> sqlite would be blazingly fast but hard to migrate. json would be most
> versatile but more time/memory consuming?
>
>
> Yes, I know about another formats, like JSON or others. NVLIST is the mos=
t
> effective and suitable for the current purposes.
>
>
> Maybe EFL approach of storing configuration files for limited
> resources embedded system storage that use binary storage data but can
> be decompressed in chunks that can be replaced in place?
> https://www.enlightenment.org/develop/efl/start
>
>
> There are many things that can be used, but it should be well known, easy=
,
> stable,
> fast and supportable. I believe NVLIST is the best choice.
>
>
> Sorry for asking those questions but there may be already good and
> verified solutions out there not to reinvent the wheel? :-)
>
>
> Thank you for your questions. If you would like, you can try to test the
> new implementation and give feedback.
>
> =E2=80=94=E2=80=94=E2=80=94
> Vitaliy Gusev
>
>

--=20
Mario.

--00000000000038930005fc73ea5e
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><a class=3D"gmail_plusreply" id=3D"plusReplyChip-1" h=
ref=3D"mailto:gusev.vitaliy@gmail.com" tabindex=3D"-1">@gusev.vitaliy@gmail=
.com</a> : Do you want to explain to me how to test the new &quot;snapshot&=
quot; feature ? I&#39;m interested to test and stress it on my system. Is i=
t ready to be used ?</div></div><br><div class=3D"gmail_quote"><div dir=3D"=
ltr" class=3D"gmail_attr">On Wed, May 24, 2023 at 5:11=E2=80=AFPM Vitaliy G=
usev &lt;<a href=3D"mailto:gusev.vitaliy@gmail.com" target=3D"_blank">gusev=
.vitaliy@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote=
" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);=
padding-left:1ex"><div>Hi Tomek,<div><br></div><div>Try to answer to the al=
l questions below, please let me know if I miss some important.</div><div><=
br></div><div><div><br><blockquote type=3D"cite"><div>On 23 May 2023, at 21=
:58, Tomek CEDRO &lt;<a href=3D"mailto:tomek@cedro.info" target=3D"_blank">=
tomek@cedro.info</a>&gt; wrote:</div><br><div><div>On Tue, May 23, 2023 at =
6:06=E2=80=AFPM Vitaliy Gusev wrote:<br><blockquote type=3D"cite">Hi,<br>He=
re is a proposal for bhyve snapshot/checkpoint image format improvements.<b=
r>It implies moving snapshot code to nvlist engine.<br></blockquote><br>Hey=
 there Vitaliy :-) bhyve getting more and more traction, I am new<br>user o=
f bhyve and no expert, but new and missing features are welcome<br>I guess.=
. there was a discussion on the mailing lists recently on<br>better snapsho=
ts mechanism :-)<br><br><br><blockquote type=3D"cite">Current snapshot impl=
ementation has disadvantages:<br>3 files per snapshot: .meta, .kern, vram<b=
r></blockquote><br>No problem, unless new single file will be protected aga=
inst<br>corruption (filesystem, transfer, application crash) and possible t=
o<br>be easily and cheaply modified in place?<br></div></div></blockquote><=
div><br></div><div>Current snapshot implementation doesn=E2=80=99t have it.=
 I would say more, current</div><div>pkg implementation doesn=E2=80=99t tra=
ck/notify if some of files are changed.=C2=A0 Binary files on a</div><div>s=
ystem can be changed, for example ELF files, without any notification.</div=
><div><div><br></div><div>Tar doesn=E2=80=99t have protection for keeping d=
ata.=C2=A0 Some filesystems like ZFS</div><div>guarantee that data is not m=
odified by underlying disks.</div></div><div><br></div><div>Protecting requ=
ires more efforts and it should be clearly defined: what is purpose. If</di=
v><div>purpose is having checksum with 99.9%=C2=A0<span style=3D"background=
-color:rgb(255,255,255)"><font face=3D"arial, sans-serif" color=3D"#202124"=
><span style=3D"white-space:nowrap">reliability, NVLIST HEADER can be widen=
</span></font></span></div><div><span style=3D"background-color:rgb(255,255=
,255)"><font face=3D"arial, sans-serif" color=3D"#202124"><span style=3D"wh=
ite-space:nowrap">to have =E2=80=9Cchecksum=E2=80=9D key/value for a Sectio=
n.</span></font></span></div><div><span style=3D"background-color:rgb(255,2=
55,255)"><font face=3D"arial, sans-serif" color=3D"#202124"><span style=3D"=
white-space:nowrap"><br></span></font></span></div><div><span style=3D"back=
ground-color:rgb(255,255,255)"><font face=3D"arial, sans-serif" color=3D"#2=
02124"><span style=3D"white-space:nowrap">If purpose=C2=A0is having crypto =
verification - I believe sha256 program should be your choice.</span></font=
></span></div><div><br></div><blockquote type=3D"cite"><div><div><br><block=
quote type=3D"cite">Binary Stream format of data.<br></blockquote><br>This =
is small and fast? Will new format too?<br></div></div></blockquote><div><b=
r></div>Small is not so perfect. As the first attempt snapshot code is good=
. But if you want to get</div><div>values related to some specific device, =
for example, for NIC or HPET, you cannot get it easily. Please</div><div>tr=
y :)</div><div><div><br></div><div>Stream doesn=E2=80=99t have flexibility.=
 It is good for well specified =C2=A0and long long time discussed protocols=
</div><div>like XDR (NFS), when it has RFC and each position in the stream =
is described. Example: RFC1813.</div><div><br></div><div>New format with NV=
LIST has flexibility and is fast enough. Note, ZFS uses nvlist for keeping =
attributes=C2=A0</div><div>and more another things.</div><div><br></div><di=
v><br></div><blockquote type=3D"cite"><div><div><blockquote type=3D"cite">A=
dding =C2=A0optional variable - breaks resume<br>Removing variable - breaks=
 resume<br>Changing saved order of variables - breaks resume<br></blockquot=
e><br>Obviously need improvement :-)<br><br><blockquote type=3D"cite">Hard =
to get information about what is saved and decode.<br>Hard to debug if some=
things goes wrong<br></blockquote><br>Additional tools missing? Will new fo=
rmat allow text editor interaction?<br></div></div></blockquote><div><br></=
div>Why do you need modify snapshot image ? Could you describe more? Do you=
</div><div>modify current 3 snapshot files?</div><div><br><br><blockquote t=
ype=3D"cite"><div><div><blockquote type=3D"cite">No versions. If change cod=
e, resume of an old images can be<br>passed, but with UB.<br></blockquote><=
br>Is new format future proof and provides backward compatibility?<br></div=
></div></blockquote><div><br></div>Intention of moving to the new format - =
to have backward compatibility if some code</div><div>is changed:</div></di=
v><blockquote style=3D"margin:0px 0px 0px 40px;border:medium none;padding:0=
px"><div><ul><li>Adding optional variable=C2=A0</li><li>Removing variable t=
hat is not used anymore</li><li>Change order of saving variables</li><li>=
=E2=80=9CHot Fixes=E2=80=9D.</li></ul></div></blockquote><div><div><br></di=
v><div>If changes are critical and are incompatible, restore stage should h=
ave clear information about</div><div>incompatibility and break resume. Ide=
ally it should be able to get informed even before starting</div><div>resto=
re process. For this purpose, the new format introduce versions.</div><div>=
<div><br></div><br><blockquote type=3D"cite"><div><div><br><blockquote type=
=3D"cite">New nvlist implementation should solve all things above. The firs=
t step -<br>improve snapshot/checkpoint saving format. It eliminates three =
files usage<br>per a snapshot.<br><br>(..)<br></blockquote><br>So this will=
 be new text config based format with variable =3D value and sections?<br><=
/div></div></blockquote><div><br></div>This is NVLIST approach with key=3Dv=
alue, where key is string, and value can be</div><div>Integer, array, strin=
g, etc.</div><div><br><blockquote type=3D"cite"><div><div><br>How much bigg=
er will be the overal file size increase?<br></div></div></blockquote><div>=
<br></div>Not so huge. NVLIST internals is well specified. For example, for=
 my VM</div><div><br></div><div><p style=3D"margin:0px;font-stretch:normal;=
font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;f=
ont-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:norm=
al;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-=
settings:normal"><span style=3D"font-variant-ligatures:no-common-ligatures"=
>=C2=A0 [kernel]</span></p><p style=3D"margin:0px;font-stretch:normal;font-=
size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-k=
erning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;fo=
nt-variant-numeric:normal;font-variant-east-asian:normal;font-feature-setti=
ngs:normal"><span style=3D"font-variant-ligatures:no-common-ligatures">=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 kernel.offset =3D 0x11f6 (4598)</span></p><p style=
=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-f=
amily:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates=
:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-vari=
ant-east-asian:normal;font-feature-settings:normal"><span style=3D"font-var=
iant-ligatures:no-common-ligatures">=C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel.size=
 =3D 0x19a7 (6567)</span></p><p style=3D"margin:0px;font-stretch:normal;fon=
t-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font=
-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;=
font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-set=
tings:normal"><span style=3D"font-variant-ligatures:no-common-ligatures">=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel.type =3D =E2=80=9Cnvlist&quot;</span></p=
><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:norm=
al;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-a=
lternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;=
font-variant-east-asian:normal;font-feature-settings:normal"><span style=3D=
"font-variant-ligatures:no-common-ligatures">=C2=A0 [devices]</span></p><p =
style=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:normal;f=
ont-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alter=
nates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font=
-variant-east-asian:normal;font-feature-settings:normal"><span style=3D"fon=
t-variant-ligatures:no-common-ligatures">=C2=A0 =C2=A0 =C2=A0 =C2=A0 device=
s.offset =3D 0x2b9d (11165)</span></p><p style=3D"margin:0px;font-stretch:n=
ormal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:=
none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligature=
s:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-fe=
ature-settings:normal"><span style=3D"font-variant-ligatures:no-common-liga=
tures">=C2=A0 =C2=A0 =C2=A0 =C2=A0 devices.size =3D 0x10145ba (16860602)</s=
pan></p><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-heig=
ht:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-va=
riant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:=
normal;font-variant-east-asian:normal;font-feature-settings:normal"><span s=
tyle=3D"font-variant-ligatures:no-common-ligatures">=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 devices.type =3D =E2=80=9Cnvlist=E2=80=9D</span></p><br></div><div>S=
o packed size for <i>kernel</i>=C2=A0 is <font size=3D"2" face=3D"Menlo">65=
67</font> bytes, for <i>devices</i>=C2=A0 is=C2=A0<span style=3D"font-famil=
y:Menlo"><font size=3D"2">16860602</font></span> including</div><div>frameb=
uffer 16MB. If remove fbuf, packed nvlist devices Section has size=C2=A0<sp=
an style=3D"font-family:Menlo"><font size=3D"2">83386</font></span>=C2=A0by=
tes.</div><div><br></div><div><br></div><div><blockquote type=3D"cite"><div=
><div><br>How much longer it will take do decode/encode/process files?<br><=
/div></div></blockquote><div><br></div><div>It is fast, just several millis=
econds. NVLIST is very fast format. It is already integrated</div><div>into=
 bhyve as Config engine.</div><div><br></div><br><blockquote type=3D"cite">=
<div><div><br>What is the possibility of format change and backward/foward =
compatibility?<br></div></div></blockquote><div><br></div><div>If you are t=
alking about compatibility of a Image format - it should be compatible in</=
div><div>both directions, at least for not so big format changes.</div><div=
><br></div><div><div>If consider overall snapshot/resume compatibility - I =
believe =C2=A0forward compatibility</div><div>is not case and target. Indee=
d, why do you need =C2=A0to resume an image created by</div><div>a higher v=
ersion of a program?=C2=A0</div><div><br></div><div>The most important thin=
g - backward compatibility, i.e. when an image is created</div><div>by an o=
lder version of a program, but should be resumed on a new one.</div></div><=
div><br></div><div>This is target and and intention of this improvement.</d=
iv><div><br></div><blockquote type=3D"cite"><div><div><br>Have you consider=
ed efficiency comparison of current format, proposed<br>format, and maybe u=
sing SQLITE or JSON storage/parsers?=C2=A0 For instance<br>sqlite would be =
blazingly fast but hard to migrate. json would be most<br>versatile but mor=
e time/memory consuming?<br></div></div></blockquote><div><br></div><div>Ye=
s, I know about another formats, like JSON or others. NVLIST is the most</d=
iv><div>effective and suitable for the current purposes.</div><div><br></di=
v><blockquote type=3D"cite"><div><div><br>Maybe EFL approach of storing con=
figuration files for limited<br>resources embedded system storage that use =
binary storage data but can<br>be decompressed in chunks that can be replac=
ed in place?<br><a href=3D"https://www.enlightenment.org/develop/efl/start"=
 target=3D"_blank">https://www.enlightenment.org/develop/efl/start</a><br><=
/div></div></blockquote><div><br></div><div>There are many things that can =
be used, but it should be well known, easy, stable,</div><div>fast and supp=
ortable. I believe NVLIST is the best choice.</div><br><blockquote type=3D"=
cite"><div><div><br>Sorry for asking those questions but there may be alrea=
dy good and<br>verified solutions out there not to reinvent the wheel? :-)<=
br></div></div></blockquote><div><br></div>Thank you for your questions. If=
 you would like, you can try to test the new implementation and give feedba=
ck.</div><div><br></div><div>=E2=80=94=E2=80=94=E2=80=94</div><div>Vitaliy =
Gusev</div><div><br></div></div></div></blockquote></div><br clear=3D"all">=
<br><span>-- </span><br><div dir=3D"ltr">Mario.<br></div>

--00000000000038930005fc73ea5e--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g>