Date: Wed, 24 May 2023 19:33:48 +0200 From: Mario Marietto <marietto2008@gmail.com> To: Vitaliy Gusev <gusev.vitaliy@gmail.com> Cc: Tomek CEDRO <tomek@cedro.info>, virtualization@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: BHYVE SNAPSHOT image format proposal Message-ID: <CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g@mail.gmail.com> In-Reply-To: <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com> References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <CAFYkXjng1LWy5wVyTnSo0xrEWOy%2BOx9ZjLcmFqQs5EVpT8J_uA@mail.gmail.com> <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--00000000000038930005fc73ea5e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable @gusev.vitaliy@gmail.com <gusev.vitaliy@gmail.com> : Do you want to explain to me how to test the new "snapshot" feature ? I'm interested to test and stress it on my system. Is it ready to be used ? On Wed, May 24, 2023 at 5:11=E2=80=AFPM Vitaliy Gusev <gusev.vitaliy@gmail.= com> wrote: > Hi Tomek, > > Try to answer to the all questions below, please let me know if I miss > some important. > > > On 23 May 2023, at 21:58, Tomek CEDRO <tomek@cedro.info> wrote: > > On Tue, May 23, 2023 at 6:06=E2=80=AFPM Vitaliy Gusev wrote: > > Hi, > Here is a proposal for bhyve snapshot/checkpoint image format improvement= s. > It implies moving snapshot code to nvlist engine. > > > Hey there Vitaliy :-) bhyve getting more and more traction, I am new > user of bhyve and no expert, but new and missing features are welcome > I guess.. there was a discussion on the mailing lists recently on > better snapshots mechanism :-) > > > Current snapshot implementation has disadvantages: > 3 files per snapshot: .meta, .kern, vram > > > No problem, unless new single file will be protected against > corruption (filesystem, transfer, application crash) and possible to > be easily and cheaply modified in place? > > > Current snapshot implementation doesn=E2=80=99t have it. I would say more= , current > pkg implementation doesn=E2=80=99t track/notify if some of files are chan= ged. > Binary files on a > system can be changed, for example ELF files, without any notification. > > Tar doesn=E2=80=99t have protection for keeping data. Some filesystems l= ike ZFS > guarantee that data is not modified by underlying disks. > > Protecting requires more efforts and it should be clearly defined: what i= s > purpose. If > purpose is having checksum with 99.9% reliability, NVLIST HEADER can be > widen > to have =E2=80=9Cchecksum=E2=80=9D key/value for a Section. > > If purpose is having crypto verification - I believe sha256 program shoul= d > be your choice. > > > Binary Stream format of data. > > > This is small and fast? Will new format too? > > > Small is not so perfect. As the first attempt snapshot code is good. But > if you want to get > values related to some specific device, for example, for NIC or HPET, you > cannot get it easily. Please > try :) > > Stream doesn=E2=80=99t have flexibility. It is good for well specified a= nd long > long time discussed protocols > like XDR (NFS), when it has RFC and each position in the stream is > described. Example: RFC1813. > > New format with NVLIST has flexibility and is fast enough. Note, ZFS uses > nvlist for keeping attributes > and more another things. > > > Adding optional variable - breaks resume > Removing variable - breaks resume > Changing saved order of variables - breaks resume > > > Obviously need improvement :-) > > Hard to get information about what is saved and decode. > Hard to debug if somethings goes wrong > > > Additional tools missing? Will new format allow text editor interaction? > > > Why do you need modify snapshot image ? Could you describe more? Do you > modify current 3 snapshot files? > > > No versions. If change code, resume of an old images can be > passed, but with UB. > > > Is new format future proof and provides backward compatibility? > > > Intention of moving to the new format - to have backward compatibility if > some code > is changed: > > > - Adding optional variable > - Removing variable that is not used anymore > - Change order of saving variables > - =E2=80=9CHot Fixes=E2=80=9D. > > > If changes are critical and are incompatible, restore stage should have > clear information about > incompatibility and break resume. Ideally it should be able to get > informed even before starting > restore process. For this purpose, the new format introduce versions. > > > > New nvlist implementation should solve all things above. The first step - > improve snapshot/checkpoint saving format. It eliminates three files usag= e > per a snapshot. > > (..) > > > So this will be new text config based format with variable =3D value and > sections? > > > This is NVLIST approach with key=3Dvalue, where key is string, and value = can > be > Integer, array, string, etc. > > > How much bigger will be the overal file size increase? > > > Not so huge. NVLIST internals is well specified. For example, for my VM > > [kernel] > > kernel.offset =3D 0x11f6 (4598) > > kernel.size =3D 0x19a7 (6567) > > kernel.type =3D =E2=80=9Cnvlist" > > [devices] > > devices.offset =3D 0x2b9d (11165) > > devices.size =3D 0x10145ba (16860602) > > devices.type =3D =E2=80=9Cnvlist=E2=80=9D > > So packed size for *kernel* is 6567 bytes, for *devices* is 16860602 > including > framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size > 83386 bytes. > > > > How much longer it will take do decode/encode/process files? > > > It is fast, just several milliseconds. NVLIST is very fast format. It is > already integrated > into bhyve as Config engine. > > > > What is the possibility of format change and backward/foward compatibilit= y? > > > If you are talking about compatibility of a Image format - it should be > compatible in > both directions, at least for not so big format changes. > > If consider overall snapshot/resume compatibility - I believe forward > compatibility > is not case and target. Indeed, why do you need to resume an image > created by > a higher version of a program? > > The most important thing - backward compatibility, i.e. when an image is > created > by an older version of a program, but should be resumed on a new one. > > This is target and and intention of this improvement. > > > Have you considered efficiency comparison of current format, proposed > format, and maybe using SQLITE or JSON storage/parsers? For instance > sqlite would be blazingly fast but hard to migrate. json would be most > versatile but more time/memory consuming? > > > Yes, I know about another formats, like JSON or others. NVLIST is the mos= t > effective and suitable for the current purposes. > > > Maybe EFL approach of storing configuration files for limited > resources embedded system storage that use binary storage data but can > be decompressed in chunks that can be replaced in place? > https://www.enlightenment.org/develop/efl/start > > > There are many things that can be used, but it should be well known, easy= , > stable, > fast and supportable. I believe NVLIST is the best choice. > > > Sorry for asking those questions but there may be already good and > verified solutions out there not to reinvent the wheel? :-) > > > Thank you for your questions. If you would like, you can try to test the > new implementation and give feedback. > > =E2=80=94=E2=80=94=E2=80=94 > Vitaliy Gusev > > --=20 Mario. --00000000000038930005fc73ea5e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr"><div><a class=3D"gmail_plusreply" id=3D"plusReplyChip-1" h= ref=3D"mailto:gusev.vitaliy@gmail.com" tabindex=3D"-1">@gusev.vitaliy@gmail= .com</a> : Do you want to explain to me how to test the new "snapshot&= quot; feature ? I'm interested to test and stress it on my system. Is i= t ready to be used ?</div></div><br><div class=3D"gmail_quote"><div dir=3D"= ltr" class=3D"gmail_attr">On Wed, May 24, 2023 at 5:11=E2=80=AFPM Vitaliy G= usev <<a href=3D"mailto:gusev.vitaliy@gmail.com" target=3D"_blank">gusev= .vitaliy@gmail.com</a>> wrote:<br></div><blockquote class=3D"gmail_quote= " style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);= padding-left:1ex"><div>Hi Tomek,<div><br></div><div>Try to answer to the al= l questions below, please let me know if I miss some important.</div><div><= br></div><div><div><br><blockquote type=3D"cite"><div>On 23 May 2023, at 21= :58, Tomek CEDRO <<a href=3D"mailto:tomek@cedro.info" target=3D"_blank">= tomek@cedro.info</a>> wrote:</div><br><div><div>On Tue, May 23, 2023 at = 6:06=E2=80=AFPM Vitaliy Gusev wrote:<br><blockquote type=3D"cite">Hi,<br>He= re is a proposal for bhyve snapshot/checkpoint image format improvements.<b= r>It implies moving snapshot code to nvlist engine.<br></blockquote><br>Hey= there Vitaliy :-) bhyve getting more and more traction, I am new<br>user o= f bhyve and no expert, but new and missing features are welcome<br>I guess.= . there was a discussion on the mailing lists recently on<br>better snapsho= ts mechanism :-)<br><br><br><blockquote type=3D"cite">Current snapshot impl= ementation has disadvantages:<br>3 files per snapshot: .meta, .kern, vram<b= r></blockquote><br>No problem, unless new single file will be protected aga= inst<br>corruption (filesystem, transfer, application crash) and possible t= o<br>be easily and cheaply modified in place?<br></div></div></blockquote><= div><br></div><div>Current snapshot implementation doesn=E2=80=99t have it.= I would say more, current</div><div>pkg implementation doesn=E2=80=99t tra= ck/notify if some of files are changed.=C2=A0 Binary files on a</div><div>s= ystem can be changed, for example ELF files, without any notification.</div= ><div><div><br></div><div>Tar doesn=E2=80=99t have protection for keeping d= ata.=C2=A0 Some filesystems like ZFS</div><div>guarantee that data is not m= odified by underlying disks.</div></div><div><br></div><div>Protecting requ= ires more efforts and it should be clearly defined: what is purpose. If</di= v><div>purpose is having checksum with 99.9%=C2=A0<span style=3D"background= -color:rgb(255,255,255)"><font face=3D"arial, sans-serif" color=3D"#202124"= ><span style=3D"white-space:nowrap">reliability, NVLIST HEADER can be widen= </span></font></span></div><div><span style=3D"background-color:rgb(255,255= ,255)"><font face=3D"arial, sans-serif" color=3D"#202124"><span style=3D"wh= ite-space:nowrap">to have =E2=80=9Cchecksum=E2=80=9D key/value for a Sectio= n.</span></font></span></div><div><span style=3D"background-color:rgb(255,2= 55,255)"><font face=3D"arial, sans-serif" color=3D"#202124"><span style=3D"= white-space:nowrap"><br></span></font></span></div><div><span style=3D"back= ground-color:rgb(255,255,255)"><font face=3D"arial, sans-serif" color=3D"#2= 02124"><span style=3D"white-space:nowrap">If purpose=C2=A0is having crypto = verification - I believe sha256 program should be your choice.</span></font= ></span></div><div><br></div><blockquote type=3D"cite"><div><div><br><block= quote type=3D"cite">Binary Stream format of data.<br></blockquote><br>This = is small and fast? Will new format too?<br></div></div></blockquote><div><b= r></div>Small is not so perfect. As the first attempt snapshot code is good= . But if you want to get</div><div>values related to some specific device, = for example, for NIC or HPET, you cannot get it easily. Please</div><div>tr= y :)</div><div><div><br></div><div>Stream doesn=E2=80=99t have flexibility.= It is good for well specified =C2=A0and long long time discussed protocols= </div><div>like XDR (NFS), when it has RFC and each position in the stream = is described. Example: RFC1813.</div><div><br></div><div>New format with NV= LIST has flexibility and is fast enough. Note, ZFS uses nvlist for keeping = attributes=C2=A0</div><div>and more another things.</div><div><br></div><di= v><br></div><blockquote type=3D"cite"><div><div><blockquote type=3D"cite">A= dding =C2=A0optional variable - breaks resume<br>Removing variable - breaks= resume<br>Changing saved order of variables - breaks resume<br></blockquot= e><br>Obviously need improvement :-)<br><br><blockquote type=3D"cite">Hard = to get information about what is saved and decode.<br>Hard to debug if some= things goes wrong<br></blockquote><br>Additional tools missing? Will new fo= rmat allow text editor interaction?<br></div></div></blockquote><div><br></= div>Why do you need modify snapshot image ? Could you describe more? Do you= </div><div>modify current 3 snapshot files?</div><div><br><br><blockquote t= ype=3D"cite"><div><div><blockquote type=3D"cite">No versions. If change cod= e, resume of an old images can be<br>passed, but with UB.<br></blockquote><= br>Is new format future proof and provides backward compatibility?<br></div= ></div></blockquote><div><br></div>Intention of moving to the new format - = to have backward compatibility if some code</div><div>is changed:</div></di= v><blockquote style=3D"margin:0px 0px 0px 40px;border:medium none;padding:0= px"><div><ul><li>Adding optional variable=C2=A0</li><li>Removing variable t= hat is not used anymore</li><li>Change order of saving variables</li><li>= =E2=80=9CHot Fixes=E2=80=9D.</li></ul></div></blockquote><div><div><br></di= v><div>If changes are critical and are incompatible, restore stage should h= ave clear information about</div><div>incompatibility and break resume. Ide= ally it should be able to get informed even before starting</div><div>resto= re process. For this purpose, the new format introduce versions.</div><div>= <div><br></div><br><blockquote type=3D"cite"><div><div><br><blockquote type= =3D"cite">New nvlist implementation should solve all things above. The firs= t step -<br>improve snapshot/checkpoint saving format. It eliminates three = files usage<br>per a snapshot.<br><br>(..)<br></blockquote><br>So this will= be new text config based format with variable =3D value and sections?<br><= /div></div></blockquote><div><br></div>This is NVLIST approach with key=3Dv= alue, where key is string, and value can be</div><div>Integer, array, strin= g, etc.</div><div><br><blockquote type=3D"cite"><div><div><br>How much bigg= er will be the overal file size increase?<br></div></div></blockquote><div>= <br></div>Not so huge. NVLIST internals is well specified. For example, for= my VM</div><div><br></div><div><p style=3D"margin:0px;font-stretch:normal;= font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;f= ont-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:norm= al;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-= settings:normal"><span style=3D"font-variant-ligatures:no-common-ligatures"= >=C2=A0 [kernel]</span></p><p style=3D"margin:0px;font-stretch:normal;font-= size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-k= erning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;fo= nt-variant-numeric:normal;font-variant-east-asian:normal;font-feature-setti= ngs:normal"><span style=3D"font-variant-ligatures:no-common-ligatures">=C2= =A0 =C2=A0 =C2=A0 =C2=A0 kernel.offset =3D 0x11f6 (4598)</span></p><p style= =3D"margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-f= amily:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates= :normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-vari= ant-east-asian:normal;font-feature-settings:normal"><span style=3D"font-var= iant-ligatures:no-common-ligatures">=C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel.size= =3D 0x19a7 (6567)</span></p><p style=3D"margin:0px;font-stretch:normal;fon= t-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font= -kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;= font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-set= tings:normal"><span style=3D"font-variant-ligatures:no-common-ligatures">= =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel.type =3D =E2=80=9Cnvlist"</span></p= ><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:norm= al;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-a= lternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;= font-variant-east-asian:normal;font-feature-settings:normal"><span style=3D= "font-variant-ligatures:no-common-ligatures">=C2=A0 [devices]</span></p><p = style=3D"margin:0px;font-stretch:normal;font-size:12px;line-height:normal;f= ont-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alter= nates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font= -variant-east-asian:normal;font-feature-settings:normal"><span style=3D"fon= t-variant-ligatures:no-common-ligatures">=C2=A0 =C2=A0 =C2=A0 =C2=A0 device= s.offset =3D 0x2b9d (11165)</span></p><p style=3D"margin:0px;font-stretch:n= ormal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:= none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligature= s:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-fe= ature-settings:normal"><span style=3D"font-variant-ligatures:no-common-liga= tures">=C2=A0 =C2=A0 =C2=A0 =C2=A0 devices.size =3D 0x10145ba (16860602)</s= pan></p><p style=3D"margin:0px;font-stretch:normal;font-size:12px;line-heig= ht:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-va= riant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:= normal;font-variant-east-asian:normal;font-feature-settings:normal"><span s= tyle=3D"font-variant-ligatures:no-common-ligatures">=C2=A0 =C2=A0 =C2=A0 = =C2=A0 devices.type =3D =E2=80=9Cnvlist=E2=80=9D</span></p><br></div><div>S= o packed size for <i>kernel</i>=C2=A0 is <font size=3D"2" face=3D"Menlo">65= 67</font> bytes, for <i>devices</i>=C2=A0 is=C2=A0<span style=3D"font-famil= y:Menlo"><font size=3D"2">16860602</font></span> including</div><div>frameb= uffer 16MB. If remove fbuf, packed nvlist devices Section has size=C2=A0<sp= an style=3D"font-family:Menlo"><font size=3D"2">83386</font></span>=C2=A0by= tes.</div><div><br></div><div><br></div><div><blockquote type=3D"cite"><div= ><div><br>How much longer it will take do decode/encode/process files?<br><= /div></div></blockquote><div><br></div><div>It is fast, just several millis= econds. NVLIST is very fast format. It is already integrated</div><div>into= bhyve as Config engine.</div><div><br></div><br><blockquote type=3D"cite">= <div><div><br>What is the possibility of format change and backward/foward = compatibility?<br></div></div></blockquote><div><br></div><div>If you are t= alking about compatibility of a Image format - it should be compatible in</= div><div>both directions, at least for not so big format changes.</div><div= ><br></div><div><div>If consider overall snapshot/resume compatibility - I = believe =C2=A0forward compatibility</div><div>is not case and target. Indee= d, why do you need =C2=A0to resume an image created by</div><div>a higher v= ersion of a program?=C2=A0</div><div><br></div><div>The most important thin= g - backward compatibility, i.e. when an image is created</div><div>by an o= lder version of a program, but should be resumed on a new one.</div></div><= div><br></div><div>This is target and and intention of this improvement.</d= iv><div><br></div><blockquote type=3D"cite"><div><div><br>Have you consider= ed efficiency comparison of current format, proposed<br>format, and maybe u= sing SQLITE or JSON storage/parsers?=C2=A0 For instance<br>sqlite would be = blazingly fast but hard to migrate. json would be most<br>versatile but mor= e time/memory consuming?<br></div></div></blockquote><div><br></div><div>Ye= s, I know about another formats, like JSON or others. NVLIST is the most</d= iv><div>effective and suitable for the current purposes.</div><div><br></di= v><blockquote type=3D"cite"><div><div><br>Maybe EFL approach of storing con= figuration files for limited<br>resources embedded system storage that use = binary storage data but can<br>be decompressed in chunks that can be replac= ed in place?<br><a href=3D"https://www.enlightenment.org/develop/efl/start"= target=3D"_blank">https://www.enlightenment.org/develop/efl/start</a><br><= /div></div></blockquote><div><br></div><div>There are many things that can = be used, but it should be well known, easy, stable,</div><div>fast and supp= ortable. I believe NVLIST is the best choice.</div><br><blockquote type=3D"= cite"><div><div><br>Sorry for asking those questions but there may be alrea= dy good and<br>verified solutions out there not to reinvent the wheel? :-)<= br></div></div></blockquote><div><br></div>Thank you for your questions. If= you would like, you can try to test the new implementation and give feedba= ck.</div><div><br></div><div>=E2=80=94=E2=80=94=E2=80=94</div><div>Vitaliy = Gusev</div><div><br></div></div></div></blockquote></div><br clear=3D"all">= <br><span>-- </span><br><div dir=3D"ltr">Mario.<br></div> --00000000000038930005fc73ea5e--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g>