Date: Wed, 24 May 2023 19:33:48 +0200 From: Mario Marietto <marietto2008@gmail.com> To: Vitaliy Gusev <gusev.vitaliy@gmail.com> Cc: Tomek CEDRO <tomek@cedro.info>, virtualization@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: BHYVE SNAPSHOT image format proposal Message-ID: <CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g@mail.gmail.com> In-Reply-To: <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com> References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <CAFYkXjng1LWy5wVyTnSo0xrEWOy%2BOx9ZjLcmFqQs5EVpT8J_uA@mail.gmail.com> <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] @gusev.vitaliy@gmail.com <gusev.vitaliy@gmail.com> : Do you want to explain to me how to test the new "snapshot" feature ? I'm interested to test and stress it on my system. Is it ready to be used ? On Wed, May 24, 2023 at 5:11 PM Vitaliy Gusev <gusev.vitaliy@gmail.com> wrote: > Hi Tomek, > > Try to answer to the all questions below, please let me know if I miss > some important. > > > On 23 May 2023, at 21:58, Tomek CEDRO <tomek@cedro.info> wrote: > > On Tue, May 23, 2023 at 6:06 PM Vitaliy Gusev wrote: > > Hi, > Here is a proposal for bhyve snapshot/checkpoint image format improvements. > It implies moving snapshot code to nvlist engine. > > > Hey there Vitaliy :-) bhyve getting more and more traction, I am new > user of bhyve and no expert, but new and missing features are welcome > I guess.. there was a discussion on the mailing lists recently on > better snapshots mechanism :-) > > > Current snapshot implementation has disadvantages: > 3 files per snapshot: .meta, .kern, vram > > > No problem, unless new single file will be protected against > corruption (filesystem, transfer, application crash) and possible to > be easily and cheaply modified in place? > > > Current snapshot implementation doesn’t have it. I would say more, current > pkg implementation doesn’t track/notify if some of files are changed. > Binary files on a > system can be changed, for example ELF files, without any notification. > > Tar doesn’t have protection for keeping data. Some filesystems like ZFS > guarantee that data is not modified by underlying disks. > > Protecting requires more efforts and it should be clearly defined: what is > purpose. If > purpose is having checksum with 99.9% reliability, NVLIST HEADER can be > widen > to have “checksum” key/value for a Section. > > If purpose is having crypto verification - I believe sha256 program should > be your choice. > > > Binary Stream format of data. > > > This is small and fast? Will new format too? > > > Small is not so perfect. As the first attempt snapshot code is good. But > if you want to get > values related to some specific device, for example, for NIC or HPET, you > cannot get it easily. Please > try :) > > Stream doesn’t have flexibility. It is good for well specified and long > long time discussed protocols > like XDR (NFS), when it has RFC and each position in the stream is > described. Example: RFC1813. > > New format with NVLIST has flexibility and is fast enough. Note, ZFS uses > nvlist for keeping attributes > and more another things. > > > Adding optional variable - breaks resume > Removing variable - breaks resume > Changing saved order of variables - breaks resume > > > Obviously need improvement :-) > > Hard to get information about what is saved and decode. > Hard to debug if somethings goes wrong > > > Additional tools missing? Will new format allow text editor interaction? > > > Why do you need modify snapshot image ? Could you describe more? Do you > modify current 3 snapshot files? > > > No versions. If change code, resume of an old images can be > passed, but with UB. > > > Is new format future proof and provides backward compatibility? > > > Intention of moving to the new format - to have backward compatibility if > some code > is changed: > > > - Adding optional variable > - Removing variable that is not used anymore > - Change order of saving variables > - “Hot Fixes”. > > > If changes are critical and are incompatible, restore stage should have > clear information about > incompatibility and break resume. Ideally it should be able to get > informed even before starting > restore process. For this purpose, the new format introduce versions. > > > > New nvlist implementation should solve all things above. The first step - > improve snapshot/checkpoint saving format. It eliminates three files usage > per a snapshot. > > (..) > > > So this will be new text config based format with variable = value and > sections? > > > This is NVLIST approach with key=value, where key is string, and value can > be > Integer, array, string, etc. > > > How much bigger will be the overal file size increase? > > > Not so huge. NVLIST internals is well specified. For example, for my VM > > [kernel] > > kernel.offset = 0x11f6 (4598) > > kernel.size = 0x19a7 (6567) > > kernel.type = “nvlist" > > [devices] > > devices.offset = 0x2b9d (11165) > > devices.size = 0x10145ba (16860602) > > devices.type = “nvlist” > > So packed size for *kernel* is 6567 bytes, for *devices* is 16860602 > including > framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size > 83386 bytes. > > > > How much longer it will take do decode/encode/process files? > > > It is fast, just several milliseconds. NVLIST is very fast format. It is > already integrated > into bhyve as Config engine. > > > > What is the possibility of format change and backward/foward compatibility? > > > If you are talking about compatibility of a Image format - it should be > compatible in > both directions, at least for not so big format changes. > > If consider overall snapshot/resume compatibility - I believe forward > compatibility > is not case and target. Indeed, why do you need to resume an image > created by > a higher version of a program? > > The most important thing - backward compatibility, i.e. when an image is > created > by an older version of a program, but should be resumed on a new one. > > This is target and and intention of this improvement. > > > Have you considered efficiency comparison of current format, proposed > format, and maybe using SQLITE or JSON storage/parsers? For instance > sqlite would be blazingly fast but hard to migrate. json would be most > versatile but more time/memory consuming? > > > Yes, I know about another formats, like JSON or others. NVLIST is the most > effective and suitable for the current purposes. > > > Maybe EFL approach of storing configuration files for limited > resources embedded system storage that use binary storage data but can > be decompressed in chunks that can be replaced in place? > https://www.enlightenment.org/develop/efl/start > > > There are many things that can be used, but it should be well known, easy, > stable, > fast and supportable. I believe NVLIST is the best choice. > > > Sorry for asking those questions but there may be already good and > verified solutions out there not to reinvent the wheel? :-) > > > Thank you for your questions. If you would like, you can try to test the > new implementation and give feedback. > > ——— > Vitaliy Gusev > > -- Mario. [-- Attachment #2 --] <div dir="ltr"><div><a class="gmail_plusreply" id="plusReplyChip-1" href="mailto:gusev.vitaliy@gmail.com" tabindex="-1">@gusev.vitaliy@gmail.com</a> : Do you want to explain to me how to test the new "snapshot" feature ? I'm interested to test and stress it on my system. Is it ready to be used ?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 24, 2023 at 5:11 PM Vitaliy Gusev <<a href="mailto:gusev.vitaliy@gmail.com" target="_blank">gusev.vitaliy@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hi Tomek,<div><br></div><div>Try to answer to the all questions below, please let me know if I miss some important.</div><div><br></div><div><div><br><blockquote type="cite"><div>On 23 May 2023, at 21:58, Tomek CEDRO <<a href="mailto:tomek@cedro.info" target="_blank">tomek@cedro.info</a>> wrote:</div><br><div><div>On Tue, May 23, 2023 at 6:06 PM Vitaliy Gusev wrote:<br><blockquote type="cite">Hi,<br>Here is a proposal for bhyve snapshot/checkpoint image format improvements.<br>It implies moving snapshot code to nvlist engine.<br></blockquote><br>Hey there Vitaliy :-) bhyve getting more and more traction, I am new<br>user of bhyve and no expert, but new and missing features are welcome<br>I guess.. there was a discussion on the mailing lists recently on<br>better snapshots mechanism :-)<br><br><br><blockquote type="cite">Current snapshot implementation has disadvantages:<br>3 files per snapshot: .meta, .kern, vram<br></blockquote><br>No problem, unless new single file will be protected against<br>corruption (filesystem, transfer, application crash) and possible to<br>be easily and cheaply modified in place?<br></div></div></blockquote><div><br></div><div>Current snapshot implementation doesn’t have it. I would say more, current</div><div>pkg implementation doesn’t track/notify if some of files are changed. Binary files on a</div><div>system can be changed, for example ELF files, without any notification.</div><div><div><br></div><div>Tar doesn’t have protection for keeping data. Some filesystems like ZFS</div><div>guarantee that data is not modified by underlying disks.</div></div><div><br></div><div>Protecting requires more efforts and it should be clearly defined: what is purpose. If</div><div>purpose is having checksum with 99.9% <span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap">reliability, NVLIST HEADER can be widen</span></font></span></div><div><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap">to have “checksum” key/value for a Section.</span></font></span></div><div><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap"><br></span></font></span></div><div><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap">If purpose is having crypto verification - I believe sha256 program should be your choice.</span></font></span></div><div><br></div><blockquote type="cite"><div><div><br><blockquote type="cite">Binary Stream format of data.<br></blockquote><br>This is small and fast? Will new format too?<br></div></div></blockquote><div><br></div>Small is not so perfect. As the first attempt snapshot code is good. But if you want to get</div><div>values related to some specific device, for example, for NIC or HPET, you cannot get it easily. Please</div><div>try :)</div><div><div><br></div><div>Stream doesn’t have flexibility. It is good for well specified and long long time discussed protocols</div><div>like XDR (NFS), when it has RFC and each position in the stream is described. Example: RFC1813.</div><div><br></div><div>New format with NVLIST has flexibility and is fast enough. Note, ZFS uses nvlist for keeping attributes </div><div>and more another things.</div><div><br></div><div><br></div><blockquote type="cite"><div><div><blockquote type="cite">Adding optional variable - breaks resume<br>Removing variable - breaks resume<br>Changing saved order of variables - breaks resume<br></blockquote><br>Obviously need improvement :-)<br><br><blockquote type="cite">Hard to get information about what is saved and decode.<br>Hard to debug if somethings goes wrong<br></blockquote><br>Additional tools missing? Will new format allow text editor interaction?<br></div></div></blockquote><div><br></div>Why do you need modify snapshot image ? Could you describe more? Do you</div><div>modify current 3 snapshot files?</div><div><br><br><blockquote type="cite"><div><div><blockquote type="cite">No versions. If change code, resume of an old images can be<br>passed, but with UB.<br></blockquote><br>Is new format future proof and provides backward compatibility?<br></div></div></blockquote><div><br></div>Intention of moving to the new format - to have backward compatibility if some code</div><div>is changed:</div></div><blockquote style="margin:0px 0px 0px 40px;border:medium none;padding:0px"><div><ul><li>Adding optional variable </li><li>Removing variable that is not used anymore</li><li>Change order of saving variables</li><li>“Hot Fixes”.</li></ul></div></blockquote><div><div><br></div><div>If changes are critical and are incompatible, restore stage should have clear information about</div><div>incompatibility and break resume. Ideally it should be able to get informed even before starting</div><div>restore process. For this purpose, the new format introduce versions.</div><div><div><br></div><br><blockquote type="cite"><div><div><br><blockquote type="cite">New nvlist implementation should solve all things above. The first step -<br>improve snapshot/checkpoint saving format. It eliminates three files usage<br>per a snapshot.<br><br>(..)<br></blockquote><br>So this will be new text config based format with variable = value and sections?<br></div></div></blockquote><div><br></div>This is NVLIST approach with key=value, where key is string, and value can be</div><div>Integer, array, string, etc.</div><div><br><blockquote type="cite"><div><div><br>How much bigger will be the overal file size increase?<br></div></div></blockquote><div><br></div>Not so huge. NVLIST internals is well specified. For example, for my VM</div><div><br></div><div><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> [kernel]</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> kernel.offset = 0x11f6 (4598)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> kernel.size = 0x19a7 (6567)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> kernel.type = “nvlist"</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> [devices]</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> devices.offset = 0x2b9d (11165)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> devices.size = 0x10145ba (16860602)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures"> devices.type = “nvlist”</span></p><br></div><div>So packed size for <i>kernel</i> is <font size="2" face="Menlo">6567</font> bytes, for <i>devices</i> is <span style="font-family:Menlo"><font size="2">16860602</font></span> including</div><div>framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size <span style="font-family:Menlo"><font size="2">83386</font></span> bytes.</div><div><br></div><div><br></div><div><blockquote type="cite"><div><div><br>How much longer it will take do decode/encode/process files?<br></div></div></blockquote><div><br></div><div>It is fast, just several milliseconds. NVLIST is very fast format. It is already integrated</div><div>into bhyve as Config engine.</div><div><br></div><br><blockquote type="cite"><div><div><br>What is the possibility of format change and backward/foward compatibility?<br></div></div></blockquote><div><br></div><div>If you are talking about compatibility of a Image format - it should be compatible in</div><div>both directions, at least for not so big format changes.</div><div><br></div><div><div>If consider overall snapshot/resume compatibility - I believe forward compatibility</div><div>is not case and target. Indeed, why do you need to resume an image created by</div><div>a higher version of a program? </div><div><br></div><div>The most important thing - backward compatibility, i.e. when an image is created</div><div>by an older version of a program, but should be resumed on a new one.</div></div><div><br></div><div>This is target and and intention of this improvement.</div><div><br></div><blockquote type="cite"><div><div><br>Have you considered efficiency comparison of current format, proposed<br>format, and maybe using SQLITE or JSON storage/parsers? For instance<br>sqlite would be blazingly fast but hard to migrate. json would be most<br>versatile but more time/memory consuming?<br></div></div></blockquote><div><br></div><div>Yes, I know about another formats, like JSON or others. NVLIST is the most</div><div>effective and suitable for the current purposes.</div><div><br></div><blockquote type="cite"><div><div><br>Maybe EFL approach of storing configuration files for limited<br>resources embedded system storage that use binary storage data but can<br>be decompressed in chunks that can be replaced in place?<br><a href="https://www.enlightenment.org/develop/efl/start" target="_blank">https://www.enlightenment.org/develop/efl/start</a><br></div></div></blockquote><div><br></div><div>There are many things that can be used, but it should be well known, easy, stable,</div><div>fast and supportable. I believe NVLIST is the best choice.</div><br><blockquote type="cite"><div><div><br>Sorry for asking those questions but there may be already good and<br>verified solutions out there not to reinvent the wheel? :-)<br></div></div></blockquote><div><br></div>Thank you for your questions. If you would like, you can try to test the new implementation and give feedback.</div><div><br></div><div>———</div><div>Vitaliy Gusev</div><div><br></div></div></div></blockquote></div><br clear="all"><br><span>-- </span><br><div dir="ltr">Mario.<br></div>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g>
