Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 24 May 2023 19:33:48 +0200
From:      Mario Marietto <marietto2008@gmail.com>
To:        Vitaliy Gusev <gusev.vitaliy@gmail.com>
Cc:        Tomek CEDRO <tomek@cedro.info>, virtualization@freebsd.org,  freebsd-hackers@freebsd.org
Subject:   Re: BHYVE SNAPSHOT image format proposal
Message-ID:  <CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g@mail.gmail.com>
In-Reply-To: <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com>
References:  <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <CAFYkXjng1LWy5wVyTnSo0xrEWOy%2BOx9ZjLcmFqQs5EVpT8J_uA@mail.gmail.com> <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

[-- Attachment #1 --]
@gusev.vitaliy@gmail.com <gusev.vitaliy@gmail.com> : Do you want to explain
to me how to test the new "snapshot" feature ? I'm interested to test and
stress it on my system. Is it ready to be used ?

On Wed, May 24, 2023 at 5:11 PM Vitaliy Gusev <gusev.vitaliy@gmail.com>
wrote:

> Hi Tomek,
>
> Try to answer to the all questions below, please let me know if I miss
> some important.
>
>
> On 23 May 2023, at 21:58, Tomek CEDRO <tomek@cedro.info> wrote:
>
> On Tue, May 23, 2023 at 6:06 PM Vitaliy Gusev wrote:
>
> Hi,
> Here is a proposal for bhyve snapshot/checkpoint image format improvements.
> It implies moving snapshot code to nvlist engine.
>
>
> Hey there Vitaliy :-) bhyve getting more and more traction, I am new
> user of bhyve and no expert, but new and missing features are welcome
> I guess.. there was a discussion on the mailing lists recently on
> better snapshots mechanism :-)
>
>
> Current snapshot implementation has disadvantages:
> 3 files per snapshot: .meta, .kern, vram
>
>
> No problem, unless new single file will be protected against
> corruption (filesystem, transfer, application crash) and possible to
> be easily and cheaply modified in place?
>
>
> Current snapshot implementation doesn’t have it. I would say more, current
> pkg implementation doesn’t track/notify if some of files are changed.
> Binary files on a
> system can be changed, for example ELF files, without any notification.
>
> Tar doesn’t have protection for keeping data.  Some filesystems like ZFS
> guarantee that data is not modified by underlying disks.
>
> Protecting requires more efforts and it should be clearly defined: what is
> purpose. If
> purpose is having checksum with 99.9% reliability, NVLIST HEADER can be
> widen
> to have “checksum” key/value for a Section.
>
> If purpose is having crypto verification - I believe sha256 program should
> be your choice.
>
>
> Binary Stream format of data.
>
>
> This is small and fast? Will new format too?
>
>
> Small is not so perfect. As the first attempt snapshot code is good. But
> if you want to get
> values related to some specific device, for example, for NIC or HPET, you
> cannot get it easily. Please
> try :)
>
> Stream doesn’t have flexibility. It is good for well specified  and long
> long time discussed protocols
> like XDR (NFS), when it has RFC and each position in the stream is
> described. Example: RFC1813.
>
> New format with NVLIST has flexibility and is fast enough. Note, ZFS uses
> nvlist for keeping attributes
> and more another things.
>
>
> Adding  optional variable - breaks resume
> Removing variable - breaks resume
> Changing saved order of variables - breaks resume
>
>
> Obviously need improvement :-)
>
> Hard to get information about what is saved and decode.
> Hard to debug if somethings goes wrong
>
>
> Additional tools missing? Will new format allow text editor interaction?
>
>
> Why do you need modify snapshot image ? Could you describe more? Do you
> modify current 3 snapshot files?
>
>
> No versions. If change code, resume of an old images can be
> passed, but with UB.
>
>
> Is new format future proof and provides backward compatibility?
>
>
> Intention of moving to the new format - to have backward compatibility if
> some code
> is changed:
>
>
>    - Adding optional variable
>    - Removing variable that is not used anymore
>    - Change order of saving variables
>    - “Hot Fixes”.
>
>
> If changes are critical and are incompatible, restore stage should have
> clear information about
> incompatibility and break resume. Ideally it should be able to get
> informed even before starting
> restore process. For this purpose, the new format introduce versions.
>
>
>
> New nvlist implementation should solve all things above. The first step -
> improve snapshot/checkpoint saving format. It eliminates three files usage
> per a snapshot.
>
> (..)
>
>
> So this will be new text config based format with variable = value and
> sections?
>
>
> This is NVLIST approach with key=value, where key is string, and value can
> be
> Integer, array, string, etc.
>
>
> How much bigger will be the overal file size increase?
>
>
> Not so huge. NVLIST internals is well specified. For example, for my VM
>
>   [kernel]
>
>         kernel.offset = 0x11f6 (4598)
>
>         kernel.size = 0x19a7 (6567)
>
>         kernel.type = “nvlist"
>
>   [devices]
>
>         devices.offset = 0x2b9d (11165)
>
>         devices.size = 0x10145ba (16860602)
>
>         devices.type = “nvlist”
>
> So packed size for *kernel*  is 6567 bytes, for *devices*  is 16860602
> including
> framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size
> 83386 bytes.
>
>
>
> How much longer it will take do decode/encode/process files?
>
>
> It is fast, just several milliseconds. NVLIST is very fast format. It is
> already integrated
> into bhyve as Config engine.
>
>
>
> What is the possibility of format change and backward/foward compatibility?
>
>
> If you are talking about compatibility of a Image format - it should be
> compatible in
> both directions, at least for not so big format changes.
>
> If consider overall snapshot/resume compatibility - I believe  forward
> compatibility
> is not case and target. Indeed, why do you need  to resume an image
> created by
> a higher version of a program?
>
> The most important thing - backward compatibility, i.e. when an image is
> created
> by an older version of a program, but should be resumed on a new one.
>
> This is target and and intention of this improvement.
>
>
> Have you considered efficiency comparison of current format, proposed
> format, and maybe using SQLITE or JSON storage/parsers?  For instance
> sqlite would be blazingly fast but hard to migrate. json would be most
> versatile but more time/memory consuming?
>
>
> Yes, I know about another formats, like JSON or others. NVLIST is the most
> effective and suitable for the current purposes.
>
>
> Maybe EFL approach of storing configuration files for limited
> resources embedded system storage that use binary storage data but can
> be decompressed in chunks that can be replaced in place?
> https://www.enlightenment.org/develop/efl/start
>
>
> There are many things that can be used, but it should be well known, easy,
> stable,
> fast and supportable. I believe NVLIST is the best choice.
>
>
> Sorry for asking those questions but there may be already good and
> verified solutions out there not to reinvent the wheel? :-)
>
>
> Thank you for your questions. If you would like, you can try to test the
> new implementation and give feedback.
>
> ———
> Vitaliy Gusev
>
>

-- 
Mario.

[-- Attachment #2 --]
<div dir="ltr"><div><a class="gmail_plusreply" id="plusReplyChip-1" href="mailto:gusev.vitaliy@gmail.com" tabindex="-1">@gusev.vitaliy@gmail.com</a> : Do you want to explain to me how to test the new &quot;snapshot&quot; feature ? I&#39;m interested to test and stress it on my system. Is it ready to be used ?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, May 24, 2023 at 5:11 PM Vitaliy Gusev &lt;<a href="mailto:gusev.vitaliy@gmail.com" target="_blank">gusev.vitaliy@gmail.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div>Hi Tomek,<div><br></div><div>Try to answer to the all questions below, please let me know if I miss some important.</div><div><br></div><div><div><br><blockquote type="cite"><div>On 23 May 2023, at 21:58, Tomek CEDRO &lt;<a href="mailto:tomek@cedro.info" target="_blank">tomek@cedro.info</a>&gt; wrote:</div><br><div><div>On Tue, May 23, 2023 at 6:06 PM Vitaliy Gusev wrote:<br><blockquote type="cite">Hi,<br>Here is a proposal for bhyve snapshot/checkpoint image format improvements.<br>It implies moving snapshot code to nvlist engine.<br></blockquote><br>Hey there Vitaliy :-) bhyve getting more and more traction, I am new<br>user of bhyve and no expert, but new and missing features are welcome<br>I guess.. there was a discussion on the mailing lists recently on<br>better snapshots mechanism :-)<br><br><br><blockquote type="cite">Current snapshot implementation has disadvantages:<br>3 files per snapshot: .meta, .kern, vram<br></blockquote><br>No problem, unless new single file will be protected against<br>corruption (filesystem, transfer, application crash) and possible to<br>be easily and cheaply modified in place?<br></div></div></blockquote><div><br></div><div>Current snapshot implementation doesn’t have it. I would say more, current</div><div>pkg implementation doesn’t track/notify if some of files are changed.  Binary files on a</div><div>system can be changed, for example ELF files, without any notification.</div><div><div><br></div><div>Tar doesn’t have protection for keeping data.  Some filesystems like ZFS</div><div>guarantee that data is not modified by underlying disks.</div></div><div><br></div><div>Protecting requires more efforts and it should be clearly defined: what is purpose. If</div><div>purpose is having checksum with 99.9% <span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap">reliability, NVLIST HEADER can be widen</span></font></span></div><div><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap">to have “checksum” key/value for a Section.</span></font></span></div><div><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap"><br></span></font></span></div><div><span style="background-color:rgb(255,255,255)"><font face="arial, sans-serif" color="#202124"><span style="white-space:nowrap">If purpose is having crypto verification - I believe sha256 program should be your choice.</span></font></span></div><div><br></div><blockquote type="cite"><div><div><br><blockquote type="cite">Binary Stream format of data.<br></blockquote><br>This is small and fast? Will new format too?<br></div></div></blockquote><div><br></div>Small is not so perfect. As the first attempt snapshot code is good. But if you want to get</div><div>values related to some specific device, for example, for NIC or HPET, you cannot get it easily. Please</div><div>try :)</div><div><div><br></div><div>Stream doesn’t have flexibility. It is good for well specified  and long long time discussed protocols</div><div>like XDR (NFS), when it has RFC and each position in the stream is described. Example: RFC1813.</div><div><br></div><div>New format with NVLIST has flexibility and is fast enough. Note, ZFS uses nvlist for keeping attributes </div><div>and more another things.</div><div><br></div><div><br></div><blockquote type="cite"><div><div><blockquote type="cite">Adding  optional variable - breaks resume<br>Removing variable - breaks resume<br>Changing saved order of variables - breaks resume<br></blockquote><br>Obviously need improvement :-)<br><br><blockquote type="cite">Hard to get information about what is saved and decode.<br>Hard to debug if somethings goes wrong<br></blockquote><br>Additional tools missing? Will new format allow text editor interaction?<br></div></div></blockquote><div><br></div>Why do you need modify snapshot image ? Could you describe more? Do you</div><div>modify current 3 snapshot files?</div><div><br><br><blockquote type="cite"><div><div><blockquote type="cite">No versions. If change code, resume of an old images can be<br>passed, but with UB.<br></blockquote><br>Is new format future proof and provides backward compatibility?<br></div></div></blockquote><div><br></div>Intention of moving to the new format - to have backward compatibility if some code</div><div>is changed:</div></div><blockquote style="margin:0px 0px 0px 40px;border:medium none;padding:0px"><div><ul><li>Adding optional variable </li><li>Removing variable that is not used anymore</li><li>Change order of saving variables</li><li>“Hot Fixes”.</li></ul></div></blockquote><div><div><br></div><div>If changes are critical and are incompatible, restore stage should have clear information about</div><div>incompatibility and break resume. Ideally it should be able to get informed even before starting</div><div>restore process. For this purpose, the new format introduce versions.</div><div><div><br></div><br><blockquote type="cite"><div><div><br><blockquote type="cite">New nvlist implementation should solve all things above. The first step -<br>improve snapshot/checkpoint saving format. It eliminates three files usage<br>per a snapshot.<br><br>(..)<br></blockquote><br>So this will be new text config based format with variable = value and sections?<br></div></div></blockquote><div><br></div>This is NVLIST approach with key=value, where key is string, and value can be</div><div>Integer, array, string, etc.</div><div><br><blockquote type="cite"><div><div><br>How much bigger will be the overal file size increase?<br></div></div></blockquote><div><br></div>Not so huge. NVLIST internals is well specified. For example, for my VM</div><div><br></div><div><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">  [kernel]</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">        kernel.offset = 0x11f6 (4598)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">        kernel.size = 0x19a7 (6567)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">        kernel.type = “nvlist&quot;</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">  [devices]</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">        devices.offset = 0x2b9d (11165)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">        devices.size = 0x10145ba (16860602)</span></p><p style="margin:0px;font-stretch:normal;font-size:12px;line-height:normal;font-family:Menlo;font-size-adjust:none;font-kerning:auto;font-variant-alternates:normal;font-variant-ligatures:normal;font-variant-numeric:normal;font-variant-east-asian:normal;font-feature-settings:normal"><span style="font-variant-ligatures:no-common-ligatures">        devices.type = “nvlist”</span></p><br></div><div>So packed size for <i>kernel</i>  is <font size="2" face="Menlo">6567</font> bytes, for <i>devices</i>  is <span style="font-family:Menlo"><font size="2">16860602</font></span> including</div><div>framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size <span style="font-family:Menlo"><font size="2">83386</font></span> bytes.</div><div><br></div><div><br></div><div><blockquote type="cite"><div><div><br>How much longer it will take do decode/encode/process files?<br></div></div></blockquote><div><br></div><div>It is fast, just several milliseconds. NVLIST is very fast format. It is already integrated</div><div>into bhyve as Config engine.</div><div><br></div><br><blockquote type="cite"><div><div><br>What is the possibility of format change and backward/foward compatibility?<br></div></div></blockquote><div><br></div><div>If you are talking about compatibility of a Image format - it should be compatible in</div><div>both directions, at least for not so big format changes.</div><div><br></div><div><div>If consider overall snapshot/resume compatibility - I believe  forward compatibility</div><div>is not case and target. Indeed, why do you need  to resume an image created by</div><div>a higher version of a program? </div><div><br></div><div>The most important thing - backward compatibility, i.e. when an image is created</div><div>by an older version of a program, but should be resumed on a new one.</div></div><div><br></div><div>This is target and and intention of this improvement.</div><div><br></div><blockquote type="cite"><div><div><br>Have you considered efficiency comparison of current format, proposed<br>format, and maybe using SQLITE or JSON storage/parsers?  For instance<br>sqlite would be blazingly fast but hard to migrate. json would be most<br>versatile but more time/memory consuming?<br></div></div></blockquote><div><br></div><div>Yes, I know about another formats, like JSON or others. NVLIST is the most</div><div>effective and suitable for the current purposes.</div><div><br></div><blockquote type="cite"><div><div><br>Maybe EFL approach of storing configuration files for limited<br>resources embedded system storage that use binary storage data but can<br>be decompressed in chunks that can be replaced in place?<br><a href="https://www.enlightenment.org/develop/efl/start" target="_blank">https://www.enlightenment.org/develop/efl/start</a><br></div></div></blockquote><div><br></div><div>There are many things that can be used, but it should be well known, easy, stable,</div><div>fast and supportable. I believe NVLIST is the best choice.</div><br><blockquote type="cite"><div><div><br>Sorry for asking those questions but there may be already good and<br>verified solutions out there not to reinvent the wheel? :-)<br></div></div></blockquote><div><br></div>Thank you for your questions. If you would like, you can try to test the new implementation and give feedback.</div><div><br></div><div>———</div><div>Vitaliy Gusev</div><div><br></div></div></div></blockquote></div><br clear="all"><br><span>-- </span><br><div dir="ltr">Mario.<br></div>

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CA%2B1FSii9vzi-1TpCGxJBPd54HUZS0_Xa0t%2BCo5QneG2kW-fX1g>