Date: Wed, 24 May 2023 18:10:49 +0300 From: Vitaliy Gusev <gusev.vitaliy@gmail.com> To: Tomek CEDRO <tomek@cedro.info> Cc: virtualization@freebsd.org, freebsd-hackers@freebsd.org Subject: Re: BHYVE SNAPSHOT image format proposal Message-ID: <AF34E648-2D8A-46C7-82A5-B88006BBB8F6@gmail.com> In-Reply-To: <CAFYkXjng1LWy5wVyTnSo0xrEWOy%2BOx9ZjLcmFqQs5EVpT8J_uA@mail.gmail.com> References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> <CAFYkXjng1LWy5wVyTnSo0xrEWOy%2BOx9ZjLcmFqQs5EVpT8J_uA@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
Hi Tomek,
Try to answer to the all questions below, please let me know if I miss some important.
> On 23 May 2023, at 21:58, Tomek CEDRO <tomek@cedro.info> wrote:
>
> On Tue, May 23, 2023 at 6:06 PM Vitaliy Gusev wrote:
>> Hi,
>> Here is a proposal for bhyve snapshot/checkpoint image format improvements.
>> It implies moving snapshot code to nvlist engine.
>
> Hey there Vitaliy :-) bhyve getting more and more traction, I am new
> user of bhyve and no expert, but new and missing features are welcome
> I guess.. there was a discussion on the mailing lists recently on
> better snapshots mechanism :-)
>
>
>> Current snapshot implementation has disadvantages:
>> 3 files per snapshot: .meta, .kern, vram
>
> No problem, unless new single file will be protected against
> corruption (filesystem, transfer, application crash) and possible to
> be easily and cheaply modified in place?
Current snapshot implementation doesn’t have it. I would say more, current
pkg implementation doesn’t track/notify if some of files are changed. Binary files on a
system can be changed, for example ELF files, without any notification.
Tar doesn’t have protection for keeping data. Some filesystems like ZFS
guarantee that data is not modified by underlying disks.
Protecting requires more efforts and it should be clearly defined: what is purpose. If
purpose is having checksum with 99.9% reliability, NVLIST HEADER can be widen
to have “checksum” key/value for a Section.
If purpose is having crypto verification - I believe sha256 program should be your choice.
>
>> Binary Stream format of data.
>
> This is small and fast? Will new format too?
Small is not so perfect. As the first attempt snapshot code is good. But if you want to get
values related to some specific device, for example, for NIC or HPET, you cannot get it easily. Please
try :)
Stream doesn’t have flexibility. It is good for well specified and long long time discussed protocols
like XDR (NFS), when it has RFC and each position in the stream is described. Example: RFC1813.
New format with NVLIST has flexibility and is fast enough. Note, ZFS uses nvlist for keeping attributes
and more another things.
>> Adding optional variable - breaks resume
>> Removing variable - breaks resume
>> Changing saved order of variables - breaks resume
>
> Obviously need improvement :-)
>
>> Hard to get information about what is saved and decode.
>> Hard to debug if somethings goes wrong
>
> Additional tools missing? Will new format allow text editor interaction?
Why do you need modify snapshot image ? Could you describe more? Do you
modify current 3 snapshot files?
>> No versions. If change code, resume of an old images can be
>> passed, but with UB.
>
> Is new format future proof and provides backward compatibility?
Intention of moving to the new format - to have backward compatibility if some code
is changed:
Adding optional variable
Removing variable that is not used anymore
Change order of saving variables
“Hot Fixes”.
If changes are critical and are incompatible, restore stage should have clear information about
incompatibility and break resume. Ideally it should be able to get informed even before starting
restore process. For this purpose, the new format introduce versions.
>
>> New nvlist implementation should solve all things above. The first step -
>> improve snapshot/checkpoint saving format. It eliminates three files usage
>> per a snapshot.
>>
>> (..)
>
> So this will be new text config based format with variable = value and sections?
This is NVLIST approach with key=value, where key is string, and value can be
Integer, array, string, etc.
>
> How much bigger will be the overal file size increase?
Not so huge. NVLIST internals is well specified. For example, for my VM
[kernel]
kernel.offset = 0x11f6 (4598)
kernel.size = 0x19a7 (6567)
kernel.type = “nvlist"
[devices]
devices.offset = 0x2b9d (11165)
devices.size = 0x10145ba (16860602)
devices.type = “nvlist”
So packed size for kernel is 6567 bytes, for devices is 16860602 including
framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size 83386 bytes.
>
> How much longer it will take do decode/encode/process files?
It is fast, just several milliseconds. NVLIST is very fast format. It is already integrated
into bhyve as Config engine.
>
> What is the possibility of format change and backward/foward compatibility?
If you are talking about compatibility of a Image format - it should be compatible in
both directions, at least for not so big format changes.
If consider overall snapshot/resume compatibility - I believe forward compatibility
is not case and target. Indeed, why do you need to resume an image created by
a higher version of a program?
The most important thing - backward compatibility, i.e. when an image is created
by an older version of a program, but should be resumed on a new one.
This is target and and intention of this improvement.
>
> Have you considered efficiency comparison of current format, proposed
> format, and maybe using SQLITE or JSON storage/parsers? For instance
> sqlite would be blazingly fast but hard to migrate. json would be most
> versatile but more time/memory consuming?
Yes, I know about another formats, like JSON or others. NVLIST is the most
effective and suitable for the current purposes.
>
> Maybe EFL approach of storing configuration files for limited
> resources embedded system storage that use binary storage data but can
> be decompressed in chunks that can be replaced in place?
> https://www.enlightenment.org/develop/efl/start
There are many things that can be used, but it should be well known, easy, stable,
fast and supportable. I believe NVLIST is the best choice.
>
> Sorry for asking those questions but there may be already good and
> verified solutions out there not to reinvent the wheel? :-)
Thank you for your questions. If you would like, you can try to test the new implementation and give feedback.
———
Vitaliy Gusev
[-- Attachment #2 --]
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;">Hi Tomek,<div><br></div><div>Try to answer to the all questions below, please let me know if I miss some important.</div><div><br></div><div><div><br><blockquote type="cite"><div>On 23 May 2023, at 21:58, Tomek CEDRO <tomek@cedro.info> wrote:</div><br class="Apple-interchange-newline"><div><div>On Tue, May 23, 2023 at 6:06 PM Vitaliy Gusev wrote:<br><blockquote type="cite">Hi,<br>Here is a proposal for bhyve snapshot/checkpoint image format improvements.<br>It implies moving snapshot code to nvlist engine.<br></blockquote><br>Hey there Vitaliy :-) bhyve getting more and more traction, I am new<br>user of bhyve and no expert, but new and missing features are welcome<br>I guess.. there was a discussion on the mailing lists recently on<br>better snapshots mechanism :-)<br><br><br><blockquote type="cite">Current snapshot implementation has disadvantages:<br>3 files per snapshot: .meta, .kern, vram<br></blockquote><br>No problem, unless new single file will be protected against<br>corruption (filesystem, transfer, application crash) and possible to<br>be easily and cheaply modified in place?<br></div></div></blockquote><div><br></div><div>Current snapshot implementation doesn’t have it. I would say more, current</div><div>pkg implementation doesn’t track/notify if some of files are changed. Binary files on a</div><div>system can be changed, for example ELF files, without any notification.</div><div><div><br></div><div>Tar doesn’t have protection for keeping data. Some filesystems like ZFS</div><div>guarantee that data is not modified by underlying disks.</div></div><div><br></div><div>Protecting requires more efforts and it should be clearly defined: what is purpose. If</div><div>purpose is having checksum with 99.9% <span style="background-color: rgb(255, 255, 255);"><font color="#202124" face="arial, sans-serif"><span style="caret-color: rgb(32, 33, 36); white-space: nowrap;">reliability, NVLIST HEADER can be widen</span></font></span></div><div><span style="background-color: rgb(255, 255, 255);"><font color="#202124" face="arial, sans-serif"><span style="white-space: nowrap;">to have “checksum” key/value for a Section.</span></font></span></div><div><span style="background-color: rgb(255, 255, 255);"><font color="#202124" face="arial, sans-serif"><span style="white-space: nowrap;"><br></span></font></span></div><div><span style="background-color: rgb(255, 255, 255);"><font color="#202124" face="arial, sans-serif"><span style="white-space: nowrap;">If purpose is having crypto verification - I believe sha256 program should be your choice.</span></font></span></div><div><br></div><blockquote type="cite"><div><div><br><blockquote type="cite">Binary Stream format of data.<br></blockquote><br>This is small and fast? Will new format too?<br></div></div></blockquote><div><br></div>Small is not so perfect. As the first attempt snapshot code is good. But if you want to get</div><div>values related to some specific device, for example, for NIC or HPET, you cannot get it easily. Please</div><div>try :)</div><div><div><br></div><div>Stream doesn’t have flexibility. It is good for well specified and long long time discussed protocols</div><div>like XDR (NFS), when it has RFC and each position in the stream is described. Example: RFC1813.</div><div><br></div><div>New format with NVLIST has flexibility and is fast enough. Note, ZFS uses nvlist for keeping attributes </div><div>and more another things.</div><div><br></div><div><br></div><blockquote type="cite"><div><div><blockquote type="cite">Adding optional variable - breaks resume<br>Removing variable - breaks resume<br>Changing saved order of variables - breaks resume<br></blockquote><br>Obviously need improvement :-)<br><br><blockquote type="cite">Hard to get information about what is saved and decode.<br>Hard to debug if somethings goes wrong<br></blockquote><br>Additional tools missing? Will new format allow text editor interaction?<br></div></div></blockquote><div><br></div>Why do you need modify snapshot image ? Could you describe more? Do you</div><div>modify current 3 snapshot files?</div><div><br><br><blockquote type="cite"><div><div><blockquote type="cite">No versions. If change code, resume of an old images can be<br>passed, but with UB.<br></blockquote><br>Is new format future proof and provides backward compatibility?<br></div></div></blockquote><div><br></div>Intention of moving to the new format - to have backward compatibility if some code</div><div>is changed:</div></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><ul class="MailOutline"><li>Adding optional variable </li><li>Removing variable that is not used anymore</li><li>Change order of saving variables</li><li>“Hot Fixes”.</li></ul></div></blockquote><div><div><br></div><div>If changes are critical and are incompatible, restore stage should have clear information about</div><div>incompatibility and break resume. Ideally it should be able to get informed even before starting</div><div>restore process. For this purpose, the new format introduce versions.</div><div><div><br></div><br><blockquote type="cite"><div><div><br><blockquote type="cite">New nvlist implementation should solve all things above. The first step -<br>improve snapshot/checkpoint saving format. It eliminates three files usage<br>per a snapshot.<br><br>(..)<br></blockquote><br>So this will be new text config based format with variable = value and sections?<br></div></div></blockquote><div><br></div>This is NVLIST approach with key=value, where key is string, and value can be</div><div>Integer, array, string, etc.</div><div><br><blockquote type="cite"><div><div><br>How much bigger will be the overal file size increase?<br></div></div></blockquote><div><br></div>Not so huge. NVLIST internals is well specified. For example, for my VM</div><div><br></div><div><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> [kernel]</span></p><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> kernel.offset = 0x11f6 (4598)</span></p><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> kernel.size = 0x19a7 (6567)</span></p><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> kernel.type = “nvlist"</span></p><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> [devices]</span></p><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> devices.offset = 0x2b9d (11165)</span></p><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> devices.size = 0x10145ba (16860602)</span></p><p style="margin: 0px; font-stretch: normal; font-size: 12px; line-height: normal; font-family: Menlo; font-size-adjust: none; font-kerning: auto; font-variant-alternates: normal; font-variant-ligatures: normal; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-position: normal; font-feature-settings: normal; font-optical-sizing: auto; font-variation-settings: normal;"><span style="font-variant-ligatures: no-common-ligatures;"> devices.type = “nvlist”</span></p><br></div><div>So packed size for <i>kernel</i> is <font face="Menlo" size="2">6567</font> bytes, for <i>devices</i> is <span style="font-family: Menlo;"><font size="2">16860602</font></span> including</div><div>framebuffer 16MB. If remove fbuf, packed nvlist devices Section has size <span style="font-family: Menlo;"><font size="2">83386</font></span> bytes.</div><div><br></div><div><br></div><div><blockquote type="cite"><div><div><br>How much longer it will take do decode/encode/process files?<br></div></div></blockquote><div><br></div><div>It is fast, just several milliseconds. NVLIST is very fast format. It is already integrated</div><div>into bhyve as Config engine.</div><div><br></div><br><blockquote type="cite"><div><div><br>What is the possibility of format change and backward/foward compatibility?<br></div></div></blockquote><div><br></div><div>If you are talking about compatibility of a Image format - it should be compatible in</div><div>both directions, at least for not so big format changes.</div><div><br></div><div><div>If consider overall snapshot/resume compatibility - I believe forward compatibility</div><div>is not case and target. Indeed, why do you need to resume an image created by</div><div>a higher version of a program? </div><div><br></div><div>The most important thing - backward compatibility, i.e. when an image is created</div><div>by an older version of a program, but should be resumed on a new one.</div></div><div><br></div><div>This is target and and intention of this improvement.</div><div><br></div><blockquote type="cite"><div><div><br>Have you considered efficiency comparison of current format, proposed<br>format, and maybe using SQLITE or JSON storage/parsers? For instance<br>sqlite would be blazingly fast but hard to migrate. json would be most<br>versatile but more time/memory consuming?<br></div></div></blockquote><div><br></div><div>Yes, I know about another formats, like JSON or others. NVLIST is the most</div><div>effective and suitable for the current purposes.</div><div><br></div><blockquote type="cite"><div><div><br>Maybe EFL approach of storing configuration files for limited<br>resources embedded system storage that use binary storage data but can<br>be decompressed in chunks that can be replaced in place?<br>https://www.enlightenment.org/develop/efl/start<br></div></div></blockquote><div><br></div><div>There are many things that can be used, but it should be well known, easy, stable,</div><div>fast and supportable. I believe NVLIST is the best choice.</div><br><blockquote type="cite"><div><div><br>Sorry for asking those questions but there may be already good and<br>verified solutions out there not to reinvent the wheel? :-)<br></div></div></blockquote><div><br></div>Thank you for your questions. If you would like, you can try to test the new implementation and give feedback.</div><div><br></div><div>———</div><div>Vitaliy Gusev</div><div><br></div></div></body></html>
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AF34E648-2D8A-46C7-82A5-B88006BBB8F6>
