Date: Mon, 05 Jun 2023 17:32:57 +0200 From: Corvin =?ISO-8859-1?Q?K=F6hne?= <corvink@FreeBSD.org> To: Vitaliy Gusev <gusev.vitaliy@gmail.com>, virtualization@freebsd.org Cc: freebsd-hackers@freebsd.org Subject: Re: BHYVE SNAPSHOT image format proposal Message-ID: <6b98da58a5bd8e83bc466efa20b5a900298210aa.camel@FreeBSD.org> In-Reply-To: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com> References: <67FDC8A8-86A6-4AE4-85F0-FF7BEF9F2F06@gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --] On Tue, 2023-05-23 at 19:05 +0300, Vitaliy Gusev wrote: > Hi, > > Here is a proposal for bhyve snapshot/checkpoint image format > improvements. > Hi Vitaliy, thank you very much for sending this proposal to the mailing list. > It implies moving snapshot code to nvlist engine. > > Current snapshot implementation has disadvantages: > > * 3 files per snapshot: .meta, .kern, vram > * Binary Stream format of data. > * Adding optional variable - breaks resume > * Removing variable - breaks resume > * Changing saved order of variables - breaks resume > * Hard to get information about what is saved and decode. > * Hard to debug if somethings goes wrong > * No versions. If change code, resume of an old images can be > passed, but with UB. > > New nvlist implementation should solve all things above. The first > step - > improve snapshot/checkpoint saving format. It eliminates three files > usage > per a snapshot. > > > 1. BHYVE SNAPSHOT image format: > > > +———————————————————————————————————————+ > > | HEADER PHYS - 4096 BYTES | > > +———————————————————————————————————————+ > > | | > > | DATA | > > | | > > +———————————————————————————————————————+ > > 2. HEADER PHYS format: > > 0 +—————————————————————————————————————————+ > | IDENT STRING - 64 BYTES | > 64 +—————————————————————————————————————————+ > | NVLIST SIZE - 4 BYTES | NVLIST DATA | > 72 +—————————————————————————————————————————+ > | | > | NVLIST DATA | > | | > 4096 +—————————————————————————————————————————+ > > > > > IDENT STRING - Each producer can set its own value to specify > > image. > > NVLIST SIZE - The following packed header nvlist data size. > > NVLIST DATA - Packed nvlist header data. > > > > 4KB should be enough for the HEADER to keep basic information about > > Sections. However, it can > > be enlarged lately, without breaking backward compatibility. > > I can't see an advantage of using a fixed sized header of 4KB. You have to parse the offset and size of every section anyways. If it's for alignment requirements you can still align all sections on save and set the offset accordingly. So, why complicating things by using a fixed header size? The IDENT STRING seems to be very large. Even a GUID which should be a global unique identifier uses just 16 Bytes. Additionally, it might be worth using a dedicated ident and version field for an easier version parsing. E.g.: +------------------+-------------------+ | IDENT - 56 BYTES | VERSION - 8 BYTES | +------------------+-------------------+ IDENT - "BHYVE CHECKPOINT IMAGE" VERSION - 1 (as uint64_t) Btw: I don't care but here we could leave some free space for possible forward compatibility. E.g.: +------------------+-------------------+-------------------------+ | IDENT - 16 BYTES | VERSION - 8 BYTES | _FREE_SPACE_ - 40 BYTES | +------------------+-------------------+-------------------------+ > 3. NVLIST HEADER consists of Sections in the following format: > > * Name - string > * Type: string: > - “text, - plain text, > - “nvlist” - packed nvlist, > - “binary” - raw binary data. > * Size - Size of section - uint64 > * Offset - Offset in image format - uint64 > > > Predefined sections: “config”, “devices”, “kernel”, “memory”. > LGTM. > > 4. EXAMPLE: > > > IDENT STRING: > > "BHYVE CHECKPOINT IMAGE VERSION 1" > > NVLIST HEADER: > > [config] > config.offset = 0x1000 (4096) > config.size = 0x1f6 (502) > config.type = "text" > [kernel] > kernel.offset = 0x11f6 (4598) > kernel.size = 0x19a7 (6567) > kernel.type = “nvlist" > [devices] > devices.offset = 0x2b9d (11165) > devices.size = 0x10145ba (16860602) > devices.type = "nvlist" > [memory] > memory.offset = 0x1200000 (18874368) > memory.size = 0x3ce00000 (1021313024) > memory.type = “binary" > > SECTIONS: > > [section "config" size 0x1f6 offset 0x1000]: > > memory.size=1024M > > x86.strictmsr=true > > x86.vmexit_on_hlt=true > > cpus=2 > > acpi_tables=true > > pci.0.0.0.device=hostbridge > > pci.0.31.0.device=lpc > > pci.0.4.0.device=virtio-net > > pci.0.4.0.backend=tap0 > > pci.0.7.0.device=fbuf > > pci.0.7.0.tcp=10.42.0.78:5900 > > pci.0.7.0.w=1024 > > pci.0.7.0.h=768 > > pci.0.5.0.device=ahci > > pci.0.5.0.port.0.type=cd > > pci.0.5.0.port.0.path=/ISO/ubuntu-22.04.1-live-server-amd64.iso > > lpc.bootrom=/usr/local/share/uefi-firmware/BHYVE_UEFI.fd > > checkpoint.date="Wed Jan 25 23:48:29 2023" > > name=ubuntu22 > Not sure if it's just an example for the "text" type. bhyve converts it into a nvlist, so it could be saved directly as nvlist. Btw: I would only implement the "text" type if there's an usecase that can't be solved by one of the other types. > [section "kernel" size 0x19a7 offset 0x11f6]: > [vm] > vm.vds_version = 0x1 (1) > vm.cpu0.data(BINARY): 00 00 00 00 0D 00 00 00 01 00 00 00 00 > 00 00 00 ... size=0x28 > vm.cpu1.data(BINARY): 00 00 00 00 0D 00 00 00 01 00 00 00 00 > 00 00 00 ... size=0x28 > vm.checkpoint_tsc = 0xe2e0ac6fbe456 (3991273496896598) > [hpet] > hpet.vds_version = 0x1 (1) > hpet.data(BINARY): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 ... size=0x118 > [vmx] > vmx.vds_version = 0x1 (1) > vmx.cpu_features = 0 (0) > vmx.cpu0.vmx_data(BINARY): F0 CC 15 B8 FF FF FF FF 40 B4 21 > B9 FF FF FF FF ... size=0x288 > vmx.cpu1.vmx_data(BINARY): F0 CC 15 B8 FF FF FF FF 00 00 67 > 41 D8 9B FF FF ... size=0x288 > [ioapic] > ioapic.vds_version = 0x1 (1) > ioapic.data(BINARY): 00 00 01 00 00 00 00 00 00 00 00 00 00 > 00 00 00 ... size=0x208 > [lapic] > lapic.vds_version = 0x1 (1) > lapic.cpu0.data(BINARY): 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 ... size=0x460 > lapic.cpu1.data(BINARY): 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 ... size=0x460 > [atpit] > atpit.vds_version = 0x1 (1) > atpit.data(BINARY): 00 00 00 00 00 00 00 00 54 AD 51 97 0F 0E > 00 00 ... size=0xa0 > [atpic] > atpic.vds_version = 0x1 (1) > atpic.data(BINARY): 01 00 00 00 00 00 00 00 00 00 00 00 01 00 > 00 00 ... size=0x84 > [pmtimer] > pmtimer.vds_version = 0x1 (1) > pmtimer.uptime = 0x26fd133e5cc (2679274464716) > [rtc] > rtc.vds_version = 0x1 (1) > rtc.data(BINARY): 0A 00 00 00 00 FE FF FF 10 35 13 3D 40 F7 > 14 00 ... size=0x98 > > — > Thanks, > Vitaliy Gusev > All in all, it looks good. Keep on your work! Regards checksum feature: We should focus on enabling this feature by default before adding advanced features. So, keep it simple and small. Regards forward compatibility: Backward compatibility is way more important than forward compatibility. Nevertheless, forward compatibility would be nice to have. So, we should keep it in mind when modifying the layout. For the moment, just focus on a format which is backward compatible. -- Kind regards, Corvin [-- Attachment #2 --] -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEgvRSla3m2t/H2U9G2FTaVjFeAmoFAmR+ACkACgkQ2FTaVjFe AmqHiA//Z+21VQdlu4imactssAT4+30yMrPULmDp98ncd7Z+jyixmXNa8+i3b0Wb XuTMBVERSGRxuaUjca2ChG98qlI3liLVgvRuvZqT2mi7VwGaQlAnMaEkL40mxq6M H+y9zUnOXKUGjTHS2nKCJflPihcCx/Nj0uif4WiEePCazmbp/KEOMXbrTlEIUNC/ 39OHjG0q2BPLp1P0c0eF4UJLXr8P0ieeeWDKWSSaLscJs59ON+H0cn8z28AbWxTk KBo3i4UwXS/Ei/VIVmM2I3Xg51CeCEXdlpg8vf+/MNqiQ4/xUvQdDBdbzgUVAmwG fhuKVds/DJa3vJPNHiS/EoLFOeyo5IXVQhowi2OVzDznwRXlmGcGHJ80/XDmHuZr +N6ekQJVOH/OUrNcsFlCWenWmWYeajIq9F2EWBOnnLy6jnieQ6qCC0Tflkn1wvIN FmS1qg2z7Y5OIxOKMOKfK8vckiaaAMrK76GehNjqBI3/7FzO7rMFqz1CcCPfkS9E atf2AaSCIFiBJpTFyj2HZwdlL7nKNh8d7raACyC3G3MRMZgCZ8Fhiyp7t+LL4Dfk F303H2zaFCo6lZR4GpRBsadmxo8QnXPw42eWLDy4kBP1WoA+TMbk8mnK/eONDvYz aYmTEOcJIdIylWK/S2p8bPs1ydeBCF9e6xbK7ErGBTbtnprVKcs= =oFeS -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?6b98da58a5bd8e83bc466efa20b5a900298210aa.camel>
