# Firecracker snapshot versioning

This document describes how Firecracker persists its state across multiple
versions, diving deep into the snapshot format, encoding, compatibility and
limitations.

## Introduction

The design behind the snapshot implementation enables version tolerant save
and restore across multiple Firecracker versions which we call a version space.
For example, one can pause a microVM, save it to disk with Firecracker version
**0.23.0** and later load it in Firecracker version **0.24.0**. It also works
in reverse: Firecracker version **0.23.0** loads what  **0.24.0** saves.

Below is an example graph showing backward and forward snapshot compatibility.
This is the general picture, but keep in mind that when adding new features
some version translations would not be possible.

![Version graph](
../images/version_graph.png?raw=true
"Version graph")

A non-exhaustive list of how cross-version snapshot support can be used:

Example scenario #1 - load snapshot from older version:

* Start Firecracker v0.23 → Boot microVM → *Workload starts* → Pause →
  CreateSnapshot(snap) → kill microVM
* Start Firecracker v0.24 → LoadSnapshot → Resume → *Workload continues*

Example scenario #2 - load snapshot in older version:

* Start Firecracker v0.24 → Boot microVM → *Workload starts* → Pause →
  CreateSnapshot(snap, “0.23”) → kill microVM
* Start Firecracker v0.23 → LoadSnapshot(snap) → Resume → *Workload continues*

Example scenario #3 - load snapshot in older version:

* Start Firecracker v0.24 →  LoadSnapshot(older_snap) → Resume →
  *Workload continues* → Pause → CreateSnapshot(snap, “0.23”) → kill microVM
* Start Firecracker v0.23 → LoadSnapshot(snap) → Resume → *Workload continues*

## Overview

Firecracker persists the microVM state as 2 separate objects:

* a **guest memory** file
* a **microVM state** file.

*The block devices attached to the microVM are not considered part of the
state and need to be managed separately.*

### Guest memory

The guest memory file contains the microVM memory saved as a dump of all pages.

### MicroVM state

In the VM state file, Firecracker stores the internal state of the VMM (device
emulation, KVM and vCPUs) with 2 exceptions - serial emulation and vsock backend.

While we continuously improve and extend Firecracker's features by adding new
capabilities, devices or enhancements, the microVM state file may change both
structurally and semantically with each new release. The state file includes
versioning information and each Firecracker release implements distinct
save/restore logic for the supported version space.

## MicroVM state file format

A microVM state file is further split into four different fields:

| Field | Bits| Description |
|----|----|----|
| magic_id | 64 | Firecracker snapshot, architecture (x86_64/aarch64) and storage version.
| version  | 16 | The snapshot version number internally mapped 1:1 to a specific Firecracker version.
| state | N | Bincode blob containing the microVM state.
 | crc| 64 | Optional CRC64 sum of magic_id, version and state fields.

**Note**: the last 16 bits of `magic_id` encode the storage version which specifies
the encoding used for the `version` and `state` fields. The current
implementation sets this field to 1, which identifies it as a [Serde bincode](https://github.com/servo/bincode)
compatible encoder/decoder.

### Version tolerant ser/de

Firecracker reads and writes the `state` blob of the snapshot by using per
version, separate serialization and deserialization logic. This logic is mostly
autogenerated by a Rust procedural macro based on `struct` and `enum`
annotations. Basically, one can say that these structures support versioning.
The versioning logic is generated by parsing a structure's history log (encoded
using Rust annotations) and emitting Rust code.

Versioned serialization and deserialization is divided into two translation layers:

* field translator,
* semantic translator.

The _field translator_ implements the logic to convert between different
versions of the same Rust POD structure: it can deserialize or serialize from
source version to target.
The translation is done field by field - the common fields are copied from
source to target, and the fields that are unique to the target are
(de)serialized with their default values.

The _semantic translator_ is only concerned with translating the semantics of
the serialized/deserialized fields.

The _field translator_ is generated automatically through a procedural macro,
and the _semantic translation methods_ have to be annotated in the structure
by the user.

This block diagram illustrates the concept:

![Versionize](
../images/versionize.png?raw=true
"Versionize layers")

## VM state encoding

During research and prototyping we considered multiple storage formats. The
criteria used for comparing these are: performance, size, rust support,
specification, versioning support, community and tooling. Performance, size
and Rust support are hard requirements while all others can be the subject
of trade offs.
More info about this comparison can be found [here](https://github.com/firecracker-microvm/firecracker/blob/9d427b33d989c3225d874210f6c2849465941dc0/docs/snapshotting/design.md#snapshot-format).

Key benefits of using *bincode*:

* Minimal snapshot size overhead
* Minimal CPU overhead
* Simple implementation

The current implementation relies on the [Serde bincode encoder](https://github.com/servo/bincode).

Versionize is compatible to Serde with bincode backend: structures serialized
with versionize at a specific version can be deserialized with Serde. Also
structures serialized with serde can be deserialized with versionize.

## Snapshot compatibility

### Host kernel

The minimum kernel version required by Firecracker snapshots is 4.14. Snapshots
can be saved and restored on the same kernel version without any issues. There
might be issues when restoring snapshots created on different host kernel
version even when using the same Firecracker version.

SnapshotCreate and SnapshotLoad operations across different host kernels is
considered unstable in Firecracker as the saved KVM state might have different
semantics on different kernels.

### Device model

The current Firecracker devices are backwards compatible up to the version that
introduces them. Ideally this property would be kept over time, but there are
situations when a new version of a device exposes new features to the guest
that do not exist in an older version. In such cases restoring a snapshot at
an older version becomes impossible without breaking the guest workload.

The microVM state file links some resources that are external to the snapshot:

* tap devices by device name,
* block devices by block file path,
* vsock backing Unix domain socket by socket name.

To successfully restore a microVM one should check that:

* tap devices are available, their names match their original names since these
  are the values saved in the microVM state file, and they are accessible to
  the Firecracker process where the microVM is being restored,
* block devices are set up at their original relative or absolute paths with
  the proper permissions, as the Firecracker process with the restored microVM
  will attempt to access them exactly as they were accessed in the original
  Firecracker process,
* the vsock backing Unix domain socket is available, its name matches the
  original name, and it is accessible to the new Firecracker process.

### CPU model

Firecracker microVMs snapshot functionality is available for Intel/AMD/ARM64
CPU models that support the hardware virtualizations extensions, more details
are available [here](../../README.md#supported-platforms). Snapshots are not
compatible across CPU architectures and even across CPU models of the same
architecture. They are only compatible if the CPU features exposed to the guest
are an invariant when saving and restoring the snapshot. The trivial scenario
is creating and restoring snapshots on hosts that have the same CPU model.

To make snapshots more portable across Intel CPUs Firecracker provides an API to
select an Intel CPU template: T2, T2CL, T2S or C3.
Firecracker CPU templates mask CPUID and some MSR values (in case of T2CL and T2S)
to restrict the exposed features to a common denominator of multiple CPU models.
T2 and C3 templates are mapped as close as possible to AWS T2/C3 instances in terms
of CPU features. The T2S template is designed to allow migrating snapshots
between hosts with Intel Skylake and Cascade Lake securely by further
restricting CPU features for the guest, however this comes with a performance
penalty. Users are encouraged to carry out a performance assessment if they wish
to use the T2S template.
The T2CL template is mapped to be close to Intel Cascade Lake.
It is not safe to use it on Intel CPUs older than Cascade Lake (such as Skylake).

The only AMD template is T2A. It is considered safe to be used with AMD Milan.

Intel T2CL and AMD T2A templates together aim to provide instruction set feature
parity between CPUs running them, so they can form a heterogeneous fleet
exposing the same instruction sets to the application.

Restoring from an Intel snapshot on AMD (or vice-versa) is not supported.

There are no templates available for ARM64.

It is important to note that guest workloads can still execute instructions
that are being masked by CPUID and restoring and saving of such workloads will
lead to undefined result. Firecracker retrieves the state of a discrete list
MSRs from KVM, more specifically the MSRs corresponding to the guest
exposed features.

## Implementation

To enable Firecracker cross version snapshots we have designed and built two
crates:

* [versionize](https://crates.io/crates/versionize) - defines the `Versionize`
  trait, implements serialization of primitive types and provides a helper
  class to map Firecracker versions to individual structure versions.
* [versionize_derive](https://crates.io/crates/versionize_derive) - exports
  a procedural macro that consumes structures and enums and their annotations
  to produce an implementation of the `Versionize` trait.

The microVM state file format is implemented in the [snapshot crate](../../src/snapshot/src/lib.rs)
in the Firecracker repository.
All Firecracker devices implement the [Persist](../../src/snapshot/src/persist.rs)
trait which exposes an interface that enables creating from and saving to the
microVM state.
