Restic-based backup

Attention

Scribe has been renamed to VolSync!

The Scribe project has been renamed to VolSync, and it has a new home. Come join us at our new location:

Scribe supports taking backups of PersistentVolume data using the Restic-based data mover. A ReplicationSource defines the backup policy (target, frequency, and retention), while a ReplicationDestination is used for restores.

The Restic mover is different than most of Scribe’s other movers because it is not meant for synchronizing data between clusters. This mover is specifically meant for data backup.

Specifying a repository

For both backup and restore operations, it is necessary to specify a backup repository for Restic. The repository and connection information are defined in a restic-config Secret.

Below is an example showing how to use a repository stored on Minio.

apiVersion: v1
kind: Secret
metadata:
  name: restic-config
type: Opaque
stringData:
  # The repository url
  RESTIC_REPOSITORY: s3:http://minio.minio.svc.cluster.local:9000/restic-repo
  # The repository encryption key
  RESTIC_PASSWORD: my-secure-restic-password
  # ENV vars specific to the chosen back end
  # https://restic.readthedocs.io/en/stable/030_preparing_a_new_repo.html
  AWS_ACCESS_KEY_ID: access
  AWS_SECRET_ACCESS_KEY: password

This Secret will be referenced for both backup (ReplicationSource) and for restore (ReplicationDestination).

Note

If necessary, the repository will be automatically initialized (i.e., restic init) during the first backup.

Configuring backup

A backup policy is defined by a ReplicationSource object that uses the restic replication method.

---
apiVersion: scribe.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: mydata-backup
spec:
  # The PVC to be backed up
  sourcePVC: mydata
  trigger:
    # Take a backup every 30 minutes
    schedule: "*/30 * * * *"
restic:
  # Prune the repository (repack to free space) every 2 weeks
  pruneIntervalDays: 14
  # Name of the Secret with the connection information
  repository: restic-config
  # Retention policy for backups
  retain:
    hourly: 6
    daily: 5
    weekly: 4
    monthly: 2
    yearly: 1
  # Clone the source volume prior to taking a backup to ensure a
  # point-in-time image.
  copyMethod: Clone

Backup options

There are a number of additional configuration options not shown in the above example. Scribe’s Restic mover options closely follow those of Restic itself.

accessModes

When using a copyMethod of Clone or Snapshot, this field allows overriding the access modes for the point-in-time (PiT) volume. The default is to use the access modes from the source PVC.

capacity

When using a copyMethod of Clone or Snapshot, this allows overriding the capacity of the PiT volume. The default is to use the capacity of the source volume.

copyMethod

This specifies the method used to create a PiT copy of the source volume. Valid values are:

  • Clone - Create a new volume by cloning the source PVC (i.e., use the source PVC as the volumeSource for the new volume.

  • None - Do no create a PiT copy. The Scribe data mover will directly use the source PVC.

  • Snapshot - Create a VolumeSnapshot of the source PVC, then use that snapshot to create the new volume. This option should be used for CSI drivers that support snapshots but not cloning.

storageClassName

This specifies the name of the StorageClass to use when creating the PiT volume. The default is to use the same StorageClass as the source volume.

volumeSnapshotClassName

When using a copyMethod of Snapshot, this specifies the name of the VolumeSnapshotClass to use. If not specified, the cluster default will be used.

cacheCapacity

This determines the size of the Restic metadata cache volume. This volume contains cached metadata from the backup repository. It must be large enough to hold the non-pruned repository metadata. The default is 1 Gi.

cacheStorageClassName

This is the name of the StorageClass that should be used when provisioning the cache volume. It defaults to .spec.storageClassName, then to the name of the StorageClass used by the source PVC.

cacheAccessModes

This is the access mode(s) that should be used to provision the cache volume. It defaults to .spec.accessModes, then to the access modes used by the source PVC.

pruneIntervalDays

This determines the number of days between running restic prune on the repository. The prune operation repacks the data to free space, but it can also generate significant I/O traffic as a part of the process. Setting this option allows a trade-off between storage consumption (from no longer referenced data) and access costs.

repository

This is the name of the Secret (in the same Namespace) that holds the connection information for the backup repository. The repository path should be unique for each PV.

retain

This has sub-fields for hourly, daily, weekly, monthly, and yearly that allow setting the number of each type of backup to retain. There is an additional field, within that can be used to specify a time period during which all backups should be retained. See Restic’s documentation on –keep-within for more information.

When more than the specified number of backups are present in the repository, they will be removed via Restic’s forget operation, and the space will be reclaimed during the next prune.

Performing a restore

Data from a backup can be restored using the ReplicationDestination CR. In most cases, it is desirable to perform a single restore into an empty PersistentVolume.

For example, create a PVC to hold the restored data:

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: datavol
spec:
accessModes:
  - ReadWriteOnce
resources:
  requests:
    storage: 3Gi

Restore the data into datavol:

---
apiVersion: scribe.backube/v1alpha1
kind: ReplicationDestination
metadata:
  name: datavol-dest
spec:
  trigger:
    manual: restore-once
  restic:
    repository: restic-repo
    destinationPVC: datavol
    copyMethod: None

In the above example, the data will be written directly into the new PVC since it is specified via destinationPVC, and no snapshot will be created since a copyMethod of None is used.

The restore operation only needs to be performed once, so instead of using a cronspec-based schedule, a manual trigger is used. After the restore completes, the ReplicationDestination object can be deleted.

Note

Currently, Scribe only supports restoring the latest backup. However, older backups may be present in the repository (according to the retain parameters). Those can be accessed directly using the Restic utility plus the connection information and credentials from the repository Secret.

Restore options

There are a number of additional configuration options not shown in the above example.

accessModes

When Scribe creates the destination volume, this specifies the accessModes for the PVC. The value should be ReadWriteOnce or ReadWriteMany.

capacity

When Scribe creates the destination volume, this value is used to determine its size. This need not match the size of the source volume, but it must be large enough to hold the incoming data.

copyMethod

This specifies how the data should be preserved at the end of each synchronization iteration. Valid values are:

  • None - Do not create a point-in-time copy of the data.

  • Snapshot - Create a VolumeSnapshot at the end of each iteration

destinationPVC

Instead of having Scribe automatically provision the destination volume (using capacity, accessModes, etc.), the name of a pre-existing PVC may be specified here.

storageClassName

When Scribe creates the destination volume, this specifies the name of the StorageClass to use. If omitted, the system default StorageClass will be used.

volumeSnapshotClassName

When using a copyMethod of Snapshot, this value specifies the name of the VolumeSnapshotClass to use when creating a snapshot.

cacheCapacity

This determines the size of the Restic metadata cache volume. This volume contains cached metadata from the backup repository. It must be large enough to hold the non-pruned repository metadata. The default is 1 Gi.

cacheStorageClassName

This is the name of the StorageClass that should be used when provisioning the cache volume. It defaults to .spec.storageClassName, then to the name of the StorageClass used by the source PVC.

cacheAccessModes

This is the access mode(s) that should be used to provision the cache volume. It defaults to .spec.accessModes, then to the access modes used by the source PVC.

repository

This is the name of the Secret (in the same Namespace) that holds the connection information for the backup repository. The repository path should be unique for each PV.