Kubernetes 1.20: Kubernetes Volume Snapshot Moves to GA
Authors: Xing Yang, VMware & Xiangqian Yu, Google
The Kubernetes Volume Snapshot feature is now GA in Kubernetes v1.20. It was introduced as alpha in Kubernetes v1.12, followed by a second alpha with breaking changes in Kubernetes v1.13, and promotion to beta in Kubernetes 1.17. This blog post summarizes the changes releasing the feature from beta to GA.
What is a volume snapshot?
Many storage systems (like Google Cloud Persistent Disks, Amazon Elastic Block Storage, and many on-premise storage systems) provide the ability to create a “snapshot” of a persistent volume. A snapshot represents a point-in-time copy of a volume. A snapshot can be used either to rehydrate a new volume (pre-populated with the snapshot data) or to restore an existing volume to a previous state (represented by the snapshot).
Why add volume snapshots to Kubernetes?
Kubernetes aims to create an abstraction layer between distributed applications and underlying clusters so that applications can be agnostic to the specifics of the cluster they run on and application deployment requires no “cluster-specific” knowledge.
The Kubernetes Storage SIG identified snapshot operations as critical functionality for many stateful workloads. For example, a database administrator may want to snapshot a database’s volumes before starting a database operation.
By providing a standard way to trigger volume snapshot operations in Kubernetes, this feature allows Kubernetes users to incorporate snapshot operations in a portable manner on any Kubernetes environment regardless of the underlying storage.
Additionally, these Kubernetes snapshot primitives act as basic building blocks that unlock the ability to develop advanced enterprise-grade storage administration features for Kubernetes, including application or cluster level backup solutions.
What’s new since beta?
With the promotion of Volume Snapshot to GA, the feature is enabled by default on standard Kubernetes deployments and cannot be turned off.
Many enhancements have been made to improve the quality of this feature and to make it production-grade.
-
The Volume Snapshot APIs and client library were moved to a separate Go module.
-
A snapshot validation webhook has been added to perform necessary validation on volume snapshot objects. More details can be found in the Volume Snapshot Validation Webhook Kubernetes Enhancement Proposal.
-
Along with the validation webhook, the volume snapshot controller will start labeling invalid snapshot objects that already existed. This allows users to identify, remove any invalid objects, and correct their workflows. Once the API is switched to the v1 type, those invalid objects will not be deletable from the system.
-
To provide better insights into how the snapshot feature is performing, an initial set of operation metrics has been added to the volume snapshot controller.
-
There are more end-to-end tests, running on GCP, that validate the feature in a real Kubernetes cluster. Stress tests (based on Google Persistent Disk and
hostPath
CSI Drivers) have been introduced to test the robustness of the system.
Other than introducing tightening validation, there is no difference between the v1beta1 and v1 Kubernetes volume snapshot API. In this release (with Kubernetes 1.20), both v1 and v1beta1 are served while the stored API version is still v1beta1. Future releases will switch the stored version to v1 and gradually remove v1beta1 support.
Which CSI drivers support volume snapshots?
Snapshots are only supported for CSI drivers, not for in-tree or FlexVolume drivers. Ensure the deployed CSI driver on your cluster has implemented the snapshot interfaces. For more information, see Container Storage Interface (CSI) for Kubernetes GA.
Currently more than 50 CSI drivers support the Volume Snapshot feature. The GCE Persistent Disk CSI Driver has gone through the tests for upgrading from volume snapshots beta to GA. GA level support for other CSI drivers should be available soon.
Who builds products using volume snapshots?
As of the publishing of this blog, the following participants from the Kubernetes Data Protection Working Group are building products or have already built products using Kubernetes volume snapshots.
- Dell-EMC: PowerProtect
- Druva
- Kasten K10
- NetApp: Project Astra
- Portworx (PX-Backup)
- Pure Storage (Pure Service Orchestrator)
- Red Hat OpenShift Container Storage
- Robin Cloud Native Storage
- TrilioVault for Kubernetes
- Velero plugin for CSI
How to deploy volume snapshots?
Volume Snapshot feature contains the following components:
- Kubernetes Volume Snapshot CRDs
- Volume snapshot controller
- Snapshot validation webhook
- CSI Driver along with CSI Snapshotter sidecar
It is strongly recommended that Kubernetes distributors bundle and deploy the volume snapshot controller, CRDs, and validation webhook as part of their Kubernetes cluster management process (independent of any CSI Driver).
If your cluster does not come pre-installed with the correct components, you may manually install them. See the CSI Snapshotter README for details.
How to use volume snapshots?
Assuming all the required components (including CSI driver) have been already deployed and running on your cluster, you can create volume snapshots using the VolumeSnapshot
API object, or use an existing VolumeSnapshot
to restore a PVC by specifying the VolumeSnapshot data source on it. For more details, see the volume snapshot documentation.
Dynamically provision a volume snapshot
To dynamically provision a volume snapshot, create a VolumeSnapshotClass
API object first.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: test-snapclass
driver: testdriver.csi.k8s.io
deletionPolicy: Delete
parameters:
csi.storage.k8s.io/snapshotter-secret-name: mysecret
csi.storage.k8s.io/snapshotter-secret-namespace: mysecretnamespace
Then create a VolumeSnapshot
API object from a PVC by specifying the volume snapshot class.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: test-snapshot
namespace: ns1
spec:
volumeSnapshotClassName: test-snapclass
source:
persistentVolumeClaimName: test-pvc
Importing an existing volume snapshot with Kubernetes
To import a pre-existing volume snapshot into Kubernetes, manually create a VolumeSnapshotContent
object first.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotContent
metadata:
name: test-content
spec:
deletionPolicy: Delete
driver: testdriver.csi.k8s.io
source:
snapshotHandle: 7bdd0de3-xxx
volumeSnapshotRef:
name: test-snapshot
namespace: default
Then create a VolumeSnapshot
object pointing to the VolumeSnapshotContent
object.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: test-snapshot
spec:
source:
volumeSnapshotContentName: test-content
Rehydrate volume from snapshot
A bound and ready VolumeSnapshot
object can be used to rehydrate a new volume with data pre-populated from snapshotted data as shown here:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-restore
namespace: demo-namespace
spec:
storageClassName: test-storageclass
dataSource:
name: test-snapshot
kind: VolumeSnapshot
apiGroup: snapshot.storage.k8s.io
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
How to add support for snapshots in a CSI driver?
See the CSI spec and the Kubernetes-CSI Driver Developer Guide for more details on how to implement the snapshot feature in a CSI driver.
What are the limitations?
The GA implementation of volume snapshots for Kubernetes has the following limitations:
- Does not support reverting an existing PVC to an earlier state represented by a snapshot (only supports provisioning a new volume from a snapshot).
How to learn more?
The code repository for snapshot APIs and controller is here: https://github.com/kubernetes-csi/external-snapshotter
Check out additional documentation on the snapshot feature here: http://k8s.io/docs/concepts/storage/volume-snapshots and https://kubernetes-csi.github.io/docs/
How to get involved?
This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together.
We offer a huge thank you to the contributors who stepped up these last few quarters to help the project reach GA. We want to thank Saad Ali, Michelle Au, Tim Hockin, and Jordan Liggitt for their insightful reviews and thorough consideration with the design, thank Andi Li for his work on adding the support of the snapshot validation webhook, thank Grant Griffiths on implementing metrics support in the snapshot controller and handling password rotation in the validation webhook, thank Chris Henzie, Raunak Shah, and Manohar Reddy for writing critical e2e tests to meet the scalability and stability requirements for graduation, thank Kartik Sharma for moving snapshot APIs and client lib to a separate go module, and thank Raunak Shah and Prafull Ladha for their help with upgrade testing from beta to GA.
There are many more people who have helped to move the snapshot feature from beta to GA. We want to thank everyone who has contributed to this effort:
- Andi Li
- Ben Swartzlander
- Chris Henzie
- Christian Huffman
- Grant Griffiths
- Humble Devassy Chirammal
- Jan Šafránek
- Jiawei Wang
- Jing Xu
- Jordan Liggitt
- Kartik Sharma
- Madhu Rajanna
- Manohar Reddy
- Michelle Au
- Patrick Ohly
- Prafull Ladha
- Prateek Pandey
- Raunak Shah
- Saad Ali
- Saikat Roychowdhury
- Tim Hockin
- Xiangqian Yu
- Xing Yang
- Zhu Can
For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We’re rapidly growing and always welcome new contributors.
We also hold regular Data Protection Working Group meetings. New attendees are welcome to join in discussions.