Merge "Update replication v2.1 devref"

2017-05-25 18:47:04 +00:00 · 2017-05-25 18:47:04 +00:00 · b6042c90eb
commit b6042c90eb
parent 4abdee752f 30261ec485
1 changed files with 416 additions and 111 deletions
--- a/doc/source/devref/replication.rst
+++ b/doc/source/devref/replication.rst
@ -1,64 +1,88 @@
 Replication
-============
+===========

-How to implement replication features in a backend driver.
+For backend devices that offer replication features, Cinder provides a common
+mechanism for exposing that functionality on a per volume basis while still
+trying to allow flexibility for the varying implementation and requirements of
+all the different backend devices.

-For backend devices that offer replication features, Cinder
-provides a common mechanism for exposing that functionality
-on a volume per volume basis while still trying to allow
-flexibility for the varying implementation and requirements
-of all the different backend devices.
+There are 2 sides to Cinder's replication feature, the core mechanism and the
+driver specific functionality, and in this document we'll only be covering the
+driver side of things aimed at helping vendors implement this functionality in
+their drivers in a way consistent with all other drivers.

-Most of the configuration is done via the cinder.conf file
-under the driver section and through the use of volume types.
+Although we'll be focusing on the driver implementation there will also be some
+mentions on deployment configurations to provide a clear picture to developers
+and help them avoid implementing custom solutions to solve things that were
+meant to be done via the cloud configuration.

-NOTE:
-This implementation is intended to solve a specific use case.
-It's critical that you read the Use Cases section of the spec
-here:
+Overview
+--------
+
+As a general rule replication is enabled and configured via the cinder.conf
+file under the driver's section, and volume replication is requested through
+the use of volume types.
+
+*NOTE*: Current replication implementation is v2.1 and it's meant to solve a
+very specific use case, the "smoking hole" scenario.  It's critical that you
+read the Use Cases section of the spec here:
 https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html

-Config file examples
--------------------
+From a user's perspective volumes will be created using specific volume types,
+even if it is the default volume type, and they will either be replicated or
+not, which will be reflected on the ``replication_status`` field of the volume.
+So in order to know if a snapshot is replicated we'll have to check its volume.

-The cinder.conf file is used to specify replication config info
-for a specific driver. There is no concept of managed vs unmanaged,
-ALL replication configurations are expected to work by using the same
-driver.  In other words, rather than trying to perform any magic
-by changing host entries in the DB for a Volume etc, all replication
-targets are considered "unmanaged" BUT if a failover is issued, it's
-the drivers responsibility to access replication volumes on the replicated
-backend device.
+After the loss of the primary storage site all operations on the resources will
+fail and VMs will no longer have access to the data.  It is then when the Cloud
+Administrator will issue the ``failover-host`` command to make the
+cinder-volume service perform the failover.

-This results in no changes for the end-user.  For example, He/She can
-still issue an attach call to a replicated volume that has been failed
-over, and the driver will still receive the call BUT the driver will
-need to figure out if it needs to redirect the call to the a different
-backend than the default or not.
+After the failover is completed, the Cinder volume service will start using the
+failed-over secondary storage site for all operations and the user will once
+again be able to perform actions on all resources that were replicated, while
+all other resources will be in error status since they are no longer available.

-Information regarding if the backend is in a failed over state should
-be stored in the driver, and in the case of a restart, the service
-entry in the DB will have the replication status info and pass it
-in during init to allow the driver to be set in the correct state.
+Storage Device configuration
+----------------------------

-In the case of a failover event, and a volume was NOT of type
-replicated, that volume will now be UNAVAILABLE and any calls
-to access that volume should return a VolumeNotFound exception.
+Most storage devices will require configuration changes to enable the
+replication functionality, and this configuration process is vendor and storage
+device specific so it is not contemplated by the Cinder core replication
+functionality.

-**replication_device**
+It is up to the vendors whether they want to handle this device configuration
+in the Cinder driver or as a manual process, but the most common approach is to
+avoid including this configuration logic into Cinder and having the Cloud
+Administrators do a manual process following a specific guide to enable
+replication on the storage device before configuring the cinder volume service.

-Is a multi-dict opt, that should be specified
-for each replication target device the admin would
-like to configure.
+Service configuration
+---------------------

-*NOTE:*
+The way to enable and configure replication is common to all drivers and it is
+done via the ``replication_device`` configuration option that goes in the
+driver's specific section in the ``cinder.conf`` configuration file.

-There is one standardized and REQUIRED key in the config
-entry, all others are vendor-unique:
+``replication_device`` is a multi dictionary option, that should be specified
+for each replication target device the admin wants to configure.

-* backend_id:<vendor-identifier-for-rep-target>
+While it is true that all drivers use the same ``replication_device``
+configuration option this doesn't mean that they will all have the same data,
+as there is only one standardized and **REQUIRED** key in the configuration
+entry, all others are vendor specific:

-An example driver config for a device with multiple replication targets
+- backend_id:<vendor-identifier-for-rep-target>
+
+Values of ``backend_id`` keys are used to uniquely identify within the driver
+each of the secondary sites, although they can be reused on different driver
+sections.
+
+These unique identifiers will be used by the failover mechanism as well as in
+the driver initialization process, and the only requirement is that is must
+never have the value "default".
+
+An example driver configuration for a device with multiple replication targets
 is show below::

    .....
@ -76,95 +100,376 @@ is show below::
    replication_device = backend_id:vendor-id-1,unique_key:val....
    replication_device = backend_id:vendor-id-2,unique_key:val....

-In this example the result is self.configuration.get('replication_device) with the list::
+In this example the result of calling
+``self.configuration.safe_get('replication_device)`` within the driver is the
+following list::

    [{backend_id: vendor-id-1, unique_key: val1},
-     {backend_id: vendor-id-2, unique_key: val1}]
+     {backend_id: vendor-id-2, unique_key: val2}]

+It is expected that if a driver is configured with multiple replication
+targets, that replicated volumes are actually replicated on **all targets**.

-
-Volume Types / Extra Specs
---------------------------
-In order for a user to specify they'd like a replicated volume, there needs to be
-a corresponding Volume Type created by the Cloud Administrator.
-
-There's a good deal of flexibility by using volume types.  The scheduler can
-send the create request to a backend that provides replication by simply
-providing the replication=enabled key to the extra-specs of the volume type.
-
-For example, if the type was set to simply create the volume on any (or if you only had one)
-backend that supports replication, the extra-specs entry would be::
-
-    {replication: enabled}
-
-Additionally you could provide additional details using scoped keys::
-
-    {replication: enabled, replication_type: async, replication_count: 2,
-    replication_targets: [fake_id1, fake_id2]}
-
-It's up to the driver to parse the volume type info on create and set things up
-as requested.  While the scoping key can be anything, it's strongly recommended that all
-backends utilize the same key (replication) for consistency and to make things easier for
-the Cloud Administrator.
-
-Additionally it's expected that if a backend is configured with 3 replication
-targets, that if a volume of type replication=enabled is issued against that
-backend then it will replicate to ALL THREE of the configured targets.
+Besides specific replication device keys defined in the ``replication_device``,
+a driver may also have additional normal configuration options in the driver
+section related with the replication to allow Cloud Administrators to configure
+things like timeouts.

 Capabilities reporting
 ----------------------
-The following entries are expected to be added to the stats/capabilities update for
-replication configured devices::
+
+There are 2 new replication stats/capability keys that drivers supporting
+relication v2.1 should be reporting: ``replication_enabled`` and
+``replication_targets``::

    stats["replication_enabled"] = True|False
    stats["replication_targets"] = [<backend-id_1, <backend-id_2>...]

-NOTICE, we report configured replication targets via volume stats_update
-This information is added to the get_capabilities admin call.
+If a driver is behaving correctly we can expect the ``replication_targets``
+field to be populated whenever ``replication_enabled`` is set to ``True``, and
+it is expected to either be set to ``[]`` or be missing altogether when
+``replication_enabled`` is set to ``False``.

-Required methods
-----------------
-The number of API methods associated with replication is intentionally very limited,
+The purpose of the ``replication_enabled`` field is to be used by the scheduler
+in volume types for creation and migrations.

-Admin only methods.
+As for the ``replication_targets`` field it is only provided for informational
+purposes so it can be retrieved through the ``get_capabilities`` using the
+admin REST API, but it will not be used for validation at the API layer.  That
+way Cloud Administrators will be able to know available secondary sites where
+they can failover.

-They include::
+Volume Types / Extra Specs
+---------------------------

-    replication_failover(self, context, volumes)
+The way to control the creation of volumes on a cloud with backends that have
+replication enabled is, like with many other features, through the use of
+volume types.

-Additionally we have freeze/thaw methods that will act on the scheduler
-but may or may not require something from the driver::
+We won't go into the details of volume type creation, but suffice to say that
+you will most likely want to use volume types to discriminate between
+replicated and non replicated volumes and be explicit about it so that non
+replicated volumes won't end up in a replicated backend.
+
+Since the driver is reporting the ``replication_enabled`` key, we just need to
+require it for replication volume types adding ``replication_enabled='<is>
+True``` and also specifying it for all non replicated volume types
+``replication_enabled='<is> False'``.
+
+It's up to the driver to parse the volume type info on create and set things up
+as requested.  While the scoping key can be anything, it's strongly recommended
+that all backends utilize the same key (replication) for consistency and to
+make things easier for the Cloud Administrator.
+
+Additional replication parameters can be supplied to the driver using vendor
+specific properties through the volume type's extra-specs so they can be used
+by the driver at volume creation time, or retype.
+
+It is up to the driver to parse the volume type info on create and retype to
+set things up as requested.  A good pattern to get a custom parameter from a
+given volume instance is this::
+
+    extra_specs = getattr(volume.volume_type, 'extra_specs', {})
+    custom_param = extra_specs.get('custom_param', 'default_value')
+
+It may seem convoluted, but we must be careful when retrieving the
+``extra_specs`` from the ``volume_type`` field as it could be ``None``.
+
+Vendors should try to avoid obfuscating their custom properties and expose them
+using the ``_init_vendor_properties`` method so they can be checked by the
+Cloud Administrator using the ``get_capabilities`` REST API.
+
+*NOTE*: For storage devices doing per backend/pool replication the use of
+volume types is also recommended.
+
+Volume creation
+---------------
+
+Drivers are expected to honor the replication parameters set in the volume type
+during creation, retyping, or migration.
+
+When implementing the replication feature there are some driver methods that
+will most likely need modifications -if they are implemented in the driver
+(since some are optional)- to make sure that the backend is replicating volumes
+that need to be replicated and not replicating those that don't need to be:
+
+- ``create_volume``
+- ``create_volume_from_snapshot``
+- ``create_cloned_volume``
+- ``retype``
+- ``clone_image``
+- ``migrate_volume``
+
+In these methods the driver will have to check the volume type to see if the
+volumes need to be replicated, we could use the same pattern described in the
+`Volume Types / Extra Specs`_ section::
+
+    def _is_replicated(self, volume):
+        specs = getattr(volume.volume_type, 'extra_specs', {})
+        return specs.get('replication_enabled') == '<is> True'
+
+But it is **not** the recommended mechanism, and the ``is_replicated`` method
+available in volumes and volume types versioned objects instances should be
+used instead.
+
+Drivers are expected to keep the ``replication_status`` field up to date and in
+sync with reality, usually as specified in the volume type.  To do so in above
+mentioned methods' implementation they should use the update model mechanism
+provided for each one of those methods.  One must be careful since the update
+mechanism may be different from one method to another.
+
+What this means is that most of these methods should be returning a
+``replication_status`` key with the value set to ``enabled`` in the model
+update dictionary if the volume type is enabling replication.  There is no need
+to return the key with the value of ``disabled`` if it is not enabled since
+that is the default value.
+
+In the case of the ``create_volume``, and ``retype`` method there is no need to
+return the ``replication_status`` in the model update since it has already been
+set by the scheduler on creation using the extra spec from the volume type. And
+on ``migrate_volume`` there is no need either since there is no change to the
+``replication_status``.
+
+*NOTE*: For storage devices doing per backend/pool replication it is not
+necessary to check the volume type for the ``replication_enabled`` key since
+all created volumes will be replicated, but they are expected to return the
+``replication_status`` in all those methods, including the ``create_volume``
+method since the driver may receive a volume creation request without the
+replication enabled extra spec and therefore the driver will not have set the
+right ``replication_status`` and the driver needs to correct this.
+
+Besides the ``replication_status`` field that drivers need to update there are
+other fields in the database related to the replication mechanism that the
+drivers can use:
+
+- ``replication_extended_status``
+- ``replication_driver_data``
+
+These fields are string type fields with a maximum size of 255 characters and
+they are available for drivers to use internally as they see fit for their
+normal replication operation.  So they can be assigned in the model update and
+later on used by the driver, for example during the failover.
+
+To avoid using magic strings drivers must use values defined by the
+``ReplicationsSatus`` class in ``cinder/objects/fields.py`` file and
+these are:
+
+- ``ERROR``: When setting the replication failed on creation, retype, or
+  migrate.  This should be accompanied by the volume status ``error``.
+- ``ENABLED``: When the volume is being replicated.
+- ``DISABLED``: When the volume is not being replicated.
+- ``FAILED_OVER``: After a volume has been successfully failed over.
+- ``FAILOVER_ERROR``: When there was an error during the failover of this
+  volume.
+- ``NOT_CAPABLE``: When we failed-over but the volume was not replicated.
+
+The first 3 statuses revolve around the volume creation and the last 3 around
+the failover mechanism.
+
+The only status that should not be used for the volume's ``replication_status``
+is the ``FAILING_OVER`` status.
+
+Whenever we are referring to values of the ``replication_status`` in this
+document we will be referring to the ``ReplicationStatus`` attributes and not a
+literal string, so ``ERROR`` means
+``cinder.objects.field.ReplicationStatus.ERROR`` and not the string "ERROR".
+
+Failover
+--------
+
+This is the mechanism used to instruct the cinder volume service to fail over
+to a secondary/target device.
+
+Keep in mind the use case is that the primary backend has died a horrible death
+and is no longer valid, so any volumes that were on the primary and were not
+being replicated will no longer be available.
+
+The method definition required from the driver to implement the failback
+mechanism is as follows::
+
+    def failover_host(self, context, volumes, secondary_id=None):
+
+There are several things that are expected of this method:
+
+- Promotion of a secondary storage device to primary
+- Generating the model updates
+- Changing internally to access the secondary storage device for all future
+  requests.
+
+If no secondary storage device is provided to the driver via the ``backend_id``
+argument (it is equal to ``None``), then it is up to the driver to choose which
+storage device to failover to.  In this regard it is important that the driver
+takes into consideration that it could be failing over from a secondary (there
+was a prior failover request), so it should discard current target from the
+selection.
+
+If the ``secondary_id`` is not a valid one the driver is expected to raise
+``InvalidReplicationTarget``, for any other non recoverable errors during a
+failover the driver should raise ``UnableToFailOver`` or any child of
+``VolumeDriverException`` class and revert to a state where the previous
+backend is in use.
+
+The failover method in the driver will receive a list of replicated volumes
+that need to be failed over.  Replicated volumes passed to the driver may have
+diverse ``replication_status`` values, but they will always be one of:
+``ENABLED``, ``FAILED_OVER``, or ``FAILOVER_ERROR``.
+
+The driver must return a 2-tuple with the new storage device target id as the
+first element and a list of dictionaries with the model updates required for
+the volumes so that the driver can perform future actions on those volumes now
+that they need to be accessed on a different location.
+
+It's not a requirement for the driver to return model updates for all the
+volumes, or for any for that matter as it can return ``None`` or an empty list
+if there's no update necessary.  But if elements are returned in the model
+update list then it is a requirement that each of the dictionaries contains 2
+key-value pairs, ``volume_id`` and ``updates`` like this::
+
+    [{
+         'volume_id': volumes[0].id,
+         'updates': {
+             'provider_id': new_provider_id1,
+             ...
+         },
+         'volume_id': volumes[1].id,
+         'updates': {
+             'provider_id': new_provider_id2,
+             'replication_status': fields.ReplicationStatus.FAILOVER_ERROR,
+             ...
+         },
+    }]
+
+In these updates there is no need to set the ``replication_status`` to
+``FAILED_OVER`` if the failover was successful, as this will be performed by
+the manager by default, but it won't create additional DB queries if it is
+returned.  It is however necessary to set it to ``FAILOVER_ERROR`` for those
+volumes that had errors during the failover.
+
+Driver's don't have to worry about snapshots or non replicated volumes, since
+the manager will take care of those in the following manner:
+
+- All non replicated volumes will have their current ``status`` field saved in
+  the ``previous_status`` field, the ``status`` field changed to ``error``, and
+  their ``replication_status`` set to ``NOT_CAPABLE``.
+- All snapshots from non replicated volumes will have their statuses changed to
+  ``error``.
+- All replicated volumes that failed on the failover will get their ``status``
+  changed to ``error``, their current ``status`` preserved in
+  ``previous_status``, and their ``replication_status`` set to
+  ``FAILOVER_ERROR`` .
+- All snapshots from volumes that had errors during the failover will have
+  their statuses set to ``error``.
+
+Any model update request from the driver that changes the ``status`` field will
+trigger a change in the ``previous_status`` field to preserve the current
+status.
+
+Once the failover is completed the driver should be pointing to the secondary
+and should be able to create and destroy volumes and snapshots as usual, and it
+is left to the Cloud Administrator's discretion whether resource modifying
+operations are allowed or not.
+
+Failback
+--------
+
+Drivers are not required to support failback, but they are required to raise a
+``InvalidReplicationTarget`` exception if the failback is requested but not
+supported.
+
+The way to request the failback is quite simple, the driver will receive the
+argument ``secondary_id`` with the value of ``default``.  That is why if was
+forbidden to use the ``default`` on the target configuration in the cinder
+configuration file.
+
+Expected driver behavior is the same as the one explained in the `Failover`_
+section:
+
+- Promotion of the original primary to primary
+- Generating the model updates
+- Changing internally to access the original primary storage device for all
+  future requests.
+
+If the failback of any of the volumes fail the driver must return
+``replication_status`` set to ``ERROR`` in the volume updates for those
+volumes.  If they succeed it is not necessary to change the
+``replication_status`` since the default behavior will be to set them to
+``ENABLED``, but it won't create additional DB queries if it is set.
+
+The manager will update resources in a slightly different way than in the
+failover case:
+
+- All non replicated volumes will not have any model modifications.
+- All snapshots from non replicated volumes will not have any model
+  modifications.
+- All replicated volumes that failed on the failback will get their ``status``
+  changed to ``error``, have their current ``status`` preserved in the
+  ``previous_status`` field, and their ``replication_status`` set to
+  ``FAILOVER_ERROR``.
+- All snapshots from volumes that had errors during the failover will have
+  their statuses set to ``error``.
+
+We can avoid using the "default" magic string by using the
+``FAILBACK_SENTINEL`` class attribute from the ``VolumeManager`` class.
+
+Initialization
+--------------
+
+It stands to reason that a failed over Cinder volume service may be restarted,
+so there needs to be a way for a driver to know on start which storage device
+should be used to access the resources.
+
+So, to let drivers know which storage device they should use the manager passes
+drivers the ``active_backend_id`` argument to their ``__init__`` method during
+the initialization phase of the driver.  Default value is ``None`` when the
+default (primary) storage device should be used.
+
+Drivers should store this value if they will need it, as the base driver is not
+storing it, for example to determine the current storage device when a failover
+is requested and we are already in a failover state, as mentioned above.
+
+Freeze / Thaw
+-------------
+
+In many cases, after a failover has been completed we'll want to allow changes
+to the data in the volumes as well as some operations like attach and detach
+while other operations that modify the number of existing resources, like
+delete or create, are not allowed.
+
+And that is where the freezing mechanism comes in; freezing a backend puts the
+control plane of the specific Cinder volume service into a read only state, or
+at least most of it, while allowing the data plane to proceed as usual.
+
+While this will mostly be handled by the Cinder core code, drivers are informed
+when the freezing mechanism is enabled or disabled via these 2 calls::

    freeze_backend(self, context)
    thaw_backend(self, context)

-**replication_failover**
+In most cases the driver may not need to do anything, and then it doesn't need
+to define any of these methods as long as its a child class of the ``BaseVD``
+class that already implements them as noops.

-Used to instruct the backend to fail over to the secondary/target device.
-If not secondary is specified (via backend_id argument) it's up to the driver
-to choose which device to failover to.  In the case of only a single
-replication target this argument should be ignored.
+Raising a `VolumeDriverException` exception in any of these methods will result
+in a 500 status code response being returned to the caller and the manager will
+not log the exception, so it's up to the driver to log the error if it is
+appropriate.

-Note that ideally drivers will know how to update the volume reference properly so that Cinder is now
-pointing to the secondary.  Also, while it's not required, at this time; ideally the command would
-act as a toggle, allowing to switch back and forth between primary and secondary and back to primary.
+If the driver wants to give a more meaningful error response, then it can raise
+other exceptions that have different status codes.

-Keep in mind the use case is that the backend has died a horrible death and is
-no longer valid.  Any volumes that were on the primary and NOT of replication
-type should now be unavailable.
+When creating the `freeze_backend` and `thaw_backend` driver methods we must
+remember that this is a Cloud Administrator operation, so we can return errors
+that reveal internals of the cloud, for example the type of storage device, and
+we must use the appropriate internationalization translation methods when
+raising exceptions; for `VolumeDriverException` no translation is necessary
+since the manager doesn't log it or return to the user in any way, but any
+other exception should use the ``_()`` translation method since it will be
+returned to the REST API caller.

-NOTE:  We do not expect things like create requests to go to the driver and
-magically create volumes on the replication target.  The concept is that the
-backend is lost, and we're just providing a DR mechanism to preserve user data
-for volumes that were specified as such via type settings.
+For example, if a storage device doesn't support the thaw operation when failed
+over, then it should raise an `Invalid` exception::

-**freeze_backend**
-
-Puts a backend host/service into a R/O state for the control plane.  For
-example if a failover is issued, it is likely desirable that while data access
-to existing volumes is maintained, it likely would not be wise to continue
-doing things like creates, deletes, extends etc.
-
-**thaw_backend**
-
-Clear frozen control plane on a backend.
+    def thaw_backend(self, context):
+        if self.failed_over:
+            msg = _('Thaw is not supported by driver XYZ.')
+            raise exception.Invalid(msg)