langyal’s blog

snippets related to my work and interest

Ceph-to-Ceph migration for Openstack leveraging RBD mirroring

Use case

We have an Openstack cloud using Ceph storage cluster deployed by ceph-ansible; we want to deploy another Ceph storage cluster by juju and we want to migrate the workload to the latter cluster without cloud redeployment.

Overview

DR scenarios

  • (A) Cloud-A/Ceph-A -- Cloud-B/Ceph-B
    • we have two independent cloud control planes and storage clusters
    • this is here only for reference: let’s say the cloud native approach
    • application level synchronization is the only way to go
    • instances are not synchronized
    • volumes are not synchronized
  • (B) Cloud-A/Ceph-A -- Cloud-B/Ceph-B,A'
    • we have two independent cloud control planes and a (partially) synchronized storage cluster
    • ceph rbd mirroring must be set up at the pool level - that’s why it’s partial
    • instances are not synchronized
    • volumes are synchronized
  • (C) Cloud-A/Ceph-A -- Cloud-A/Ceph-A'
    • we have one single cloud control plane and two independent storage clusters
    • the control plane is connected to one of the two Ceph clusters at a time
    • we mirror each pools/images of Ceph-A to Ceph-A'
    • since Ceph is robust enough on it’s own, this setup does not make any sense except one use case: storage migration

Openstack and Ceph

Openstack - by default - needs three Ceph pools:

  • nova: to store ephemeral disks
  • glance: to store images
  • cinder-ceph: to store volumes

Nova creates the ephemeral disks based on the setting called libvirt_image_type. If it’s value is rbd, images will be created on Ceph (this is a very simplified explanation). In “juju language”, we say juju config nova-compute libvirt-image-backend=rbd.

It is recommended to use raw image format because this way we can leverage Ceph’s COW (Copy-on-Write) capability:

  • we store the image in glance as raw
  • a protected (read-only) snapshot is created from the image
  • new instances will be the clones of this snapshot

Major steps of the Proof of Concept

  • we create the “source” Ceph cluster and we call it “ceph”
    • we can use any deployment method at this step (ceph-ansible, ceph-deployer, etc.)
    • for me, using juju was the quickest way to do it - but really, it does not matter
  • we configure our Openstack to use this cluster
    • since we have a juju-based deployment, we will use the ceph-proxy charm to integrate this “external” cluster
    • we will create a few instances and volumes
  • we create the “destination” Ceph cluster and we also call it “ceph”
    • we use juju to create it, but in this step we don’t connect (in juju terms: relate) this cluster to the cloud
    • we could use a a different juju model to create this cluster and leverage cross-model relations - but we don’t want to complicate the situation now
  • we prevent any changes to the cloud
    • it means that no new instances/volumes are allowed to be created
    • however, existing instances/volumes are still working
    • we have to do it now because we will set up mirroring in the next step and new instances/volumes would be outside the scope if mirroring
  • we configure the one-way mirroring between the source and destination cluster
    • we could configure a two-way mirroring as well, but this is not what we want now
    • we keep the cloud up & running so there is now downtime up to this point
  • showtime: the switch-over
    • we shut down the instances
    • we detach the volumes from the instances
    • we deactivate images
    • we set the destination cluster as primary
    • we stop mirroring between the two clusters
    • we disconnect the cloud from the source cluster and connect it to the destination cluster
    • we restore the cloud
    • we start the APIs

Create the source Ceph cluster

Well, this will be in juju style. As I said, you can choose any other method to create this cluster.

  • juju add-model ceph
  • juju model-config -m ceph default-space=all-space
  • juju deploy ./ceph-standalone.yml

You can find the ceph-standalone.yml bundle here

At the end, we should see something like this:

  • juju status -m ceph
Model  Controller  Cloud/Region     Version  SLA          Timestamp
ceph   oc-juju     onibaba/default  2.8.7    unsupported  12:57:14Z

App       Version  Status  Scale  Charm     Store       Rev  OS      Notes
ceph-mon  13.2.9   active      3  ceph-mon  jujucharms  451  ubuntu  
ceph-osd  13.2.9   active      3  ceph-osd  jujucharms  476  ubuntu  

Unit         Workload  Agent  Machine  Public address  Ports  Message
ceph-mon/0   active    idle   0        10.33.11.251           Unit is ready and clustered
ceph-mon/1   active    idle   1        10.33.31.251           Unit is ready and clustered
ceph-mon/2*  active    idle   2        10.33.21.251           Unit is ready and clustered
ceph-osd/0   active    idle   3        10.33.11.252           Unit is ready (1 OSD)
ceph-osd/1*  active    idle   4        10.33.21.252           Unit is ready (1 OSD)
ceph-osd/2   active    idle   5        10.33.31.252           Unit is ready (1 OSD)

Machine  State    DNS           Inst id  Series  AZ       Message
0        started  10.33.11.251  oc-2001  bionic  default  Deployed
1        started  10.33.31.251  oc-2201  bionic  default  Deployed
2        started  10.33.21.251  oc-2101  bionic  default  Deployed
3        started  10.33.11.252  oc-1702  bionic  default  Deployed
4        started  10.33.21.252  oc-1802  bionic  default  Deployed
5        started  10.33.31.252  oc-1902  bionic  default  Deployed

Let’s gather the data we need in the next step:

  • juju ssh -m ceph ceph-mon/0 "sudo egrep '^mon host|^fsid' /etc/ceph/ceph.conf"
mon host = 10.33.11.251 10.33.21.251 10.33.31.251
fsid = 380bbbaa-4911-11eb-92b0-525400fcc578
  • juju ssh -m ceph ceph-mon/0 "sudo grep key /etc/ceph/ceph.client.admin.keyring"
	key = AQDE3ulf6a5oJBAAt+5gBjm+u+UgHhg+bS7RDw==

Configure Openstack to use this cluster

We create a different juju model called openstack while keeping the source Ceph cluster in the model called ceph. We have to provide the data we gathered in the previous step inside the bundle file. We reference the source Ceph cluster as if it was an external cluster (e.g. deployed by ceph-ansible); in other words, we will not relate that cluster by juju just because we created it by juju.

  • juju add-model openstack
  • juju model-config -m openstack default-space=all-space
  • juju deploy ./ceph-proxy-openstack.yml

You can find the ceph-proxy-openstack.yml bundle here

Just to highlight the interesting parts:

  • ceph-proxy
  ceph-proxy:
    charm: cs:ceph-proxy
    num_units: 1
    options:
      fsid: 380bbbaa-4911-11eb-92b0-525400fcc578
      admin-key: AQDE3ulf6a5oJBAAt+5gBjm+u+UgHhg+bS7RDw==
      monitor-hosts: '10.33.11.251:6789 10.33.21.251:6789 10.33.31.251:6789'
    bindings:
      "": *public-space
    to:
      - 'lxd:openstack-management/2'
  • nova
  nova-compute:
    charm: cs:~openstack-charmers-next/bionic/nova-compute
    num_units: 3
    constraints: "tags=compute-shared-all"
    options:
      virt-type: qemu
      libvirt-image-backend: rbd
      config-flags: default_ephemeral_format=ext4
      enable-live-migration: true
      enable-resize: true
      migration-auth-type: ssh
      openstack-origin: *openstack-origin
    bindings:
      "": *public-space

Once it’s finished, we should have a “full green” juju status output; I just highlight how ceph-proxy relates to nova, glance and cinder:

  • juju status -m openstack --relations | grep ceph-client
ceph-proxy:client                        cinder-ceph:ceph                               ceph-client         regular      
ceph-proxy:client                        glance:ceph                                    ceph-client         regular      
ceph-proxy:client                        nova-compute:ceph                              ceph-client         regular

Let’s create some instances and volumes now, using this script - we have some assumptions, but those are outside of the scope of this post.

  • run the script and verify the results
. ./openstack-poc.sh
. openrc
openstack server list
openstack volume list
  • “stamp” the instances/volumes
vm0:> touch /root/vm0
vm0:> mkfs.ext4 /dev/vdb
vm0:> mount /dev/vdb /mnt
vm0:> touch /mnt/vo0

vm1:> touch /root/vm1
vm1:> mkfs.ext4 /dev/vdb
vm1:> mount /dev/vdb /mnt
vm1:> touch /mnt/vo1

vm2:> touch /root/vm2
vm2:> mkfs.ext4 /dev/vdb
vm2:> mount /dev/vdb /mnt
vm2:> touch /mnt/vo2
  • check ceph from a compute node
hypervisor:> rbd -p nova -n client.nova-compute ls
b2cc115f-d174-419c-8933-ad6c296775bb_disk
c02c37c6-eec0-4a5e-8426-073826afe45b_disk
ccf80d02-fc0a-4cfe-96b0-3276028d6a7d_disk

hypervisor:> rbd -p nova -n client.nova-compute info b2cc115f-d174-419c-8933-ad6c296775bb_disk
rbd image 'b2cc115f-d174-419c-8933-ad6c296775bb_disk':
	size 8 GiB in 1024 objects
	order 23 (8 MiB objects)
	id: 1218643c9869
	block_name_prefix: rbd_data.1218643c9869
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Tue Dec 29 14:02:42 2020
	parent: glance/1d854549-3069-4ae8-a14a-612324a29f5c@snap
	overlap: 112 MiB

hypervisor:> qemu-img info -f rbd rbd:nova/b2cc115f-d174-419c-8933-ad6c296775bb_disk:id=nova-compute:conf=/etc/ceph/ceph.conf
image: json:{"pool": "nova", "image": "b2cc115f-d174-419c-8933-ad6c296775bb_disk", "conf": "/etc/ceph/ceph.conf", "driver": "rbd", "user": "nova-compute"}
file format: rbd
virtual size: 8.0G (8589934592 bytes)
disk size: unavailable
cluster_size: 8388608
  • check how the KVM instances are actually configured - see the disk definition for the ephemeral disk and the volume which is attached to the instance
    • the ephemeral disk is referenced as nova/523d0de7-016f-4ef0-ac89-386dd1efa861_disk
    • the volume disk is referenced as cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <auth username='nova-compute'>
        <secret type='ceph' uuid='514c9fca-8cbe-11e2-9c52-3bc8c7819472'/>
      </auth>
      <source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
        <host name='10.33.11.251' port='6789'/>
        <host name='10.33.21.251' port='6789'/>
        <host name='10.33.31.251' port='6789'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none' discard='unmap'/>
      <auth username='cinder-ceph'>
        <secret type='ceph' uuid='70f89300-4656-4a38-b5a9-39218a12c830'/>
      </auth>
      <source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
        <host name='10.33.11.251' port='6789'/>
        <host name='10.33.21.251' port='6789'/>
        <host name='10.33.31.251' port='6789'/>
      </source>
      <target dev='vdb' bus='virtio'/>
      <serial>b4e4e70e-42ba-4479-a15f-cd2db74a755a</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
  • check how the disks look like on the Ceph side
    • see the “parent” relation to the glance image snapshot
    • juju ssh -m ceph ceph-mon/0 "sudo rbd -p nova info 523d0de7-016f-4ef0-ac89-386dd1efa861_disk"
rbd image '523d0de7-016f-4ef0-ac89-386dd1efa861_disk':
	size 8 GiB in 1024 objects
	order 23 (8 MiB objects)
	id: 15e26b8b4567
	block_name_prefix: rbd_data.15e26b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Mon Dec 28 14:48:18 2020
	parent: glance/b0a23295-198b-48ac-8b99-21d2b628f5e4@snap
	overlap: 112 MiB
  • now check glance
  • juju ssh -m ceph ceph-mon/0 "sudo rbd -p glance snap ls b0a23295-198b-48ac-8b99-21d2b628f5e4"
SNAPID NAME    SIZE TIMESTAMP                
     4 snap 112 MiB Mon Dec 28 14:47:14 2020 
  • finally, check the volume - it has no parent relation of course
  • juju ssh -m ceph ceph-mon/0 "sudo rbd -p cinder-ceph info volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a"
rbd image 'volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a':
	size 4 GiB in 1024 objects
	order 22 (4 MiB objects)
	id: 168f6b8b4567
	block_name_prefix: rbd_data.168f6b8b4567
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
	op_features: 
	flags: 
	create_timestamp: Mon Dec 28 14:49:03 2020

Create the destination Ceph cluster

This is where the tricky part begins. We deploy a Ceph cluster as if it would be part of the Openstack bundle, but we do not relate it to nova, glance and cinder.

In other words, we make sure we don’t have these three relations in the bundle:

- - ceph-mon:client
  - nova-compute:ceph
- - ceph-mon:client
  - cinder-ceph:ceph
- - ceph-mon:client
  - glance:ceph

In fact, the bundle we are going to use is very similar to the ceph-standalone.yml bundle, with one difference: we deploy the monitors to existing machines as lxd containers so we must reference those machines in this bundle.

Now, just make sure we are in the right model and deploy this new cluster:

juju switch openstack
juju deploy ./ceph-openstack.yml
Resolving charm: cs:~openstack-charmers-next/bionic/ceph-mon
Resolving charm: cs:~openstack-charmers-next/bionic/ceph-osd
Resolving charm: cs:ubuntu
Executing changes:
- upload charm cs:~openstack-charmers-next/ceph-mon-451 for series bionic
- deploy application ceph-mon on bionic using cs:~openstack-charmers-next/ceph-mon-451
- upload charm cs:~openstack-charmers-next/ceph-osd-476 for series bionic
- deploy application ceph-osd on bionic using cs:~openstack-charmers-next/ceph-osd-476
- add relation ceph-osd:mon - ceph-mon:osd
- add unit ceph-mon/0 to 6/lxd/0 to satisfy [lxd:openstack-management/0]
- add unit ceph-mon/1 to 7/lxd/0 to satisfy [lxd:openstack-management/1]
- add unit ceph-mon/2 to 8/lxd/0 to satisfy [lxd:openstack-management/2]
- add unit ceph-osd/0 to new machine 9
- add unit ceph-osd/1 to new machine 10
- add unit ceph-osd/2 to new machine 11
Deploy of bundle completed.

You can find the ceph-openstack.yml bundle here

At the end of this run we will have:

  • a (source) Ceph cluster managed independently (in our case in a dedicated juju model, but it could be ceph-ansible as well)
    • check: juju status -m ceph ceph-mon ceph-osd
  • a running Openstack cloud connected to the source Ceph cluster via the ceph-proxy charm
    • check: juju status -m openstack --relations | grep ceph-client
  • a (destination) Ceph cluster deployed into the same juju model where our Openstack is deployed, running at this moment disconnected from the cloud
    • check: juju status -m openstack ceph-mon ceph-osd

Prevent any changes to the cloud

This step is “left as an exercise for the reader”. There are many ways to achieve this - the most trivial is to shut down the cloud APIs.

Configure the one-way mirroring

Let me reference here the official rbd mirroring documentation:

RBD images can be asynchronously mirrored between two Ceph clusters. This capability is available in two modes:

  • Journal-based: This mode uses the RBD journaling image feature to ensure point-in-time, crash-consistent replication between clusters.
  • Snapshot-based: This mode uses periodically scheduled or manually created RBD image mirror-snapshots to replicate crash-consistent RBD images between clusters.

Mirroring is configured on a per-pool basis within peer clusters and can be configured on a specific subset of images within the pool.

Depending on the desired needs for replication, RBD mirroring can be configured for either one- or two-way replication:

  • One-way Replication: When data is only mirrored from a primary cluster to a secondary cluster, the rbd-mirror daemon runs only on the secondary cluster.
  • Two-way Replication: When data is mirrored from primary images on one cluster to non-primary images on another cluster (and vice-versa), the rbd-mirror daemon runs on both clusters.

We will implement the journal-based one-way replication for each pool Openstack uses: nova, glance and cinder-ceph.

The main steps are the following:

  • on the source cluster
    • enable mirroring on the pools
    • enable journaling on the images of the pools
    • create a user/credential for the mirroring
    • see how pools were created
  • on the destination cluster
    • create the pools on the destination cluster
    • enable mirroring on the pools
    • get the credentials from the source cluster
    • create a user/credential for the mirroring
    • install and configure the rbd-mirror daemon
    • configure mirroring per pool

Hereafter, we will reference the clusters by the prompt:

  • the source ceph cluster: ceph-src:>
    • to get there: juju ssh -m ceph ceph-mon/0 "sudo -i"
  • the destination ceph cluster: ceph-dst:>
    • to get there: juju ssh -m openstack ceph-mon/0 "sudo -i"

on the source cluster

enable mirroring on the pools

from the man page of rbd: mirror pool enable [pool-name] mode

Enable RBD mirroring by default within a pool. The mirroring mode can either be pool or image. If configured in pool mode, all images in the pool with the journaling feature enabled are mirrored. If configured in image mode, mirroring needs to be explicitly enabled (by mirror image enable command) on each image.

we choose pool mode:

ceph-src:> for i in glance nova cinder-ceph; do rbd mirror pool info $i; done
Mode: disabled
Mode: disabled
Mode: disabled

ceph-src:> for i in glance nova cinder-ceph; do rbd mirror pool enable $i pool; done

ceph-src:> for i in glance nova cinder-ceph; do rbd mirror pool info $i; done
Mode: pool
Peers: none
Mode: pool
Peers: none
Mode: pool
Peers: none

enable journaling on the images of the pools

ceph-src:> for i in glance nova cinder-ceph; do for j in `rbd -p $i ls`; do rbd feature enable $i/$j journaling; done; done

ceph-src:> for i in glance nova cinder-ceph; do echo pool: $i; for j in `rbd -p $i ls`; do echo image: $j; rbd -p $i info $j|grep mirroring; done; done
pool: glance
image: b0a23295-198b-48ac-8b99-21d2b628f5e4
	mirroring state: enabled
	mirroring global id: 8b228ab9-3da5-4284-97c9-6b2a53088bc1
	mirroring primary: true
pool: nova
image: 523d0de7-016f-4ef0-ac89-386dd1efa861_disk
	mirroring state: enabled
	mirroring global id: 0c0fc2b8-047c-428b-8b22-0386d96aa22e
	mirroring primary: true
image: 9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0_disk
	mirroring state: enabled
	mirroring global id: 5fefe994-4892-4cdc-a823-f5ea857d1996
	mirroring primary: true
image: aff2d014-223b-441b-8110-fa34a39027b5_disk
	mirroring state: enabled
	mirroring global id: c8dd4aa8-edca-4308-8f14-edc1e490a9a9
	mirroring primary: true
pool: cinder-ceph
image: volume-b21adc28-60ca-4fee-a546-1c7f14765e8a
	mirroring state: enabled
	mirroring global id: 71913e49-240e-4f6d-9e27-8c90cec10694
	mirroring primary: true
image: volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a
	mirroring state: enabled
	mirroring global id: 95a86c89-d497-41d4-8aa0-88ef5d1f91c4
	mirroring primary: true
image: volume-d9704728-6ab6-4852-a7cf-db92b1ed0754
	mirroring state: enabled
	mirroring global id: 404fd5c9-b16f-4735-a886-e43ae7587f7b
	mirroring primary: true

create a user/credential for the mirroring

ceph-src:> ceph auth get-or-create client.rbd-mirror-src mon 'profile rbd' osd 'profile rbd' -o /etc/ceph/ceph-src.client.rbd-mirror-src.keyring

ceph-src:> ceph auth get client.rbd-mirror-src
exported keyring for client.rbd-mirror-src
[client.rbd-mirror-src]
	key = AQAXBetfm/yZFxAA3YmDIgNXEjj1GNuhNXxx6A==
	caps mon = "profile rbd"
	caps osd = "profile rbd"

see how pools were created

ceph-src:> ceph osd pool ls detail 
pool 1 'nova' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 24 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
	removed_snaps [1~3]
pool 2 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 22 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
	removed_snaps [1~3]
pool 3 'cinder-ceph' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 27 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
	removed_snaps [1~3]

on the destination cluster

create the pools on the destination cluster

ideally, this is done by juju during the relation phase - but remember: this is what we skipped intentionally; so we have to do it manually:

ceph-dst:> ceph osd pool create nova 32 32 replicated
ceph-dst:> ceph osd pool create glance 4 4 replicated
ceph-dst:> ceph osd pool create cinder-ceph 32 32 replicated
ceph-dst:> for i in nova glance cinder-ceph; do ceph osd pool application enable $i rbd; done

ceph-dst:> ceph osd pool ls detail
pool 5 'nova' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 38 flags hashpspool stripe_width 0 application rbd
pool 6 'glance' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 4 pgp_num 4 last_change 39 flags hashpspool stripe_width 0 application rbd
pool 7 'cinder-ceph' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 40 flags hashpspool stripe_width 0 application rbd

enable mirroring on the pools

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool info $i; done
Mode: disabled
Mode: disabled
Mode: disabled

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool enable $i pool; done

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool info $i; done
Mode: pool
Peers: none
Mode: pool
Peers: none
Mode: pool
Peers: none

get the credentials from the source cluster

we just need a minimal ceph.conf snippet and the credential:

ceph-dst:> cat /etc/ceph/ceph-src.conf 
[global]
mon host = 10.33.11.251 10.33.21.251 10.33.31.251

ceph-dst:> cat /etc/ceph/ceph-src.client.rbd-mirror-src.keyring 
[client.rbd-mirror-src]
	key = AQAXBetfm/yZFxAA3YmDIgNXEjj1GNuhNXxx6A==

check whether we can reach the source cluster properly:

ceph-dst:> ceph --cluster ceph-src -n client.rbd-mirror-src osd lspools
1 nova
2 glance
3 cinder-ceph

ceph-dst:> rbd -p nova --cluster ceph-src -n client.rbd-mirror-src ls
523d0de7-016f-4ef0-ac89-386dd1efa861_disk
9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0_disk
aff2d014-223b-441b-8110-fa34a39027b5_disk

ceph-dst:> rbd -p glance --cluster ceph-src -n client.rbd-mirror-src ls
b0a23295-198b-48ac-8b99-21d2b628f5e4

ceph-dst:> rbd -p cinder-ceph --cluster ceph-src -n client.rbd-mirror-src ls
volume-b21adc28-60ca-4fee-a546-1c7f14765e8a
volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a
volume-d9704728-6ab6-4852-a7cf-db92b1ed0754

create a user/credential for the mirroring

this will be used by the rbd-mirror daemon later:

ceph-dst:> ceph auth get-or-create client.rbd-mirror-dst mon 'profile rbd' osd 'profile rbd' -o /etc/ceph/ceph.client.rbd-mirror-dst.keyring

ceph-dst:> ceph auth get client.rbd-mirror-dst 
exported keyring for client.rbd-mirror-dst
[client.rbd-mirror-dst]
	key = AQCCD+tfD5i0EhAAryk40zCdzJ9Lb4IaAVzRcQ==
	caps mon = "profile rbd"
	caps osd = "profile rbd"

install and configure the rbd-mirror daemon

The rbd-mirror daemon is responsible for pulling image updates from the remote peer cluster and applying them to the image within the local cluster.

ceph-dst:> apt install rbd-mirror
ceph-dst:> systemctl enable ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl start ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl status ceph-rbd-mirror@rbd-mirror-dst.service

configure mirroring per pool

so far, we just prepared the mirroring - now it’s time to configure it really:

  • pool: glance
ceph-dst:> rbd mirror pool peer add glance client.rbd-mirror-src@ceph-src
98db5fc6-fc72-4c13-a3d0-c41616a23983

ceph-dst:> rbd mirror pool info glance
Mode: pool
Peers: 
  UUID                                 NAME     CLIENT                
  98db5fc6-fc72-4c13-a3d0-c41616a23983 ceph-src client.rbd-mirror-src 

ceph-dst:> rbd mirror pool status glance --verbose
health: WARNING
images: 1 total
    1 starting_replay

b0a23295-198b-48ac-8b99-21d2b628f5e4:
  global_id:   8b228ab9-3da5-4284-97c9-6b2a53088bc1
  state:       up+starting_replay
  description: starting replay
  last_update: 2020-12-29 11:20:20

ceph-dst:> rbd mirror pool status glance --verbose
health: OK
images: 1 total
    1 replaying

b0a23295-198b-48ac-8b99-21d2b628f5e4:
  global_id:   8b228ab9-3da5-4284-97c9-6b2a53088bc1
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
  last_update: 2020-12-29 11:21:01
  • pool: nova
ceph-dst:> rbd mirror pool peer add nova client.rbd-mirror-src@ceph-src
15d732fd-f183-4ba8-850e-5303da9056a2

ceph-dst:> rbd mirror pool info nova
Mode: pool
Peers: 
  UUID                                 NAME     CLIENT                
  15d732fd-f183-4ba8-850e-5303da9056a2 ceph-src client.rbd-mirror-src 

ceph-dst:> rbd mirror pool status nova --verbose
health: WARNING
images: 3 total
    3 syncing

523d0de7-016f-4ef0-ac89-386dd1efa861_disk:
  global_id:   0c0fc2b8-047c-428b-8b22-0386d96aa22e
  state:       up+syncing
  description: bootstrapping, IMAGE_SYNC/COPY_IMAGE
  last_update: 2020-12-29 11:23:01

9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0_disk:
  global_id:   5fefe994-4892-4cdc-a823-f5ea857d1996
  state:       up+syncing
  description: bootstrapping, IMAGE_SYNC/COPY_IMAGE
  last_update: 2020-12-29 11:23:01

aff2d014-223b-441b-8110-fa34a39027b5_disk:
  global_id:   c8dd4aa8-edca-4308-8f14-edc1e490a9a9
  state:       up+syncing
  description: bootstrapping, IMAGE_SYNC/COPY_IMAGE
  last_update: 2020-12-29 11:23:01

ceph-dst:> rbd mirror pool status nova --verbose
health: WARNING
images: 3 total
    3 syncing

523d0de7-016f-4ef0-ac89-386dd1efa861_disk:
  global_id:   0c0fc2b8-047c-428b-8b22-0386d96aa22e
  state:       up+syncing
  description: bootstrapping, IMAGE_SYNC/COPY_IMAGE 60%
  last_update: 2020-12-29 11:23:36

9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0_disk:
  global_id:   5fefe994-4892-4cdc-a823-f5ea857d1996
  state:       up+syncing
  description: bootstrapping, IMAGE_SYNC/COPY_IMAGE 29%
  last_update: 2020-12-29 11:23:25

aff2d014-223b-441b-8110-fa34a39027b5_disk:
  global_id:   c8dd4aa8-edca-4308-8f14-edc1e490a9a9
  state:       up+syncing
  description: bootstrapping, IMAGE_SYNC/COPY_IMAGE 65%
  last_update: 2020-12-29 11:23:37

ceph-dst:> rbd mirror pool status nova --verbose
health: OK
images: 3 total
    3 replaying

523d0de7-016f-4ef0-ac89-386dd1efa861_disk:
  global_id:   0c0fc2b8-047c-428b-8b22-0386d96aa22e
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
  last_update: 2020-12-29 11:24:21

9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0_disk:
  global_id:   5fefe994-4892-4cdc-a823-f5ea857d1996
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
  last_update: 2020-12-29 11:24:21

aff2d014-223b-441b-8110-fa34a39027b5_disk:
  global_id:   c8dd4aa8-edca-4308-8f14-edc1e490a9a9
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
  last_update: 2020-12-29 11:24:21
  • pool: cinder-ceph
ceph-dst:> rbd mirror pool peer add cinder-ceph client.rbd-mirror-src@ceph-src
17bf9724-e481-4b7e-bd8a-78982e27ae8b

ceph-dst:> rbd mirror pool info cinder-ceph
Mode: pool
Peers: 
  UUID                                 NAME     CLIENT                
  17bf9724-e481-4b7e-bd8a-78982e27ae8b ceph-src client.rbd-mirror-src 

ceph-dst:> rbd mirror pool status cinder-ceph --verbose
health: WARNING
images: 3 total
    3 starting_replay

volume-b21adc28-60ca-4fee-a546-1c7f14765e8a:
  global_id:   71913e49-240e-4f6d-9e27-8c90cec10694
  state:       up+starting_replay
  description: starting replay
  last_update: 2020-12-29 11:25:51

volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a:
  global_id:   95a86c89-d497-41d4-8aa0-88ef5d1f91c4
  state:       up+starting_replay
  description: starting replay
  last_update: 2020-12-29 11:25:51

volume-d9704728-6ab6-4852-a7cf-db92b1ed0754:
  global_id:   404fd5c9-b16f-4735-a886-e43ae7587f7b
  state:       up+starting_replay
  description: starting replay
  last_update: 2020-12-29 11:25:51

ceph-dst:> rbd mirror pool status cinder-ceph --verbose
health: OK
images: 3 total
    3 replaying

volume-b21adc28-60ca-4fee-a546-1c7f14765e8a:
  global_id:   71913e49-240e-4f6d-9e27-8c90cec10694
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
  last_update: 2020-12-29 11:28:17

volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a:
  global_id:   95a86c89-d497-41d4-8aa0-88ef5d1f91c4
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
  last_update: 2020-12-29 11:28:17

volume-d9704728-6ab6-4852-a7cf-db92b1ed0754:
  global_id:   404fd5c9-b16f-4735-a886-e43ae7587f7b
  state:       up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=1, entry_tid=3], mirror_position=[object_number=3, tag_tid=1, entry_tid=3], entries_behind_master=0
  last_update: 2020-12-29 11:28:16

Now, we have

  • active mirroring between the source and the destination cluster
  • Openstack APIs down to prevent changes to the cloud
  • Openstack cloud instances/volumes are working; the cloud is still connected to the source cluster

Showtime: the switch-over

shut down the instances

this step is necessary because we have to force the recreation of the libvirt configuration since we will use a different set of ceph monitors

note: we just stop the instances - no need to delete / create them; basically, this is why we worked so hard so far!

openstack server stop vm0
openstack server stop vm1
openstack server stop vm2

openstack server list
+--------------------------------------+------+---------+-----------------------------------+--------+----------+
| ID                                   | Name | Status  | Networks                          | Image  | Flavor   |
+--------------------------------------+------+---------+-----------------------------------+--------+----------+
| 9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0 | vm2  | SHUTOFF | net-external=10.44.2.12           | cirros | m1.small |
| aff2d014-223b-441b-8110-fa34a39027b5 | vm1  | SHUTOFF | net-internal=10.55.3.149          | cirros | m1.small |
| 523d0de7-016f-4ef0-ac89-386dd1efa861 | vm0  | SHUTOFF | net-internal=10.55.1.8, 10.44.1.8 | cirros | m1.small |
+--------------------------------------+------+---------+-----------------------------------+--------+----------+

detach the volumes from the instances

this step is necessary because we have to force the recreation of the libvirt configuration since we will use a different set of ceph monitors

note: we just detach the volumes - no need to delete / create them; basically, this is why we worked so hard so far!

openstack server remove volume vm0 vo0
openstack server remove volume vm1 vo1
openstack server remove volume vm2 vo2

openstack volume list
+--------------------------------------+------+-----------+------+-------------+
| ID                                   | Name | Status    | Size | Attached to |
+--------------------------------------+------+-----------+------+-------------+
| b21adc28-60ca-4fee-a546-1c7f14765e8a | vo2  | available |    4 |             |
| d9704728-6ab6-4852-a7cf-db92b1ed0754 | vo1  | available |    4 |             |
| b4e4e70e-42ba-4479-a15f-cd2db74a755a | vo0  | available |    4 |             |
+--------------------------------------+------+-----------+------+-------------+

deactivate images

openstack image set --deactivate cirros

openstack image list 
+--------------------------------------+--------+-------------+
| ID                                   | Name   | Status      |
+--------------------------------------+--------+-------------+
| b0a23295-198b-48ac-8b99-21d2b628f5e4 | cirros | deactivated |
+--------------------------------------+--------+-------------+

demote-on-src, promote-on-dst: set the destination cluster as primary

  • we make sure no active mirroring is happening
ceph-dst:> for i in glance nova cinder-ceph; do echo pool $i:; rbd mirror pool status $i; done
pool glance:
health: OK
images: 1 total
    1 replaying
pool nova:
health: OK
images: 3 total
    3 replaying
pool cinder-ceph:
health: OK
images: 3 total
    3 replaying
  • we demote/promote the pool and implicitly each image in that pool in one step
  • we execute all the commands on the destination cluster since we have access to both clusters from there
ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote glance
Demoted 1 mirrored images

ceph-dst:> rbd mirror pool promote glance
Promoted 1 mirrored images

ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote nova
Demoted 3 mirrored images

ceph-dst:> rbd mirror pool promote nova
Promoted 3 mirrored images

ceph-dst:> rbd --cluster ceph-src -n client.rbd-mirror-src mirror pool demote cinder-ceph
Demoted 3 mirrored images

ceph-dst:> rbd mirror pool promote cinder-ceph
Promoted 3 mirrored images

let’s check the status:

ceph-dst:> rbd mirror pool status glance --verbose
health: OK
images: 1 total
    1 stopped

b0a23295-198b-48ac-8b99-21d2b628f5e4:
  global_id:   8b228ab9-3da5-4284-97c9-6b2a53088bc1
  state:       up+stopped
  description: local image is primary
  last_update: 2020-12-29 12:04:48

ceph-dst:> rbd mirror pool status nova --verbose
health: OK
images: 3 total
    3 stopped

523d0de7-016f-4ef0-ac89-386dd1efa861_disk:
  global_id:   0c0fc2b8-047c-428b-8b22-0386d96aa22e
  state:       up+stopped
  description: local image is primary
  last_update: 2020-12-29 12:04:48

9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0_disk:
  global_id:   5fefe994-4892-4cdc-a823-f5ea857d1996
  state:       up+stopped
  description: local image is primary
  last_update: 2020-12-29 12:04:48

aff2d014-223b-441b-8110-fa34a39027b5_disk:
  global_id:   c8dd4aa8-edca-4308-8f14-edc1e490a9a9
  state:       up+stopped
  description: local image is primary
  last_update: 2020-12-29 12:04:48

ceph-dst:> rbd mirror pool status nova --verbose
rbd mirror pool status nova --verbose
health: OK
images: 3 total
    3 stopped

523d0de7-016f-4ef0-ac89-386dd1efa861_disk:
  global_id:   0c0fc2b8-047c-428b-8b22-0386d96aa22e
  state:       up+stopped
  description: local image is primary
  last_update: 2020-12-29 12:05:19

9af1a2bc-c91d-46b0-9e76-6011e6e7cfb0_disk:
  global_id:   5fefe994-4892-4cdc-a823-f5ea857d1996
  state:       up+stopped
  description: local image is primary
  last_update: 2020-12-29 12:05:19

aff2d014-223b-441b-8110-fa34a39027b5_disk:
  global_id:   c8dd4aa8-edca-4308-8f14-edc1e490a9a9
  state:       up+stopped
  description: local image is primary
  last_update: 2020-12-29 12:05:18

stop mirroring between the two clusters

remove peers

  • pool: glance
ceph-dst:> rbd mirror pool info glance
Mode: pool
Peers: 
  UUID                                 NAME     CLIENT                
  98db5fc6-fc72-4c13-a3d0-c41616a23983 ceph-src client.rbd-mirror-src 

ceph-dst:> rbd mirror pool peer remove glance 98db5fc6-fc72-4c13-a3d0-c41616a23983

ceph-dst:> rbd mirror pool info glance
Mode: pool
Peers: none
  • pool: nova
ceph-dst:> rbd mirror pool info nova
Mode: pool
Peers: 
  UUID                                 NAME     CLIENT                
  15d732fd-f183-4ba8-850e-5303da9056a2 ceph-src client.rbd-mirror-src 

ceph-dst:> rbd mirror pool peer remove nova 15d732fd-f183-4ba8-850e-5303da9056a2

ceph-dst:> rbd mirror pool info nova
Mode: pool
Peers: none
  • pool: cinder-ceph
ceph-dst:> rbd mirror pool info cinder-ceph
Mode: pool
Peers: 
  UUID                                 NAME     CLIENT                
  17bf9724-e481-4b7e-bd8a-78982e27ae8b ceph-src client.rbd-mirror-src 

ceph-dst:> rbd mirror pool peer remove cinder-ceph 17bf9724-e481-4b7e-bd8a-78982e27ae8b

ceph-dst:> rbd mirror pool info cinder-ceph
Mode: pool
Peers: none

disable mirroring per pool

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool disable $i; done
2020-12-29 14:27:42.127 7ff8ae728b80 -1 librbd::api::Mirror: image_disable: mirroring is enabled on one or more children 
2020-12-29 14:27:42.143 7ff8ae728b80 -1 librbd::api::Mirror: mode_set: error disabling mirroring for image id 113c6b8b4567(16) Device or resource busy

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool info $i; done
Mode: image
Peers: none
Mode: disabled
Mode: disabled

we have to fix the pool glance:

ceph-dst:> rbd mirror pool disable glance
2020-12-29 14:31:08.583 7efee6f4cb80 -1 librbd::api::Mirror: mode_set: failed to disable mirror mode: there are still images with mirroring enabled

ceph-dst:> rbd -p glance ls
1d854549-3069-4ae8-a14a-612324a29f5c

ceph-dst:> rbd mirror image disable glance/1d854549-3069-4ae8-a14a-612324a29f5c
Mirroring disabled

ceph-dst:> rbd mirror pool disable glance

ceph-dst:> for i in glance nova cinder-ceph; do rbd mirror pool info $i; done
Mode: disabled
Mode: disabled
Mode: disabled

disable journaling

ceph-dst:> for i in glance nova cinder-ceph; do for j in `rbd -p $i ls`; do rbd feature disable $i/$j journaling; done; done

stop and disable the rbd-mirror daemon

ceph-dst:> systemctl stop ceph-rbd-mirror@rbd-mirror-dst.service
ceph-dst:> systemctl disable ceph-rbd-mirror@rbd-mirror-dst.service

we disconnect the cloud from the source cluster and connect it to the destination cluster

We are really close to the end. But we have one more tricky step: to reconfigure the cloud to use the destination cluster.

disconnect: remove relations

this is that simple:

juju status -m openstack --relations | grep ceph-client
ceph-proxy:client                        cinder-ceph:ceph                               ceph-client         regular      
ceph-proxy:client                        glance:ceph                                    ceph-client         regular      
ceph-proxy:client                        nova-compute:ceph                              ceph-client         regular      

juju remove-relation ceph-proxy:client cinder-ceph:ceph
juju remove-relation ceph-proxy:client glance:ceph
juju remove-relation ceph-proxy:client nova-compute:ceph

juju status -m openstack --relations | grep ceph-client

connect: add relations

instead of just adding those three relations, we rely on juju’s idempotency: we deploy the final bundle, hoping that juju will execute the three missing relations:

juju deploy ./openstack-bundle-default.yml 
Resolving charm: cs:~openstack-charmers-next/bionic/ceph-mon
Resolving charm: cs:~openstack-charmers-next/bionic/ceph-osd
Resolving charm: cs:~openstack-charmers-next/bionic/cinder
Resolving charm: cs:~openstack-charmers-next/bionic/cinder-ceph
Resolving charm: cs:~openstack-charmers-next/bionic/glance
Resolving charm: cs:~openstack-charmers-next/bionic/keystone
Resolving charm: cs:~openstack-charmers-next/bionic/percona-cluster
Resolving charm: cs:~openstack-charmers-next/bionic/neutron-api
Resolving charm: cs:~openstack-charmers-next/bionic/neutron-gateway
Resolving charm: cs:~openstack-charmers-next/bionic/neutron-openvswitch
Resolving charm: cs:~openstack-charmers-next/bionic/nova-cloud-controller
Resolving charm: cs:~openstack-charmers-next/bionic/nova-compute
Resolving charm: cs:ntp
Resolving charm: cs:ubuntu
Resolving charm: cs:~openstack-charmers-next/bionic/rabbitmq-server
Executing changes:
- set application options for ceph-mon
- set application options for neutron-api
- set application options for neutron-openvswitch
- set application options for nova-compute
- add relation ceph-mon:client - nova-compute:ceph
- add relation ceph-mon:client - cinder-ceph:ceph
- add relation ceph-mon:client - glance:ceph
Deploy of bundle completed.

after a while:

juju status -m openstack --relations | grep ceph-client
ceph-mon:client                          cinder-ceph:ceph                               ceph-client         regular      
ceph-mon:client                          glance:ceph                                    ceph-client         regular      
ceph-mon:client                          nova-compute:ceph                              ceph-client         regular      

restore the cloud

openstack image set --activate cirros

openstack server add volume vm0 vo0
openstack server add volume vm1 vo1
openstack server add volume vm2 vo2

openstack server start vm0
openstack server start vm1
openstack server start vm2
vm0:> mount /dev/vdb /mnt
vm0:> ls /root; ls /mnt | grep -v lost+found
vm0
vo0

vm1:> mount /dev/vdb /mnt
vm1:> ls /root; ls /mnt | grep -v lost+found
vm1
vo1

vm2:> mount /dev/vdb /mnt
vm2:> ls /root; ls /mnt | grep -v lost+found
vm2
vo2

check the xml definition of the instances; this is what we had with the source cluster:

...
      <source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
        <host name='10.33.11.251' port='6789'/>
        <host name='10.33.21.251' port='6789'/>
        <host name='10.33.31.251' port='6789'/>
...
      <source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
        <host name='10.33.11.251' port='6789'/>
        <host name='10.33.21.251' port='6789'/>
        <host name='10.33.31.251' port='6789'/>
...

this is what we have now with the destination cluster:

...
      <source protocol='rbd' name='nova/523d0de7-016f-4ef0-ac89-386dd1ERA861_disk'>
        <host name='10.33.10.41' port='6789'/>
        <host name='10.33.10.42' port='6789'/>
        <host name='10.33.10.43' port='6789'/>
...
      <source protocol='rbd' name='cinder-ceph/volume-b4e4e70e-42ba-4479-a15f-cd2db74a755a'>
        <host name='10.33.10.41' port='6789'/>
        <host name='10.33.10.42' port='6789'/>
        <host name='10.33.10.43' port='6789'/>
...
  • the reference to the rbd image is the same for nova and cinder-ceph, respectively
  • the reference to the ceph monitors is different since we replaced the storage

start the APIs

If you could stop them, I’m pretty sure you can start them as well :)

Closure

Workflow

The workflow described above focuses on minimizing the downtime of the cloud instances/volumes. It is achieved by building the destination cluster in parallel and block the users only when it’s really needed:

  • prohibiting the access to the APIs to prevent changes only when the mirroring activity begins
  • shutting down the instances/volumes only during the switch-over

However, there are some risky steps:

  • we “emulate” the juju charm actions when we create the pools on the destination cluster
  • if the juju relation to the destination cluster would be triggered (e.g. accidentally starting a pipeline) before we finish with the mirroring activity, the running instances/volumes would end up in a non-deterministic state

We could choose a safer approach; but in this scenario we would need to shut down all the instances/volumes - and the APIs of course - at the beginning of the activity. The workflow would be:

  • we shut down the APIs, the instances and the volumes
  • we disconnect the cloud from the source cluster (just run the three juju remove-relation commands)
  • we run the new/final pipeline/bundle and let juju to build a new (destination) ceph cluster, including all the pools Openstack needs - these pools will be initially empty
  • we configure the one-way mirroring between the source and destination cluster - thus, populating the images to the new cluster’s pools
  • we set the destination cluster as primary and stop mirroring between the two clusters
  • we restore the cloud and start the APIs

Known limitations

It is the unit where the rbd-mirror daemon is running that ships the blocks of data between the two clusters. The performance of this shipment determines the time we need to perform the switch-over.

Caveats

It is absolutely possible to upload a qcow2 image via cinder to openstack/ceph claiming it’s raw - it will silently accept it. The end result is the worst one can imagine: no error message, but the instance cannot boot. Why? Because the clone of the image snapshot is a garbage. It took me some time to figure out what’s wrong… on the other side, I learned a lot :)

References