<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.0">Jekyll</generator><link href="https://www.root314.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.root314.com/" rel="alternate" type="text/html" /><updated>2021-01-05T16:26:00+01:00</updated><id>https://www.root314.com/feed.xml</id><title type="html">Root314</title><subtitle>Software Defined Engineer - Cloud, OpenStack, Network</subtitle><entry><title type="html">Docker multi stage builds</title><link href="https://www.root314.com/docker/2018/05/20/docker-multistage-builds/" rel="alternate" type="text/html" title="Docker multi stage builds" /><published>2018-05-20T20:00:00+02:00</published><updated>2018-05-20T20:00:00+02:00</updated><id>https://www.root314.com/docker/2018/05/20/docker-multistage-builds</id><content type="html" xml:base="https://www.root314.com/docker/2018/05/20/docker-multistage-builds/">&lt;p&gt;With Docker 1.17 the long awaited multi stage builds are now available. It allows you to run several stages in your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt; while reducing the size of the final container.&lt;/p&gt;

&lt;p&gt;It is typically used for compiled languages like C, Java or Go, where you need a compiler environment to compile the code, but don’t need it for runtime.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Dockerfile&lt;/code&gt; below is an example showing this in action for a Go application, a single &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;docker build .&lt;/code&gt; will:&lt;/p&gt;
&lt;ol&gt;
  &lt;li&gt;First compile the source code with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;golang:alpine&lt;/code&gt; container (~250MB)&lt;/li&gt;
  &lt;li&gt;Copy the compiled artefact into a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;scratch&lt;/code&gt; container&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The resulting runtime image is only 1.86MB (!) and should be the &lt;em&gt;absolute minimum&lt;/em&gt; to run the application.&lt;/p&gt;

&lt;h5 id=&quot;appgo&quot;&gt;app.go&lt;/h5&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;package main

import (
        &quot;fmt&quot;
)

func main() {
        fmt.Println(&quot;Hello Docker&quot;)
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;h5 id=&quot;dockerfile&quot;&gt;Dockerfile&lt;/h5&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# compiler container
FROM golang:alpine as build
ENV CGO_ENABLED=1
ADD . .
RUN go build -o /main

# runtime container
FROM scratch
COPY --from=build /main /main
ENTRYPOINT [&quot;/main&quot;]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;You can find this example in &lt;a href=&quot;https://github.com/RootPi314/docker-examples/tree/master/go-multistage&quot;&gt;GitHub RootPi314/docker-examples/go-multistage&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This can be easily adapted to cover any compiled language, more info available in the Docker documentation&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="docker" /><category term="docker" /><summary type="html">With Docker 1.17 the long awaited multi stage builds are now available. It allows you to run several stages in your Dockerfile while reducing the size of the final container.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/container-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/container-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Kubernetes All-In-One in 10 minutes</title><link href="https://www.root314.com/kubernetes/2018/04/02/Kubernetes-all-in-one-in-10min/" rel="alternate" type="text/html" title="Kubernetes All-In-One in 10 minutes" /><published>2018-04-02T20:00:00+02:00</published><updated>2018-04-02T20:00:00+02:00</updated><id>https://www.root314.com/kubernetes/2018/04/02/Kubernetes-all-in-one-in-10min</id><content type="html" xml:base="https://www.root314.com/kubernetes/2018/04/02/Kubernetes-all-in-one-in-10min/">&lt;p&gt;&lt;a href=&quot;https://github.com/kubernetes-incubator/kubespray&quot;&gt;Kubespray&lt;/a&gt; is a set of Ansible playbooks to deploy a production ready Kubernetes cluster. But we can also use it for quick and easy test/dev Kubernetes clusters.&lt;/p&gt;

&lt;h2 id=&quot;with-vagrant&quot;&gt;With Vagrant&lt;/h2&gt;
&lt;p&gt;The easiest way is to use Vagrant, simply download the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Vagrantfile&lt;/code&gt; and run &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;vagrant up&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone https://github.com/RootPi314/kubespray-aio.git
cd kubespray-aio
vagrant up
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;cloud-init&quot;&gt;Cloud Init&lt;/h2&gt;

&lt;p&gt;If you prefer to run it with a cloud provider instead of running things locally simply pass the &lt;a href=&quot;https://github.com/RootPi314/kubespray-aio/blob/master/cloud-init.yml&quot;&gt;cloud-init.yml&lt;/a&gt; file as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;--user-data&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;do-it-yourself--behind-the-scene&quot;&gt;Do It Yourself &amp;amp; behind the Scene&lt;/h2&gt;
&lt;p&gt;Finally, if you want to get your hands dirty you’ll need an Ubuntu 16.04 server or VM, then run:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;git clone git clone https://github.com/RootPi314/kubespray-aio.git
cd kubespray-aio
./install.sh
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This should take around 10 minutes depending your computer and internet connection. After which &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kubectl get nodes&lt;/code&gt; will show you a ready cluster:&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ kubectl get nodes
NAME        STATUS    ROLES         AGE       VERSION
localhost   Ready     master,node   5m       v1.9.2+coreos.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;More documentation and info available on &lt;a href=&quot;https://github.com/RootPi314/kubespray-aio&quot;&gt;GitHub.com/RootPi314/kubespray-aio&lt;/a&gt;&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="kubernetes" /><category term="kubernetes" /><category term="kubespray" /><summary type="html">Kubespray is a set of Ansible playbooks to deploy a production ready Kubernetes cluster. But we can also use it for quick and easy test/dev Kubernetes clusters.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/cube-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/cube-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">OpenStack Nordic Days 2017</title><link href="https://www.root314.com/openstack/2017/11/30/openstack-nordic-days-2017-how-to-get-your-private-cloud-project-to-take-off/" rel="alternate" type="text/html" title="OpenStack Nordic Days 2017" /><published>2017-11-30T19:30:00+01:00</published><updated>2017-11-30T19:30:00+01:00</updated><id>https://www.root314.com/openstack/2017/11/30/openstack-nordic-days-2017-how-to-get-your-private-cloud-project-to-take-off</id><content type="html" xml:base="https://www.root314.com/openstack/2017/11/30/openstack-nordic-days-2017-how-to-get-your-private-cloud-project-to-take-off/">&lt;p&gt;Last month I had made a presentation at the OpenSTack Nordic Day 2017 in the Copenhagen: &lt;a href=&quot;/presentations/osnd2017&quot;&gt;How to get your private cloud project to take off&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Video recording is now available:&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="openstack" /><category term="presentation" /><category term="openstack" /><summary type="html">Last month I had made a presentation at the OpenSTack Nordic Day 2017 in the Copenhagen: How to get your private cloud project to take off.</summary></entry><entry><title type="html">OpenStack Days UK 2017</title><link href="https://www.root314.com/openstack/2017/08/30/openstack-days-uk-2017-working-with-legacy-applications/" rel="alternate" type="text/html" title="OpenStack Days UK 2017" /><published>2017-08-30T20:30:00+02:00</published><updated>2017-08-30T20:30:00+02:00</updated><id>https://www.root314.com/openstack/2017/08/30/openstack-days-uk-2017-working-with-legacy-applications</id><content type="html" xml:base="https://www.root314.com/openstack/2017/08/30/openstack-days-uk-2017-working-with-legacy-applications/">&lt;p&gt;The OpenStack community is organising OpenStack Days UK on September 26, 2017 in London. I’ll be presenting how to &lt;a href=&quot;https://openstackdays.uk/2017/?schedule=working-with-legacy-applications-in-the-clouds&quot;&gt;work with legacy applications in the clouds&lt;/a&gt; (Room: London Wall @15:45).&lt;/p&gt;

&lt;p&gt;Check out the full &lt;a href=&quot;https://openstackdays.uk/2017/#schedule&quot;&gt;schedule&lt;/a&gt; and see you there!&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="openstack" /><category term="presentation" /><category term="openstack" /><summary type="html">The OpenStack community is organising OpenStack Days UK on September 26, 2017 in London. I’ll be presenting how to work with legacy applications in the clouds (Room: London Wall @15:45).</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/london-eye-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/london-eye-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">VM live migration across the globe with Ceph and Openstack</title><link href="https://www.root314.com/ceph/2017/06/08/VM-live-migration-across-the-globe-with-Ceph-and-Openstack/" rel="alternate" type="text/html" title="VM live migration across the globe with Ceph and Openstack" /><published>2017-06-08T22:00:00+02:00</published><updated>2017-06-08T22:00:00+02:00</updated><id>https://www.root314.com/ceph/2017/06/08/VM-live-migration-across-the-globe-with-Ceph-and-Openstack</id><content type="html" xml:base="https://www.root314.com/ceph/2017/06/08/VM-live-migration-across-the-globe-with-Ceph-and-Openstack/">&lt;p&gt;Have you ever been in a situation where you had to migrate a VMs across the globe with minimum downtime?
VM live migration across Openstack regions.&lt;/p&gt;

&lt;h3 id=&quot;classic-approach&quot;&gt;Classic approach&lt;/h3&gt;
&lt;p&gt;The volume import/export capabilities of Openstack are limited, therefore the classic approach is to turn off the VM, copy it to the destination, then attach the volume to a VM in the new region. It works but the VM is down during the transfer and it could mean days of downtime for your application.&lt;/p&gt;

&lt;p&gt;That translates in the following Openstack commands:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;DST_REGION&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;RegionTwo
&lt;span class=&quot;nv&quot;&gt;VOL_SIZE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;100
&lt;span class=&quot;c&quot;&gt;# Stop the VM for safety&lt;/span&gt;
openstack server stop myserver
&lt;span class=&quot;c&quot;&gt;# Send the VM snapshot into glance&lt;/span&gt;
openstack image create &lt;span class=&quot;nt&quot;&gt;--volume&lt;/span&gt; myvolume mysnapshot
&lt;span class=&quot;c&quot;&gt;# Pipe the glance image into the DST_REGION&lt;/span&gt;
glance image-download mysnapshot | glance &lt;span class=&quot;nt&quot;&gt;--os-region-name&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$DST_REGION&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; image-upload &lt;span class=&quot;nt&quot;&gt;--name&lt;/span&gt; mysnapshot
&lt;span class=&quot;c&quot;&gt;# Switch to destination region for all further commands&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;export &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;OS_REGION_NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$DST_REGION&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Create volume&lt;/span&gt;
openstack volume create &lt;span class=&quot;nt&quot;&gt;--image&lt;/span&gt; mysnapshot &lt;span class=&quot;nt&quot;&gt;--size&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$VOL_SIZE&lt;/span&gt; myvolume
&lt;span class=&quot;c&quot;&gt;# Boot new server in DST_REGION&lt;/span&gt;
openstack server create &lt;span class=&quot;nt&quot;&gt;--flavor&lt;/span&gt; m1.small &lt;span class=&quot;nt&quot;&gt;--image&lt;/span&gt; mybaseimage myserver
&lt;span class=&quot;c&quot;&gt;# Attach the volume to the new VM&lt;/span&gt;
openstack server add volume myserver myvolume
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This technique is using exclusively the Openstack API but requires downtime during the transfer.&lt;/p&gt;

&lt;h3 id=&quot;here-comes-ceph&quot;&gt;Here comes Ceph&lt;/h3&gt;

&lt;p&gt;Ceph’s snapshot feature allows this type of migration to be handled like VM live migrations:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Take a live snapshot of the volume&lt;/li&gt;
  &lt;li&gt;Transfer the snapshot (many GBs)&lt;/li&gt;
  &lt;li&gt;Stop the original VM&lt;/li&gt;
  &lt;li&gt;Transfer the changes since the snapshot (few MBs)&lt;/li&gt;
  &lt;li&gt;Attach volume and start the VM&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Set source region variables&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;SRC_REGION&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;RegionOne
&lt;span class=&quot;nv&quot;&gt;SRC_POOL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;volumes
&lt;span class=&quot;nv&quot;&gt;SRC_SERVER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;src-server
&lt;span class=&quot;nv&quot;&gt;SRC_VOL_ID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;00000000-1111-2222-3333-4444444444444

&lt;span class=&quot;c&quot;&gt;# Set destination region variables&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;DST_REGION&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;RegionTwo
&lt;span class=&quot;nv&quot;&gt;DST_POOL&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;volumes
&lt;span class=&quot;nv&quot;&gt;DST_SERVER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;dst-server

&lt;span class=&quot;c&quot;&gt;# Keep the volume size and name handy&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;VOL_SIZE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;openstack volume show &lt;span class=&quot;nv&quot;&gt;$SRC_VOL_ID&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--format&lt;/span&gt; value &lt;span class=&quot;nt&quot;&gt;--column&lt;/span&gt; size&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;VOL_NAME&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;openstack volume show &lt;span class=&quot;nv&quot;&gt;$SRC_VOL_ID&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--format&lt;/span&gt; value &lt;span class=&quot;nt&quot;&gt;--column&lt;/span&gt; name&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Take a snapshot of volume and keep snapshot ID&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;SRC_SNAP_ID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;openstack &lt;span class=&quot;nt&quot;&gt;--os-region-name&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$SRC_REGION&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; snapshot create testrbd &lt;span class=&quot;nt&quot;&gt;--name&lt;/span&gt; testsnapshot &lt;span class=&quot;nt&quot;&gt;--force&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--format&lt;/span&gt; value &lt;span class=&quot;nt&quot;&gt;--column&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Create volume in destination region and keep its ID&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;DST_VOL_ID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;openstack &lt;span class=&quot;nt&quot;&gt;--os-region-name&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$DST_REGION&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; volume create &lt;span class=&quot;nt&quot;&gt;--size&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$VOL_SIZE&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--name&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$VOL_NAME&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;--format&lt;/span&gt; value &lt;span class=&quot;nt&quot;&gt;--column&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
ceph-dst:~# rbd &lt;span class=&quot;nb&quot;&gt;rm&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$DST_POOL&lt;/span&gt;/volume-&lt;span class=&quot;nv&quot;&gt;$DST_VOL_ID&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# export | import of the snapshot with an SSH pipe for data transfer&lt;/span&gt;
ceph-src:~# rbd &lt;span class=&quot;nb&quot;&gt;export&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$SRC_POOL&lt;/span&gt;/volume-&lt;span class=&quot;nv&quot;&gt;$SRC_VOL_ID&lt;/span&gt;@snapshot-&lt;span class=&quot;nv&quot;&gt;$SRC_SNAP_ID&lt;/span&gt; - | ssh ceph-dst rbd &lt;span class=&quot;nt&quot;&gt;--image-format&lt;/span&gt; 2 import - &lt;span class=&quot;nv&quot;&gt;$DST_POOL&lt;/span&gt;/volume-&lt;span class=&quot;nv&quot;&gt;$DST_VOL_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The export-import can take a long time (hours or days) but since the VM is still running in the source region there is no downtime during the transfer time. Also note that the data transfer is encrypted by SSH in this example.&lt;/p&gt;

&lt;p&gt;Once the base snapshot is imported to its destination, we can start with the incremental:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Unmount the file system on the volume (downtime starts)&lt;/span&gt;
src-server:~# umount /path/to/mount/point
ceph-src:~# openstack server remove volume &lt;span class=&quot;nv&quot;&gt;$SRC_SERVER&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$SRC_VOL_ID&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Create a matching snapshot in the destination region&lt;/span&gt;
ceph-dst:~# rbd snap create &lt;span class=&quot;nv&quot;&gt;$DST_POOL&lt;/span&gt;/volume-&lt;span class=&quot;nv&quot;&gt;$DST_VOL_ID&lt;/span&gt;@snapshot-&lt;span class=&quot;nv&quot;&gt;$SRC_SNAP_ID&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Then export-diff | import-diff the delta between current state and base snapshot&lt;/span&gt;
ceph-src:~# rbd export-diff &lt;span class=&quot;nt&quot;&gt;--from-snap&lt;/span&gt; snapshot-&lt;span class=&quot;nv&quot;&gt;$SRC_SNAP_ID&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$SRC_POOL&lt;/span&gt;/volume-&lt;span class=&quot;nv&quot;&gt;$SRC_VOL_ID&lt;/span&gt; - | ssh ceph-dst rbd import-diff - &lt;span class=&quot;nv&quot;&gt;$DST_POOL&lt;/span&gt;/volume-&lt;span class=&quot;nv&quot;&gt;$DST_VOL_ID&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Clean up the temporary snapshot&lt;/span&gt;
ceph-dst:~# rbd snap &lt;span class=&quot;nb&quot;&gt;rm&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$DST_POOL&lt;/span&gt;/volume-&lt;span class=&quot;nv&quot;&gt;$DST_VOL_ID&lt;/span&gt;@snapshot-&lt;span class=&quot;nv&quot;&gt;$SRC_SNAP_ID&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# Attach volume to destination VM and mount the file system (downtime ends)&lt;/span&gt;
ceph-dst:~# openstack server add volume &lt;span class=&quot;nv&quot;&gt;$DST_SERVER&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;$DST_VOL_ID&lt;/span&gt;
dst-server:~# mount /dev/disk/by-id/virtio-&lt;span class=&quot;k&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;DST_VOL_ID&lt;/span&gt;:0:20&lt;span class=&quot;k&quot;&gt;}&lt;/span&gt; /path/to/mount/point
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;to-summarize&quot;&gt;To summarize&lt;/h1&gt;

&lt;p&gt;This example demonstrate how to get it going with 1 incremental iteration for simplicity, but you might repeat the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;export-diff | import-diff&lt;/code&gt; a couple of times to reduce the downtime to &lt;strong&gt;less than 10 seconds&lt;/strong&gt;.
This type of live volume migration is great tool to critical move VMs accross continents.&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="ceph" /><summary type="html">Have you ever been in a situation where you had to migrate a VMs across the globe with minimum downtime? VM live migration across Openstack regions.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/plane-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/plane-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Ceph hybrid storage tiers</title><link href="https://www.root314.com/ceph/2017/04/30/Ceph-hybrid-storage-tiers/" rel="alternate" type="text/html" title="Ceph hybrid storage tiers" /><published>2017-04-30T00:00:00+02:00</published><updated>2017-04-30T00:00:00+02:00</updated><id>https://www.root314.com/ceph/2017/04/30/Ceph-hybrid-storage-tiers</id><content type="html" xml:base="https://www.root314.com/ceph/2017/04/30/Ceph-hybrid-storage-tiers/">&lt;p&gt;In a previous post I showed you &lt;a href=&quot;https://www.root314.com/2017/01/15/Ceph-storage-tiers/&quot;&gt;how to deploy storage tiering for Ceph&lt;/a&gt;, today I will explain how to setup hybrid storage tiers.&lt;/p&gt;

&lt;h2 id=&quot;what-is-hybrid-storage&quot;&gt;What is hybrid storage?&lt;/h2&gt;

&lt;p&gt;Hybrid storage is a combination of two different storage tiers like SSD and HDD. In Ceph terms that means that the copies of each objects are located in different tiers - maybe 1 copy on SSD and 2 copies on HDDs.&lt;/p&gt;

&lt;p&gt;The idea is to keep 1 copy of the data on a high performance tier (usually SSD or NVMe) and 2 additional copies on a lower cost tier (usually HDDs) in order to improve the read performance at a lower cost.&lt;/p&gt;

&lt;p&gt;The following diagram explains the difference between read and write I/O, when using a hybrid storage tier:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://www.root314.com/img/posts/hybrid-storage-tier.svg&quot; alt=&quot;wdm usage&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;how-to-set-it-up&quot;&gt;How to set it up?&lt;/h2&gt;

&lt;p&gt;To get this to work in Ceph, we create a two step storage policy:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;First step: choose the primary OSD (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;firstn 1&lt;/code&gt;) in the high performance tier, “root-ssd” in the example&lt;/li&gt;
  &lt;li&gt;Second step: choose the rest of the OSDs (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;firstn -1&lt;/code&gt;) in the low performance tier, “root-hdd” in the example&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Assuming a replication factor of 3, the following Ceph ruleset will place 1 copy of each object on SSD and 2 copies on HDDs.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;# Hybrid storage policy
rule hybrid {
  ruleset 2
  type replicated
  min_size 1
  max_size 10
  step take root-ssd
  step chooseleaf firstn 1 type host
  step emit
  step take root-hdd
  step chooseleaf firstn -1 type host
  step emit
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Now that you know how to set it up, it’s up to you to combine all the storage tiers.&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="ceph" /><category term="ceph" /><summary type="html">In a previous post I showed you how to deploy storage tiering for Ceph, today I will explain how to setup hybrid storage tiers.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/disks-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/disks-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Schrodinger Ceph cluster</title><link href="https://www.root314.com/ceph/2017/04/05/schrodinger-ceph-cluster/" rel="alternate" type="text/html" title="The Schrodinger Ceph cluster" /><published>2017-04-05T00:00:00+02:00</published><updated>2017-04-05T00:00:00+02:00</updated><id>https://www.root314.com/ceph/2017/04/05/schrodinger-ceph-cluster</id><content type="html" xml:base="https://www.root314.com/ceph/2017/04/05/schrodinger-ceph-cluster/">&lt;p&gt;Inspired by Schrodinger’s famous thought experiment, this is the the story of a Ceph cluster that was both full and empty until reality kicked in.&lt;/p&gt;

&lt;h3 id=&quot;the-cluster&quot;&gt;The cluster&lt;/h3&gt;

&lt;p&gt;Let’s imagine a small 3 servers cluster. All servers are identical and contain 10x 2 TB hard drives.
All Ceph pools are &lt;strong&gt;triple replicated on three different hosts&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The total capacity of the cluster is 60 TB raw and 20 TB usable. All is well.&lt;/p&gt;

&lt;h3 id=&quot;the-increase&quot;&gt;The increase&lt;/h3&gt;

&lt;p&gt;After some time you need more capacity and decide to add a new node, but this time with 10x 6 TB drives. The anticipated capacity is then 120 TB raw and 40 TB usable.&lt;/p&gt;

&lt;p&gt;After the installation &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceph status&lt;/code&gt; reports the expected raw capacity of 120 TB.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt; health HEALTH_OK
[...]
 osdmap e10: 40 osds: 40 up, 40 in
  pgmap v20: 256 pgs, 2 pools, 30 TB data, 1000 objects
        30 TB used, 90 TB / 120 TB avail
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;the-problem&quot;&gt;The problem&lt;/h3&gt;

&lt;p&gt;The problem with this setup is that the &lt;strong&gt;usable capacity is  lower than expected&lt;/strong&gt;. If you would try to fill the pool with data you would notice that the maximum usable capacity of this cluster is 30TB, that’s &lt;strong&gt;10 TB lower than anticipated&lt;/strong&gt;: simply because of triple replication.&lt;/p&gt;

&lt;p&gt;The table below shows the space usage when you try to to fill this cluster.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Server&lt;/th&gt;
      &lt;th&gt;Disks&lt;/th&gt;
      &lt;th&gt;% of Total&lt;/th&gt;
      &lt;th&gt;Used&lt;/th&gt;
      &lt;th&gt; Available&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph1&lt;/td&gt;
      &lt;td&gt;10x2TB&lt;/td&gt;
      &lt;td&gt;16.7%&lt;/td&gt;
      &lt;td&gt;20 TB&lt;/td&gt;
      &lt;td&gt;0 TB&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph2&lt;/td&gt;
      &lt;td&gt;10x2TB&lt;/td&gt;
      &lt;td&gt;16.7%&lt;/td&gt;
      &lt;td&gt;20 TB&lt;/td&gt;
      &lt;td&gt;0 TB&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph3&lt;/td&gt;
      &lt;td&gt;10x2TB&lt;/td&gt;
      &lt;td&gt;16.7%&lt;/td&gt;
      &lt;td&gt;20 TB&lt;/td&gt;
      &lt;td&gt;0 TB&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph4&lt;/td&gt;
      &lt;td&gt;10x6TB&lt;/td&gt;
      &lt;td&gt;50%&lt;/td&gt;
      &lt;td&gt;30 TB&lt;/td&gt;
      &lt;td&gt;30 TB&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;In this situation you see that the last node (ceph4) still has 30 TB raw available: that’s the missing 10 TB usable. Since all other nodes (ceph1-3) are full, there is nowhere to store the additional copies required by the storage policy (3 copies on 3 different hosts), so that space is not usable for a triple replication pool.&lt;/p&gt;

&lt;h3 id=&quot;the-solutions&quot;&gt;The solutions&lt;/h3&gt;

&lt;p&gt;This is happening when using replication with N copies (N=3 in this example) and a single node is responsible for more than 1/N (33% in this example) of the overall cluster capacity.&lt;/p&gt;

&lt;h4 id=&quot;theoretical-solution&quot;&gt;Theoretical solution&lt;/h4&gt;

&lt;p&gt;Some would say:&lt;/p&gt;
&lt;blockquote&gt;
  &lt;p&gt;Just use N=1, then you would be guaranteed that no node will be responsible for more than 100% of the cluster capacity&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s right, but in practice that would disable the data protection (1 copy = no replication), so definitly not what we want.&lt;/p&gt;

&lt;h4 id=&quot;equilibrate-the-cluster&quot;&gt;Equilibrate the cluster&lt;/h4&gt;
&lt;p&gt;The most practical way to address this is to equilibrate the servers. We can swap half of the 2TB drives in ceph3 with half of the 6 TB drives in ceph4.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Server&lt;/th&gt;
      &lt;th&gt;2 TB drive&lt;/th&gt;
      &lt;th&gt; 6 TB drive&lt;/th&gt;
      &lt;th&gt;% of Total&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph1&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;16.7%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph2&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;16.7%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph3&lt;/td&gt;
      &lt;td&gt;5&lt;/td&gt;
      &lt;td&gt;5&lt;/td&gt;
      &lt;td&gt;33%&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ceph4&lt;/td&gt;
      &lt;td&gt;5&lt;/td&gt;
      &lt;td&gt;5&lt;/td&gt;
      &lt;td&gt;33%&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;I recommend you to first update the CRUSHmap, this will start the data re-balancing and will make the new space available quickly, but will relax your storage policy (3 copies on 2 different servers) until you also swap the disks physically.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Move 5x2TB OSDs to ceph4&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;i &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;20 21 22 23 24
&lt;span class=&quot;k&quot;&gt;do
  &lt;/span&gt;ceph osd crush &lt;span class=&quot;nb&quot;&gt;set &lt;/span&gt;osd.&lt;span class=&quot;nv&quot;&gt;$i&lt;/span&gt; 2.0 &lt;span class=&quot;nv&quot;&gt;root&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;default &lt;span class=&quot;nv&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;ceph4
&lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Move 5x6TB OSDs to ceph3&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for &lt;/span&gt;i &lt;span class=&quot;k&quot;&gt;in &lt;/span&gt;30 31 32 33 34
&lt;span class=&quot;k&quot;&gt;do
  &lt;/span&gt;ceph osd crush &lt;span class=&quot;nb&quot;&gt;set &lt;/span&gt;osd.&lt;span class=&quot;nv&quot;&gt;$i&lt;/span&gt; 6.0 &lt;span class=&quot;nv&quot;&gt;root&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;default &lt;span class=&quot;nv&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;ceph3
&lt;span class=&quot;k&quot;&gt;done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="ceph" /><category term="ceph" /><summary type="html">Inspired by Schrodinger’s famous thought experiment, this is the the story of a Ceph cluster that was both full and empty until reality kicked in.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/disk-and-micro-sd-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/disk-and-micro-sd-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The basics of Wavelength Division Multiplexing</title><link href="https://www.root314.com/network/2017/03/05/the-basics-of-wdm/" rel="alternate" type="text/html" title="The basics of Wavelength Division Multiplexing" /><published>2017-03-05T13:00:00+01:00</published><updated>2017-03-05T13:00:00+01:00</updated><id>https://www.root314.com/network/2017/03/05/the-basics-of-wdm</id><content type="html" xml:base="https://www.root314.com/network/2017/03/05/the-basics-of-wdm/">&lt;p&gt;Wavelength Division Multiplexing (WDM) is a multiplexing method which uses different colors (or wavelengths) of light. Where traditional optics allow a single channel of communication on a fiber pair, WDM deployments allow to carry up to 96 channels on a single fiber pair. Channels can support different technologies and different speeds, for example you can mix 1G Ethernet, 10G Ethernet and 4G Fiber Channel on the same fiber run.&lt;/p&gt;

&lt;h3 id=&quot;usage&quot;&gt;Usage&lt;/h3&gt;

&lt;p&gt;Since it is a multiplexing technology, we combine (mux) the transmission and we split (demux) the reception. With a typical fiber pairs, this is done by a mux/demux passive box as illustrated below.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;https://www.root314.com/img/posts/mux-demux.svg&quot; alt=&quot;wdm usage&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The optics used in a WDM setup are called colored optics (tuned to a specific wavelength) therefore the color on each end of the fiber must match.&lt;/p&gt;

&lt;h3 id=&quot;cwdm-vs-dwdm&quot;&gt;CWDM vs DWDM&lt;/h3&gt;
&lt;p&gt;WDM exists in two variety: Coarse (CWDM) and Dense (DWDM), each using its own range of wavelengths.  Since their wavelength range overlaps, they can co-exist on the same link to some extent.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Name&lt;/th&gt;
      &lt;th&gt;Channels&lt;/th&gt;
      &lt;th&gt;Wavelength&lt;/th&gt;
      &lt;th&gt;Cost&lt;/th&gt;
      &lt;th&gt;Use case&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;CWDM&lt;/td&gt;
      &lt;td&gt;18&lt;/td&gt;
      &lt;td&gt;1270-1610nm&lt;/td&gt;
      &lt;td&gt;Low&lt;/td&gt;
      &lt;td&gt;Short distance&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;DWDM&lt;/td&gt;
      &lt;td&gt;96&lt;/td&gt;
      &lt;td&gt;1520-1577nm&lt;/td&gt;
      &lt;td&gt;High&lt;/td&gt;
      &lt;td&gt;Medium/long distance&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;reducing-interconnect-costs&quot;&gt;Reducing interconnect costs&lt;/h3&gt;
&lt;p&gt;WDM reduces the interconnection costs by using the same fiber pair for many links.
Let’s say we want 60Gbps bandwidth (6x10 Gbps) between two point of presence (POP A and B) with dark fiber between them.&lt;/p&gt;

&lt;p&gt;To make this a fair example I will assume the following prices:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Dark fiber: 100 €/mo/fiber pair&lt;/li&gt;
  &lt;li&gt;Traditional LR optic 10Gbps SFP+: 30 €&lt;/li&gt;
  &lt;li&gt;CWDM optic 10Gbps SFP+: 100 €/unit&lt;/li&gt;
  &lt;li&gt;Mux-Demux: 700 €/unit&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;traditional-approach&quot;&gt;Traditional approach&lt;/h4&gt;
&lt;p&gt;We need 6 dark fiber pairs (600 €/mo) and 12 optics (360 €). Costing a total of 21 960 € over 3 years.&lt;/p&gt;

&lt;h4 id=&quot;cwdm&quot;&gt;CWDM&lt;/h4&gt;
&lt;p&gt;Since we need 6 channels CWDM (up to 16 channels) over a few hundred meters (different rooms) will work just fine.
Of course we will need only 1 fiber pair (100 €/mo), and 12 CWDM optics (1200 €).
We will also use 1 Mux-Demux at each location (1400 €).&lt;/p&gt;

&lt;p&gt;Yielding a total cost of 6 200 €, 70% less than the traditional setup.&lt;/p&gt;

&lt;h3 id=&quot;total-cost-of-ownership-calculator&quot;&gt;Total Cost of Ownership calculator&lt;/h3&gt;

&lt;p&gt;You can use the following calculator to estimate the TCO of a WDM deployment based on the number of channels.
The example costs are based on online prices as of March 2017.&lt;/p&gt;

&lt;form ng-controller=&quot;CalculatorController&quot; class=&quot;well&quot;&gt;
&lt;div class=&quot;row&quot;&gt;
  &lt;div class=&quot;btn-group&quot;&gt;
    &lt;button type=&quot;button&quot; class=&quot;btn btn-default dropdown-toggle&quot; data-toggle=&quot;dropdown&quot; aria-haspopup=&quot;true&quot; aria-expanded=&quot;false&quot;&gt;
    Examples &lt;span class=&quot;caret&quot;&gt;&lt;/span&gt;
    &lt;/button&gt;
    &lt;ul class=&quot;dropdown-menu&quot;&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 4; price_wdm = 100; price_mux = 220;&quot;&gt;4 channels CWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 4; price_wdm = 330; price_mux = 310;&quot;&gt;4 channels DWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 8; price_wdm = 100; price_mux = 390;&quot;&gt;8 channels CWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 8; price_wdm = 330; price_mux = 580;&quot;&gt;8 channels DWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 16; price_wdm = 165; price_mux = 740;&quot;&gt;16 channels CWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 16; price_wdm = 330; price_mux = 1000;&quot;&gt;16 channels DWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 40; price_wdm = 330; price_mux = 1600;&quot;&gt;40 channels DWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 80; price_wdm = 440; price_mux = 8500;&quot;&gt;80 channels DWDM&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a ng-click=&quot;channels = 96; price_wdm = 440; price_mux = 7900;&quot;&gt;96 channels DWDM&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/div&gt;
&lt;/div&gt;


&lt;div class=&quot;row&quot;&gt;
    &lt;div class=&quot;col-md-6&quot;&gt;
    &lt;h4&gt;Setup&lt;/h4&gt;
    &lt;div class=&quot;form-group&quot;&gt;
    &lt;label for=&quot;channels&quot;&gt;Channels&lt;/label&gt;
    &lt;input type=&quot;number&quot; ng-model=&quot;channels&quot; class=&quot;form-control&quot; id=&quot;channels&quot; label=&quot;Channels&quot; name=&quot;channels&quot; /&gt;
&lt;/div&gt;

    &lt;div class=&quot;form-group&quot;&gt;
    &lt;label for=&quot;period&quot;&gt;TCO (in months)&lt;/label&gt;
    &lt;input type=&quot;number&quot; ng-model=&quot;period&quot; class=&quot;form-control&quot; id=&quot;period&quot; label=&quot;TCO (in months)&quot; name=&quot;period&quot; /&gt;
&lt;/div&gt;

    &lt;/div&gt;
    &lt;div class=&quot;col-md-6&quot;&gt;
    &lt;h4&gt;Prices&lt;/h4&gt;
      &lt;div class=&quot;form-group&quot;&gt;
    &lt;label for=&quot;price_fiber&quot;&gt;Dark fiber (in &amp;euro;/mo/fiber pair)&lt;/label&gt;
    &lt;input type=&quot;number&quot; ng-model=&quot;price_fiber&quot; class=&quot;form-control&quot; id=&quot;price_fiber&quot; label=&quot;Dark fiber (in &amp;euro;/mo/fiber pair)&quot; name=&quot;price_fiber&quot; /&gt;
&lt;/div&gt;

      &lt;div class=&quot;form-group&quot;&gt;
    &lt;label for=&quot;price_optic&quot;&gt;Standard optic (in &amp;euro;/unit)&lt;/label&gt;
    &lt;input type=&quot;number&quot; ng-model=&quot;price_optic&quot; class=&quot;form-control&quot; id=&quot;price_optic&quot; label=&quot;Standard optic (in &amp;euro;/unit)&quot; name=&quot;price_optic&quot; /&gt;
&lt;/div&gt;

      &lt;div class=&quot;form-group&quot;&gt;
    &lt;label for=&quot;price_wdm&quot;&gt;WDM optic (in &amp;euro;/unit)&lt;/label&gt;
    &lt;input type=&quot;number&quot; ng-model=&quot;price_wdm&quot; class=&quot;form-control&quot; id=&quot;price_wdm&quot; label=&quot;WDM optic (in &amp;euro;/unit)&quot; name=&quot;price_wdm&quot; /&gt;
&lt;/div&gt;

      &lt;div class=&quot;form-group&quot;&gt;
    &lt;label for=&quot;price_mux&quot;&gt;Mux-Demux (in &amp;euro;/unit)&lt;/label&gt;
    &lt;input type=&quot;number&quot; ng-model=&quot;price_mux&quot; class=&quot;form-control&quot; id=&quot;price_mux&quot; label=&quot;Mux-Demux (in &amp;euro;/unit)&quot; name=&quot;price_mux&quot; /&gt;
&lt;/div&gt;

      &lt;/div&gt;

&lt;/div&gt;
&lt;h3&gt;Results&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;TCO without WDM: &lt;span ng-bind=&quot;tco_trad() | number:0&quot;&gt;&lt;/span&gt; &amp;euro;&lt;/li&gt;
  &lt;li&gt;TCO with WDM: &lt;span ng-bind=&quot;tco_wdm() | number:0&quot;&gt;&lt;/span&gt; &amp;euro;&lt;/li&gt;
&lt;/ul&gt;
&lt;/form&gt;

&lt;!-- AngularJS --&gt;
&lt;script src=&quot;//ajax.googleapis.com/ajax/libs/angularjs/1.5.6/angular.min.js&quot;&gt;&lt;/script&gt;

&lt;script&gt;
angular.module('Root314', [])
  .controller('CalculatorController', ['$scope', function($scope) {

    $scope.period = 36;
    $scope.channels = 6;

    $scope.price_fiber = 100;
    $scope.price_optic = 30;
    $scope.price_wdm = 100;
    $scope.price_mux = 700;

    $scope.set_example = function(id) {
      $scope.channels = $scope.examples[id].channels;
      $scope.price_wdm = $scope.examples[id].price_wdm;
      $scope.price_mux = $scope.examples[id].price_mux;
    };
    $scope.tco_trad = function() {
      return $scope.channels*($scope.price_fiber*$scope.period + 2*$scope.price_optic);
    };
    $scope.tco_wdm = function() {
      return $scope.price_fiber*$scope.period + 2*($scope.channels*$scope.price_wdm + $scope.price_mux);
    };
  }]);
&lt;/script&gt;

&lt;h3 id=&quot;caveats&quot;&gt;Caveats&lt;/h3&gt;

&lt;p&gt;Using WDM requires additional documentation to keep track of which wavelengths are used by which links. Be mindful of the physical layer and do not put redundant links on the same fiber pair.&lt;/p&gt;

&lt;h3 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h3&gt;

&lt;p&gt;WDM is a great tool to deliver high speed (up to 960 Gbps) on a single fiber pair and to keep costs down.&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="network" /><category term="network" /><category term="wdm" /><summary type="html">Wavelength Division Multiplexing (WDM) is a multiplexing method which uses different colors (or wavelengths) of light. Where traditional optics allow a single channel of communication on a fiber pair, WDM deployments allow to carry up to 96 channels on a single fiber pair. Channels can support different technologies and different speeds, for example you can mix 1G Ethernet, 10G Ethernet and 4G Fiber Channel on the same fiber run.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/network2-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/network2-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Edit the Ceph CRUSHmap</title><link href="https://www.root314.com/2017/01/29/edit-ceph-crushmap/" rel="alternate" type="text/html" title="Edit the Ceph CRUSHmap" /><published>2017-01-29T19:00:00+01:00</published><updated>2017-01-29T19:00:00+01:00</updated><id>https://www.root314.com/2017/01/29/edit-ceph-crushmap</id><content type="html" xml:base="https://www.root314.com/2017/01/29/edit-ceph-crushmap/">&lt;p&gt;The CRUSHmap, as suggested by the name, is a map of your storage cluster. This map is necessary for the CRUSH algorithm to determine data placements. But Ceph’s CRUSHmap is stored in binary form. So how to easily change it?&lt;/p&gt;

&lt;h3 id=&quot;native-tools&quot;&gt;Native tools&lt;/h3&gt;

&lt;p&gt;Ceph comes with a couple of native commands to do basic customizations to the CRUSHmap:&lt;/p&gt;

&lt;h4 id=&quot;reweight&quot;&gt;&lt;a href=&quot;http://docs.ceph.com/docs/master/rados/operations/crush-map/#adjust-an-osd-s-crush-weight&quot;&gt;Reweight&lt;/a&gt;&lt;/h4&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ceph osd crush reweight &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;name&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;weight&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You use this to adjust the weight of an OSD or a bucket. It’s very useful when some OSDs are getting more used than others, as it allows to lower the weights of the more busy drives or nodes.&lt;/p&gt;

&lt;h4 id=&quot;remove&quot;&gt;&lt;a href=&quot;http://docs.ceph.com/docs/master/rados/operations/crush-map/#remove-an-osd&quot;&gt;Remove&lt;/a&gt;&lt;/h4&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ceph osd remove &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;name|bucket-name&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;You use this to either clean up old buckets, or when you decomission OSDs.&lt;/p&gt;

&lt;h4 id=&quot;move-or-add-set-location-and-weight&quot;&gt;&lt;a href=&quot;http://docs.ceph.com/docs/master/rados/operations/crush-map/#add-move-an-osd&quot;&gt;Move or add, set location and weight&lt;/a&gt;&lt;/h4&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ceph osd crush &lt;span class=&quot;nb&quot;&gt;set&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;id-or-name&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;weight&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;nv&quot;&gt;root&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;={&lt;/span&gt;pool-name&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[{&lt;/span&gt;bucket-type&lt;span class=&quot;o&quot;&gt;}={&lt;/span&gt;bucket-name&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; ...]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This is one of the most interesting commands. It does 3 things at once to the specified OSD or bucket:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;If the specified OSD or bucket does not exist - it creates it. So be careful with typos, oosd.0 is probably not what you meant :)&lt;/li&gt;
  &lt;li&gt;It changes the location&lt;/li&gt;
  &lt;li&gt;It changes the weight&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This command is pretty useful when you are physically moving things in your cluster.&lt;/p&gt;

&lt;h4 id=&quot;read-and-write-the-map&quot;&gt;Read and write the map&lt;/h4&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Read&lt;/span&gt;
ceph osd getcrushmap &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;output file&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Write&lt;/span&gt;
ceph osd setcrushmap &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;input file&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;If you want to customize anything else (not covered in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceph osd crush&lt;/code&gt;) then you will need to download the CRUSHmap, edit it and then upload the new version. But since the CRUSHmap is in binary format you have to convert it to and from human readable text.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceph osd getcrushmap -o map.bin&lt;/code&gt; returns the map in its binary form and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;crushtool -d map.bin -o map.txt&lt;/code&gt; converts the binary file into a human readable text file.&lt;/li&gt;
  &lt;li&gt;You can edit the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;map.txt&lt;/code&gt; with your favorite text editor.&lt;/li&gt;
  &lt;li&gt;To apply your changes, you first need to convert the edited text file to binary with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;crushtool -c map.txt -o map.bin&lt;/code&gt; and then to apply your changes with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceph osd setcrushmap -i map.bin&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;helper-scripts&quot;&gt;Helper scripts&lt;/h3&gt;
&lt;p&gt;To make it easy for you I made a pair of helper scripts that takes care of the conversion transparently:&lt;/p&gt;

&lt;h4 id=&quot;ceph-get-crushmap&quot;&gt;ceph-get-crushmap&lt;/h4&gt;
&lt;p&gt;This first one just outputs the current CRUSHmap to &lt;em&gt;stdout&lt;/em&gt;. Perfect to combine with a &lt;a href=&quot;https://en.wikipedia.org/wiki/Pipeline_(Unix)&quot;&gt;pipeline&lt;/a&gt;.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Get and convert the CRUSHmap&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;mktemp&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
ceph osd getcrushmap &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tmp&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; crushtool &lt;span class=&quot;nt&quot;&gt;-d&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tmp&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; /dev/stdout
&lt;span class=&quot;nb&quot;&gt;rm&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tmp&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h4 id=&quot;ceph-set-crushmap&quot;&gt;ceph-set-crushmap&lt;/h4&gt;

&lt;p&gt;This second script applies the CRUSHmap from &lt;em&gt;stdin&lt;/em&gt; to the Ceph cluster.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Convert and set the CRUSHmap&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;tmp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;mktemp&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
crushtool &lt;span class=&quot;nt&quot;&gt;-c&lt;/span&gt; /dev/stdin &lt;span class=&quot;nt&quot;&gt;-o&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tmp&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; ceph osd setcrushmap &lt;span class=&quot;nt&quot;&gt;-i&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tmp&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;rm&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tmp&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;example&quot;&gt;Example&lt;/h3&gt;
&lt;p&gt;Assuming you have both scripts in your &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;$PATH&lt;/code&gt;, you could easily rename the &lt;em&gt;host&lt;/em&gt; type to &lt;em&gt;server&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;ceph-get-crushmap | sed &lt;span class=&quot;s1&quot;&gt;'s/host/server/'&lt;/span&gt; | get-set-crushmap
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="ceph" /><category term="ceph" /><summary type="html">The CRUSHmap, as suggested by the name, is a map of your storage cluster. This map is necessary for the CRUSH algorithm to determine data placements. But Ceph’s CRUSHmap is stored in binary form. So how to easily change it?</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/map-and-pen-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/map-and-pen-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Deploying Ceph with storage tiering</title><link href="https://www.root314.com/2017/01/15/Ceph-storage-tiers/" rel="alternate" type="text/html" title="Deploying Ceph with storage tiering" /><published>2017-01-15T21:00:00+01:00</published><updated>2017-01-15T21:00:00+01:00</updated><id>https://www.root314.com/2017/01/15/Ceph-storage-tiers</id><content type="html" xml:base="https://www.root314.com/2017/01/15/Ceph-storage-tiers/">&lt;p&gt;You have several options to deploy storage tiering within Ceph. In this post I will show you a simple yet powerful approach to automatically update the CRUSHmap and create storage policies.&lt;/p&gt;

&lt;h3 id=&quot;some-basics&quot;&gt;Some basics&lt;/h3&gt;

&lt;p&gt;Storage tiering means having several tiers available. The classic 3 tiered approach is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;fast: all flash&lt;/li&gt;
  &lt;li&gt;medium: disks accelerated by some flash journals&lt;/li&gt;
  &lt;li&gt;slow: archive disks with collocated journals&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;tiered-crushmap&quot;&gt;Tiered CRUSHmap&lt;/h3&gt;

&lt;p&gt;First we will configure the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;crush location hook&lt;/code&gt;. It is a script invoked on OSD start to determine the OSD’s location in the CRUSHmap.
To make things simple I use the size of a disk to find out which tier it should belong to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Bigger than 6 TB → Archive drive&lt;/li&gt;
  &lt;li&gt;Between 1.6TB and 6TB → Disk with flash journal&lt;/li&gt;
  &lt;li&gt;Smaller than 1.6TB → SSD assumed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works in most environments but you might want to adjust the script to fit your environment.&lt;/p&gt;

&lt;p&gt;Append the following code to a &lt;strong&gt;copy&lt;/strong&gt; of &lt;a href=&quot;https://github.com/ceph/ceph/blob/master/src/ceph-crush-location.in&quot;&gt;/usr/bin/ceph-crush-location&lt;/a&gt;, then specify its path with &lt;a href=&quot;http://docs.ceph.com/docs/master/rados/operations/crush-map/#custom-location-hooks&quot;&gt;osd crush location hook&lt;/a&gt; in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceph.conf&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# more than 6TB for slow&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;size_limit_slow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;6000
&lt;span class=&quot;c&quot;&gt;# more than 1.6TB for medium&lt;/span&gt;
&lt;span class=&quot;nv&quot;&gt;size_limit_medium&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;1600

&lt;span class=&quot;nv&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;df&lt;/span&gt; /var/lib/ceph/osd/ceph-&lt;span class=&quot;nv&quot;&gt;$id&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{if(NR &amp;gt; 1){printf &quot;%d&quot;, $2/1024/1024}}'&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$size&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-gt&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$size_limit_slow&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;then
  &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;tier&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;slow&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$size&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-gt&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$size_limit_medium&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;then
  &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;tier&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;medium&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;else
  &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;tier&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;fast&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;fi
&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&quot;host=&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;hostname&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-s&lt;/span&gt;&lt;span class=&quot;si&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tier&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt; root=&lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;$tier&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;After a restart your OSDs will show up in a tier specific root, the OSD tree should look like that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;root fast
    &lt;ul&gt;
      &lt;li&gt;host ceph-1-fast&lt;/li&gt;
      &lt;li&gt;host ceph-2-fast&lt;/li&gt;
      &lt;li&gt;host ceph-3-fast&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;root medium
    &lt;ul&gt;
      &lt;li&gt;host ceph-1-medium&lt;/li&gt;
      &lt;li&gt;host ceph-2-medium&lt;/li&gt;
      &lt;li&gt;host ceph-3-medium&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;root slow
    &lt;ul&gt;
      &lt;li&gt;host ceph-1-slow&lt;/li&gt;
      &lt;li&gt;host ceph-2-slow&lt;/li&gt;
      &lt;li&gt;host ceph-3-slow&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;creating-rulesets&quot;&gt;Creating rulesets&lt;/h3&gt;

&lt;p&gt;Rulesets allow you to describe your storage policies. We will use rulesets to restrict storage pools to each tiers. You can easily do this by editing the CRUSHmap. Below is an example of rulesets for replicated pools with copies stored on different hosts.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;rule fast {
  ruleset 1
  type replicated
  min_size 1
  max_size 10
  step take fast
  step chooseleaf firstn 0 type host
  step emit
}
rule medium {
  ruleset 2
  type replicated
  min_size 1
  max_size 10
  step take medium
  step chooseleaf firstn 0 type host
  step emit
}
rule slow {
  ruleset 3
  type replicated
  min_size 1
  max_size 10
  step take slow
  step chooseleaf firstn 0 type host
  step emit
}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;bring-it-all-together&quot;&gt;Bring it all together&lt;/h3&gt;

&lt;p&gt;To finish you simply set the appropriate ruleset to each storage pool as shown below and you are ready to go.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Set fast tier for the rbd-fast&lt;/span&gt;
ceph osd pool &lt;span class=&quot;nb&quot;&gt;set &lt;/span&gt;rbd-fast crush_ruleset 1
&lt;span class=&quot;c&quot;&gt;# Set medium tier for the rbd pool&lt;/span&gt;
ceph osd pool &lt;span class=&quot;nb&quot;&gt;set &lt;/span&gt;rbd crush_ruleset 2
&lt;span class=&quot;c&quot;&gt;# Set slow tier for the &quot;archives&quot; pool&lt;/span&gt;
ceph osd pool &lt;span class=&quot;nb&quot;&gt;set &lt;/span&gt;archives crush_ruleset 3
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;monitoring&quot;&gt;Monitoring&lt;/h3&gt;
&lt;p&gt;If you are doing Ceph tiering in production, you quickly realize that the output of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceph status&lt;/code&gt; shows the combined available and used space of all tiers.&lt;/p&gt;

&lt;p&gt;To find the used space of each storage tier use &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ceph osd df tree&lt;/code&gt;. You can reflect that in your monitoring system with the following command:&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# Show percentage space used, space used and&lt;/span&gt;
ceph@ceph-1:~# &lt;span class=&quot;nb&quot;&gt;sudo &lt;/span&gt;ceph osd &lt;span class=&quot;nb&quot;&gt;df &lt;/span&gt;tree | &lt;span class=&quot;nb&quot;&gt;grep&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'root '&lt;/span&gt; | &lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'{print $10 &quot;:&quot;, $7 &quot;%&quot; &quot; &quot; $5 &quot;/&quot; $4}'&lt;/span&gt;
fast: 50.11% 1169G/2332G
medium: 23.28% 3059G/13142G
slow: 10.19% 6153G/60383G
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;edit&quot;&gt;Edit&lt;/h2&gt;

&lt;p&gt;Since Ceph Luminious there is native support for storage tiering under the name &lt;a href=&quot;https://ceph.com/community/new-luminous-crush-device-classes/&quot;&gt;device classes&lt;/a&gt;.&lt;/p&gt;</content><author><name>{&quot;twitter&quot;=&gt;&quot;Miouge&quot;}</name></author><category term="ceph" /><category term="ceph" /><category term="production" /><summary type="html">You have several options to deploy storage tiering within Ceph. In this post I will show you a simple yet powerful approach to automatically update the CRUSHmap and create storage policies.</summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.root314.com/img/disks-sq.jpg" /><media:content medium="image" url="https://www.root314.com/img/disks-sq.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>