How to setup ElasticSearch with Replication on a Windows Azure Availability set of VMs

Installing Elastic Search

If you wonder why we care about availability sets you better read here

Now to the final ingredient of our uber stack of technologies that will form the foundation of our awesome new product ;), ElasticSearch. Installing ElasticSearch is a quite straightforward process, you just download a binary file and run it, although as we’ll see below running ElasticSearch on a multi-node environment and especially on a cloud infrastructure like Azure, is a bit more complicated process.

Running ElasticSearch in a Multicast friendly environment in a multi-node setup is also a straightforward process that involves mainly the proper configuration of your individual nodes together with setting in the configuration a cluster name that will be used for auto-discovery, after that you run all your nodes and ElasticSearch does its magic and the nodes will join your cluster. But this can happen only in environments that multicast is supported, in Windows Azure Multicast is not supported and thus we cannot rely on it for auto-discovery.

Fear not! The people at ElasticSearch have created plugins to support cloud environments where multicast is not supported and Azure is among the supported ones. What the plugin does is to use the Azure API for the unicast discovery mechanism in order to provide a similar to the case of having multicast experience when managing an ElasticSearch cluster on Azure. What is important to keep in mind if you want to have an ElasticSearch cluster on Windows Azure:

  • You should create your VMs with your ssh keys defined and uploaded to Azure. (Important!!)
  • As an additional step you’ll have to create a java keystore using your ssh keys that will be present in each node.
  • Regarding the ElasticSearch installation the only additional step is to install the Azure plugin. But be sure to select the proper plugin version, which has to be aligned with the version of the ElasticSearch that you will use.
  • ElasticSearch will be using the Azure API to perform the auto-discovery mechanism.
  • There’s a minimal configuration needed inside the elasticsearch.yml related the usage of the plugin
  • Well that’s all

Regarding your ssh Keys, make sure that they are “Azure compatible”, in order to do that you’ll have to generate X509 certificates with a 2048-bit RSA keypair as suggested by Microsoft. Also in this link it suggests to use OpenSSL that isn’t from MacPorts and it also mentions versions of it that do not work. Assuming everything is ok with your OpenSSL you have to do the following in order to generate your keys:

The .pem file that you have created is the one you have to upload during the process of creating a new VM in Azure in case you want to also connect to it using ssh keys instead of a username and password. The .cer file that was created will be uploaded to the Azure portal so we can use the Azure API directly. In order to do that log on to windows Azure using your account and then from the portal go to SETTINGS -> MANAGEMENT CERTIFICATES and press the UPLOAD button at the bottom, then all you have to do is to select the .cer file you have created previously.

The next step is to create the keystore that will be used by ElasticSearch. In order to do that you do the following:

Things to remember regarding the keystore:

  • You HAVE to provide a password and remember it as it will be used by ElasticSearch
  • You need a copy of the keystore (azurekeystore.pkcs12) in each of your nodes.

Right now we are done with anything related to Windows Azure, what is left is to install and configure ElasticSearch in a way that will form a cluster using the VMs we have created in the same availability zone. Installing ElasticSearch is quite trivial, you just have to go here select the version and the packaging you prefer, download and install it. What is important though is to install the plugin for windows azure, something that it can easily be done using the following command:

Keep in mind the following:

  1. You need to have Java installed so if your VM doesn’t have one, install a jdk
  2. Use the correct version of the plugin

Now the only thing that is left is to properly configure ElasticSearch to use the Azure plugin, in order to do that you need to put the following in your elasticsearch.yml file

run ElasticSearch in both your nodes and check the logs for the following:

Success 🙂

So now additionally to your mariaDB you also have ElasticSearch running as a cluster on your availability set, so if one of the nodes go down it will continue working and depending on your configuration it should rebalance the nodes as soon the node joins the cluster again.

In case you missed the guide on how to setup MariaDB on this environment you can find it here.

Links

[1] https://github.com/elasticsearch/elasticsearch-cloud-azure/tree/es-1.2 the elasticsearch azure plugin which has some excellent documentation.

[2] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html plugins for other IaaS providers, e.g. EC2

[3] http://azure.microsoft.com/en-us/documentation/articles/linux-use-ssh-key/ how to properly create ssh keys for your azure VMS.

 

Advertisements