Adding nodes to an ES cluster

As we previously learned, sharding increases our data capacity by breaking down data into smaller chunks to fit onto nodes.

However, this won't scale forever, and as data grows, eventually we will need to add new nodes to our cluster.

How to add new nodes

To add a new node to our cluster, we can simply extract another version of the es .tar we downloaded.

Assuming we have kibana and an existing ES node in a the directory elasticstack, just add two new instances to the directory:

tar -xvf elasticsearch-9.0.3-darwin-x86_64.tar.gz

Do this twice into two folders: node-2 and node-3.

We need to generate an enrollment token for these new nodes. This enrollment token is generated from the existing (usually initial/primary) node in the cluster:

bin/elasticsearch-create-enrollment-token -s node

Then use this token when initialising the new node:

bin/elasticsearch --enrollment-token <TOKEN>

The function of this enrollment token is used by the new node in the cluster to automatically configure:

  • Cluster names
  • CA certificates
  • Initial connection settings
  • Bootstraps communication within the cluster

Outcomes

Take this scenario:

  • You have created a single ES node in a cluster
  • This has generated a number of internal indices
  • You have created a replica shard for an index

As we only have 1 node running, the internal indices are all primary indices with no replication. The replica shard is unassigned because it doesn't have another node to go to. Our cluster will have a YELLOW status because of this.

Remember replica shards have to be on a separate node to the primary shard

When we create a new node on the cluster three things will happen:

  1. The replica shard will be assigned to the new node
  2. Internal indices will have auto generated replica shards because replica indices have a setting index.auto_expand_replicas: 0-1.
  3. Cluster health will change from YELLOW to GREEN

Adding a third node

Note that adding a third node to the cluster has some considerations, namely that once you start a third node, you can never have less than two nodes moving forward.

So if you ever want a singe node cluster again after starting a third node, this is technically not possible.

This has to do with how ES assigns and uses master nodes. But we won't go into that detail.

Elasticsearch automatically assigns and balances shards to nodes for you

When a node leaves

When a node leaves the cluster, it can take some time for shards to be reassigned to a node.

You can read more about this, including why this can take some time, here.