Managing data

Deleting indices

Delete an index called pages:

DELETE /pages

Adding indices

To add an index called products:

PUT /products

we can pass in a JSON object as well, defining # of shards and replicas:

PUT /products
{
    "settings": {
        "number_of_shards": 2,
        "number_of_replicas": 2
    }
}

Indexing documents

POST /products/_doc
{
    "name": "Coffee Maker",
    "price": 64,
    "in_stock": 10
}

Retrieving docs by id

Here with the ID of 100.

GET /products/_doc/100

Retrieving all docs in an index

Here we need to use the search endpoint:

GET /products/_search 
{
    "query": {
        "match_all": {}
    }
}

Updating docs

POST /products/_update/100
{
"doc": {
    "in_stock": 3
}    
}

Adding new fields to docs

Here we use the update API. Something to note about this API is that it replaces the document, it doesn't update it in situ.

POST /products/_update/100
{
"doc": {
    "tags": ["electronics"]
}    
}

Documents are immutable in ElasticSearch, meaning they are deleted and reindexed

Scripted updates

Reduce stock count by 1. ctx is a special variable that allows us to access the source of the object. ctx._source allows us to access the source of the object we want to update.

POST /products/_update/ig5mHpgBorYWjN5fK4m0
{
   "script": {
        "source": "ctx._source.in_stock--"
   }
}

Update the stock value

POST /products/_update/ig5mHpgBorYWjN5fK4m0
{
   "script": {
        "source": "ctx._source.in_stock = 11"
   }
}

Using params in scripted updates

We can also pass params that can be used in our updates.

For e.g: if customer purchases 4 products this means we can access this value in our script and use it programatically to reduce stock by 4:

POST /products/_update/ig5mHpgBorYWjN5fK4m0
{
    "script": {
        "source": "ctx._source.in_stock -= params.quantity",
        "params": {
            "quantity": 4
        }
    }
}

You can add conditional scripts to the source using multiline, for e.g skip an update if the stock count is already 0:

if(ctx._source.in_stock === 0) {
    ctx.op == 'noop'
}

noop means 'no operation' and will skip the update.

Upsert documents

Upserting documents means to update and insert a document based on whether it exists. If it already exists, a script is run, if not, the doc is indexed.

It still uses the _update API:

POST /products/_update/101
{
    "script": {
        "source": "ctx._source.in_stock++"
    },
    "upsert": {
        "name": "Blender",
        "price": 399,
        "in_stock": 5
    }
}

This example updates the stock of document 101 if it already exists, otherwise it creates a new document with the upsert body.

Replacing documents

PUT /products/_doc/101
{
    "name": "Toaster",
    "price": 500
}

Deleting documents

DELETE products/_doc/101

Updating multiple docs by a query

Similar to updates with a WHERE clause. i.e find docs based on a condition and then update them.

We use the update_by_query endpoint with an attached query clause:

POST /products/_update_by_query
{
    "script": {
        "source": "ctx._source.in_stock--"
    },
    "query": {
        "match_all": {}
    }
}

When ES runs bulk updates it takes an index snapshot is created in order to handle failure cases. ES have a retry mechanism built in as well. If an error occurs, the request returns, but the updates that have occurred remain.

As bulk updates can take some time, there may be changes that have occurred while you run the bulk update. This means the query will fail due to conflicts.

You can override this with "conflicts": "proceed".

More on this topic here

Deleting multiple docs by a query

POST /products/_delete_by_query
{
    "query": {
        "match_all": {}
    }
}

Bulk CURL request

curl --cacert ~/elasticstack/elasticearch/config/certs/http_ca.crt -u elastic  -H "Content-Type: application/x-ndjson" -XPOST https://localhost:9200/products/_bulk --data-binary "@products-bulk.json"

Index vs create actions:

  • Create actions fail if docs already exist
  • Index action will add doc if already exists, otherwise it replaced