2020-05-26

#elasticsearch

Alexander Nnakwue

19171

May 26, 2020 ⋅ 9 min read

Exploring SQL and Elasticsearch with Open Distro

Alexander Nnakwue Software engineer. React, Node.js, Python, and other developer tools and libraries.

See how LogRocket's Galileo AI surfaces the most severe issues for you

No signup required

Check it out

🚀 Sign up for The Replay newsletter

The Replay is a weekly newsletter for dev and engineering leaders.

Delivered once a week, it's your curated guide to the most important conversations around frontend dev, emerging AI tools, and the state of modern software.

Introduction

Open Distro is an open-source, enterprise-grade Elasticsearch distribution with a lot of advanced features, including a robust security option, alerting, integrated event monitoring, performance analysis, and more. It also offers an alternative for developers already familiar with SQL to quickly hit the ground running with writing Elasticsearch queries.

Exploring SQL And Elasticsearch With Open Distro

Open Distro is an Apache 2.0-licensed extension of the open-source version of Elasticsearch (ES) and Kibana. It is currently developed and maintained by AWS as part of its open source effort. This became necessary due to the continued need to sustain new and advanced ES features as truly open-source software.

Open Distro allows you to interact with a cluster using SQL and to perform operations for which you would normally have to use the Elasticsearch domain-specific language (DSL). This feature was initially based on the elasticsearch-sql plugin, which is now being gradually phased out or deprecated.

In this tutorial, we are going to focus on how to query Elasticsearch data using SQL with the help of this open-source tool. We are going to cover:

Installation and getting started with Open Distro
Using the REST API to ingest data into our ES cluster
SQL with Elasticsearch
- Endpoints and operations
- Features and supported protocols
- Exploring additional plugin settings and options

Prerequisites

To easily follow along with this tutorial, it is advisable to have a basic knowledge of Elasticsearch and writing basic ES queries with the DSL. This is important so that we have a reference point when we write similar queries with SQL.

However, if you are already familiar with basic SQL commands, then this should not be too difficult to follow. For a primer of working with Elasticsearch, we can check out this awesome tutorial on Understanding Elasticsearch query builder in Node.js.

Since we are interested in Open Distro and its SQL feature, we are going to install the Open Distro plugin as a standalone plugin on our machine, the same way we can install and use any other ES plugin on a compatible cluster.

Note: Open Distro for ES and Kibana is available via a Docker image, etc. More details in the next section.

First steps: Getting started

In order to get started with Open Distro for ElasticSearch, let’s begin by installing and configuring it for development. It’s important to note, though, that there is a variety of available options for installing and getting started with Open Distro.

As we mentioned earlier, we are going to be installing the standalone plugin and developing on our machine. Note that if we already use AWS and intend to have this feature, we can make use of the cloud hosted version offered as a service.

Note: For more information on other available options for installing and configuring this tool, we can check this section on installation and configuration in the documentation.

Let’s make sure we have a compatible ES version on our machine. Details about plugin compatibility with ES versions can be found in the documentation.

Here, we will be using Elasticsearch version 7.1.1 and Open Distro plugin version 1.1.0.0 for the SQL feature, since we are only interested in this for now. To install the plugin, we can run the following command on our terminal/command prompt:

sudo bin/elasticsearch-plugin install https://d3g5vo6xdbdb9a.cloudfront.net/downloads/elasticsearch-plugins/opendistro-sql/opendistro_sql-1.1.0.0.zip

Before running this command, we should ensure that we have Elasticsearch version 7.1.1 installed on our machine. For macOS, we can run the following command to download the gzipped folder:

curl -L -O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.1.1-darwin-x86_64.tar.gz

After the download is complete, we can then run this command to untar the gzipped folder:

tar -xzvf elasticsearch-7.1.1-darwin-x86_64.tar.gz

Then we can navigate into the folder to test our installation. We can run this command below:

cd elasticsearch-7.1.1  // navigate into our elasticsearch downloaded binaries
./bin/elasticsearch // test your elastic search installation by starting it up
ls // list all files and folders inside the distribution

The output after the plugin installation should be similar to the below:

-> Downloading https://d3g5vo6xdbdb9a.cloudfront.net/downloads/elasticsearch-plugins/opendistro-sql/opendistro_sql-1.1.0.0.zip
[=================================================] 100%   
-> Installed opendistro_sql

Note: We can confirm our plugin installation is successful by running this command in the ES folder:
sudo bin/elasticsearch-plugin list

Now that we are done setting up open_distro_sql for development, let’s look at how we can ingest data into our ES cluster with a simple Node.js script. After that, we can start using SQL statements to write our queries.

Using the REST API to ingest data into our ES cluster

From our earlier tutorial on working with Elasticsearch, we can take a look at the section on how to ingest data into our ES cluster with Node.js. For reference, here is a similar script that does the same thing:

'use strict'

const fs = require('fs'); // Node JS file system module 
require('dotenv').config()  // for environment variables
const { Client } = require('@elastic/elasticsearch'). // ES client
const client = new Client({ node: 'http://localhost:9200' }). // ES client connection

// import json from file to ingest to our cluster
const data = JSON.parse(fs.readFileSync(__dirname + '/demo.json'));

const index= process.env.ES_INDEX; // ES index
const type = process.env.ES_TYPE;  // ES type

const writeDataToES = async (index, data) => {
    for (let i = 0; i < data.length; i++ ) {
      await client.create({
        refresh: true,
        index: index,
        id: i,
        body: data[i]
      }, (error, res) => {
        if (error) {
          console.error("Failed to import data to ES", error);
          return;
        }
        else {
          console.log("Successfully imported data", res);
        }
      });
    }
};

writeDataToES(index, data);

//to run the script, create a new index.js file and copy the above code snippet
//then run <node index.js> from the terminal...

Note that we can give the ES index any name we choose. Also, make sure we have installed all the imported dependencies for the script to run successfully.

After running the script with the above command, we can confirm that the data has been successfully written to the cluster by running this command:

http://127.0.0.1:9200/index_name/_search?pretty

Note: In this tutorial, our queries would be run against this same JSON data available here.

SQL with Elasticsearch

Endpoints and operations

As reviewed earlier, Open Distro for Elasticsearch SQL lets us write queries in SQL rather than the DSL. In order to use this feature, all we need to do is send all our requests to this → http://<host:port>/_opendistro/_sql URI or endpoint.

While all calls to this endpoint support both request parameter or request body, the latter is usually the recommended approach for making calls to this URI.

Note: POST requests are the recommended approach because there are no limitations as per the content length. Also, other parameters can be passed to the plugin when making a call for extra functionalities like the prepared statements, for example.

Additionally, the explain endpoint is used often for query translation and troubleshooting. The endpoints include:

GET – here, we can send regular HTTP GET requests with our query embedded in URL parameter. An example is shown below:
```
curl -H 'Content-Type: application/json' -X GET localhost:9200/_opendistro/_sql?sql=SELECT * FROM cars
```

POST – we can also send HTTP POST request with our query in the request body, as shown below:

curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql -d '{"query" : "SELECT * FROM cars"}'

explain – to translate a query, we can make use of the explain endpoint. The output is usually the ES DSL in JSON format.

Note: We can also do a copy and paste to the console to run it against our ES cluster. Let’s see an example below:

curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql/_explain -d '{"query" : "SELECT Origin,Horsepower FROM cars WHERE Cylinders > 4 LIMIT 5"}'

The result:

{
        "from": 0,
        "size": 5,
        "query": {
                "bool": {
                        "filter": [{
                                "bool": {
                                        "must": [{
                                                "range": {
                                                        "Cylinders": {
                                                          "from": 4,
                                                          "to": null,
                                                          "include_lower": false,
                                                          "include_upper": true,
                                                          "boost": 1.0
                                                    }
                                              }
                                        }],
                                        "adjust_pure_negative": true,
                                        "boost": 1.0
                                }
                        }],
                        "adjust_pure_negative": true,
                        "boost": 1.0
                }
        },
        "_source": {
                "includes": ["Origin", "Horsepower"],
                "excludes": []
        }
}

As an aside, note that the features or commands supported by native SQL queries are also supported by Open Distro for SQL. They include SQL Select, Delete, Where, Order By, Group By, Inner Join, Show, and so on.

For example, using the WHERE conditional statement is as easy as this:

curl -XPOST localhost:9200/_opendistro/_sql -d '{"query": "SELECT * FROM cars WHERE ['Acceleration']='20' LIMIT 5"}' -H 'Content-Type: application/json'

Note: The WHERE clause indicates we want to filter our data to include only rows where the specified or given conditions are true.

Here’s the result of the query above:

{
        "took": 3,
        "timed_out": false,
        "_shards": {
                "total": 1,
                "successful": 1,
                "skipped": 0,
                "failed": 0
        },
        "hits": {
                "total": {
                        "value": 8,
                        "relation": "eq"
                },
                "max_score": 0.0,
                "hits": [{
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "39",
                        "_score": 0.0,
                        "_source": {
                                "Name": "volkswagen super beetle 117",
                                "Miles_per_Gallon": null,
                                "Cylinders": 4,
                                "Displacement": 97,
                                "Horsepower": 48,
                                "Weight_in_lbs": 1978,
                                "Acceleration": 20,
                                "Year": "1971-01-01",
                                "Origin": "Europe"
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "382",
                        "_score": 0.0,
                        "_source": {
                                "Name": "amc concord dl",
                                "Miles_per_Gallon": 23,
                                "Cylinders": 4,
                                "Displacement": 151,
                                "Horsepower": null,
                                "Weight_in_lbs": 3035,
                                "Acceleration": 20.5,
                                "Year": "1982-01-01",
                                "Origin": "USA"
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "63",
                        "_score": 0.0,
                        "_source": {
                                "Name": "plymouth cricket",
                                "Miles_per_Gallon": 26,
                                "Cylinders": 4,
                                "Displacement": 91,
                                "Horsepower": 70,
                                "Weight_in_lbs": 1955,
                                "Acceleration": 20.5,
                                "Year": "1971-01-01",
                                "Origin": "USA"
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "359",
                        "_score": 0.0,
                        "_source": {
                                "Name": "ford escort 2h",
                                "Miles_per_Gallon": 29.9,
                                "Cylinders": 4,
                                "Displacement": 98,
                                "Horsepower": 65,
                                "Weight_in_lbs": 2380,
                                "Acceleration": 20.7,
                                "Year": "1982-01-01",
                                "Origin": "USA"
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "25",
                        "_score": 0.0,
                        "_source": {
                                "Name": "volkswagen 1131 deluxe sedan",
                                "Miles_per_Gallon": 26,
                                "Cylinders": 4,
                                "Displacement": 97,
                                "Horsepower": 46,
                                "Weight_in_lbs": 1835,
                                "Acceleration": 20.5,
                                "Year": "1970-01-01",
                                "Origin": "Europe"
                        }
                }]
        }
}

Note: In this tutorial, we are make use of the curl command line, too, to make all our requests. We can install curl on macOS with Homebrew by running this command: brew install curl.

Likewise for the DSL, we can also make use of the SQL feature for aggregation, queries, join, show, and many more.

Querying our cluster

Statements

WHERE – this clause does a filter of our dataset to return only rows where a given conditional statement is true. A sample query and result for the WHERE clause can be found here on the Github repo.
ORDER BY – this clause allows us sort our data by a particular column (in this case we are using the _id column) numerically or alphabetically. Here, we are doing so numerically and in an ascending order. A sample query and result for the ORDER BY clause can be found here on the Github repo.
GROUP BY – this clause is used with aggregate functions, e.g., the range function, so as to have a group of data with similar attributes in a single bucket. A sample query and result for the GROUP BY clause can be found here on the Github repo.

Aggregations

COUNT() – as we can see from the sample query on GitHub, the COUNT() function takes the name of a column — in this case, Miles_per_Gallon — as an argument and counts the number of rows where it is not NULL.
MIN() – the MIN() function takes the name of a column as an argument — Horsepower, in our sample query — and returns the smallest value in that column.
SHOW – an aggregate function that returns the indices and mappings that match the pattern of an index name. Note that we can use either * or % commands for wild cards. A sample query and result for the SHOW command can be found here on GitHub.
AVG() – this is an aggregate function that returns the average value for a numeric column. In our sample query, that is Weight_in_lbs.

Functions

DATE_FORMAT() – the DATE_FORMAT function takes a timestamp column as an argument, with the field we intend to format, and returns the formatted response for that column. A sample query and result for the DATE_FORMAT function can be found in the GitHub repo here.
FLOOR() – this function rounds off a number value for a specified numeric column to the nearest whole. Note that it can be combined with other aggregate functions. In our sample query and result, we combine it with AS.

Conditions

BETWEEN – this is used to filter out the data within a certain range. Note that the conditions can be numbers, text, or dates. In our sample query, we are making use of numbers.
COUNT(DISTINCT … ) – this is used for selecting only unique values of a column. A sample query and result for this command can be found in the GitHub repo.
IN – this clause returns results based on multiple values in a common column. A sample query and result for the IN clause can be found in the GitHub repo here.

Features and supported protocols

For the protocol, the SQL plugin provides multiple response formats for different purposes, while the request format is the same for all. Among them, JDBC format is widely used because it provides schema information and additional functionality, such as pagination. Besides the JDBC driver, various clients can benefit from the detailed and well-formatted response.

For example, the body of an HTTP POST request can take a few more other fields with our SQL query:

curl -H 'Content-Type: application/json' -X POST localhost:9200/_opendistro/_sql -d '{"query" : "SELECT Origin, Horsepower, Miles_per_Gallon FROM cars LIMIT 5","filter" : {"range" :{"Weight_in_lbs" : {"lt" : 4000}}}}'

The result:

{
        "took": 6,
        "timed_out": false,
        "_shards": {
                "total": 1,
                "successful": 1,
                "skipped": 0,
                "failed": 0
        },
        "hits": {
                "total": {
                        "value": 339,
                        "relation": "eq"
                },
                "max_score": 0.0,
                "hits": [{
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "182",
                        "_score": 0.0,
                        "_source": {
                                "Origin": "Europe",
                                "Horsepower": 70,
                                "Miles_per_Gallon": 29
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "128",
                        "_score": 0.0,
                        "_source": {
                                "Origin": "USA",
                                "Horsepower": 150,
                                "Miles_per_Gallon": 15
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "156",
                        "_score": 0.0,
                        "_source": {
                                "Origin": "Japan",
                                "Horsepower": 97,
                                "Miles_per_Gallon": 24
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "170",
                        "_score": 0.0,
                        "_source": {
                                "Origin": "USA",
                                "Horsepower": 95,
                                "Miles_per_Gallon": 18
                        }
                }, {
                        "_index": "cars",
                        "_type": "_doc",
                        "_id": "57",
                        "_score": 0.0,
                        "_source": {
                                "Origin": "Europe",
                                "Horsepower": 90,
                                "Miles_per_Gallon": 28
                        }
                }]
        }
}

Note: We can make use of the filter command to add more conditions to the ES DSL directly. A sample query and result for this extra field feature can be found in the GitHub repo here.

Exploring additional plugin settings and options

When Elasticsearch bootstraps, the SQL plugin will register a few settings in the ES cluster settings. Most of these settings can change dynamically, as this will give us some control over the behavior of our installed plugin without the need to touch our cluster configurations.

They include:

1.) opendistro.sql.enabled – we can enable or disable the SQL plugin to reject all coming requests. Note the following:

The default value is true
This setting is node scope
This setting can be updated dynamically

We can update the setting with a new value like this:

curl -H 'Content-Type: application/json' -X PUT localhost:9200/_cluster/settings -d '{"transient" : {"opendistro.sql.enabled" : false}}'

Here’s the result:

{
        "acknowledged": true,
        "persistent": {},
        "transient": {
                "opendistro": {
                        "sql": {
                                "enabled": "false"
                        }
                }
        }
}

2.) opendistro.sql.query.analysis.enabled – we can also enable or disable the query analyzer to bypass strict syntactic and semantic analysis. Note the following:

The default value is true
This setting is node scope
This setting can only be updated dynamically

We can update this setting with a new value like this:

curl -H 'Content-Type: application/json' -X PUT localhost:9200/_cluster/settings -d '{"transient" : {"opendistro.sql.query.analysis.enabled" : false}}'

Result:

{
        "acknowledged": true,
        "persistent": {},
        "transient": {
                "opendistro": {
                        "sql": {
                                "query": {
                                        "analysis": {
                                                "enabled": "false"
                                        }
                                }
                        }
                }
        }
}

Plugin monitoring

By making use of the /stats endpoint, we can collect metrics for our plugin within a particular time duration. Note that only the statistics as per node level are currently supported. In essence, we only get the metrics for the node we are able to get access to. Let’s see an example below –

curl -H 'Content-Type: application/json' -X GET localhost:9200/_opendistro/_sql/stats

Result:

{
        "failed_request_count_cb": 0,  
        "failed_request_count_cuserr": 0,
        "circuit_breaker": 0,
        "request_total": 12,
        "request_count": 0,
        "failed_request_count_syserr": 0
}

Conclusion

From our earlier tutorial, we have learned that Elasticsearch is a fully distributed search and analytics engine. It’s also schemaless since it offers support for both structured and unstructured data. With Open Distro, it is now possible for users who are already comfortable with SQL to interact with an Elasticsearch cluster and to perform similar operations as they would with the DSL.

In this tutorial, we have covered majorly the SQL feature that affords an easy way to integrate with other SQL-compliant systems. But this is just the tip of the iceberg — to learn more about other features not covered in this tutorial, do not hesitate to check out the awesome documentation to learn more about other capabilities of Open Distro for ES.

Finally, the repository for this tutorial is available here on Github.

Get set up with LogRocket's modern error tracking in minutes:

Visit https://logrocket.com/signup/ to get an app ID

Install LogRocket via npm or script tag. LogRocket.init() must be called client-side, not server-side

npm
Script tag

$ npm i --save logrocket 

// Code:

import LogRocket from 'logrocket'; 
LogRocket.init('app/id');

// Add to your HTML:

<script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script>
<script>window.LogRocket && window.LogRocket.init('app/id');</script>

(Optional) Install plugins for deeper integrations with your stack:
- Redux middleware
- NgRx middleware
- Vuex plugin

Get started now

#elasticsearch

The Replay (10/22/25): AI-assisted coding, Wasm 3.0, and more

Discover what’s new in The Replay, LogRocket’s newsletter for dev and engineering leaders, in the October 22nd issue.

Matt MacCormack

Oct 22, 2025 ⋅ 29 sec read

Where AI-assisted coding accelerates development — and where it doesn’t

John Reilly discusses how software development has been changed by the innovations of AI: both the positives and the negatives.

John Reilly

Oct 22, 2025 ⋅ 12 min read

Debugging with Chrome DevTools MCP: Giving AI eyes in the browser

Learn how to effectively debug with Chrome DevTools MCP server, which provides AI agents access to Chrome DevTools directly inside your favorite code editor.

Emmanuel John

Oct 21, 2025 ⋅ 6 min read

Goodbye, useState? Smarter state modeling for modern React apps

Ever opened a React file and found a graveyard of useState hooks? The problem isn’t React; it’s how we model state. Here’s how to do it smarter.

Oscar Jite-Orimiono

Oct 21, 2025 ⋅ 9 min read

View all posts

5 Replies to "Exploring SQL and Elasticsearch with Open Distro"

Abbas Hussain says:

May 27, 2020 at 1:01 pm

The endpoint should be `_opendistro/_sql` instead of `opendistro/sql` for code blocks in Section `Endpoints and operations`

Reply
Pedro Pardo says:

May 30, 2020 at 6:49 am

> Among them, JDBC format is widely used because it provides schema information and additional functionality, such as pagination.

Does pagination work for you? I can’t seem to get past 200 results.

Reply
Alex Regin says:

May 30, 2020 at 6:56 am

Great intro, thanks.

Reply
Chetan Mehta says:

June 2, 2020 at 6:17 am

Great article.

Is there support for Moving average in SQL for Open Distro ?

Reply
Alexander Nnakwue says:

June 3, 2020 at 1:51 am

Oh thanks for that Abbas. I see the error now in the first two sections on HTTP GET and POST calls. The underscores are missing

Reply

Advisory boards aren’t only for executives. Join the LogRocket Content Advisory Board today →