In my second week as a Software Developer & Data Scientist, I met Elasticsearch. Elasticsearch is a search engine. It is open and provides you with an analytical engine. The queries for Elasticsearch are in real-time. Elasticsearch provides features as autocomplete, geo-localisation, based filters and multilevel aggregations.
Our company scraps the web 🐱💻 . Therefore, scraping unstructured data as any web page and transforming it into structured data seems to fit Elasticsearch and our duties perfectly.
Basic Concepts
The structure of Elasticsearch could be similar to a SQL structure. The next table gives you a view (equivalent terms) of it.
Document
Do you remember that the company scraps the web? Elasticsearch stores the data in JSON documents. JSON is very flexible and easily understood by humans. Elasticsearch supports our duty while we grab unstructured data from websites and store it in a JSON. The equivalent of a Row (SQL) in Elasticsearch is a Document (Figure 1).
In Figure 2, we created a “Users” type that shows us the Document (Row), Luke, Petra, 20, 21, male, etc. In figure 2, therefore, our elements are:
Type = USERS.
Document = Luke, Petra, 20, 21, 1, 2, Male, Female, etc.
Field = ID, Name, Age, Gender, Email.
It is a simple new way of naming our data. When we see a Document as a JSON format would look like:
{
"id": 1,
"name": "Luke",
"gender": "M",
"email": "luke@gmail.com"
}
Index
The index in Elasticsearch is a database in SQL. Do not be confused, with a Database index. Hence, our data would be stored in Elasticsearch indexes similarly as you store data in databases. Elasticsearch index should be in lower case and with a unique name.
Type
A type is a database table. In figure 1, we have created a “USERS” type. Different Types separate a different kind of data. Therefore an index can be relational and contain more than one type. Let's see how do they look as JSON and how they correlate:
{
"articleid": 1,
"name": "Futbol-ball",
}
The document contains the article type. In the next block, we see the document from a comment Type.
{ "commentid": "RxftPwUwere-rTs"
"articleid": 1,
"comment": "Best price-quality ball",
}
We have entered the beautiful world of Elasticsearch. We are experts in Elasticsearch terminology. Now we know the terminology in Elasticsearch, we can now understand the basics for this powerful analytical search engine. That is the next post.