Nested Objects in Elasticsearch

Pramod Shehan
5 min readJan 13, 2024

--

1. Index array object without nested

a) Create index

First we need to create an index.

curl -X PUT http://localhost:9200/news

b) Add documents

Add a document with an array object.

curl -X POST -H 'Content-Type: application/json' -d '{
"title": "News 1",
"description": "This is news 01",
"id": 1,
"category": "Sports",
"location": "Sri Lanka",
"comments": [
{
"user": "pramod",
"comment": "this is okay",
"rate": 5
},
{
"user": "pramod",
"comment": "this is okay",
"rate": 4
},
{
"user": "shehan",
"comment": "this is cool",
"rate": 2
}
]
}' http://localhost:9200/news/_doc

c) Search documents

According to the Image 01, this is how these data are indexed in elasticsearch. Here you can see we have comments array object.

Image 01

d) Filter documents

Here we are trying to get the document that user of the comment is containing shehan with a rate of at least 3

curl -X GET "localhost:9200/news/_search?pretty" -H 'Content-Type: application/json' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"comments.user": "shehan"
}
},
{
"range": {
"comments.rate": {
"gte": 3
}
}
}
]
}
}
}'

Here one particular document fits our criteria.

But according to the image 01 result should be empty because there aren’t any comments with user shehan and a minimum rate of 3. rate is 2 for user “shehan”’s comment.

e) Index mapping

curl -X GET http://localhost:9200/news/_mapping?pretty
{
"news" : {
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"comments" : {
"properties" : {
"comment" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"rate" : {
"type" : "long"
},
"user" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"description" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"id" : {
"type" : "long"
},
"location" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}

We didn’t add any explicit field mapping and elasticsearch generates dynamic field mapping for us automatically. Here you can see that comments field is mapped as object field.

f) Internally document store

Document would be transformed internally like this when mapping json array as object field. The values for each object keys are grouped together and indexed as an array.

This is the reason for the unexpected result for the previous query. Here you can see “comments.user” array is containting “shehan” and “comment.rate” is containing values greater than 3. But problem is here that 5 and 4 ratings belongs to other comments. not belongs to the user “shehan”.

g) Problem

  • When indexing arrays of objects , the relationships between values are not maintained.
  • Queries can yield unpredictable result.

2. Index array object with nested

If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested data type instead of the object data type.

a) Create index with nested data type

curl -X PUT http://localhost:9200/news -H 'Content-Type: application/json' -d '{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"description": {
"type" : "text"
},
"id": {
"type": "integer"
},
"category": {
"type": "text"
},
"location": {
"type": "text"
},
"comments": {
"type": "nested",
"properties": {
"user": {
"type": "text"
},
"comment": {
"type": "text"
},
"rate": {
"type": "integer"
}
}
}
}
}
}'

b) Get mapping

c) How these documents index internally

  • Internally, nested objects index each object in the array as a separate hidden document(each nested object is indexed as a separate Lucene document) with the root document.
  • If an object matches the search, the nested query returns the root parent document.
  • They can be queried independently of each other.

d) Filter documents

Here also we are trying to get the document that user of the comment is containing shehan with a rate of at least 3. But for this time we don’t have any result because now it maintains the independence of each object in the array.

For this query, it returns a result because this query matches because shehan and 2 are in the same nested object.

e) Problems

  • Performance hit- Nested field indexing and querying is much more expensive.
  • Storage- It creates separate Lucene documents for each object in the array.

Reference

https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html#:~:text=The%20nested%20query%20searches%20nested,returns%20the%20root%20parent%20document.

--

--