Nested Objects in Elasticsearch
1. Index array object without nested
a) Create index
First we need to create an index.
curl -X PUT http://localhost:9200/news
b) Add documents
Add a document with an array object.
curl -X POST -H 'Content-Type: application/json' -d '{
"title": "News 1",
"description": "This is news 01",
"id": 1,
"category": "Sports",
"location": "Sri Lanka",
"comments": [
{
"user": "pramod",
"comment": "this is okay",
"rate": 5
},
{
"user": "pramod",
"comment": "this is okay",
"rate": 4
},
{
"user": "shehan",
"comment": "this is cool",
"rate": 2
}
]
}' http://localhost:9200/news/_doc
c) Search documents
According to the Image 01, this is how these data are indexed in elasticsearch. Here you can see we have comments array object.
d) Filter documents
Here we are trying to get the document that user of the comment is containing shehan
with a rate of at least 3
curl -X GET "localhost:9200/news/_search?pretty" -H 'Content-Type: application/json' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"comments.user": "shehan"
}
},
{
"range": {
"comments.rate": {
"gte": 3
}
}
}
]
}
}
}'
Here one particular document fits our criteria.
But according to the image 01 result should be empty because there aren’t any comments with user shehan and a minimum rate of 3. rate is 2 for user “shehan”’s comment.
e) Index mapping
curl -X GET http://localhost:9200/news/_mapping?pretty
{
"news" : {
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"comments" : {
"properties" : {
"comment" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"rate" : {
"type" : "long"
},
"user" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"description" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"id" : {
"type" : "long"
},
"location" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
We didn’t add any explicit field mapping and elasticsearch generates dynamic field mapping for us automatically. Here you can see that comments field is mapped as object field.
f) Internally document store
Document would be transformed internally like this when mapping json array as object field
. The values for each object keys are grouped together and indexed as an array.
This is the reason for the unexpected result for the previous query. Here you can see “comments.user” array is containting “shehan” and “comment.rate” is containing values greater than 3. But problem is here that 5 and 4 ratings belongs to other comments. not belongs to the user “shehan”.
g) Problem
- When indexing arrays of objects , the relationships between values are not maintained.
- Queries can yield unpredictable result.
2. Index array object with nested
If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested
data type instead of the object
data type.
a) Create index with nested data type
curl -X PUT http://localhost:9200/news -H 'Content-Type: application/json' -d '{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"description": {
"type" : "text"
},
"id": {
"type": "integer"
},
"category": {
"type": "text"
},
"location": {
"type": "text"
},
"comments": {
"type": "nested",
"properties": {
"user": {
"type": "text"
},
"comment": {
"type": "text"
},
"rate": {
"type": "integer"
}
}
}
}
}
}'
b) Get mapping
c) How these documents index internally
- Internally, nested objects index each object in the array as a separate hidden document(each nested object is indexed as a separate Lucene document) with the root document.
- If an object matches the search, the
nested
query returns the root parent document. - They can be queried independently of each other.
d) Filter documents
Here also we are trying to get the document that user of the comment is containing shehan
with a rate of at least 3
. But for this time we don’t have any result because now it maintains the independence of each object in the array.
For this query, it returns a result because this query matches because shehan
and 2
are in the same nested object.
e) Problems
- Performance hit- Nested field indexing and querying is much more expensive.
- Storage- It creates separate Lucene documents for each object in the array.
Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html