Supercharging Elasticsearch with Transformers

09 Aug 2021 by dzlab

Elasticsearch query DSL provides the possibility to use custom logic for calculating the score for the returned documents using script_score query. In this article, we will leverage this functionality along with Sentence Transformers to improve search result.

First, we load Sentence Transformers model and use it to calculate the embeddings of each document in the corpus. In this case, we are loading documents from a JSON file and processing each one individually:

f = open('data.json',)
documents = json.load(f)
corpus = []
for doc in documents:
    text = doc['text']
    embeddings = model.encode(text)
    doc['embeddings'] = embeddings.tolist()

Note: we covert the embeddings into list of double in order to serialize it later back to json and sending it as payload for Elasticsearch index API.

Second, we sotre the documents along with the calculating embeddings into the test index:

from elasticsearch import Elasticsearch

es = Elasticsearch()

for idx, doc in enumerate(documents):
    res = es.index(index="test", id=idx+1, body=doc)

Now we are ready to call the search API. But first we need to calculate the embeddings for search query the same way we did for each indexed documents:

query = "..."
query_vector = model.encode(text).tolist()

Finally, we use Cosine Similarity function to find among all documents which ones have an embeding vector the closest in distance to the embedding vector of the query:

script_query = {
    "script_score": {
        "query": {"match_all": {}},
        "script": {
            "source": "cosineSimilarity(params.embeddings, doc['embeddings']) + 1.0",
            "params": {"embeddings": query_vector}
        }
    }
}
search_body = {
    "size": 10,
    "query": script_query,
    "_source": {"excludes": ['embeddings']}
}
result = es.search(index="myindex", body=search_body)

Note how we pass the query_vector as a parameter in the API call, and we use cosineSimilarity function as the scorer method.

All things

Supercharging Elasticsearch with Transformers

Related Posts

07 Jun 2025

Reverse Engineering Zed's AI Coding Assistant with mitmproxy 07 Jun 2025

Advanced Retrieval Techniques to Supercharge Your RAG 24 May 2025