Prompt engineering for question answering with LangChain

Large language models (LLMs) like GPT-3 can produce human-like text given an initial text as prompt. They can also be customised to perform a wide variety of natural language tasks such as: translation, summarization, question-answering, etc.

This customization steps requires tweaking the prompts given to the language model to maximize its effectiveness. This tweaking process requires many attempts/modification to the prompt and hence is also known as Prompt engineering. In the rest of this article we will explore how to use LangChain for a question-anwsering application on custom corpus. LangChain is a python library that makes the customization of models like GPT-3 more approchable by creating an API around the Prompt engineering needed for a specific task.

Enter LangChain


LangChain provides prompt templates for per task (e.g. question answering) and Data Augmented Generation to augment the knowledge of the LLM by providing more contextual data. For instance, for question answering the templace can be found here and looks like this:

Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES").
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.

QUESTION: {question}
Content: ...
Source: ...

You can see that the templace:

  1. starts with a general prompt Given the following extracted parts .. then
  2. highlights the question with QUESTION then
  3. enumerates a squence of Content and Source clauses, and finally
  4. highlighs the right answer with FINAL ANSWER and SOURCES

This is a concrente example of how the earlier prompt template looks like in practice

QUESTION: Which state/country's law governs the interpretation of the contract?
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an injunction or other relief to protect its Intellectual Property Rights.
Source: 28-pl

Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other) right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Source: 30-pl

Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as defined in Clause 8.5) or that such a violation is reasonably likely to occur,
Source: 4-pl
FINAL ANSWER: This Agreement is governed by English law.
SOURCES: 28-pl


Using LangChain is straightforward. First, we would need to install the dependencies

$ pip install langchain requests transformers faiss-cpu

Next, import LangChain modules. Specifically a QA chain and a language model (e.g. OpenAI GPT-3).

Note: LangChain support other language models (e.g. HuggingFacePipeline or Cohere) but support for the question answering task may not be available as of now.

from langchain.llms import OpenAI
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.docstore.document import Document
import requests

Now, we instantiate an OpenAI client to use as our language models

llm = OpenAI(temperature=0)

Note: we need to set the OPENAI_API_KEY environment variable to be able to use OpenAI client, you can get a key at

Then wrap the language model in a Question-Answering chain as follows:

chain = load_qa_with_sources_chain(llm)

For the question answering example we will use data from Wikipedia to build a toy corpus. The following helper function fetches articles from Wikipedia and creates LangChain Documents.

def query_wikipedia(title, first_paragraph_only=True):
  base_url = ""
  url = f"{base_url}/w/api.php?format=json&action=query&prop=extracts&explaintext=1&titles={title}"
  if first_paragraph_only:
    url += "&exintro=1"
  data = requests.get(url).json()
  return Document(
    metadata={"source": f"{base_url}/wiki/{title}"},

Now we can download some articles

sources = [

Finally we put everything together in the following helper function that will return the language model answers given document sources and a question:

def qa(chain, question):
  inputs = {"input_documents": sources, "question": question}
  outputs = chain(inputs, return_only_outputs=True)["output_text"]
  return outputs

Now we can test using a simple question

qa(chain, "Who wrote Les Misérables?")

or a more complicated question like this

qa(chain, "What are the main differences between Victor Hugo and Alexandre Dumas writing styles?")

Handling large corpus

The previous simple chain would work for small corpus or small documents, but will not work for larger sets. For instance, OpenAI implements a size limit on the prompt which means we cannot sends requests with large text body.

LangChain provides couple workarounds for those limitations. Let’s examine them in the following subsections.

Using a map-reduce chain

When creating a chain we can pass a chain_type argument that takes one of the following values (see documentation):

We can test one of those chain types as follows

mapred_chain = load_qa_with_sources_chain(llm, chain_type="map_reduce")
qa(mapred_chain, "your question here")

Using a vector store

You may notice that using anything than the stuff type will result in more queries to the underlying language model. This may lead to longer response times when using long ducuments or large corpus. To speed up search, LangChain allow us to combine language models with search engines (e.g. FAISS) as follows

We can build a search index with FAISS as follows

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS

vector_store = FAISS.from_documents(sources, OpenAIEmbeddings())

Note: we are using OpenAI API to create embeddings for each document.

Finally, we can use the search index to lookup for answers as follows

def qa_vector_store(chain, question):
  inputs = {
    "input_documents": vector_store.similarity_search(question, k=4),
    "question": question
  response = chain(inputs, return_only_outputs=True)
  outputs = response["output_text"]
  return outputs

Now, we can test everything with questions

qa_vector_store(chain, "your question here")

Using text splitter

Very large documents may still pose problems, for this we can use a text splitter to chunk them into multiple smaller documents. LangChain provides a text_splitter to do this, and we can leverage it to chunk our wikipedia documents as follows:

from langchain.text_splitter import CharacterTextSplitter

def chunk(sources):
  splitter = CharacterTextSplitter(separator=" ", chunk_size=1024, chunk_overlap=0)
  chunks = []
  for src in sources:
    for chunk in splitter.split_text(src.page_content):
      document = Document(page_content=chunk, metadata=src.metadata)
  return chunks

In the previous function, CharacterTextSplitter is configured to split documents on whitespaces and create chunks of maximum size of 1024 characters. LangChain supports other types of splitters that may work better, check the documentation

We can also fill the FAISS vector store with chunks instead of the full documents as follows

vector_store = FAISS.from_documents(chunks, OpenAIEmbeddings())

That’s all folks

I hope you enjoyed this article, feel free to leave a comment or reach out on twitter @bachiirc.