Prompt engineering for question answering with LangChain
02 Jan 2023 by dzlabLarge language models (LLMs) like GPT-3 can produce human-like text given an initial text as prompt. They can also be customised to perform a wide variety of natural language tasks such as: translation, summarization, question-answering, etc.
This customization steps requires tweaking the prompts given to the language model to maximize its effectiveness. This tweaking process requires many attempts/modification to the prompt and hence is also known as Prompt engineering. In the rest of this article we will explore how to use LangChain for a question-anwsering application on custom corpus. LangChain is a python library that makes the customization of models like GPT-3 more approchable by creating an API around the Prompt engineering needed for a specific task.
Enter LangChain
Introduction
LangChain provides prompt templates for per task (e.g. question answering) and Data Augmented Generation to augment the knowledge of the LLM by providing more contextual data. For instance, for question answering the templace can be found here and looks like this:
Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES").
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
QUESTION: {question}
=========
Content: ...
Source: ...
...
=========
FINAL ANSWER:
SOURCES:
You can see that the templace:
- starts with a general prompt
Given the following extracted parts ..
then - highlights the question with
QUESTION
then - enumerates a squence of
Content
andSource
clauses, and finally - highlighs the right answer with
FINAL ANSWER
andSOURCES
This is a concrente example of how the earlier prompt template looks like in practice
QUESTION: Which state/country's law governs the interpretation of the contract?
=========
Content: This Agreement is governed by English law and the parties submit to the exclusive jurisdiction of the English courts in relation to any dispute (contractual or non-contractual) concerning this Agreement save that either party may apply to any court for an injunction or other relief to protect its Intellectual Property Rights.
Source: 28-pl
Content: No Waiver. Failure or delay in exercising any right or remedy under this Agreement shall not constitute a waiver of such (or any other) right or remedy.\n\n11.7 Severability. The invalidity, illegality or unenforceability of any term (or part of a term) of this Agreement shall not affect the continuation in force of the remainder of the term (if any) and this Agreement.\n\n11.8 No Agency. Except as expressly stated otherwise, nothing in this Agreement shall create an agency, partnership or joint venture of any kind between the parties.\n\n11.9 No Third-Party Beneficiaries.
Source: 30-pl
Content: (b) if Google believes, in good faith, that the Distributor has violated or caused Google to violate any Anti-Bribery Laws (as defined in Clause 8.5) or that such a violation is reasonably likely to occur,
Source: 4-pl
=========
FINAL ANSWER: This Agreement is governed by English law.
SOURCES: 28-pl
Usage
Using LangChain is straightforward. First, we would need to install the dependencies
$ pip install langchain requests transformers faiss-cpu
Next, import LangChain modules. Specifically a QA chain and a language model (e.g. OpenAI GPT-3).
Note: LangChain support other language models (e.g.
HuggingFacePipeline
or Cohere) but support for the question answering task may not be available as of now.
from langchain.llms import OpenAI
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.docstore.document import Document
import requests
Now, we instantiate an OpenAI client to use as our language models
llm = OpenAI(temperature=0)
Note: we need to set the
OPENAI_API_KEY
environment variable to be able to use OpenAI client, you can get a key at https://beta.openai.com/account/api-keys
Then wrap the language model in a Question-Answering chain as follows:
chain = load_qa_with_sources_chain(llm)
For the question answering example we will use data from Wikipedia to build a toy corpus. The following helper function fetches articles from Wikipedia and creates LangChain Document
s.
def query_wikipedia(title, first_paragraph_only=True):
base_url = "https://en.wikipedia.org"
url = f"{base_url}/w/api.php?format=json&action=query&prop=extracts&explaintext=1&titles={title}"
if first_paragraph_only:
url += "&exintro=1"
data = requests.get(url).json()
return Document(
metadata={"source": f"{base_url}/wiki/{title}"},
page_content=list(data["query"]["pages"].values())[0]["extract"],
)
Now we can download some articles
sources = [
query_wikipedia("Michelangelo"),
query_wikipedia("Claude_Monet"),
query_wikipedia("Alexandre_Dumas"),
query_wikipedia("Victor_Hugo"),
]
Finally we put everything together in the following helper function that will return the language model answers given document sources and a question:
def qa(chain, question):
inputs = {"input_documents": sources, "question": question}
outputs = chain(inputs, return_only_outputs=True)["output_text"]
return outputs
Now we can test using a simple question
qa(chain, "Who wrote Les Misérables?")
or a more complicated question like this
qa(chain, "What are the main differences between Victor Hugo and Alexandre Dumas writing styles?")
Handling large corpus
The previous simple chain would work for small corpus or small documents, but will not work for larger sets. For instance, OpenAI implements a size limit on the prompt which means we cannot sends requests with large text body.
LangChain provides couple workarounds for those limitations. Let’s examine them in the following subsections.
Using a map-reduce chain
When creating a chain we can pass a chain_type
argument that takes one of the following values (see documentation):
stuff
used as the default value, it simply indicates that the chain will combine all of the input sources into the promptmap_reduce
: maps over the input sources and summarizes them. Then use the summaries when building the prompt.refine
: iterates over the input sources and query the language model for answers.
We can test one of those chain types as follows
mapred_chain = load_qa_with_sources_chain(llm, chain_type="map_reduce")
qa(mapred_chain, "your question here")
Using a vector store
You may notice that using anything than the stuff
type will result in more queries to the underlying language model. This may lead to longer response times when using long ducuments or large corpus.
To speed up search, LangChain allow us to combine language models with search engines (e.g. FAISS) as follows
- Ahead of time, index all sources using a traditional search engine
- At query time, use the question to query the search index and select top k (e.g. 2) results.
- The selected documents are used as sources for a chain of type
stuff
.
We can build a search index with FAISS as follows
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores.faiss import FAISS
vector_store = FAISS.from_documents(sources, OpenAIEmbeddings())
Note: we are using OpenAI API to create embeddings for each document.
Finally, we can use the search index to lookup for answers as follows
def qa_vector_store(chain, question):
inputs = {
"input_documents": vector_store.similarity_search(question, k=4),
"question": question
}
response = chain(inputs, return_only_outputs=True)
outputs = response["output_text"]
return outputs
Now, we can test everything with questions
qa_vector_store(chain, "your question here")
Using text splitter
Very large documents may still pose problems, for this we can use a text splitter to chunk them into multiple smaller documents. LangChain provides a text_splitter to do this, and we can leverage it to chunk our wikipedia documents as follows:
from langchain.text_splitter import CharacterTextSplitter
def chunk(sources):
splitter = CharacterTextSplitter(separator=" ", chunk_size=1024, chunk_overlap=0)
chunks = []
for src in sources:
for chunk in splitter.split_text(src.page_content):
document = Document(page_content=chunk, metadata=src.metadata)
chunks.append(document)
return chunks
In the previous function, CharacterTextSplitter
is configured to split documents on whitespaces and create chunks of maximum size of 1024 characters. LangChain supports other types of splitters that may work better, check the documentation
We can also fill the FAISS vector store with chunks instead of the full documents as follows
vector_store = FAISS.from_documents(chunks, OpenAIEmbeddings())
That’s all folks
I hope you enjoyed this article, feel free to leave a comment or reach out on twitter @bachiirc.