Llama Index

At a glance

Hello everyone,

I have read all the articles on the LlamaIndex website, and I apologize for asking what may seem like a noob question.
My purpose is to perform question answering or semantic searching through a large set of articles (tens of thousands).

OpenAI's cookbook (https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb) has taught us to:

Turn articles into embeddings using the embedding API
Turn our query into embeddings using the embedding API
Compare the similarity between the two vectors
Find the article that matches the query the most
Inject the article into the prompt as context, send along with completion/chat API

My questions are:

Can I think of LlamaIndex as a convenience wrapper for these steps?
What is the main difference between using the steps provided by OpenAI and using LlamaIndex if I want to achieve Question answering or Semantic search?

Initially, I thought LlamaIndex was a convenience wrapper for these steps, but it appears that LlamaIndex transforms my data to an index. I'm still not entirely sure how I can leverage this index data structure.

Thank you all!

10 comments

LLogan M

Yea, it is sort of a wrapper on those functions
The main difference with llama index is it allows you to have a lot more flexibility, while taking care of a lot of edge cases. Plus there are integrations for several popular vector stores (pinecone, qdrant, etc.)

This explanation assumes you are using a vector index.

You can input any document format (or read data from an external source), and the text is broken up into chunks and indexed. If you are using a vector index, each chunk is also embedded. This can be saved and loaded from disk, allowing you to persist and grow your own knowledge base.

At query time, the closest matching top_k chunks are retrieved. The answer to the query is then refined across several llm calls if all the text does not fit in a single prompt.

Furthermore, llama index provides some other index structures (lists, keywords, trees) that have other use cases. You can ever wrap several vector indexes with a top level index to help route queries to the proper data.

Last advantage with llama index, is that the indexes created can be directly integrated as "tools" with langchain, allowing for some pretty cool use cases

jjerryjliu0

@Logan M this is a great explanation

JJohnnyT

Thanks @Logan M for the detailed explanation !!! I’d like to ask 1 more question if you don’t mind

Question: What would be your recommended best practice for storing and retrieving the "index"?

As the tutorial states, you can store the index as JSON file and read it later

Plain Text

# save to disk
index.save_to_disk('index.json')
# load from disk
index = GPTSimpleVectorIndex.load_from_disk('index.json')

Let’s say I want to achieve Semantic search in 100K different articles,
assume I’ve created the index by reading these different articles in a directory

Plain Text

documents = SimpleDirectoryReader('data').load_data()

Is it a best practice (in terms of performance) to store the index to disk as JSON and retrieve it later by reading the JSON file, even if my input might be large? (In my example, 100K articles)

Since we don't want to create the index every time we want to query it,
I'm wondering if this is the right approach. I haven't tested this yet, as it may cost money 😅, so I want to make sure I'm on the right track before proceeding.

Thank you!!

LLogan M

Yea it's best practice to save/load it! In an application or server api setting though, normally you'd just load it once when the server/app starts and keep it in memory.

If you are embedding 100k articles though, I would look into the vector store integrations like pinecone or qdrant, so you don't get too many slowdowns 👍

JJohnnyT

Thanks @Logan M

I am experimenting vector store recent days, I'm not sure what I'm doing is correct, would you kindly help me take a look what I am doing is correct or not?
I'm using QDrant as vector store, this is what I am currently doing

1) Load in documents, and build index. This step will also save index to QDrant

Plain Text

documents = SimpleDirectoryReader('data').load_data()
index = GPTQdrantIndex.from_documents(documents, collection_name=collection_name, client=client)

2) Query the documents I saved to QDrant earlier, to prevent creating index again

Plain Text

reader = QdrantReader(host="host", https=True, api_key="key")
documents = reader.load_data(collection_name=collection_name, query_vector=vector)

index = GPTQdrantIndex.from_documents(documents, collection_name=collection_name, client=client)
response = index.query("What did the author do growing up?")

Where query_vector is the query's embedding, what I do is transform my query to embedding using OpenAI's embedding API
Then build the GPTQdrantIndex from the documents I just loaded, then perform index.query() to get results.

Question:

1) The query string of index.query() method, should be as same as the query_vector when calling load_data method, is this correct?
2) Am I doing the whole process in the correct way?

Thank you so much for taking time to answer my question!

LLogan M

Actually, once you've already saved the documents to QDrant (the first code block), you don't need to pass them back in again. The second code block could be reduced to

Plain Text

index = GPTQdrantIndex([], collection_name=collection_name, client=client)
response = index.query("What did the author do growing up?")

Or you can save/load from disk as well (it won't save the documents to disk, just some metadata)

Plain Text

index.save_to_disk("index_qdrant.json")
index = GPTQdrantIndex.load_from_disk("index_qdrant.json", client=client)

mmuthu

Hi @Logan M kindly tell me how to apply metafilter with GPTQdrantIndex.
Thanks

LLogan M

I'm actually pretty sure the qdrant index doesn't support Metadata filters (yet). Happy to have a PR to support this though!

jjerryjliu0

yep very open to PR's here!

hhammad

@Logan M

Add a reply

Find answers from the community

Llama Index