Find answers from the community

Updated 3 months ago

LangchainLLM

L
e
96 comments
I believe the correct method is something like this

Plain Text
llm = <create llm from langchain I.e. bedrock>
service_context = ServiceContext.from_defaults(llm=llm)

index = VectorStoreIndex.from_documents(documents, service_context=service_context)
i tried what you suggested, but its still expecting openai key.
AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.
Ah, because there is two models in llama index that both default to openai -- the llm and the embed_model
https://gpt-index.readthedocs.io/en/latest/how_to/customization/embeddings.html#custom-embeddings

You can run local emeddings if you want to skip openai. This example shows how to use huggingface (if you don't provide a model_name, it defaults to mpnet-v2)
I want to use bedrock embeddings or some top ones from https://huggingface.co/spaces/mteb/leaderboard. eventually store in a vector store. so, for now I can't use VectorStoreIndex? (right?)
You can still use the vector store index. Just setup like this

Plain Text
from llama_index import ServiceContext, LangchainEmbedding
from langchain.embeddings.bedrock import BedrockEmbeddings

llm = <create llm from langchain I.e. bedrock>
embed_model = LangchainEmbedding(BedrockEmbeddings(...))
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

index = VectorStoreIndex.from_documents(documents, service_context=service_context)
Or you can use the huggingface embeddings to use any embed model from huggingface

You can use any embeddings from langchain, just need to provide that wrapper
getting closer. I tried what you said, but it seems there is an issue with what bedrock embeddings expecting and how abstractiosn happening on langchain/llamaindex. I get below error
ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid
What's the full error/traceback? I wonder if that's a langchain or llama index error
its definitely langchain, but wondering if we can override in LangchainEmbedding?!
It sounds like you just didn't initialize the bedrock embeddings properly? ๐Ÿค”
yes, I have that working fine!
embeddings = BedrockEmbeddings(client=bedrockClient) works fine but embeddings = LangchainEmbedding(BedrockEmbeddings(client=bedrockClient)) doesn't!
i mean the error doesn't happen with those statements. but the moment i wrap embeddings in langchainbemdding and pass to service context, it has issues.
i tried embeddings.embed_query("This is a content of the document") with first statement and its fine.
hmmm, does this work?

Plain Text
embed_model = LangchainEmbedding(BedrockEmbeddings(...))
embed_model.get_text_embedding("test string")
yes that works!
this worked. embeddings = LangchainEmbedding(BedrockEmbeddings(client=bedrockClient)) embeddings.get_text_embedding("test string")
but when passing through service context it didnt work
๐Ÿค”

Ok, one last attempt to make this work lol

Set a global service context, and then don't worry about passing it in

Plain Text
from llama_index import set_global_service_context

service_context = ServiceContext.from_defaults(embed_model=embed_model, ...)
set_global_service_context(service_context)
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embeddings) is my service context
let me try set_global_service_context
I have a feeling that the path of the file I am passing is not valid in sagemaker jupyter notebook!
no, i am loading fine! its the VectorStoreIndex what fails!
Plain Text
if not index_loaded:
    # load data 
    
    seller_guide = SimpleDirectoryReader(input_files=["./aws-marketplace-ug.pdf"]).load_data()
    print(len(seller_guide)) # this did print the length fine
    # build index
    seller_index = VectorStoreIndex.from_documents(seller_guide, service_context=service_context) #this one failed
Yea seems like the embeddings are causing an issue for some reason ๐Ÿค”
i am gonna try a different embedding model and see
I'm pretty stumped haha
yea that sounds good!
I know the huggingface one works well
thats what I am trying now. do you have any recommendation ? hkunlp/instructor-xl
Yea that one seems to perform pretty well on the leaderboard
I haven't actually tried to use it yet haha but I know others have!
when using SubQuestionQueryEngine, how do i pass llm? its expecting openai by default
btw, the huggingface embeddings worked
and when querying the engine, why the answers are super short (one sentence or word)
I would just set the global service context to avoid worrying about how to pass everything in -- it really simplifies a lot
Hmmm, not sure! You are using the bedrock llm right?
i am using bedrock llm and passing that in service context. but for embeddings i am using huggingface
individual engines worked fine
i am experimenting with QueryEngineTool
and trying to figure out how to ask a question that spans across docs and it should come up with answer
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)
now, need to ask question
when i ask it give error. AuthenticationError: No API key provided. You can set your API key in code using 'openai.api_key = <API-KEY>', or you can set the environment variable OPENAI_API_KEY=<API-KEY>). If your API key is stored in a file, you can point the openai module at it with 'openai.api_key_path = <PATH>'. You can generate API keys in the OpenAI web interface. See https://platform.openai.com/account/api-keys for details.
If you don't set the global service context, then you'll also need to pass it in there too (unclear if that's still an issue lol)

Not sure on the short responses though, I've never used bedrock. Do you know the max input size for bedrock?
i am going to re-run global context (maybe i didnt set it before) and see
it seems global context worked. but now getting another error sub query eventhough it says its generating sub questions
Yea that happens when using async in a notebook, easy fix
Plain Text
import nest_asyncio

nest_asyncio.apply()
run that first
ah i thought i have those. interesting it didnt say it couldnโ€™t find the module. that fixed it
why do i get this error sometimes? OutputParserException: Got invalid return object. Expected markdown code snippet with JSON object, but got:
This is because the LLM did not generate a valid response when generating sub-questions

The LLM has to generate a json containing which sub-index to query and which question to ask. But if it doesn't write proper json, then that happens
my experience so far in generating sub questions is not that great. bit invalid questions. I will probably get better response if I switch to openai. I am using claude for now. tried titan. what is the ideal querying technique for multiple pdf document querying?!
do you know how to solve this using other LLMs? guidance (https://github.com/microsoft/guidance?)
yea guidance or using the openai function calling api will improve this somewhat ๐Ÿ™
i actually missed seeing guidance. this is the notebook i followed. i dont have much flexibility as I want to use models like claude. i see limited llms support on guidance. https://github.com/microsoft/guidance/tree/main/guidance/llms/transformers
well, i used a smilar one. not same! (the one does 10k analysis)
Yea guidance has not great llm support Sadly (booo microsoft)
ok, so here is my question. for simple RAG approach, without hallocinations, which type of querying is best for searching across multiple docs and coming up with an answer?! I understand LLM dictates what answer it comes up with (I will adjust temperature etc and do some prompt engineering to get more detailed/exact response)
I am looking at ContextRetrieverOpenAIAgent, but need a non openai alterative
right now it seems most of the features are influenced/dependent on openai.
Openai definitely pushes the features, because that's just what everyone uses ๐Ÿ˜…

Does it not work well just use a single vector index and go from there? Or does that not achieve what you expect?

The sub question engine or the router query engine would be the next "level up" from there I think
But how reliable the level up options are (or really any approach is) depends on how smart the LLM is that you are using ๐Ÿ˜…
You could also adjust the internal prompt templates for the normal index queries, to better match whichever LLM you are using
agreed on openai being the superior option here. but my plan is to evaluate multiple models (open, proprietary) to make a better decision for customers.
i have different types of docs (buyer, seller, api docs and youtube video transcripts etc) and created index for each instead of combining all into one. thats why i was thinking to create a query engine tool and pass it to llm to query appropriate doc, summarize in the end based on the consensus. I haven't tried the router yet. i will try that. I also want the source doc in the response apart from response. what param is that available ?!
You can access the list of node(s) that were used to create the response in response.source_nodes

The ID of the document that the node came from shows up in source_node.node.ref_doc_id

Additionally, any metadat set on the input documents gets inherited to the nodes created from that document
I see. I will explore the documentation on that. If you think there are any specific links that are beneficial for my use case, please pass it.
multistep is openai based. so that didnt work for me.
the Router one is throwing an error saying KeyError: 'choice'.
the subquestions are generating invalid questions and going offtrack.
what api gives the source doc reference?!
in my experience so far, i get best result if the query directly run on individual indexes. am I doing anything wrong in any of the advanced querying techniques to not get best results?! (it becomes worse)
I take that back. if i load all docs into one, it doesn't give me better results either. especially the docs started becoming big (500+ pages). need to figure out a way the similarity search works better!
nvm. found it.
Yea at that point all in a single index, you might need to increase the top k, and maybe also decrease the chunk size a bit at the same time?
for now, I am facing difficulty even with single file indexing. basically its kind of doing one page when the information is in multi page. so, i should increase chunk size? whats the default chunk size in service context?
the default chunk size is 1024, and the default top k is 2
are you saying about top_k or similarity_top_k?
what is the default overlap?!
where can we find default values in service context?!
similarity_top_k yes
overlap is not really important tbh. But the default is 20 tokens. the default chunk size is 1024
does the LLM summarize across all results returned or just top 1?
It will be all nodes returned
so in case of content being across multiple pages, i should be increasing chunk size right?! or decrease?! sometimes when i ask llm to provide list of things, it provides only half of what is present as the data is spread across multiple pages
Ok, let's take a step back and ensure we both understand the flow of how things work

When documents are put into llama-index, they are chunked into nodes that are 1024 tokens by default, with 20 tokens of overlap by default.

At query time, it depends on which index you used. There are two main indexes in llama-index, a VectorStoreIndex and a ListIndex

A VectorStoreIndex will embed your query, retrieve the top 2 (by default) matching nodes. Then it sends those nodes and your query to the LLM to answer the query. If you are missing information, you need to either increase the top k, or increase the chunk size.

A list index will not use embeddings, and instead return every node in the index. This is usually useful for queries that need to read everything in the index to answer, or for generating summaries with something like index.as_query_engine(response_mode="tree_summarize").query("Summarize the context.")

You can probably tell there are times when a query is best answered by a vector index, and other times by a list index. Using a router query engine, you can best support both use cases
Add a reply
Sign up and join the conversation on Discord