Find answers from the community

Home
Members
AmitKhandey
A
AmitKhandey
Offline, last seen 3 months ago
Joined September 25, 2024
Could you please advise on the process of querying Chroma DB to retrieve the names of documents for which embeddings are available, leveraging the Lamma index?
4 comments
L
A
b
Hi, I want to use chroma db instead of in memeory (InMemoryDocumentStore) for generation of test data using RAGAS

documents = SimpleDirectoryReader(
input_files=[
"../data/Xceed Fraud Detection Reference Manual_01March2023_Track Changes.docx",
"../data/FraudDESK Guide - ACH_v6.pdf",
"../data/FraudDESK Guide_Online Banking_v6.pdf",
"../data/FraudUseCases_13July2023.xlsx"
]
).load_data()

azure_model = LangchainLLMWrapper(AzureChatOpenAI(
model=config.chatgpt_model,
azure_deployment=config.openai_deployment_id,
api_key=config.openai_api_key,
azure_endpoint=config.openai_api_base,
api_version=config.openai_api_version,
)
)

embed_model = LangchainEmbeddingsWrapper(AzureOpenAIEmbeddings(
model=config.embed_model,
azure_deployment=config.embed_model_deployment_id,
api_key=config.openai_api_key,
azure_endpoint=config.openai_api_base,
api_version=config.openai_api_version,
))

generator_llm = azure_model
critic_llm = azure_model

splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=100)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embed_model,
extractor=keyphrase_extractor,
)
from ragas.testset import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

test_generator = TestsetGenerator(
generator_llm=generator_llm,
critic_llm=critic_llm,
embeddings=embed_model,
docstore=docstore,
)

testset = test_generator.generate_with_llamaindex_docs(documents=documents[5:6],
test_size=2, distributions={simple: 0.5, reasoning: 0.25,
multi_context: 0.25})
print(testset)
1 comment
L
Please help below error appeared when I was evaluating RAG using LlamaDataset from LlamaHub raise mapped_exc(message) from exc
httpx.LocalProtocolError: Illegal header value b'Bearer '

During handling of the above exception, another exception occurred: raise APIConnectionError(request=request) from err
openai.APIConnectionError: Connection error.
3 comments
C
A
L
And below error appeared : File "C:\Users\akhandey.NICEDEN\copilot\copilot\document_reader.py", line 45, in run_query
index = VectorStoreIndex.from_documents(nodes, storage_context=storage_context)
File "C:\Users\akhandey.NICEDEN\AppData\Roaming\Python\Python310\site-packages\llama_index\indices\base.py", line 97, in from_documents
docstore.set_document_hash(doc.get_doc_id(), doc.hash)
AttributeError: 'TextNode' object has no attribute 'get_doc_id'. Did you mean: 'ref_doc_id'? Please help
2 comments
A
L
The Llama index library that you are using needs openai>=1.0.1 . And existing copilot code needs openai==0.27.6
This is creating a conflict. Does anyone have solution?
2 comments
L
How to use azure openai with llamandex?
17 comments
W
A
I have several chorma DB collections, and I want to route queries to the most relevant collection based on a similarity score. Ideally, the document with the highest similarity score within a collection would be retrieved. Can LlamaIndex leverage its retrieval capabilities to achieve this?
1 comment
L
I want to log cosine similarity of RAG response while retrieving from chroma db ? Please refer below code for this: .vector_store = self.setup_chroma(collectionName)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader(
input_files = files
).load_data()

node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=50)
nodes = node_parser.get_nodes_from_documents(documents)
index = VectorStoreIndex(nodes, storage_context=storage_context)

query_engine = index.as_query_engine()
query_engine.query(query)
1 comment
W
A
AmitKhandey
·

Retrieval

but this work very well :vector_store = self.setup_chroma(collectionName)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader(
input_files = files
).load_data()

node_parser = SentenceSplitter(chunk_size=180, chunk_overlap=80)
nodes = node_parser.get_nodes_from_documents(documents)
index = VectorStoreIndex(nodes, storage_context=storage_context)
15 comments
L
A
My Requirement is : When updating RAG, only generate embeddings for newly added documents rather than regenerating embeddings for all documents in chroma db using llamaindex.
4 comments
L
A
W