Find answers from the community

k
kev
Offline, last seen 3 months ago
Joined September 25, 2024
Automated retrieval with llama-index blocked. Is there a work-around?

Plain Text
from llama_index.core import SummaryIndex
from llama_index.readers.web import SimpleWebPageReader
from IPython.display import Markdown, display
import os

documents = SimpleWebPageReader(html_to_text=True).load_data(["https://www.xyz.com"])
documents


[MyHomepage] Main\nContent Main Navigation\n\n## Page not available\n\nYour access to website has been blocked because you are using an\nautomated process to retrieve content\n\nReason: Automated retrieval by user agent "python-requests/2.31.0".\n\nURL:
1 comment
R
Error add to a collection in chromadb:


collection_name = "name"
vector_store = ChromaVectorStore(chroma_collection=collection_name)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

raw_index = VectorStoreIndex.from_documents(
parsed_docs,
storage_context=storage_context,
embed_model=Settings.embed_model
)




--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-41-79eec0778777> in <cell line: 7>() 5 storage_context = StorageContext.from_defaults(vector_store=vector_store) 6 ----> 7 raw_index = VectorStoreIndex.from_documents( 8 parsed_docs, 9 storage_context=storage_context, 6 frames /usr/local/lib/python3.10/dist-packages/llama_index/vector_stores/chroma/base.py in add(self, nodes, **add_kwargs) 263 documents.append(node.get_content(metadata_mode=MetadataMode.NONE)) 264 --> 265 self._collection.add( 266 embeddings=embeddings, 267 ids=ids, AttributeError: 'str' object has no attribute 'add'
19 comments
L
k
k
kev
·

Encode

when we load an image using SimpleDirectoryReader , does it encode the image? if so, what encoding type does it use?

img = SimpleDirectoryReader("/content/drive/images").load_data()
1 comment
L
Is there an example notebook showcasing the use of approximate. meta data filtering ? For example, I am using workflows for RAG and I'd like to include. approximate metadata filtering for better retrieval accuracy.

Plain Text
custom_index = VectorStoreIndex.from_documents(
                                               documents,
                                               storage_context=storage_context        
                                              )


class RAGWorkflow(Workflow):
    
    @step
    async def ingest(self, ctx: Context, ev: StartEvent) -> StopEvent | None:
        """Entry point - ingest documents"""
        
        index = custom_index
        
        return StopEvent(result=index)

    @step
    async def retrieve(self, ctx: Context, ev: StartEvent) -> RetrieverEvent | None:
        "Entry point for RAG, triggered by a StartEvent with `query`."
        query = ev.get("query")
        index = ev.get("index")

        if not query:
            return None

        # store the query in the global context
        await ctx.set("query", query)
        await ctx.set("index", index)

        # get the index from the global context
        if index is None:
            print("Index is empty, load some documents before querying!")
            return None

        retriever = index.as_retriever(similarity_top_k=10)
        nodes = await retriever.aretrieve(query)
    
        return RetrieverEvent(nodes=nodes)
7 comments
k
L
What's the best way to use llama-index to retrieve row(s) and cell value from a pandas dataframe based on a natural language user query?
41 comments
L
k
Is there a good example / cookbook for multi-vector / recursive retriever + multi-modal RAG using llama-index?

Here's an LangChain example: https://github.com/langchain-ai/langchain/blob/master/cookbook/Multi_modal_RAG.ipynb
21 comments
k
L
Hi there, I used llama-parse and implemented the RAG on a set of financial documents. Similar to the example in this notebook[1], we build a raw index and recursive index . To my surprise, the results from raw_index.as_query_engine are more accurate than the recursive one. I am try to get an intution for why this might be? For context, we have tables with financial data and a sample query might look like - what was the total rent for Property A in 2023? What is the key difference between the two indices? and how do they work under-the-hood?

https://github.com/run-llama/llama_parse/blob/main/examples/demo_advanced.ipynb
2 comments
L
k