Find answers from the community

Home
Members
no_dice
n
no_dice
Offline, last seen 3 months ago
Joined September 25, 2024
Hi - I am currently trying to wrap an agent over the free Polygion.io (financial markets data) API and so far the agent is having a hard time parsing the response it gets back. I have fiddled a lot with prompting it to better parse the data but we're running into an error that I believe is in the RequestsToolSpec object:

My code:
Plain Text
# Wrap the Polygon API spec with LoadAndSearchToolSpec
wrapped_tools = LoadAndSearchToolSpec.from_defaults(
    api_spec.to_tool_list()[0],
).to_tool_list()

agent = ReActAgent.from_tools(
    [*wrapped_tools, requests_spec.to_tool_list()[0]]
    , verbose=True
    , llm=llm
    , context=CONTEXT
    , max_iterations=20
)

agent.chat("What are all the exchanges you have access to?")


Error:
Plain Text
File /workspaces/pye/pye/.venv/lib/python3.10/site-packages/llama_hub/tools/requests/base.py:75, in RequestsToolSpec._get_headers_for_url(self, url)
     74 def _get_headers_for_url(self, url: str) -> dict:
---> 75     return self.domain_headers[self._get_domain(url)]

KeyError: 'api.polygon.io'


It looks like its trying to look in the response headers for a key api.polygon.io even though I've never prompted it to do so. It also looks like its hardcoded to do this in the RequestsToolSpec object?

Any ideas on how to resolve this? Modifying the prompt doesn't seem to do anything.
1 comment
n
LlamaIndex CLI error

Whenever I try to create a new llama-pack I run into this weird error:

Plain Text
llamaindex-cli new-package --kind "packs" --name "pack-test"

Plain Text
Traceback (most recent call last):
  File "/workspaces/llama_index/.venv/bin/llamaindex-cli", line 6, in <module>
    sys.exit(main())
  File "/workspaces/llama_index/.venv/lib/python3.10/site-packages/llama_index/cli/command_line.py", line 269, in main
    args.func(args)
  File "/workspaces/llama_index/.venv/lib/python3.10/site-packages/llama_index/cli/command_line.py", line 263, in <lambda>
    new_package_parser.set_defaults(func=lambda args: handle_init_package(**vars(args)))
  File "/workspaces/llama_index/.venv/lib/python3.10/site-packages/llama_index/cli/command_line.py", line 26, in handle_init_package
    init_new_package(integration_name=name, integration_type=kind, prefix=prefix)
  File "/workspaces/llama_index/.venv/lib/python3.10/site-packages/llama_index/cli/new_package/base.py", line 120, in init_new_package
    shutil.copyfile(common_path + "/BUILD", pkg_path + "/BUILD")
  File "/usr/local/python/3.10.13/lib/python3.10/shutil.py", line 254, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/workspaces/llama_index/.venv/lib/python3.10/site-packages/llama_index/cli/new_package/common/BUILD'

I believe this is the cause of an error my PR is currently experiencing. Not sure how to interpret these errors.
7 comments
n
L
LlamaPack dependency question..

I am writing a llama pack and poetry/ .toml files are a bit new to me. I think I've got the jist of it but I am not sure where I should add a dependency necessary for my tests? In my test's fixtures I use the wikipedia reader as a way to load some simple documents into a vector store. This is its own llama pack and I figure I should add that as a dependency. Where might I do that? I see multiple potential places as I look at other example packs
3 comments
n
L
Simple local vector store index that supports hybrid search?

Alright I've got a weird problem trying to wrap up a llama-pack:

I NEED a vector store index object that has text and vector representations of its data. How can I build a simple vector store with a small corpus of local data (it really doesn't have to be much, just enough to answer 1-2 questions) that supports HYBRID SEARCH. Most of the guides online I've seen build from textnodes directly or documents directly, none of these work because the index provided in those examples does NOT support hybrid queries. I can't use free or small instances of services like Pinecone either because these are just test fixtures and I can't expect the llama-index repo to have my credentials (nor is it best practice)

ValueError: Invalid query mode: hybrid
13 comments
L
n
Is anyone else having issues with the cli tool to upgrade your imports to v0.10+ ?

It runs fine, no errors, I get the following output:
Plain Text
(.venv) @no-dice-io ➜ /workspaces/koda-retriever (main) $ llamaindex-cli upgrade .
Module not found: VectorStoreQueryMode
Switching to core
Module not found: BaseNodePostprocessor
Switching to core
Module not found: QueryType
Switching to core
Module not found: Settings
Switching to core
New installs:
pip install llama-index-llms-openai
pip install llama-index-vector-stores-pinecone
pip install llama-index-embeddings-openai


But none of the imports have actually changed. Am I not using this tool correctly? I did reinstall my .venv with the new imports above
6 comments
L
n
Retriever tests

I am building a retriever tool for Llama Hub and was looking to see if there were any standard tests an object built from BaseRetriever should be able to pass? I took a look at the tests in the /tests/retrievers/ folder in the Llama Index repo but I only see one test in there. I can certainly follow that test but I worry that's not enough?

If anyone has any ideas some tests that I could put through my Retriever please let me know! I'm looking to get my PR ready by this weekend 😄
4 comments
W
L
n
Potential bug? Postprocessor error when vector search yields 0 results

Exception: IndexError: list index out of range

I think there's a decent chance this is a bug. I'm fairly certain this happens because I just moved to a new Vector Database that has no data in it, but the index itself has been created.

My code was working fine and my tests were passing, but when I cleared my DB this error occurred. I believe its happening because the vector search yields no results and then the postprocessor (SentenceTransformerRerank) has nothing to rerank. I would guess that even when the results are zero, the reranker shouldn't run or just returns nothing.

I've attached my code and full stack-trace in the thread.
8 comments
L
n
Having trouble actually running any queries on my query engine..

I get various errors like a 404 error or APIConnectionError depending on how I query the query engine (or when its wrapped over by a Context Augmented Agent). I've attached my code here in a text file because I don't think it'll fit. (Traceback is within the code as well)

PLEASE TAKE NOTE OF MY COMMENTS IF YOU READ MY CODE
YOU WILL ALSO NEED TO DOWNLOAD THE FILE, COMPANY SECURITY IS DOING WEIRD STUFF TO THE PREVIEW

Agent Error: Exception: APIConnectionError: Connection error.

I went ahead and tested my connection info to my AzureOpenAI class/wrapper via LangChain and it works fine on its own when in a simple notebook I create the object and prompt it. But when its wrapped in an index/engine it starts to have connection issues as shown in my code/traceback.
20 comments
n
d
L
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column data_vector_embeddings.text_search_tsv does not exist

I setup the vector index with LlamaIndex so whatever columns are supposed to be there shouldbe there in my PGVector Store.

Any idea how I'd fix this. It originated from this code:

Plain Text
from cerebro.cbcore.utils.vector_store import CerebroVectorStore
#from asyncio import run
from os import name

query = "Who does Paul Graham think of with the word schtick"
vector_store = CerebroVectorStore()
#if name == 'nt':
#    from asyncio import set_event_loop_policy, WindowsSelectorEventLoopPolicy
#    set_event_loop_policy(WindowsSelectorEventLoopPolicy()) # this fixed it

vector_store.hybrid_search(query)
4 comments
L
n
AttributeError: 'NoneType' object has no attribute 'send'

Plain Text
AttributeError: 'NoneType' object has no attribute 'send'
Exception ignored in: <function _SSLProtocolTransport.__del__ at 0x000001951368DB40>
Traceback (most recent call last):
...character limit...
Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.3056.0_x64__qbz5n2kfra8p0\lib\asyncio\base_events.py", line 515, in _check_closed
RuntimeError: Event loop is closed


Relevant code: (defined in a custom class I wrote to link the vector store and ingestion pipeline into a single object where I can access both)
Plain Text
    async def ingest(self, chunks: list[dict]):
        '''Ingests a list of chunks into the vector store asynchronously'''
        
        if hasattr(self, 'ingestion_pipeline') == False:
            self.init_ingestion_pipeline()
        print('################ vector store and ingestion pipeline initialized')
        
        processed_chunks = await process_chunks(chunks)
        print('################ chunks processed')

        return await self.ingestion_pipeline.arun(documents=processed_chunks)


Entrypoint:
Plain Text
vector_store = PFVectorStore() #custom class I was referring to
run(vector_store.ingest(test_data)) #asyncio


Pretty certain this is an async error, or something related to it?
38 comments
L
n
Async ingestion pipeline loads no data after completion

Its not returning any errors...

My code:
Plain Text
    def init_ingestion_pipeline(self) -> IngestionPipeline:
        '''Initializes the ingestion pipeline for the vector store'''

        pipeline = IngestionPipeline(
            transformations=[
                AzureOpenAIEmbedding(
                    model="text-embedding-ada-002"
                    , azure_deployment="text-embedding-ada-002"
                    , azure_endpoint=str(settings.azure_openai_api_base)
                    , api_version=str(settings.azure_openai_api_version)
                    , api_key=str(settings.azure_openai_api_key)
                )
            ]
            , vector_store=self.vector_store
        )

        self.ingestion_pipeline = pipeline

        return pipeline

    async def ingest(self, chunks: list[dict]):
        '''Ingests a list of chunks into the vector store asynchronously'''
        
        if hasattr(self, 'ingestion_pipeline') == False:
            self.init_ingestion_pipeline()
        print('################ vector store and ingestion pipeline initialized')
        
        processed_chunks = await process_chunks(chunks)
        print('################ chunks processed')

        #await self.ingestion_pipeline.arun(documents=processed_chunks)
        nodes = self.ingestion_pipeline.run(documents=processed_chunks, show_progress=True)
        print('################ ingestion pipeline completed')


The code above is in a class that I'm calling:

Plain Text
vector_store = PFVectorStore() #Custom class, not a vector store from LlamaIndex
run(vector_store.ingest(test_data)) #Referencing the ingest function from an ingestion pipeline, ingest is a custom wrapper over the ingestion pipeline
99 comments
n
L
k
Plain Text
pydantic.error_wrappers.ValidationError: 1 validation error for IngestionPipeline
transformations -> 0
  value is not a valid dict (type=type_error.dict)


Plain Text
ingestion_pipeline = IngestionPipeline(
    transformations=[AzureOpenAIEmbedding]
    , vector_store=vector_store
)

ingestion_pipeline.run(documents=new_docs, show_progress=True)

The code above throws the error at the top. Not sure why, and I haven't been able to find much online. I am not superrrr familiar with pydantic so thats probably why I'm struggling. Any help?

?
17 comments
n
L
k
I've been experimenting with fine tuning lately and want some people's thoughts.

To what point does fine tuning NOT help in exposing an LLM with additional data as it pertains to a specific task? I recognize that fine tuning is great for structured outputs, edge cases, or formatting responses from an LLM.

But I've also seen people refer to fine tuning as a way to extend the underlying data an LLM has to work with.

To what degree is that true, and to what degree is it not true? In my research it seems to only be true in as much as the new data pertains to showing the model how it should respond in a specific use case. Any thoughts?
7 comments
n
T
n
no_dice
·

Quick q :

Quick q :

gpt-3.5-turbo-0125 and gpt-4-turbo-preview are both listed as models trained on function calling by OpenAI. Have there since been any additional models that are trained for function calling out of the box? Am curious if any more have been trained for function calling recently

https://platform.openai.com/docs/guides/function-calling
3 comments
L
n
D
Is it possible to allow a ReAct Agent to be primed w/ context via a query engine to leverage that context to select a tool?
4 comments
n
L
Hi, does LlamaIndex currently have any implementation/support of OpenAI's API Specs? Specifically this would be useful as an input to a tool to be used within an agent - and these API specs provide rich context on how to use the API. LangChain's implementation seems to display its usefulness.

https://platform.openai.com/docs/plugins/getting-started

https://github.com/finnelliott/langchain-spotify-assistant/blob/main/spotify_assistant.py
7 comments
L
n
How does the Raptor Retriever in LlamaIndex handle updates to a corpus of data? More specifically in this scenario:
  • I have persisted and clustered data in a vector index that was created previously via RAPTOR
  • I want to ADD more data to this persisted and clustered vector store, but I want to be sure this new data is included in the existing clusters
Does RAPTOR accomodate at all for step 2? Or do I need to truncate that table and just re-cluster the data to ensure the data is all clustered together?
6 comments
n
L
Has anyone built any Agents with DBRX in LlamaIndex? How does it perform? Are there any considerations or issues with the LI abstractions (I understand for a while most of LangChain and LlamaIndex were tilted towards OpenAI)
4 comments
n
W
Does anyone have any easily accessible datasets that are good for RAG evaluation? Preferably ones that might be integrated into Llama Index? I looked on LlamaHub but with the recent 1.0 update I'm not able to find any on the website itself - and I figure the imports have changed recently anyway
2 comments
W
n
no_dice
·

Llm

Does LI have any dummy LLMs that fit the llama-index interface? I'm looking to potentially use one if so
4 comments
n
L
ValueError: Invalid query mode: hybrid

I am running into an error where I cannot seem to query/retrieve a Vector Store Index with the query mode "hybrid". I believe it may be because I'm just using a simple vector store built from the Documents.example() function.

My code:
Plain Text
## setup
@pytest.fixture
def setup() -> dict:

    os.environ["OPENAI_API_KEY"] = str(settings.openai_api_key)

    service_context = ServiceContext.from_defaults(
        embed_model=OpenAIEmbedding(
            model="text-embedding-ada-002"
        ),
        llm=OpenAI(
            model="gpt-3.5-turbo"
        )
    )

    shots = AlphaMatrix(data=DEFAULT_CATEGORIES)

    vector_index = VectorStoreIndex.from_documents(
        [Document.example()]
        , service_context=service_context
    )

    reranker = LLMRerank(service_context=service_context)

    retriever = CustomRetriever(
        index=vector_index,
        llm=service_context.llm,
        reranker=reranker,
        matrix=shots,
        verbose=True,
    )

    return {
        "retriever": retriever,
        "service_context": service_context,
        "vector_index": vector_index,
        "matrix": shots,
    }

#Where the error occurs:
retriever = VectorIndexRetriever(
            index=index,
            vector_store_query_mode="hybrid",
            alpha=default_alpha,
            **kwargs,  # filters, etc, added here
        )

def test_retrieve(setup):

    retriever = setup.get("retriever")
    query = "What are LLMs good at?"
    results = retriever.retrieve(query)


Error:
Plain Text
ValueError: Invalid query mode: hybrid
18 comments
L
n
Are there any dummy/monkey objects I can use for testing? For example, a dummy vector store for which I can test retrievers on?
15 comments
n
L
Is there a way to dynamically adjust the alpha parameter of a Hybrid Retriever that has already been created? Or can this only be done at instantiation? (I'm currently digging into the docs to find this answer myself but figured I'd ask in case someone else had encountered this)
7 comments
n
L
n
no_dice
·

Llm

AttributeError: 'LLMPredictor' object has no attribute '_llm'. Did you mean: 'llm'

I am trying to reference the llm provided into a ServiceContext object but when I reference it from the service context to perform a simple complete function I get the following error:

manager.service_context.llm.complete('Hi what is 5+5?') (Manager is just a class I wrote that wraps over some LI objects, including a ServiceContext object.

My service context object: (created in the Manager class where I use type-hints and set a default)
Plain Text
    service_context: ServiceContext = ServiceContext.from_defaults(
        embed_model= AzureOpenAIEmbedding(
            model="text-embedding-ada-002"
            , azure_deployment="text-embedding-ada-002"
            , azure_endpoint=str(settings.azure_openai_api_base)
            , api_version=str(settings.azure_openai_api_version)
            , api_key=str(settings.azure_openai_api_key)
        )
        , llm = AzureOpenAI(
            model="gpt-4"
            , azure_deployment="gpt-4"
            , azure_endpoint=str(settings.azure_openai_api_base)
            , api_version=str(settings.azure_openai_api_version)
            , api_key=str(settings.azure_openai_api_key)
        )
    )


Error:
AttributeError: 'LLMPredictor' object has no attribute '_llm'. Did you mean: 'llm'?

I have in the past been able to reference the LLM from a service context.. not sure why all of a sudden its not working now. I am running llama-index 0.9.39
12 comments
L
n
I'm inputting a system_prompt kwarg into my ReAct Agent when creating from tools, but it doesn't seem to be having any effect on the agent. When I dig into the source code I'm having a hard time discerning whether ReAct agents really accept this kwarg and do anything with it?

Are we able to update the prompt or system prompt of a react agent?
22 comments
n
L