Find answers from the community

Home
Members
HK-RantTest-HarisRashid
H
HK-RantTest-HarisRashid
Offline, last seen 3 months ago
Joined September 25, 2024
Not sure why I'm asking a Langchain question here, probably LangChain community is bit too slow to respond, but it seems really urgent and no one has any answers to this.

---

In brevity, I'm seeking to cite the original source of information, which is retrieved using a Langchain agent with access to a Tool like SerpAPI. However, I can't do that using callbacks since Tools are str -> str. I can only access tool's text output. It would be great if there's a way to tap into Tool's raw output. (Btw SerpAPI is just an example, I want to do this with Drug Tools, PubMed, and Medical Literature, etc.)

Probably I'll have to write my own tool and find a way to inject in a callback each time the tool is called. Is there a set pattern on how to achieve such a goal?
9 comments
H
L
For a single inference, using agent_chain.run( ... ) in langchain, for the HumanMessage + AIMessage message blocks being persisted to memory, is there a way to add metadata to those?

By metadata, I mean timestamp, etc.

Any pointers?
4 comments
H
L
Any ideas on how to speed up the load time for index creation for SQL Database. It seems as though it takes a lot of time for it to do this at query, which is not optimal. Is it possible to do it before hand?

It seems very naive I'm doing this, I guess I should pre-index it and load it up on inference. Any ideas how much this will reduce inference time?
8 comments
H
L
u seem to be playing on both fields (Langchain and Llama Index). I'm trying to stream output from an agent that uses ChatOpenAI as an LLM , and send it back as a Stream from FastAPI. There's no guide on how to do this. I've found some trails on GitHub on ways to do it with Streamlit but none so with FastAPI.

https://github.com/hwchase17/chat-langchain/issues/39

I suppose most ppl are building with FastAPI - LangChain - Next.js stack. Have u come across anything similar. I'm having to dig into the Langchain codebase, any pointers will be helpful on streaming.

Not sure, its also throwing

Plain Text
from langchain.callbacks.streaming_stdout_final_only import FinalStreamingStdOutCallbackHandler
llm = ChatOpenAI(
        model='gpt-3.5-turbo',
        streaming=True, callbacks=[FinalStreamingStdOutCallbackHandler()], temperature=0
)


Plain Text
raise OutputParserException(f"Could not parse LLM output: `{text}`")


It seems like I need to pass in prefix_tokens, but there's literally no elaborate guide on how to do deal with streaming.
19 comments
L
H
Hi there,

Just need a quick help some suggestions if u guys can make. I'm planning a shift of our app's architecutre to use MongoDB's Vector Search database instead of Pinecone. It's much easier to manage vectors in one location.

Above is a simple script that is accessing one database called Indexes, where it is successfully able to index all data, but querying it throws an empty response. Is this normal or I'm missing something?
36 comments
H
L
Hi I have 2 questions.

This question is in the context of using SQL Join Query Engine, where I've noticed the pitfall where SQL queries fail.

Q1:
What should I do in case the SQL query fails for some reason and throws an exception. Is there an option to make it do a retry? Also I notice that SQL query fails when it doesn't find an exact match on input and what's there in the record for that field. What if we can pull data if it somewhat matches? There can be a typo but the system can make out if this is the correct record to pull.

For example, "Can you tell me if there are any doctors who are experts in clinical oncology?" The query fails because SQL database contains the word "Clinical Oncology". There must be a way before entering SQL query, it should know for certain fields what values are possible. "Doctors who have expertise in heart related issues" -> "WHERE expertise LIKE 'Cardiology'"

Q2:
How can I inject my own prompt templates when querying SQL index? I think for each data source, I need to inject a set of sample commands for it to perform well that are tailored to that data source? Can I do that for each different index?
1 comment
L
We're building an app to transform initial data, perform redaction on what kind of data can and cannot be exposed, which ends up in an index. This data can be from various sources. In the most dummy case, imagine two sources: SQL database (Index) + List of Webpages (Index).

I'm trying to put this into a graph structure so I can query heterogeneous data sources. I'm not sure what's the problem here.

Plain Text
File ~/miniconda3/lib/python3.10/site-packages/llama_index/indices/query/base.py:34, in BaseQueryEngine.retrieve(self, query_bundle)
     33 def retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
---> 34     raise NotImplementedError(
     35         "This query engine does not support retrieve, use query directly"
     36     )

NotImplementedError: This query engine does not support retrieve, use query directly


It's asking me to use query directly, which I can't without first building a query engine.
41 comments
H
L
@Logan M What's the best solution that you've best found that handles this output parsing error, when I believe it fails to start the final response with "AI: ". This is one of the only fail points due to which a generation might fail. I've seen you discuss this before. Any heads up, I can just get rid of this?

One of the only ways I can get rid of it, is by clearing the memory, hence the prompt is much cleaner. That way it sticks to original instruction. Maybe, we can reinforce LLM's "belief" by hammering it with "You MUST start your response with AI: ", so this message doesn't just get lost in prompt. Therefore, not overlooked by LLM.

Any other approaches you've seen?
10 comments
L
H
Can anyone explain what is the id_to_text_map? Also I suppose the query vector is the embedding for the prompt given to index. I'm even thinking this will present a challenge if the query happens to be complex that needs further breaking down before it approaches the index in many cases.

Plain Text
from llama_index import download_loader
import os

PineconeReader = download_loader('PineconeReader')

# the id_to_text_map specifies a mapping from the ID specified in Pinecone to your text.
id_to_text_map = {
    "id1": "text blob 1",
    "id2": "text blob 2",
}
# ...
query_vector=[n1, n2, n3, ...]

reader = PineconeReader(api_key=api_key, environment="us-west1-gcp")
documents = reader.load_data(
    index_name='quickstart',
    id_to_text_map=id_to_text_map,
    top_k=3,
    vector=query_vector,
    separate_documents=True
)


Can anyone shed some light here, will be super useful.
10 comments
H
L
I have a question. When I'm building the index, all fine. Docs explain everything really well. In case, I want to build a graph based index of many types of source documents. Let's say,

  • SQL DBs
  • Speadsheets
  • PDFs
  • Webpages
In my case, I'm ingesting them in one go using llama_index. Loading documents -> creating index -> making a query engine on top -> querying it. That's easy so far.

However, when during inference, when I'm not building the index, when I just want to directly connect to Pinecone vector db and query it, there's no straightforward way. Everywhere it mentions to use load_data from document, but what if I don't want nodes from local storage, but from remote index. I think I'm missing the key conceptual understanding of llama_index. How do I make it work?
10 comments
L
H