Find answers from the community

What approach would you use to extract table data from PDF files that contain text, images, and other elements? Is there any app out there doing this?
1 comment
W
Azure OpenAI Embedding not working for me
14 comments
W
A
Hi everyone, I created a knowledge graph index and saved it using the SimpleGraphStore and StorageContext. But I am not able to find out how to load from file and create a retriever. Not able to find anything in docs.. Would love to have any help on this! 🙂
2 comments
L
A
@Logan M is there a private adaptation of llamaindex in Go that you know of which you would recommend?
2 comments
L
b
I am having a problem with saving the vector store to the database. I have attached the class I worked on. Could u help me improve it ?
2 comments
T
L
I'm using SQLAutoVectorQueryEngine and am curious to know if there is a supported method to return in-line citations with this implementation. CitationQueryEngine seems to be standalone and separate from SQLAutoVectorQueryEngine and unable to be integrated with this module. But I could be wrong. Also, if in-line citation support does not exist for this module, what would be a recommended approach to provide in-line citations with functionality similar to SQLAutoVectorQueryEngine? Break this into separate workflows? Any assistance would be appreciated. Thanks!
2 comments
L
B
I have a slight problem trying to process a document in AWS s3, I am able to upload it but the s3 reader is unable to work with the file am sending it. Am getting an error
Plain Text
Failed to load file ncl-staging/organizations/c28bd98c-5a1c-4bad-94be-718be1a32ec9/documents/knowledge.pdf with error: The input file_path must be a string or a list of strings.. Skipping...

I tried setting up a local environment and tested it out and it worked but for some reason it fails in production.
Here is the code
Plain Text
 # Process document
            loader = S3Reader(
                bucket=bucket_name,
                key=object_key,
                aws_access_id=settings.AWS_ACCESS_KEY_ID,
                aws_access_secret=settings.AWS_SECRET_ACCESS_KEY,
                s3_endpoint_url=get_s3_endpoint(settings.AWS_REGION),
                file_extractor=self.file_extractors,
            )
            llama_documents = await loader.aload_data()

            if not llama_documents:
                raise ValueError("No documents were processed from the input file")
6 comments
W
b
I'm using the LlamaCloudIndex with similarity_top_k=3, I noticed that when no match is found the source_nodes contain one chunk only and that chunk is the whole document. Is that expected behaviour? I'm surprised as I thought there would always be source nodes as we're looking at top similarities, i.e. it would return the most similar nodes even though they might not be similar at all.
5 comments
W
k
L
I am making a RAG app using milvus db and Ollama's llms. I have the api for this functions - upload(Takes pdf and indexes it in milvus), /query(Search from a particular db). I want to include chat history functionality as well how to achieve this ?
4 comments
W
m
Hello! Is there a way to log the actual prompt that was sent to the LLM? I.e. including chunks from the vector store etc?
2 comments
W
k
Hi, I'm looking for something I can use to construct a pipeline where the LLM can construct SQL queries and execute them, then save the results into a pandas dataframe in memory and construct and execute code to produce new dataframe(s) or results from the generated dataframes

For example:
"Calculate net income by month for 2024 and show me the months where I lost money along with the loss"
  • LLM constructs a plan for two sql queries to get revenue by month, expense by month
  • Execute queries and save into two separate dataframes
  • LLM can construct Pandas code, given the two dataframes, to join them and get the loss for each month
  • Execute the code, get the result dataframe
Is this possible with LlamaIndex?
1 comment
L
hello devs, a bit lost with how to use Supabase VectorStore, i need to process the document once in the Upload, and then use the vector store to answer the questions. most of the examples didn't explain how can be done.
before i was creating table embeddings, and save the embedded document inside it and use sql similarity search inside Supabase.
5 comments
L
m
It looks like the BASE_NODE_LABEL was introduced a couple of months after that blog post was written, in this commit:
https://github.com/run-llama/llama_index/commit/77bd4c3fc6db725ffe04dbf778b1d7a3f9e63baa

I see some logic for excluding both the __Entity__ and __Node__ labels in some queries but I still see them when I run:
Plain Text
CALL db.schema.visualization()


Should I just expect these base labels of __Entity__ and __Node__ are meant to be there and filter them out in my queries?
3 comments
L
0
I'm building a multi-agent app that have financial data agent, web research agent, report writing agent, etc., each having access to a series of tools. The handoff / transfer between these agents is not deterministic and the LLM should be making the calls (open to making a supervisor agent if necessary to make it work). Would love some guidance the best way to achieve this with llama_index.
1 comment
L
Hi it seems like I can use both the Agents (FnAgentWorker / OpenAIAgent, etc.) or Workflows. I'm a bit confused what would make sense for me since are at different levels of abstraction - is there a llama_index recommended place to start?
1 comment
L
I’m a blockchain & front-end developer with expertise in smart contracts, decentralized applications (dApps), and NFTs, looking for exciting projects to collaborate on. I’m passionate about driving innovation and delivering impactful, high-quality solutions.

Key Skills:

Frontend: React, Next.js, Vue.js, TypeScript
Smart Contracts: DeFi platforms (Uniswap, PancakeSwap), staking solutions, NFT marketplaces, token launchpads
Trading Bots: Ethereum, Solana, Binance Smart Chain
dApp Development: Extensive experience building secure and scalable dApps across multiple protocols like Ethereum, Polkadot, and Solana, with seamless integration into the frontend
Blockchain Infrastructure: Design and deployment of custom blockchain networks, layer 2 scaling, and decentralized infrastructure solutions
Blockchain Games: Development of NFT-based games and play-to-earn systems with fair and transparent smart contracts

If you're looking to build cutting-edge blockchain solutions, let’s connect and discuss how I can contribute to your project. Reach out anytime!👍👍👍
1 comment
W
i am following this example : https://github.com/run-llama/llama_parse/blob/main/examples/demo_advanced_weaviate.ipynb
i am using poetry and i got this error when i was trying to install flagembedding :
Plain Text
Because flagembedding (1.3.2) @ git+https://github.com/FlagOpen/FlagEmbedding.git@HEAD depends on datasets (2.19.0)
 and project-1 depends on datasets (^3.1.0), flagembedding is forbidden.
So, because project-1 depends on flagembedding (1.3.2) @ git+https://github.com/FlagOpen/FlagEmbedding.git, version solving failed.
2 comments
T
L
Hi everyone,
I’m working with CSV files and exploring the best way to generate and save embeddings for them. I noticed that PagedCSVReader creates one embedding per row, which can be time-consuming for large files.

Could you recommend a more efficient approach to generate embeddings while maintaining accuracy for Retrieval-Augmented Generation (RaG)? I’m looking for something that balances embedding granularity and performance, especially for structured tabular data.

Thanks in advance for your insights!
9 comments
W
M
Hi all, I'm using CitationQueryEngine to provide citations in my responses. On my UI, I want to be able to hover over a citation (e.g. [2]) and display links to the relevant part of each source.

Using response.source_nodes, what is the best way to know which source belongs to which citation? The text field contains the citation number (e.g. "Source 2: blah blah"), but I would have to parse this to extract the citation number.

I can't see any other field that just has the citation number itself. Is there one?
7 comments
r
L
@Logan M can I get to know current available context size and memory used and max tokens used at runtime so that when it approaches the limit, I can reset the variables and the chat engine so that it doesn’t reach the limit and break with error
1 comment
W
when installing "pip install llama-index-embeddings-huggingface" , i ran into conflict of version issues. Tried to installed less than <0.1.3 package version as well , ran into another issue not sure. Just trying to do the "https://github.com/run-llama/python-agents-tutorial/blob/main/3_rag_agent.py" with non open AI embeddings as i am running llamindex locally, can any one guide me ?
12 comments
L
r
e
em
·

Json

Hey gang! I've been using LlamaParse for the past couple weeks, loving it so far

I want to be able to get the page# for each of the text chunks I store in the db for later reference. I saw we have that in the JSON object when I do it through the web sandbox, is it possible with the library? If not, is it planned to be launched?
2 comments
L
e
T
Tech explorer
·

Csv

Hi all , iam used pagedcsvreader along with chromadb and when iam querying with similarity_top_k as 10 but none of the documents are relevant to query. Although iam directly specifying important keywords in query. What can I do to improve rag with CSV data
10 comments
L
T
R
Rom.prr
·

Embed

Hello, we have a little problem with llamaindex, when we try to load a pdf file in a database (postrgres on neon) with Mistral's embed model we get an error message about going overlimit for the tokens, we tried to split the document for every page and use the TokenTextSplitter with no good result, the problem is the only "solution" was to set the insert_batch_size parameter lower (up to 21 max) but that shouldn't have an impact on the embed model but on the db right ? 😅
3 comments
L
R
I'm trying to use BM25Retriever w/ Chroma and this is the documentation that I follow: https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_retriever/#hybrid-retriever-with-bm25-chroma
When I try to execute this line: index = VectorStoreIndex(nodes=nodes, storage_context=storage_context), my jupyter lab kernel always dies and restarts.
I'm only using a very small PDF so there shouldn't be memory overflow issues.
Do you have any ideas of what's going on? Thanks!
6 comments
L
l
t