isaackogan

do i work on a branch and PR or work in

do i work on a branch and PR or work in my own repo?

17 comments

Hello all,

Hello all,
I’m looking to expand past XML to PDFs, and the one big issue is the one issue everyone has—tables. Is there a recommended OSS way to read them? Specifically something you’d recommend be used with LlamaIndex?

15 comments

iisaackogan

Nodes

i think the answer is no but just checking

5 comments

iisaackogan

Predict

puts a wrench in my subclassing 🫠

33 comments

iisaackogan

Hey folks, I'm running into a problem

Hey folks, I'm running into a problem with date-based data, and am interested if anyone has come across a solution.

If someone asks a question "what's happening next week", I'd like the vector search to be able to retrieve the nodes related to the date range of next week. Obviously if the query just says "next week" and does not have a date, it will not retrieve the correct nodes. So, some processing is needed on the query.

I have some thoughts/ideas on how to do this, but would like some guidance if I'm on the right track.

A - Nodes) Date contents can be pre-processed in nodes and added as metadata to the node for filtering.
B - Queries) The semantic meaning of the query can be processed such that "next week" is subbed out with an actual date, and that date gets used for filtering TopK.

The thing is, for B), I can't seem to find any algorithms for extracting said date context. Is this something where I am forced to use the LLM?

6 comments

iisaackogan

I've been trying to transfer over to the

I've been trying to transfer over to the new version of llamaindex but I keep getting an unauthorized error

35 comments

iisaackogan

Hey is there any tool with llamaindex

Hey, is there any tool with llamaindex that would be optimized for a many-to-one Q&A?

For example:
What time is it, What is the time, When is it —> 6:00pm this Thursday

For example,

7 comments

iisaackogan

Type hint

Removed it from my local copy so it will run but there ya go

1 comment

iisaackogan

Sentence parsing

The problem I am having is that these "partial" sentences are being retrieved...but missing the rest of the sentence...so then no useful context ends up actually being provided in the final LLM call

13 comments

iisaackogan

So I think I’ve reached the accuracy

So I think I’ve reached the accuracy limitations with the barebones components. I’m thinking about designing my own node parser…because we’re accepting solely XML documents I have a lot of control.

Basically, 1) any suggestions/tips, 2) what size of node is optimal before the embeddings blend together 3) does it matter if some nodes are longer than others (is there some kind of class imbalance situation)

13 comments

iisaackogan

Kwargs

ContextChatEngine.from_defaults() ignores kwargs 😛

7 comments

iisaackogan

Nltk

Answer is no: https://stackoverflow.com/questions/31143015/docker-nltk-download

16 comments

iisaackogan

It s frustrating because the response

It's frustrating because the response payload is literally

Plain Text

  "usage": {
    "completion_tokens": 9,
    "prompt_tokens": 35,
    "total_tokens": 44
  }

Which has all the info I need, but the LlamaIndex abstraction makes it harder 😂

31 comments

iisaackogan

@Logan M Launched our prototype today

Launched our prototype today inside the university

3 comments

iisaackogan

Version

Is this a mismatch between the qdrant-client version and llama-index?

2 comments

iisaackogan

Parser

@Logan M Check this out when you have a free sec, let me know what you think 🙂!

I just did this today.

https://github.com/isaackogan/SemanticDocumentParser

7 comments

iisaackogan

Sort of a general RAG question (using

Sort of a general RAG question (using llama-index) to anyone. If you have some sample text data:

Plain Text

I have a corpus of documents that I have broken down into chunks. Each chunk is about 20 sentences long. I also chunked these documents with a sliding window to maintain context. I used the openai embeddings model to create a vector for each chunk of text. Currently, when the user submits a query, the app will embed this query, perform a semantic search against the vector database, then provide the gpt model the top 10 chunks of text with the user query, gpt then provides an answer to the query.

You can use embeddings w/ llamaindex's tools to semantically split this into different chunks.

Let's say that returns you 3 chunks:

Plain Text

I have a corpus of documents that I have broken down into chunks. Each chunk is about 20 sentences long. I also chunked these documents with a sliding window to maintain context. 

I used the openai embeddings model to create a vector for each chunk of text. 

Currently, when the user submits a query, the app will embed this query, perform a semantic search against the vector database, then provide the gpt model the top 10 chunks of text with the user query, gpt then provides an answer to the query.

What I'm wondering is basically, what does the tradeoff look like for these smaller semantic chunks as opposed to a large chunk.

In my head, if you do that initial paragraph as 1 vector, vs. 3 vectors (1 of each chunk), your retrieval ability should be higher with the second approach. Each vector, to me, will be less 'diluted' in terms of info. But what happens when information in 1 semantic unit is dependent on the previous. For example, if chunk 2 only makes sense after reading chunk 1. Are you SOL?

I guess I can't seem to (neither mathametically or logically) figure out what that tradeoff looks like in terms of IR accuracy.

34 comments

iisaackogan

Cria

most proud of this feature/command:

1 comment

iisaackogan

Spraying a Thread

Having a weird problem, was wondering if this is a bug or something

68 comments

iisaackogan

@Logan M LlamaIndex is the gift that

@Logan M LlamaIndex is the gift that keeps on giving; just wanted to mention I lifted your docs for my own library 😂

https://isaackogan.github.io/TikTokLive/

5 comments

iisaackogan

e.g. of current

Plain Text

return StorageContext.from_defaults(
            vector_store=FilteredDocumentVectorStore(
                client=store_config.qdrant_client,
                collection_name=f"{store_config.index_name}"
            )
        )

Find answers from the community

do i work on a branch and PR or work in

Hello all,

Nodes

Predict

Hey folks, I'm running into a problem

I've been trying to transfer over to the

Hey is there any tool with llamaindex

Type hint

Sentence parsing

So I think I’ve reached the accuracy

Kwargs

Nltk

It s frustrating because the response

@Logan M Launched our prototype today

Version

Parser

Sort of a general RAG question (using

Cria

Spraying a Thread

@Logan M LlamaIndex is the gift that

e.g. of current

@Logan M CONGRATS on the new release!!

But the embedding modelll...?

Datasets

Thread