hey guys im trying to load in my slack

At a glance

hey guys im trying to load in my slack channel but i keep getting this error:

'TextNode' object has no attribute 'get_doc_id'. Did you mean: 'ref_doc_id

14 comments

LLogan M

from_documents is for documents 👍

Try this

index = VectorStoreIndex(nodes, storage_context = storage_context)

SSteven

now i get this:

tenacity.RetryError: RetryError[<Future at 0x105e41420 state=finished raised AuthenticationError>]

@Logan M

LLogan M

Are you running in a notebook? Api keys have been weird lately with langchain llms

In addition to an env variable, try setting it directly on the module

Plain Text

import openai

openai.api_key = "sk...."

SSteven

so weird

SSteven

thank u sm for ur help @Logan M

SSteven

im having a lot of trouble understanding a few things --

Weaviate -- why would I need to use weaviate? does it just simply offer a cloud storage for indexes created by llama index?
What is the structure / file format that the loaders load the documents in? Surely it's not just pure text? That is what it seems like, but I don't get the point of using these loaders if its just text. what is so great about them?

SSteven

is there just like one still up to date tutorial on how to most effectively load data in, and most effectively query it?

LLogan M

Weaviate (and other vector store integrations) are mostly needed when either a) your index is very large and the default in-memory solutions starts to slow down and b) yes for hosted storage

It really is just text, but with some extra sugar on top (i.e. metadata) that some loaders setup for you. Really you can skip loaders and create your own Document objects if you wanted. One thing to keep in mind is that documents are broken into chunks when put inside an index. And those chunks/nodes inherit the metadata of the source document

Not sure what you mean by most effective. A lot of stuff depends on your data and use case. But in general, tossing a bunch of documents into an index and querying will get you pretty far. If you have a lot of documents, you can adjust the top k, or consider more complex query engines with multiple indexes (router query engines, sub question, sql query engines, agents, etc.)

SSteven

do you have any really comprehensive videos / tutorials i can follow

SSteven

almost every single one is ether broken, no longer relevant, or so simple that it doesn't really explain anything of use

LLogan M

No video for this yet really. Maybe the closest is this?

https://youtu.be/GT_Lsj3xj1o

If any of our official docs are broken, please let me know. We do our best to maintain anything in our docs, but things also move fast.

Although there is a wealth of knowledge in the docs on these subjects, while some examples may be simple I think they do a good job of getting across how to leverage certain things

Here's a few notebooks I personally find helpful for reference

https://gpt-index.readthedocs.io/en/latest/examples/query_engine/RouterQueryEngine.html

https://gpt-index.readthedocs.io/en/latest/examples/query_engine/sub_question_query_engine.html

https://gpt-index.readthedocs.io/en/latest/examples/agent/openai_agent_with_query_engine.html

LLogan M

Tbh though u should note the docs update when the source code updates 😅

Maybe use the docs that are pinned to a release version, biiiig updates coming tomorrow

LLogan M

https://gpt-index.readthedocs.io/en/v0.6.38.post1/index.html

SSteven

thank u so much!!

Add a reply

Find answers from the community

hey guys im trying to load in my slack