LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

Hi All I need help as I am really

Hi All I need help as I am really

At a glance

The community member is struggling to get Streamlit's file_uploader to work with llamaIndex. They have encountered several errors, including a ValidationError related to the text_splitter parameter and an AttributeError related to the get_content() method of the Document object.

The community members have tried various solutions, such as using the default settings for SimpleNodeParser and removing the square brackets around the docs list. However, they are still encountering issues.

The community members have shared their code on a Gist and requested help from another community member, @Emanuel Ferreira, who has agreed to take a look and provide assistance. They have also discussed the possibility of using a LangChain loader to format the documents correctly.

There is no explicitly marked answer in the comments, but the community members are working together to find a solution to the issue.

Useful resources

AAtaliba Miguel

·

Hi All, I need help as I am really struggling to get it to work streamlit file_uploader with llamaIndex. def configure_qa_chain(uploaded_files):
docs = []
with tempfile.TemporaryDirectory() as temp_dir:
for file in uploaded_files:
with tempfile.NamedTemporaryFile(delete=False, dir=temp_dir) as tmp_file:
tmp_file.write(file.getvalue())
tmp_file_path = tmp_file.name
docs.append(tmp_file_path)
loader = PyPDFLoader(file_path=tmp_file_path)
docs = loader.load()
nodes = SimpleNodeParser(text_splitter=" ") .get_nodes_from_documents([docs])
MONGO_URI = os.environ["MONGO_URI"]
MONGODB_DATABASE = "LlamaIndex_Chat"
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI)
docstore.add_documents(nodes)
storage_context = StorageContext.from_defaults(docstore=MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),
index_store=MongoIndexStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),)
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE)
nodes = list(docstore.docs.values())
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOpenAI( model_name="gpt-3.5-turbo", temperature=0, openai_api_key=openai.api_key,
streaming=True)
llm_predictor = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
list_index = GPTListIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_index = GPTVectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)
keyword_table_index = GPTSimpleKeywordTableIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_response = vector_index.as_query_engine().query(user_query)
return vector_response

L

A

E

35 comments

whats the error?

AAtaliba Miguel

ValidationError: 1 validation error for SimpleNodeParser text_splitter value is not a valid dict (type=type_error.dict)

AAtaliba Miguel

Attachment

nodes = SimpleNodeParser(text_splitter=" ") .get_nodes_from_documents([docs])

text splitter can't be a space, it has to be an actual object 👀

You can likely just use the defaults

nodes = SimpleNodeParser.from_defaults().get_nodes_from_documents([docs])

AAtaliba Miguel

Thanks @Logan M, tried it but now yielding AttributeError: 'list' object has no attribute 'get_content'

AAtaliba Miguel

Attachment

whoops one more error

Plain Text

nodes = SimpleNodeParser.from_defaults().get_nodes_from_documents(docs)

No need for square brackets, it's already a list

AAtaliba Miguel

I guess something might be going wrong now yielding another error - AttributeError: 'Document' object has no attribute 'get_content'

AAtaliba Miguel

I have been struggling to use streamlit file_uploader ever since. As I plan to query own documentation not via path but rather via uploading files from everywhere where located. I understand the Path being too static and cumbersome

AAtaliba Miguel

I really appreciate support here or any exisitng code snippet that uploads documents via streamlit and then fetch it to nodes and storage_contex, service_context and alike

Hmm that error is kind of wild, the Document object definitely has a get_content() method lol

AAtaliba Miguel

How shall I go about it? Please I am really struggling with this for ages and I can't get it to work. Any suggestions on the path forward are very welcomed please

EEmanuel Ferreira

which version are you using?

AAtaliba Miguel

llama-index 0.8.16

EEmanuel Ferreira

can you provide us a repo with a reproducible version? will facilitate a lot to help you

AAtaliba Miguel

Honestly @Emanuel Ferreira the stage I am now it is just the .py file that I can share with you. I haven't gone to repo's on Github yet. If you fine I can send the file across just now or DM you.

EEmanuel Ferreira

https://gist.github.com/

paste here and send me pls

AAtaliba Miguel

@Emanuel Ferreira this is my very first time on this, hope it works fine. There it goes <script src="https://gist.github.com/achilela/5fcd85d690cfe4680a27e6f99f4bc226.js"></script>

AAtaliba Miguel

Pls let know if it worked fine you accessing it

EEmanuel Ferreira

https://gist.github.com/achilela/5fcd85d690cfe4680a27e6f99f4bc226

worked, I will do some tests and let you know

AAtaliba Miguel

Really appreciate @Emanuel Ferreira

EEmanuel Ferreira

can you send me the requirements file?
pip freeze > requirements.txt

EEmanuel Ferreira

then I can have the same versions/packages as you

AAtaliba Miguel

Not sure if done it correctly pip freeze > requirements.txt ~/LLMWorkshop/Experimental
Lama_QA_Retrieval/llamaIndex_streamlit_chat.py

EEmanuel Ferreira

I see you aren't using venv, so it get all your packages (even the ones which isn't on the project)

EEmanuel Ferreira

but that's ok, I can find a way here

AAtaliba Miguel

Really appreciate @Emanuel Ferreira

AAtaliba Miguel

Hi @Emanuel Ferreira, do not meant to push, was just wondering are there any updates on the uploaded_file.

EEmanuel Ferreira

Hi! I'll be trying to take a look on it today yet

EEmanuel Ferreira

@Ataliba Miguel Your issue is that your current document coming from the PDF doesn't follow a format that the node parse request

Plain Text

loader = PyPDFLoader(file_path=tmp_file_path)
docs = loader.load()

then sending your docs to something like that, can solve

Plain Text

from llama_index import Document

def parse_document(docs):
    documents = []
    for document in docs:
        # Manually create a new metadata dictionary and exclude specific keys
        documents.append(
            Document(
                text=document.page_content if hasattr(document, 'pagecontent') else "",
                id=document.metadata.get("id", None),
                metadata=document.metadata,
            )
        )

    return documents

EEmanuel Ferreira

I didn't run all the code until make it work, because there's a lot of things and envs, but you definetly can go to the next steps with that

EEmanuel Ferreira

will mention @Logan M if want to correct me or complement with something

EEmanuel Ferreira

@Ataliba Miguel

Your issue is that it's a langchain document

@Logan M suggested easier than that is to use a langchain Loader

Plain Text

from llama_index import Document

document = Document.from_langchain_format(langchain_document)

so you can go through your docs array and format to a llamaindex document

AAtaliba Miguel

Thanks @Emanuel Ferreira. I will try it out in approx 1/2 hour. Just finishing work now. Will keep you posted. Appreciate.

Add a reply

Sign up and join the conversation on Discord

","dateCreated":"2023-09-11T17:59:21.126Z","dateModified":"2023-09-11T17:59:21.126Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":19,"upvoteCount":0},{"@type":"Comment","text":"Pls let know if it worked fine you accessing it","dateCreated":"2023-09-11T17:59:43.543Z","dateModified":"2023-09-11T17:59:43.543Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":20,"upvoteCount":0},{"@type":"Comment","text":"https://gist.github.com/achilela/5fcd85d690cfe4680a27e6f99f4bc226worked, I will do some tests and let you know","dateCreated":"2023-09-11T18:01:24.237Z","dateModified":"2023-09-11T18:01:24.237Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":21,"upvoteCount":0},{"@type":"Comment","text":"Really appreciate @Emanuel Ferreira","dateCreated":"2023-09-11T18:01:44.996Z","dateModified":"2023-09-11T18:01:44.996Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":22,"upvoteCount":0},{"@type":"Comment","text":"can you send me the requirements file?pip freeze > requirements.txt","dateCreated":"2023-09-11T18:07:18.991Z","dateModified":"2023-09-11T18:07:18.991Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":23,"upvoteCount":0},{"@type":"Comment","text":"then I can have the same versions/packages as you","dateCreated":"2023-09-11T18:07:24.103Z","dateModified":"2023-09-11T18:07:24.103Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":24,"upvoteCount":0},{"@type":"Comment","text":"Not sure if done it correctly pip freeze > requirements.txt ~/LLMWorkshop/ExperimentalLama_QA_Retrieval/llamaIndex_streamlit_chat.py","dateCreated":"2023-09-11T18:22:29.971Z","dateModified":"2023-09-11T18:22:29.971Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":25,"upvoteCount":0},{"@type":"Comment","text":"I see you aren't using venv, so it get all your packages (even the ones which isn't on the project)","dateCreated":"2023-09-11T18:59:09.827Z","dateModified":"2023-09-11T18:59:09.827Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":26,"upvoteCount":0},{"@type":"Comment","text":"but that's ok, I can find a way here","dateCreated":"2023-09-11T18:59:57.689Z","dateModified":"2023-09-11T18:59:57.689Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":27,"upvoteCount":0},{"@type":"Comment","text":"Really appreciate @Emanuel Ferreira","dateCreated":"2023-09-11T21:23:52.898Z","dateModified":"2023-09-11T21:23:52.898Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":28,"upvoteCount":0},{"@type":"Comment","text":"Hi @Emanuel Ferreira, do not meant to push, was just wondering are there any updates on the uploaded_file.","dateCreated":"2023-09-13T17:50:49.759Z","dateModified":"2023-09-13T17:50:49.759Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":29,"upvoteCount":0},{"@type":"Comment","text":"Hi! I'll be trying to take a look on it today yet","dateCreated":"2023-09-13T17:58:16.280Z","dateModified":"2023-09-13T17:58:16.280Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":30,"upvoteCount":0},{"@type":"Comment","text":"@Ataliba Miguel Your issue is that your current document coming from the PDF doesn't follow a format that the node parse requestloader = PyPDFLoader(file_path=tmp_file_path)\ndocs = loader.load()then sending your docs to something like that, can solvefrom llama_index import Document def parse_document(docs): documents = [] for document in docs: # Manually create a new metadata dictionary and exclude specific keys documents.append( Document( text=document.page_content if hasattr(document, 'pagecontent') else \"\", id=document.metadata.get(\"id\", None), metadata=document.metadata, ) ) return documents","dateCreated":"2023-09-13T23:57:53.289Z","dateModified":"2023-09-13T23:57:53.289Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":31,"upvoteCount":0},{"@type":"Comment","text":"I didn't run all the code until make it work, because there's a lot of things and envs, but you definetly can go to the next steps with that","dateCreated":"2023-09-13T23:58:16.899Z","dateModified":"2023-09-13T23:58:16.899Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":32,"upvoteCount":0},{"@type":"Comment","text":"will mention @Logan M if want to correct me or complement with something","dateCreated":"2023-09-13T23:58:27.161Z","dateModified":"2023-09-13T23:58:27.161Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":33,"upvoteCount":0},{"@type":"Comment","text":"@Ataliba Miguel Your issue is that it's a langchain document@Logan M suggested easier than that is to use a langchain Loaderfrom llama_index import Document document = Document.from_langchain_format(langchain_document)so you can go through your docs array and format to a llamaindex document","dateCreated":"2023-09-14T15:25:04.779Z","dateModified":"2023-09-14T15:25:04.779Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":34,"upvoteCount":0},{"@type":"Comment","text":"Thanks @Emanuel Ferreira. I will try it out in approx 1/2 hour. Just finishing work now. Will keep you posted. Appreciate.","dateCreated":"2023-09-14T15:38:16.133Z","dateModified":"2023-09-14T15:38:16.133Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":35,"upvoteCount":0}],"author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"interactionStatistic":{"@type":"InteractionCounter","interactionType":{"@type":"LikeAction"},"userInteractionCount":0}}]