Find answers from the community

Updated 2 years ago

Hi All I need help as I am really

At a glance

The community member is struggling to get Streamlit's file_uploader to work with llamaIndex. They have encountered several errors, including a ValidationError related to the text_splitter parameter and an AttributeError related to the get_content() method of the Document object.

The community members have tried various solutions, such as using the default settings for SimpleNodeParser and removing the square brackets around the docs list. However, they are still encountering issues.

The community members have shared their code on a Gist and requested help from another community member, @Emanuel Ferreira, who has agreed to take a look and provide assistance. They have also discussed the possibility of using a LangChain loader to format the documents correctly.

There is no explicitly marked answer in the comments, but the community members are working together to find a solution to the issue.

Useful resources
Hi All, I need help as I am really struggling to get it to work streamlit file_uploader with llamaIndex. def configure_qa_chain(uploaded_files):
docs = []
with tempfile.TemporaryDirectory() as temp_dir:
for file in uploaded_files:
with tempfile.NamedTemporaryFile(delete=False, dir=temp_dir) as tmp_file:
tmp_file.write(file.getvalue())
tmp_file_path = tmp_file.name
docs.append(tmp_file_path)
loader = PyPDFLoader(file_path=tmp_file_path)
docs = loader.load()
nodes = SimpleNodeParser(text_splitter=" ") .get_nodes_from_documents([docs])
MONGO_URI = os.environ["MONGO_URI"]
MONGODB_DATABASE = "LlamaIndex_Chat"
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI)
docstore.add_documents(nodes)
storage_context = StorageContext.from_defaults(docstore=MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),
index_store=MongoIndexStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),)
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE)
nodes = list(docstore.docs.values())
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOpenAI( model_name="gpt-3.5-turbo", temperature=0, openai_api_key=openai.api_key,
streaming=True)
llm_predictor = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
list_index = GPTListIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_index = GPTVectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)
keyword_table_index = GPTSimpleKeywordTableIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_response = vector_index.as_query_engine().query(user_query)
return vector_response
L
A
E
35 comments
whats the error?
ValidationError: 1 validation error for SimpleNodeParser text_splitter value is not a valid dict (type=type_error.dict)
nodes = SimpleNodeParser(text_splitter=" ") .get_nodes_from_documents([docs])

text splitter can't be a space, it has to be an actual object πŸ‘€

You can likely just use the defaults

nodes = SimpleNodeParser.from_defaults().get_nodes_from_documents([docs])
Thanks @Logan M, tried it but now yielding AttributeError: 'list' object has no attribute 'get_content'
whoops one more error


Plain Text
nodes = SimpleNodeParser.from_defaults().get_nodes_from_documents(docs)
No need for square brackets, it's already a list
I guess something might be going wrong now yielding another error - AttributeError: 'Document' object has no attribute 'get_content'
I have been struggling to use streamlit file_uploader ever since. As I plan to query own documentation not via path but rather via uploading files from everywhere where located. I understand the Path being too static and cumbersome
I really appreciate support here or any exisitng code snippet that uploads documents via streamlit and then fetch it to nodes and storage_contex, service_context and alike
Hmm that error is kind of wild, the Document object definitely has a get_content() method lol
How shall I go about it? Please I am really struggling with this for ages and I can't get it to work. Any suggestions on the path forward are very welcomed please
which version are you using?
llama-index 0.8.16
can you provide us a repo with a reproducible version? will facilitate a lot to help you
Honestly @Emanuel Ferreira the stage I am now it is just the .py file that I can share with you. I haven't gone to repo's on Github yet. If you fine I can send the file across just now or DM you.
@Emanuel Ferreira this is my very first time on this, hope it works fine. There it goes <script src="https://gist.github.com/achilela/5fcd85d690cfe4680a27e6f99f4bc226.js"></script>
Pls let know if it worked fine you accessing it
Really appreciate @Emanuel Ferreira
can you send me the requirements file?
pip freeze > requirements.txt
then I can have the same versions/packages as you
Not sure if done it correctly pip freeze > requirements.txt ~/LLMWorkshop/Experimental
Lama_QA_Retrieval/llamaIndex_streamlit_chat.py
I see you aren't using venv, so it get all your packages (even the ones which isn't on the project)
but that's ok, I can find a way here
Really appreciate @Emanuel Ferreira
Hi @Emanuel Ferreira, do not meant to push, was just wondering are there any updates on the uploaded_file.
Hi! I'll be trying to take a look on it today yet
@Ataliba Miguel Your issue is that your current document coming from the PDF doesn't follow a format that the node parse request
Plain Text
loader = PyPDFLoader(file_path=tmp_file_path)
docs = loader.load()

then sending your docs to something like that, can solve
Plain Text
from llama_index import Document

def parse_document(docs):
    documents = []
    for document in docs:
        # Manually create a new metadata dictionary and exclude specific keys
        documents.append(
            Document(
                text=document.page_content if hasattr(document, 'pagecontent') else "",
                id=document.metadata.get("id", None),
                metadata=document.metadata,
            )
        )

    return documents
I didn't run all the code until make it work, because there's a lot of things and envs, but you definetly can go to the next steps with that
will mention @Logan M if want to correct me or complement with something
@Ataliba Miguel

Your issue is that it's a langchain document

@Logan M suggested easier than that is to use a langchain Loader
Plain Text
from llama_index import Document

document = Document.from_langchain_format(langchain_document)

so you can go through your docs array and format to a llamaindex document
Thanks @Emanuel Ferreira. I will try it out in approx 1/2 hour. Just finishing work now. Will keep you posted. Appreciate.
Add a reply
Sign up and join the conversation on Discord
","dateCreated":"2023-09-11T17:59:21.126Z","dateModified":"2023-09-11T17:59:21.126Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":19,"upvoteCount":0},{"@type":"Comment","text":"Pls let know if it worked fine you accessing it","dateCreated":"2023-09-11T17:59:43.543Z","dateModified":"2023-09-11T17:59:43.543Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":20,"upvoteCount":0},{"@type":"Comment","text":"https://gist.github.com/achilela/5fcd85d690cfe4680a27e6f99f4bc226worked, I will do some tests and let you know","dateCreated":"2023-09-11T18:01:24.237Z","dateModified":"2023-09-11T18:01:24.237Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":21,"upvoteCount":0},{"@type":"Comment","text":"Really appreciate @Emanuel Ferreira","dateCreated":"2023-09-11T18:01:44.996Z","dateModified":"2023-09-11T18:01:44.996Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":22,"upvoteCount":0},{"@type":"Comment","text":"can you send me the requirements file?pip freeze > requirements.txt","dateCreated":"2023-09-11T18:07:18.991Z","dateModified":"2023-09-11T18:07:18.991Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":23,"upvoteCount":0},{"@type":"Comment","text":"then I can have the same versions/packages as you","dateCreated":"2023-09-11T18:07:24.103Z","dateModified":"2023-09-11T18:07:24.103Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":24,"upvoteCount":0},{"@type":"Comment","text":"Not sure if done it correctly pip freeze > requirements.txt ~/LLMWorkshop/ExperimentalLama_QA_Retrieval/llamaIndex_streamlit_chat.py","dateCreated":"2023-09-11T18:22:29.971Z","dateModified":"2023-09-11T18:22:29.971Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":25,"upvoteCount":0},{"@type":"Comment","text":"I see you aren't using venv, so it get all your packages (even the ones which isn't on the project)","dateCreated":"2023-09-11T18:59:09.827Z","dateModified":"2023-09-11T18:59:09.827Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":26,"upvoteCount":0},{"@type":"Comment","text":"but that's ok, I can find a way here","dateCreated":"2023-09-11T18:59:57.689Z","dateModified":"2023-09-11T18:59:57.689Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":27,"upvoteCount":0},{"@type":"Comment","text":"Really appreciate @Emanuel Ferreira","dateCreated":"2023-09-11T21:23:52.898Z","dateModified":"2023-09-11T21:23:52.898Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":28,"upvoteCount":0},{"@type":"Comment","text":"Hi @Emanuel Ferreira, do not meant to push, was just wondering are there any updates on the uploaded_file.","dateCreated":"2023-09-13T17:50:49.759Z","dateModified":"2023-09-13T17:50:49.759Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":29,"upvoteCount":0},{"@type":"Comment","text":"Hi! I'll be trying to take a look on it today yet","dateCreated":"2023-09-13T17:58:16.280Z","dateModified":"2023-09-13T17:58:16.280Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":30,"upvoteCount":0},{"@type":"Comment","text":"@Ataliba Miguel Your issue is that your current document coming from the PDF doesn't follow a format that the node parse requestloader = PyPDFLoader(file_path=tmp_file_path)\ndocs = loader.load()then sending your docs to something like that, can solvefrom llama_index import Document def parse_document(docs): documents = [] for document in docs: # Manually create a new metadata dictionary and exclude specific keys documents.append( Document( text=document.page_content if hasattr(document, 'pagecontent') else \"\", id=document.metadata.get(\"id\", None), metadata=document.metadata, ) ) return documents","dateCreated":"2023-09-13T23:57:53.289Z","dateModified":"2023-09-13T23:57:53.289Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":31,"upvoteCount":0},{"@type":"Comment","text":"I didn't run all the code until make it work, because there's a lot of things and envs, but you definetly can go to the next steps with that","dateCreated":"2023-09-13T23:58:16.899Z","dateModified":"2023-09-13T23:58:16.899Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":32,"upvoteCount":0},{"@type":"Comment","text":"will mention @Logan M if want to correct me or complement with something","dateCreated":"2023-09-13T23:58:27.161Z","dateModified":"2023-09-13T23:58:27.161Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":33,"upvoteCount":0},{"@type":"Comment","text":"@Ataliba Miguel Your issue is that it's a langchain document@Logan M suggested easier than that is to use a langchain Loaderfrom llama_index import Document document = Document.from_langchain_format(langchain_document)so you can go through your docs array and format to a llamaindex document","dateCreated":"2023-09-14T15:25:04.779Z","dateModified":"2023-09-14T15:25:04.779Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":34,"upvoteCount":0},{"@type":"Comment","text":"Thanks @Emanuel Ferreira. I will try it out in approx 1/2 hour. Just finishing work now. Will keep you posted. Appreciate.","dateCreated":"2023-09-14T15:38:16.133Z","dateModified":"2023-09-14T15:38:16.133Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":35,"upvoteCount":0}],"author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"interactionStatistic":{"@type":"InteractionCounter","interactionType":{"@type":"LikeAction"},"userInteractionCount":0}}]