Find answers from the community

Updated last year

Hi All I need help as I am really

Hi All, I need help as I am really struggling to get it to work streamlit file_uploader with llamaIndex. def configure_qa_chain(uploaded_files):
docs = []
with tempfile.TemporaryDirectory() as temp_dir:
for file in uploaded_files:
with tempfile.NamedTemporaryFile(delete=False, dir=temp_dir) as tmp_file:
tmp_file.write(file.getvalue())
tmp_file_path = tmp_file.name
docs.append(tmp_file_path)
loader = PyPDFLoader(file_path=tmp_file_path)
docs = loader.load()
nodes = SimpleNodeParser(text_splitter=" ") .get_nodes_from_documents([docs])
MONGO_URI = os.environ["MONGO_URI"]
MONGODB_DATABASE = "LlamaIndex_Chat"
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI)
docstore.add_documents(nodes)
storage_context = StorageContext.from_defaults(docstore=MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),
index_store=MongoIndexStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE),)
docstore = MongoDocumentStore.from_uri(uri=MONGO_URI, db_name=MONGODB_DATABASE)
nodes = list(docstore.docs.values())
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
llm = ChatOpenAI( model_name="gpt-3.5-turbo", temperature=0, openai_api_key=openai.api_key,
streaming=True)
llm_predictor = LLMPredictor(llm=llm)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
list_index = GPTListIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_index = GPTVectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)
keyword_table_index = GPTSimpleKeywordTableIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_response = vector_index.as_query_engine().query(user_query)
return vector_response
L
A
E
35 comments
whats the error?
ValidationError: 1 validation error for SimpleNodeParser text_splitter value is not a valid dict (type=type_error.dict)
nodes = SimpleNodeParser(text_splitter=" ") .get_nodes_from_documents([docs])

text splitter can't be a space, it has to be an actual object πŸ‘€

You can likely just use the defaults

nodes = SimpleNodeParser.from_defaults().get_nodes_from_documents([docs])
Thanks @Logan M, tried it but now yielding AttributeError: 'list' object has no attribute 'get_content'
whoops one more error


Plain Text
nodes = SimpleNodeParser.from_defaults().get_nodes_from_documents(docs)
No need for square brackets, it's already a list
I guess something might be going wrong now yielding another error - AttributeError: 'Document' object has no attribute 'get_content'
I have been struggling to use streamlit file_uploader ever since. As I plan to query own documentation not via path but rather via uploading files from everywhere where located. I understand the Path being too static and cumbersome
I really appreciate support here or any exisitng code snippet that uploads documents via streamlit and then fetch it to nodes and storage_contex, service_context and alike
Hmm that error is kind of wild, the Document object definitely has a get_content() method lol
How shall I go about it? Please I am really struggling with this for ages and I can't get it to work. Any suggestions on the path forward are very welcomed please
which version are you using?
llama-index 0.8.16
can you provide us a repo with a reproducible version? will facilitate a lot to help you
Honestly @Emanuel Ferreira the stage I am now it is just the .py file that I can share with you. I haven't gone to repo's on Github yet. If you fine I can send the file across just now or DM you.
@Emanuel Ferreira this is my very first time on this, hope it works fine. There it goes <script src="https://gist.github.com/achilela/5fcd85d690cfe4680a27e6f99f4bc226.js"></script>
Pls let know if it worked fine you accessing it
Really appreciate @Emanuel Ferreira
can you send me the requirements file?
pip freeze > requirements.txt
then I can have the same versions/packages as you
Not sure if done it correctly pip freeze > requirements.txt ~/LLMWorkshop/Experimental
Lama_QA_Retrieval/llamaIndex_streamlit_chat.py
I see you aren't using venv, so it get all your packages (even the ones which isn't on the project)
but that's ok, I can find a way here
Really appreciate @Emanuel Ferreira
Hi @Emanuel Ferreira, do not meant to push, was just wondering are there any updates on the uploaded_file.
Hi! I'll be trying to take a look on it today yet
@Ataliba Miguel Your issue is that your current document coming from the PDF doesn't follow a format that the node parse request
Plain Text
loader = PyPDFLoader(file_path=tmp_file_path)
docs = loader.load()

then sending your docs to something like that, can solve
Plain Text
from llama_index import Document

def parse_document(docs):
    documents = []
    for document in docs:
        # Manually create a new metadata dictionary and exclude specific keys
        documents.append(
            Document(
                text=document.page_content if hasattr(document, 'pagecontent') else "",
                id=document.metadata.get("id", None),
                metadata=document.metadata,
            )
        )

    return documents
I didn't run all the code until make it work, because there's a lot of things and envs, but you definetly can go to the next steps with that
will mention @Logan M if want to correct me or complement with something
@Ataliba Miguel

Your issue is that it's a langchain document

@Logan M suggested easier than that is to use a langchain Loader
Plain Text
from llama_index import Document

document = Document.from_langchain_format(langchain_document)

so you can go through your docs array and format to a llamaindex document
Thanks @Emanuel Ferreira. I will try it out in approx 1/2 hour. Just finishing work now. Will keep you posted. Appreciate.
Add a reply
Sign up and join the conversation on Discord
","dateCreated":"2023-09-11T17:59:21.126Z","dateModified":"2023-09-11T17:59:21.126Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":19,"upvoteCount":0},{"@type":"Comment","text":"Pls let know if it worked fine you accessing it","dateCreated":"2023-09-11T17:59:43.543Z","dateModified":"2023-09-11T17:59:43.543Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":20,"upvoteCount":0},{"@type":"Comment","text":"https://gist.github.com/achilela/5fcd85d690cfe4680a27e6f99f4bc226worked, I will do some tests and let you know","dateCreated":"2023-09-11T18:01:24.237Z","dateModified":"2023-09-11T18:01:24.237Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":21,"upvoteCount":0},{"@type":"Comment","text":"Really appreciate @Emanuel Ferreira","dateCreated":"2023-09-11T18:01:44.996Z","dateModified":"2023-09-11T18:01:44.996Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":22,"upvoteCount":0},{"@type":"Comment","text":"can you send me the requirements file?pip freeze > requirements.txt","dateCreated":"2023-09-11T18:07:18.991Z","dateModified":"2023-09-11T18:07:18.991Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":23,"upvoteCount":0},{"@type":"Comment","text":"then I can have the same versions/packages as you","dateCreated":"2023-09-11T18:07:24.103Z","dateModified":"2023-09-11T18:07:24.103Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":24,"upvoteCount":0},{"@type":"Comment","text":"Not sure if done it correctly pip freeze > requirements.txt ~/LLMWorkshop/ExperimentalLama_QA_Retrieval/llamaIndex_streamlit_chat.py","dateCreated":"2023-09-11T18:22:29.971Z","dateModified":"2023-09-11T18:22:29.971Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":25,"upvoteCount":0},{"@type":"Comment","text":"I see you aren't using venv, so it get all your packages (even the ones which isn't on the project)","dateCreated":"2023-09-11T18:59:09.827Z","dateModified":"2023-09-11T18:59:09.827Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":26,"upvoteCount":0},{"@type":"Comment","text":"but that's ok, I can find a way here","dateCreated":"2023-09-11T18:59:57.689Z","dateModified":"2023-09-11T18:59:57.689Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":27,"upvoteCount":0},{"@type":"Comment","text":"Really appreciate @Emanuel Ferreira","dateCreated":"2023-09-11T21:23:52.898Z","dateModified":"2023-09-11T21:23:52.898Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":28,"upvoteCount":0},{"@type":"Comment","text":"Hi @Emanuel Ferreira, do not meant to push, was just wondering are there any updates on the uploaded_file.","dateCreated":"2023-09-13T17:50:49.759Z","dateModified":"2023-09-13T17:50:49.759Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":29,"upvoteCount":0},{"@type":"Comment","text":"Hi! I'll be trying to take a look on it today yet","dateCreated":"2023-09-13T17:58:16.280Z","dateModified":"2023-09-13T17:58:16.280Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":30,"upvoteCount":0},{"@type":"Comment","text":"@Ataliba Miguel Your issue is that your current document coming from the PDF doesn't follow a format that the node parse requestloader = PyPDFLoader(file_path=tmp_file_path)\ndocs = loader.load()then sending your docs to something like that, can solvefrom llama_index import Document def parse_document(docs): documents = [] for document in docs: # Manually create a new metadata dictionary and exclude specific keys documents.append( Document( text=document.page_content if hasattr(document, 'pagecontent') else \"\", id=document.metadata.get(\"id\", None), metadata=document.metadata, ) ) return documents","dateCreated":"2023-09-13T23:57:53.289Z","dateModified":"2023-09-13T23:57:53.289Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":31,"upvoteCount":0},{"@type":"Comment","text":"I didn't run all the code until make it work, because there's a lot of things and envs, but you definetly can go to the next steps with that","dateCreated":"2023-09-13T23:58:16.899Z","dateModified":"2023-09-13T23:58:16.899Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":32,"upvoteCount":0},{"@type":"Comment","text":"will mention @Logan M if want to correct me or complement with something","dateCreated":"2023-09-13T23:58:27.161Z","dateModified":"2023-09-13T23:58:27.161Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":33,"upvoteCount":0},{"@type":"Comment","text":"@Ataliba Miguel Your issue is that it's a langchain document@Logan M suggested easier than that is to use a langchain Loaderfrom llama_index import Document document = Document.from_langchain_format(langchain_document)so you can go through your docs array and format to a llamaindex document","dateCreated":"2023-09-14T15:25:04.779Z","dateModified":"2023-09-14T15:25:04.779Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/f8bc593f-906c-4311-9751-20723458b663","name":"Emanuel Ferreira","identifier":"f8bc593f-906c-4311-9751-20723458b663","image":"https://cdn.discordapp.com/avatars/312981680027729942/384656b32a291d58050347a38a3e9a0f.webp?size=256"},"commentCount":0,"comment":[],"position":34,"upvoteCount":0},{"@type":"Comment","text":"Thanks @Emanuel Ferreira. I will try it out in approx 1/2 hour. Just finishing work now. Will keep you posted. Appreciate.","dateCreated":"2023-09-14T15:38:16.133Z","dateModified":"2023-09-14T15:38:16.133Z","author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"commentCount":0,"comment":[],"position":35,"upvoteCount":0}],"author":{"@type":"Person","url":"https://community.llamaindex.ai/members/958dd987-cf2d-4f41-8237-a797246122fd","name":"Ataliba Miguel","identifier":"958dd987-cf2d-4f41-8237-a797246122fd","image":"https://cdn.discordapp.com/avatars/913036801697001522/2b42c2e44ffa2f09c78c057a78c9b646.webp?size=256"},"interactionStatistic":{"@type":"InteractionCounter","interactionType":{"@type":"LikeAction"},"userInteractionCount":0}}]