Find answers from the community

Updated 3 months ago

Doc types

Hello. I saw langchain has a lot of possibilities in order to index different types of document (powerpoint, notion, markdown...). It therea a way to make langchain's Document object compatible with llama-index?
L
M
j
5 comments
Llama index also supports many document types as well! Is there a certain type you are worried about?
I've been taking a look at the Document object and it seems that it only allows receiving text by parameter. Is there any way to be able to add other types of files? Images, excel, csv...
Using SimpleDirectoryReader, you can load a ton of different formats. We handle extracting the text into document objects: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/readers/file/base.py#L19

There is also support for 3rd party documents like notion, see here: https://gpt-index.readthedocs.io/en/latest/how_to/data_connectors.html
@Manu Lorenzo also if your doc is already in langchain format, you can do Document.from_langchain_format(langchain_document) to convert it to llamaindex format
Add a reply
Sign up and join the conversation on Discord