Find answers from the community

Updated 3 months ago

did llama hub get rid of the json loader

did llama hub get rid of the json loader? and is there spacy doc integration with llama index custom docs?
L
d
12 comments
oh sick and do we just pass that into the simple directory?
this all im seeing
Attachment
image.png
yea we need to flesh out the readmes πŸ˜…
you should be able to put that in the file extractor dict for simpledirectory reader yea
Plain Text
file_extractor = {".json": JSONReader()}

SimpleDirectoryReader(..., file_extractor=file_extractor)
Thank you! Question is there a way to integrate spay doc into llama index custom doc. I like the NLP annoation spacy has
I don't think theres a way at the moment πŸ‘€ Need to convert spacy docs to llama-index docs/nodes
Cool. Thats why I was about the json haha
def process_pdfs(pdf_directory):
parser_api_url = "http://localhost:5010/api/parseDocument?renderFormat=all"
pdf_reader = LayoutPDFReader(parser_api_url)

data = []
for filename in os.listdir(pdf_directory):
if filename.endswith(".pdf"):
pdf_path = os.path.join(pdf_directory, filename)
print(f"\nProcessing document: {filename}")
doc = pdf_reader.read_pdf(pdf_path),

extra_info_user = get_extra_info()

for chunk in doc.chunks():
chunk_text = chunk.to_text(include_children=True, recurse=True)
docs = nlp(chunk_text)
extra_info_cats = {"summary": docs.text, "classification": docs.cats}

extra_info = {extra_info_user, extra_info_cats}

document = Document(
text= chunk_text,
extra_info=extra_info
)
data.append(document)
return data
summary of each chunk and classification
Attachment
image.png
i have no clue if that does anything lol
Add a reply
Sign up and join the conversation on Discord