Find answers from the community

Updated 2 years ago

Is there a retrieval only advance

At a glance

The community members are discussing how to parse Markdown files while preserving their structure, and use post-processors to include adjacent nodes before parsing the prompt. They are looking for a tutorial or example on how to create a graph structure with headers as parents and paragraphs as child nodes. The discussion also touches on the structure of the llama-index library, with one community member suggesting that retrieval should be a separate section from the query module, and that the query and chat engines should be built on top of the retriever instead of the index. The community members provide some resources, such as a custom Markdown parser and a low-level API example, but there is no explicitly marked answer to the original question.

Useful resources
Is there a retrieval only advance tutorial on how to parsing markdown file while preserving its structure, and use post processor to include the adjacent nodes before I parse the prompt so that I can use retrieval differently potentially based on query ? @Logan M @jerryjliu0
T
s
L
35 comments
thanks. Do you happen to know how do I create graph for one markdown document, with header as parents, and list nodes capture the paragraph under the header?
Sounds like you just need to create a custom markdown parser that creates the Document objects the way you want to πŸ€·β€β™‚οΈ
Is there a tutorial or something that I can learn from?
Here's a custom loader I wrote for that video, it ended up being a tad complicated though lol https://github.com/run-llama/llama_docs_bot/blob/main/llama_docs_bot/markdown_docs_reader.py
@Logan M thank, I will take a look. And by the way, maybe it is just me, but I think retrieval is such a core part of what llamaindex do, it feels like it deserve a separate section from "query module".
and leave the query/chat engine as high level interface,
The above video focuses on data loading though. Not sure which part you meant to refer to haha
where is the example where I can construct the tree structured nodes (graph) based syntax of document? @Logan M
I think what I am trying to say is: retrieval is low level, and query engine is high level.
I understand the high level api will help many, but the core value of llamaindex is low level.
So at least for me, by separate these two in the documentation into two sections, it is easy for me to focus.
Also, it will be better if query engine and chat engine are built on top of retriever instead of index.
I know I am being too picky.
The actually are πŸ™‚
I know, all the example in the doc says query engine can be get from index...
When you do index.as_query_engine() it's actually a quick way of setting up this object
https://github.com/jerryjliu/llama_index/blob/main/llama_index/query_engine/retriever_query_engine.py#L24
I understand, but that one line short hand is actually not helping in understanding how things work.
I have to think to get it.
again, being too pickly.
Any how, you guys might have a different priority.
Can you point me to some example where I can build some structure on the list of node? Also the, if there are only next/prev, how to we build the tree structure?
I think that low level api example is exactly what you are looking for πŸ˜… In actuality, every component in llama-index is customizable

We follow a thing called progressive disclosure -- simple API at the start, but you are still able to drill down and pick things into base components (retriever, response synthesizer, node postprocessor...)
There is prev/next and parent/child. These relationships are not really exploited at all though unlesss you've built a custom retriever or custom node-postprocessor

That code sample above for the markdown file makes every code block a child of the parent text block for example. There's not automatic way to do this, you have to build the structure yourself as you parse the file though
understand, my issue really is: by get query from index, you are expose the link between query and index, but hide retrieval that is in between.
just my two cents, it is better to create retriever (not index) from the disk, and get query/chat engine from retriever, there is no gap this way.
Again, I am trying to be picky.
but at the same time, I feel the llamaindex's core value is index/retrieval.
indexing produce index which retriever will user, which query engine will use. πŸ™‚
@Logan M thanks for the help.
Add a reply
Sign up and join the conversation on Discord