The community member is looking for a llamaindex variant of the Langchains HTMLHeaderTextSplitter, as they have tried using HTMLNodeParser but the output is not satisfactory. They have also tried wrapping it with LangchainNodeParser, but it fails because Llamaindex expects a list of strings while Langchain is returning a list of document objects.
The comments suggest that the community member could implement a custom solution by either using the Langchain splitter and converting the output to Llamaindex nodes/documents, or by creating a custom component for the IngestionPipeline. There is a link provided to the documentation on implementing custom transformations in Llamaindex.
I have tried wrapping with LangchainNodeParser but it fails because Llamaindex expects a list of strings while langchain is returning a list of document objects.
if its just splitting by tag, that sounds pretty easy to implement yourself (or just use the langchain splitter and convert the output to llama-index nodes/documents)