Find answers from the community

Updated 2 months ago

User is experiencing issues with LlamaParse reparsing documents repeatedly

I implemented LlamaParse like this, but for some reason it always reparses the document. I would have expected the document to only be parsed once? @Logan M Can you maybe tell me what i am doing wrong here? It tries to reparse even before the 48h breakpoint.

Plain Text
def get_file_documents(config: FileLoaderConfig):
    parser = llama_parse_parser()
    files_info = fetch_file_list()
    logger.info(
        f"List of files ready for download. Number of files to download: {len(files_info)}"
    )

    if config.use_llama_parse:
        file_paths = []
        for file_info in files_info:
            resource_url = file_info["resourceURL"]
            file_name = file_info["fileName"]
            file_path = os.path.join(config.data_dir, file_name)
            if not os.path.exists(file_path):
                download_file(resource_url, file_path)
                logger.info(
                    f"Successfully downloaded file: {file_name} and saved it on the server."
                )
            file_paths.append(file_path)
        
        documents = []  
        for file_number, file_path in enumerate(file_paths, 1):
            file_name = os.path.basename(file_path)
            json_representation = parser.get_json_result(file_path)
            document = parser.load_data(
                file_path=file_path,
                extra_info={
                    "file_name": file_name,
                    "file_number": file_number,
                    "pages": json_representation[0]["pages"]
                }
            )

            documents.append(
                document
            )
L
S
5 comments
What does llama_parse_parser() do?
Creates an instance of LLamaParse
Plain Text
def llama_parse_parser():
    if os.getenv("LLAMA_CLOUD_API_KEY") is None:
        raise ValueError(
            "LLAMA_CLOUD_API_KEY environment variable is not set. "
            "Please set it in .env file or in your shell environment then run again!"
        )
    parser = LlamaParse(verbose=True, language="de", result_type=ResultType.MD)
    return parser
How do you know its being reparsed? For me, the second run takes much less time (seconds compared to minutes)
It seems to be cached... Thanks! I thought it's reparsed, but it just creates a new job in the WebUI with no credits being used..
Add a reply
Sign up and join the conversation on Discord