Find answers from the community

Updated 4 months ago

User is experiencing issues with LlamaParse reparsing documents repeatedly

At a glance

The community member implemented LlamaParse and expected the document to be parsed only once, but it seems to be reparsed even before the 48-hour breakpoint. They asked @Logan M for help to understand what they are doing wrong. The comments suggest that the document is actually being cached, and the second run takes much less time compared to the first one. The community members confirmed that it's not being reparsed, but rather a new job is being created in the WebUI with no credits being used.

I implemented LlamaParse like this, but for some reason it always reparses the document. I would have expected the document to only be parsed once? @Logan M Can you maybe tell me what i am doing wrong here? It tries to reparse even before the 48h breakpoint.

Plain Text
def get_file_documents(config: FileLoaderConfig):
    parser = llama_parse_parser()
    files_info = fetch_file_list()
    logger.info(
        f"List of files ready for download. Number of files to download: {len(files_info)}"
    )

    if config.use_llama_parse:
        file_paths = []
        for file_info in files_info:
            resource_url = file_info["resourceURL"]
            file_name = file_info["fileName"]
            file_path = os.path.join(config.data_dir, file_name)
            if not os.path.exists(file_path):
                download_file(resource_url, file_path)
                logger.info(
                    f"Successfully downloaded file: {file_name} and saved it on the server."
                )
            file_paths.append(file_path)
        
        documents = []  
        for file_number, file_path in enumerate(file_paths, 1):
            file_name = os.path.basename(file_path)
            json_representation = parser.get_json_result(file_path)
            document = parser.load_data(
                file_path=file_path,
                extra_info={
                    "file_name": file_name,
                    "file_number": file_number,
                    "pages": json_representation[0]["pages"]
                }
            )

            documents.append(
                document
            )
L
S
5 comments
What does llama_parse_parser() do?
Creates an instance of LLamaParse
Plain Text
def llama_parse_parser():
    if os.getenv("LLAMA_CLOUD_API_KEY") is None:
        raise ValueError(
            "LLAMA_CLOUD_API_KEY environment variable is not set. "
            "Please set it in .env file or in your shell environment then run again!"
        )
    parser = LlamaParse(verbose=True, language="de", result_type=ResultType.MD)
    return parser
How do you know its being reparsed? For me, the second run takes much less time (seconds compared to minutes)
It seems to be cached... Thanks! I thought it's reparsed, but it just creates a new job in the WebUI with no credits being used..
Add a reply
Sign up and join the conversation on Discord