User is experiencing issues with LlamaParse reparsing d...

At a glance

The community member implemented LlamaParse and expected the document to be parsed only once, but it seems to be reparsed even before the 48-hour breakpoint. They asked @Logan M for help to understand what they are doing wrong. The comments suggest that the document is actually being cached, and the second run takes much less time compared to the first one. The community members confirmed that it's not being reparsed, but rather a new job is being created in the WebUI with no credits being used.

SSaltuk

I implemented LlamaParse like this, but for some reason it always reparses the document. I would have expected the document to only be parsed once? @Logan M Can you maybe tell me what i am doing wrong here? It tries to reparse even before the 48h breakpoint.

Plain Text

def get_file_documents(config: FileLoaderConfig):
    parser = llama_parse_parser()
    files_info = fetch_file_list()
    logger.info(
        f"List of files ready for download. Number of files to download: {len(files_info)}"
    )

    if config.use_llama_parse:
        file_paths = []
        for file_info in files_info:
            resource_url = file_info["resourceURL"]
            file_name = file_info["fileName"]
            file_path = os.path.join(config.data_dir, file_name)
            if not os.path.exists(file_path):
                download_file(resource_url, file_path)
                logger.info(
                    f"Successfully downloaded file: {file_name} and saved it on the server."
                )
            file_paths.append(file_path)
        
        documents = []  
        for file_number, file_path in enumerate(file_paths, 1):
            file_name = os.path.basename(file_path)
            json_representation = parser.get_json_result(file_path)
            document = parser.load_data(
                file_path=file_path,
                extra_info={
                    "file_name": file_name,
                    "file_number": file_number,
                    "pages": json_representation[0]["pages"]
                }
            )

            documents.append(
                document
            )

5 comments

LLogan M

What does llama_parse_parser() do?

SSaltuk

Creates an instance of LLamaParse

Plain Text

def llama_parse_parser():
    if os.getenv("LLAMA_CLOUD_API_KEY") is None:
        raise ValueError(
            "LLAMA_CLOUD_API_KEY environment variable is not set. "
            "Please set it in .env file or in your shell environment then run again!"
        )
    parser = LlamaParse(verbose=True, language="de", result_type=ResultType.MD)
    return parser

LLogan M

How do you know its being reparsed? For me, the second run takes much less time (seconds compared to minutes)

SSaltuk

It seems to be cached... Thanks! I thought it's reparsed, but it just creates a new job in the WebUI with no credits being used..

LLogan M

Yup ✅

Add a reply

Find answers from the community

User is experiencing issues with LlamaParse reparsing documents repeatedly