how to store simpledirectoryreader indexes in aws s3 st...

At a glance

The post asks how to store SimpleDirectoryReader indexes in AWS S3 storage. The comments provide a detailed answer, which includes the following steps:

1. Install the LlamaIndex package using pip.

2. Import the necessary modules and set up logging.

3. Load the environment variables and set up the S3 filesystem.

4. Use SimpleDirectoryReader to load the documents.

5. Create a VectorStoreIndex from the documents and save it to the S3 storage.

6. Load the index from the S3 storage using the StorageContext.

The community members also provide links to the SimpleIndexOnS3.ipynb notebook and the LlamaIndex documentation as additional resources.

ddhruva

@kapa.ai how to store simpledirectoryreader indexes in aws s3 storage

3 comments

kkapa.ai

Please stand by @dhruva as I search the knowledge base. I normally take about 20 seconds or so. :wave:

kkapa.ai

----------------------
To store indexes created using SimpleDirectoryReader in AWS S3 storage, you can follow these steps:

Install the LlamaIndex package if you haven't already. You can do this using pip:

Plain Text

!pip install llama-index

Import the necessary modules and set up logging:

Plain Text

import logging
import sys
import dotenv
import s3fs
import os
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    load_index_from_storage,
    StorageContext,
)

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Load your environment variables and set up your S3 filesystem:

Plain Text

dotenv.load_dotenv("../../../.env")

AWS_KEY = os.environ["AWS_ACCESS_KEY_ID"]
AWS_SECRET = os.environ["AWS_SECRET_ACCESS_KEY"]
R2_ACCOUNT_ID = os.environ["R2_ACCOUNT_ID"]

assert AWS_KEY is not None and AWS_KEY != ""

s3 = s3fs.S3FileSystem(
    key=AWS_KEY,
    secret=AWS_SECRET,
    endpoint_url=f"<https://{R2_ACCOUNT_ID}.r2.cloudflarestorage.com",>
    s3_additional_kwargs={"ACL": "public-read"},
)

Use SimpleDirectoryReader to load your documents:

Plain Text

documents = SimpleDirectoryReader("./data/paul_graham/").load_data()

Create an index from your documents:

Plain Text

index = VectorStoreIndex.from_documents(documents, fs=s3)

Save the index to your S3 storage:

Plain Text

index.set_index_id("vector_index")
index.storage_context.persist("llama-index/storage_demo", fs=s3)

You can then load the index from your S3 storage using the following code:

Plain Text

sc = StorageContext.from_defaults(
    persist_dir="llama-index/storage_demo", fs=s3
)
index2 = load_index_from_storage(sc, "vector_index")

Please replace "./data/paul_graham/" with the path to your directory, and "llama-index/storage_demo" with your desired S3 bucket name.

kkapa.ai

Sources:

SimpleIndexOnS3.ipynb ()
LlamaIndex Documentation ()

----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

how to store simpledirectoryreader indexes in aws s3 storage