----------------------
To store indexes created using
SimpleDirectoryReader
in AWS S3 storage, you can follow these steps:
- Install the LlamaIndex package if you haven't already. You can do this using pip:
- Import the necessary modules and set up logging:
import logging
import sys
import dotenv
import s3fs
import os
from llama_index import (
VectorStoreIndex,
SimpleDirectoryReader,
load_index_from_storage,
StorageContext,
)
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
- Load your environment variables and set up your S3 filesystem:
dotenv.load_dotenv("../../../.env")
AWS_KEY = os.environ["AWS_ACCESS_KEY_ID"]
AWS_SECRET = os.environ["AWS_SECRET_ACCESS_KEY"]
R2_ACCOUNT_ID = os.environ["R2_ACCOUNT_ID"]
assert AWS_KEY is not None and AWS_KEY != ""
s3 = s3fs.S3FileSystem(
key=AWS_KEY,
secret=AWS_SECRET,
endpoint_url=f"<https://{R2_ACCOUNT_ID}.r2.cloudflarestorage.com",>
s3_additional_kwargs={"ACL": "public-read"},
)
- Use
SimpleDirectoryReader
to load your documents:
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
- Create an index from your documents:
index = VectorStoreIndex.from_documents(documents, fs=s3)
- Save the index to your S3 storage:
index.set_index_id("vector_index")
index.storage_context.persist("llama-index/storage_demo", fs=s3)
- You can then load the index from your S3 storage using the following code:
sc = StorageContext.from_defaults(
persist_dir="llama-index/storage_demo", fs=s3
)
index2 = load_index_from_storage(sc, "vector_index")
Please replace
"./data/paul_graham/"
with the path to your directory, and
"llama-index/storage_demo"
with your desired S3 bucket name.