----------------------
To extract data from GitHub using LlamaIndex, you can use the
GithubRepositoryReader
class. Here's a step-by-step guide:
- Install the necessary packages. You'll need
llama-index
, nest_asyncio
, and httpx
.
!pip install llama-index
!pip install nest_asyncio httpx
import nest_asyncio
nest_asyncio.apply()
- Import the necessary modules.
from llama_index import VectorStoreIndex, GithubRepositoryReader
import os
- Set your GitHub token and OpenAI API key as environment variables.
%env OPENAI_API_KEY=your_openai_api_key
%env GITHUB_TOKEN=your_github_token
- Use the
GithubRepositoryReader
to load data from the GitHub repository.
github_token = os.environ.get("GITHUB_TOKEN")
owner = "jerryjliu"
repo = "llama_index"
branch = "main"
documents = GithubRepositoryReader(
github_token=github_token,
owner=owner,
repo=repo,
use_parser=False,
verbose=False,
ignore_directories=["examples"],
).load_data(branch=branch)
- Create an index from the documents.
index = VectorStoreIndex.from_documents(documents)
- You can now use the
query_engine
to query the index.
query_engine = index.as_query_engine()
response = query_engine.query(
"Your query here",
verbose=True,
)
- Display the response.
from IPython.display import Markdown, display
display(Markdown(f"<b>{response}</b>"))
Please replace
your_openai_api_key
and
your_github_token
with your actual OpenAI API key and GitHub token.