Find answers from the community

Updated last year

how to extract data from github?

@kapa.ai how to extract data from github?
k
3 comments
Please stand by @Matthews as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To extract data from GitHub using LlamaIndex, you can use the GithubRepositoryReader class. Here's a step-by-step guide:

  1. Install the necessary packages. You'll need llama-index, nest_asyncio, and httpx.
Plain Text
!pip install llama-index
!pip install nest_asyncio httpx
import nest_asyncio
nest_asyncio.apply()


  1. Import the necessary modules.
Plain Text
from llama_index import VectorStoreIndex, GithubRepositoryReader
import os


  1. Set your GitHub token and OpenAI API key as environment variables.
Plain Text
%env OPENAI_API_KEY=your_openai_api_key
%env GITHUB_TOKEN=your_github_token


  1. Use the GithubRepositoryReader to load data from the GitHub repository.
Plain Text
github_token = os.environ.get("GITHUB_TOKEN")
owner = "jerryjliu"
repo = "llama_index"
branch = "main"

documents = GithubRepositoryReader(
    github_token=github_token,
    owner=owner,
    repo=repo,
    use_parser=False,
    verbose=False,
    ignore_directories=["examples"],
).load_data(branch=branch)


  1. Create an index from the documents.
Plain Text
index = VectorStoreIndex.from_documents(documents)


  1. You can now use the query_engine to query the index.
Plain Text
query_engine = index.as_query_engine()
response = query_engine.query(
    "Your query here",
    verbose=True,
)


  1. Display the response.
Plain Text
from IPython.display import Markdown, display
display(Markdown(f"<b>{response}</b>"))


Please replace your_openai_api_key and your_github_token with your actual OpenAI API key and GitHub token.
You can find more details in the GithubRepositoryReaderDemo notebook () on GitHub.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord