Hey,

At a glance

The community member is experimenting with "Replicate" to compare with the OpenAI 3.5 turbo for their use case of providing a knowledge base and responding to queries without prior knowledge. The comments suggest that LlamaIndex may provide LLM compatibility for different tasks, and OpenAI may be the best option. However, the open-source model Zephyr is also discussed, but the response/answer seems to be trimmed. The community members discuss potential reasons for this issue, such as token consumption, and suggest trying to interact with the LLM directly to see if it's working fine.

Useful resources

PPunkbit

Hey,

I'm experimenting with "Replicate" to compare with the OpenAI 3.5 turbo, which seems quite reasonable. My use case is to provide a knowledge base and no prior knowledge to respond to queries.

Which model category should I look into to have the best experience for my use case?

Thank you!

10 comments

WWhiteFang_Jr

LlamaIndex provides LLM compatibility for performance on different tasks, You can check the LLM that suits your need.

https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#llm-compatibility-tracking

PPunkbit

@WhiteFang_Jr thanks for checking! Seems that openai is the best option it appears

WWhiteFang_Jr

Yeah, But Opensource model like Zephyr https://colab.research.google.com/drive/1UoPcoiA5EOBghxWKWduQhChliMHxla7U?usp=sharing
Is also looking well. It will not be as good as OpenAI though

PPunkbit

@WhiteFang_Jr did a quick test for Zephyr and find that the response/answer is trimmed.

PPunkbit

"...This command will download the required files and perform the"

It ends at "the" for some odd reason

PPunkbit

@WhiteFang_Jr do you have any idea why that'd be?

Plain Text

from typing import Union
from fastapi import FastAPI
from llama_index import SummaryIndex
from llama_hub.web.sitemap.base import SitemapReader
from pathlib import Path
from llama_index import download_loader, ServiceContext
from llama_index.prompts import PromptTemplate
from llama_index.llms import Replicate

app = FastAPI()
loader = SitemapReader()

llm = Replicate(
    model="tomasmcm/zephyr-7b-beta:961cd6665b811d0c43c0b9488b6dfa85ff5c7bfb875e93b4533e4c7f96c7c526"
)
service_context = ServiceContext.from_defaults(llm=llm)

MarkdownReader = download_loader("MarkdownReader")
loader = MarkdownReader()

template = (
    "We have provided knowledge below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given the provided knowledge and not prior knowledge,"
    "answer the query, including the commands and the documentation URL,\n"
    "the answer should only contain accurate information from provided knowledge only\n"
    "If no answer, ask to visit the main blog and documentation website\n"
    "The query is: {query_str}\n"
)
qa_template = PromptTemplate(template)

@app.get("/ping")
def read_root():
  return "pong"

@app.get("/query")
def query(question: Union[str, None] = None):
  documents = loader.load_data(file=Path("./knowledge.md"))
  index = SummaryIndex.from_documents(documents, service_context=service_context)
  query_engine = index.as_query_engine()
  query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": qa_template}
  )
  answer = query_engine.query(question)

  print(answer)

  return { "answer": str(answer )}

WWhiteFang_Jr

Maybe total tokens are getting consumed before it is able to generate the full answer

PPunkbit

I see! I'm actually using the free version just for resting, maybe that's why?

WWhiteFang_Jr

You can try interacting with the llm directly to see if it is working fine or not.

Plain Text

print(llm.complete("Hey how are you?"))

PPunkbit

@WhiteFang_Jr thanks, I'll check it out

Add a reply

Find answers from the community

Hey,