Hi, this is a more general question -

At a glance

The community members discuss the importance of type annotations in the Python/AI ecosystem. The original poster comes from a Java/Scala background and finds the lack of type strictness in Python libraries like llama_index to be a challenge, especially as their app grows in users and stability becomes more important. Some community members agree, noting the benefits of type annotations for maintainable code, while others point out the flexibility of Python and the use of tools like pydantic and linters to manage type-related issues. Suggestions include sending pull requests to libraries to add type annotations, using tools like ruff, and navigating the tooling challenges around type inference and imports. There is no explicitly marked answer, but the discussion provides various perspectives on the topic.

ŁŁukasz

Hi, this is a more general question - does anyone care about typing in the python/AI world at all? As my app is getting more users, stability is becoming increasingly important to me. Yet, I see that most libraries do not bother to provide explicit type definitions, llama_index among them. This leaves me with the option to either stub the types for each library (plenty of work) or ignore them altogether. Am I alone in this? I come from a Java/Scala background and this lack of type strictness is driving me nuts. Typescript seems miles ahead in terms of enabling maintainable code. I'm looking for general recommendations on how to approach this

Attachment

23 comments

ssansmoraxz

I care. Mostly I just use pydantic and annote the parameter and return types. But I don't want to break my head around types (unlike java where it plain sucks if dealing with non-String non-primitive types).

ŁŁukasz

Yeah I agree it's a double-edges sword - when you need the flexibility Java sucks. Typescript has good balance imo

ŁŁukasz

When you're defining new methods/objects that's all fine, but how do I solve the case I posted above?

ssansmoraxz

Typescript is just an abstraction though. There are no types in JS world.

ŁŁukasz

Afaik you can only define stubs

ŁŁukasz

ofc, but for practical purposes it does get the job done

ssansmoraxz

or send PRs to corresponding libraries. As far as I see, most of the types are annoted in the method definations for langchain and llama-index.

ssansmoraxz

of course old libraries like pandas may have types missing in some places.

ssansmoraxz

Also use some good linter and enable type checking rules. Should capture most issues in the codebase.

ŁŁukasz

Yeah I'm currently trying to navigate a solid mypy/pylance setup

ssansmoraxz

Have you tried ruff yet?

ŁŁukasz

Nope, will check it out

ŁŁukasz

Looks nice, thanks

ŁŁukasz

Holy fuck ruff is 🔥

LLogan M

@Łukasz as_query_engine is a gigantic catchall and wrapper around every retriever + retriever query engine.

Listing all the args here would a) be like a list of 20+ and b) need to be written for ever index

Is it a great design? Nah not really. But the actual retirvers and query engines themselves are fully typed, as well as the rest of the library

LLogan M

Also yes, ruff is great, we use it in our CI

ŁŁukasz

@Logan M Indeed, I've figured it out for now - part of my the problem is that the tooling doesn't get the job done for me, ex. infer type from usage - as far as I know VsCode is too dumb to achieve this (haven't tried pycharm in a while). Same goes for importing - unless I explicitly inspect the source code & write the import myself, there's plenty of types that aren't "importable" oob. I've no idea why that is but that's a different story

ŁŁukasz

Ex. I've had to manually import both of these at the top:

Plain Text

from llama_index.core.base.response.schema import RESPONSE_TYPE, Response
from utils.promptlayer import PromptName, get_prompt_from_promptlayer

logger = logging.getLogger(__name__)


async def run_prompt(nodes, name: PromptName) -> str:
    logger.info(f"Running prompt {name}")
    start_time = time.time()

    index = VectorStoreIndex(nodes=nodes, show_progress=True)
    query_engine = index.as_query_engine()
    prompt = get_prompt_from_promptlayer(name)
    result: RESPONSE_TYPE = await query_engine.aquery(prompt)
    typed_response = cast(Response, result)

ssansmoraxz

Yes, the imports are kinda messy.
I do like the attempts to clean up the code base in 0.10x releases, but more often than not I use the GitHub or Google search for what I intend.

ssansmoraxz

I do wish copilot would adapt to these changes faster but unless they rearchitect to search from local code in some form rather than pre-training code, these issues will remain for the time being.

ŁŁukasz

I thought it was just me being clunky with python dev but if that's just how things are I can live with that 😄 Thanks for the feedback guys

LLogan M

yea I 100% use ctrl-f with the source code open lol

LLogan M

Glad you like the release! It was a ton of work (and still some kinks). Hoping this will make the codebase more scalable 🙂

Add a reply

Find answers from the community

Hi, this is a more general question -