RAGy: A simple RAG framework for Python

Mar 12, 2026

Motivation

RAG (Retrieval-Augmented Generation) is a powerful way to inject our own data to LLMs.

Think about this. OpenAI's latest model probably was not trained using your company's data. So, when you ask about "What does the company's operations manual say I must do in this situation?" you'll get a big "I don't know".

RAG is a method to ingest additional data into the LLM so the responses can be more relevant to our context as a company or individual.

But RAG is not too easy. To implement a good RAG system you must consider:

  • Embeddings
  • Chunking
  • Overlapping
  • Vector databases
  • Raw document source
  • Caching embeddings to reduce $$$

And many more things.

That's why I built RAGy.

RAGy in action

RAGy is a simple RAG framework for Python. So simple that you can implement your own RAG in ~30 lines of code.

Don't believe me? Let's check it.

First, you need to install it to your Python project.

pip install ragy

Then, you can use it.

from ragy.rag import RAG
from ragy.reasoning import OpenAIEmbeddingModel, OpenAIGPTEngine
from ragy.rawdoc import DirectoryRawDocumentRetriever
from ragy.vector import ChromaVectorStore

# Create a RAG interface with the necessary components
system_prompt = """You are a helpful assistant that provides accurate and concise answers to user queries based on the retrieved documents."""

embedding_model = OpenAIEmbeddingModel(model='text-embedding-3-small')
raw_document_retriever = DirectoryRawDocumentRetriever(dir='./docs')
vector_store = ChromaVectorStore(path="./chroma", collection_name='my_collection')
ai_engine = OpenAIGPTEngine(model='gpt-5.2')

rag = RAG(
    system_prompt=system_prompt,
    embedding_model=embedding_model,
    raw_document_retriever=raw_document_retriever,
    vector_store=vector_store,
    ai_engine=ai_engine
)

# Use the RAG interface to ingest documents into the vector store
rag.ingest(chunk_size=512, chunk_overlap=128)

# Use the RAG interface to generate a response to a query
response = rag.generate('What is the capital of France?')
print(response)

And that's it!

Simple, as it should be.

Next steps

This is a new project and there's a lot of stuff to improve.

In the short term I want to:

  • Add built-in support for many other vector databases, raw document sources and AI engines
  • Cache for embeddings and responses using similarity detection
  • Toolkit for LLM evals (yes, you must evaluate your models with datasets before shipping them to production)

RAGy is open source. You can find the repository on https://github.com/mvrcoag/ragy.

Contributions to RAGy are welcome, except for full AI generated ones. Why? because of this.

Keep in touch for the updates on this new framework.

-- Marco AG