Talk to Airflow - Build an AI Agent Using PydanticAI and Gemini 2.0
Create an AI agent with PydanticAI to interact with Airflow DAGs
Within this article, I share some of the basics to create a LLM-driven web-application, using various technologies, such as: Python, FastAPI, Pydantic, VertexAI and more. You will learn how to create such a project from the very beginning and get an overview of the underlying concepts, including Retrieval-Augmented Generation (RAG).
The best way to share this knowledge is through a practical example. Hence, I’ll use my project Gemini Movie Detectives to cover the various aspects. The project was created as part of the Google AI Hackathon 2024, which is still running while I am writing this.
Gemini Movie Detectives is a project aimed at leveraging the power of the Gemini Pro model via VertexAI to create an engaging quiz game using the latest movie data from The Movie Database (TMDB).
Part of the project was also to make it deployable with Docker and to create a live version. Try it yourself: movie-detectives.com. Keep in mind that this is a simple prototype, so there might be unexpected issues. Also, I had to add some limitations in order to control costs that might be generated by using GCP and VertexAI.
The project is fully open-source and is split into two separate repositories:
The focus of the article is the backend project and underlying concepts. It will therefore only briefly explain the frontend and its components.
In the following video, I also give an overview over the project and its components:
Growing up as a passionate gamer and now working as a Data Engineer, I’ve always been drawn to the intersection of gaming and data. With this project, I combined two of my greatest passions: gaming and data. Back in the 90’ I always enjoyed the video game series You Don’t Know Jack, a delightful blend of trivia and comedy that not only entertained but also taught me a thing or two. Generally, the usage of games for educational purposes is another concept that fascinates me.
In 2023, I organized a workshop to teach kids and young adults game development. They learned about mathematical concepts behind collision detection, yet they had fun as everything was framed in the context of gaming. It was eye-opening that gaming is not only a huge market but also holds a great potential for knowledge sharing.
With this project, called Movie Detectives, I aim to showcase the magic of Gemini, and AI in general, in crafting engaging trivia and educational games, but also how game design can profit from these technologies in general.
By feeding the Gemini LLM with accurate and up-to-date movie metadata, I could ensure the accuracy of the questions from Gemini. An important aspect, because without this Retrieval-Augmented Generation (RAG) methodology to enrich queries with real-time metadata, there’s a risk of propagating misinformation – a typical pitfall when using AI for this purpose.
Another game-changer lies in the modular prompt generation framework I’ve crafted using Jinja templates. It’s like having a Swiss Army knife for game design – effortlessly swapping show master personalities to tailor the game experience. And with the language module, translating the quiz into multiple languages is a breeze, eliminating the need for costly translation processes.
Taking that on a business perspective, it can be used to reach a much broader audience of customers, without the need of expensive translation processes.
From a business standpoint, this modularization opens doors to a wider customer base, transcending language barriers without breaking a sweat. And personally, I’ve experienced firsthand the transformative power of these modules. Switching from the default quiz master to the dad-joke-quiz-master was a riot – a nostalgic nod to the heyday of You Don’t Know Jack, and a testament to the versatility of this project.
Movie Detectives - Example: Santa Claus personality
Before we jump into details, let’s get an overview of how the application was built.
Tech Stack: 🚀 Backend
Tech Stack: 🖥️ Frontend
Essentially, the application fetches up-to-date movie metadata from an external API (TMDB), constructs a prompt based on different modules (personality, language, …), enriches this prompt with the metadata and that way, uses Gemini to initiate a movie quiz in which the user has to guess the correct title.
The backend infrastructure is built with FastAPI and Python, employing the Retrieval-Augmented Generation (RAG) methodology to enrich queries with real-time metadata. Utilizing Jinja templating, the backend modularizes prompt generation into base, personality, and data enhancement templates, enabling the generation of accurate and engaging quiz questions.
The frontend is powered by Vue 3 and Vite, supported by daisyUI and Tailwind CSS for efficient frontend development. Together, these tools provide users with a sleek and modern interface for seamless interaction with the backend.
In Movie Detectives, quiz answers are interpreted by the Language Model (LLM) once again, allowing for dynamic scoring and personalized responses. This showcases the potential of integrating LLM with RAG in game design and development, paving the way for truly individualized gaming experiences. Furthermore, it demonstrates the potential for creating engaging quiz trivia or educational games by involving LLM. Adding and changing personalities or languages is as easy as adding more Jinja template modules. With very little effort, this can change the full game experience, reducing the effort for developers.
Movie Detectives - System Overview
As can be seen in the overview, Retrieval-Augmented Generation (RAG) is one of the essential ideas of the backend. Let’s have a closer look at this particular paradigm.
In the realm of Large Language Models (LLM) and AI, one paradigm becoming more and more popular is Retrieval-Augmented Generation (RAG). But what does RAG entail, and how does it influence the landscape of AI development?
At its essence, RAG enhances LLM systems by incorporating external data to enrich their predictions. Which means, you pass relevant context to the LLM as an additional part of the prompt, but how do you find relevant context? Usually, this data can be automatically retrieved from a database with vector search or dedicated vector databases. Vector databases are especially useful, since they store data in a way, so that it can be queried for similar data quickly. The LLM then generates the output based on both, the query and the retrieved documents.
Picture this: you have an LLM capable of generating text based on a given prompt. RAG takes this a step further by infusing additional context from external sources, like up-to-date movie data, to enhance the relevance and accuracy of the generated text.
Let’s break down the key components of RAG:
While in the Gemini Movie Detectives project, the prompt is enhanced with external API data from The Movie Database, RAG typically involves the use of vector indexes to streamline this process. It is using much more complex documents as well as a much higher amount of data for enhancement. Thus, these indexes act like signposts, guiding the system to relevant external sources quickly.
In this project, it is therefore a mini version of RAG but showing the basic idea at least, demonstrating the power of external data to augment LLM capabilities.
In more general terms, RAG is a very important concept, especially when crafting trivia quizzes or educational games using LLMs like Gemini. This concept can avoid the risk of false positives, asking wrong questions, or misinterpreting answers from the users.
Here are some open-source projects that might be helpful when approaching RAG in one of your projects:
Of course, with the potential value of this approach for LLM-based applications, there are many more open- and close-source alternatives, but with these, you should be able to get your research on the topic started.
Now that the main concepts are clear, let’s have a closer look how the project was created and how dependencies are managed in general.
The three main tasks Poetry can help you with are: Build, Publish and Track. The idea is to have a deterministic way to manage dependencies, to share your project and to track dependency states.
Poetry also handles the creation of virtual environments for you. Per default, those are in a centralized folder within your system. However, if you prefer to have the virtual environment of project in the project folder, like I do, it is a simple config change:
With poetry new
you can then create a new Python project. It will create a virtual environment linking you systems default Python. If you combine this with pyenv, you get a flexible way to create projects using specific versions. Alternatively, you can also tell Poetry directly which Python version to use: poetry env use /full/path/to/python
.
Once you have a new project, you can use poetry add
to add dependencies to it.
With this, I created the project for Gemini Movie Detectives:
The metadata about your projects, including the dependencies with the respective versions, are stored in the poetry.toml
and poetry.lock
files. I added more dependencies later, which resulted in the following poetry.toml
for the project:
FastAPI is a Python framework that allows for rapid API development. Built on open standards, it offers a seamless experience without new syntax to learn. With automatic documentation generation, robust validation, and integrated security, FastAPI streamlines development while ensuring great performance.
Implementing the API for the Gemini Movie Detectives projects, I simply started from a Hello World application and extended it from there. Here is how to get started:
Assuming you also keep the virtual environment within the project folder as .venv/
and use uvicorn, this is how to start the API with the reload feature enabled, in order to test code changes without the need of a restart:
If you have not yet installed jq, I highly recommend doing it now. I might cover this wonderful JSON Swiss Army knife in a future article. This is how the response looks like:
From here, you can develop your API endpoints as needed. This is how the API endpoint implementation to start a movie quiz in Gemini Movie Detectives looks like for example:
Within this code, you can see already three of the main components of the backend:
tmdb_client
: A client I implemented using httpx
to fetch data from The Movie Database (TMDB).prompt_generator
: A class that helps to generate modular prompts based on Jinja templates.gemini_client
: A client to interact with the Gemini LLM via VertexAI in Google Cloud.We will look at these components in detail later, but first some more helpful insights regarding the usage of FastAPI.
FastAPI makes it really easy to define the HTTP method and data to be transferred to the backend. For this particular function, I expect a POST
request as this creates a new quiz. This can be done with the post
decorator:
Also, I am expecting some data within the request sent as JSON in the body. In this case, I am expecting an instance of QuizConfig
as JSON. I simply defined QuizConfig
as a subclass of BaseModel
from Pydantic (will be covered later) and with that, I can pass it in the API function and FastAPI will do the rest:
Furthermore, you might notice two custom decorators:
These I implemented to reduce duplicate code. They wrap the API function to retry the function in case of errors and to introduce a global rate limit of how many movie quizzes can be started per day.
What I also liked personally is the error handling with FastAPI. You can simply raise a HTTPException
, give it the desired status code and the user will then receive a proper response, for example, if no movie could be found with a given configuration:
With this, you should have an overview of creating an API like the one for Gemini Movie Detectives with FastAPI. Keep in mind: all code is open-source, so feel free to have a look at the API repository on Github.
One of the main challenges with todays AI/ML projects is data quality. But that does not only apply to ETL/ELT pipelines, which prepare datasets to be used in model training or prediction, but also to the AI/ML application itself. Using Python for example usually enables Data Engineers and Scientist to get a reasonable result with little code but being (mostly) dynamically typed, Python lacks of data validation when used in a naive way.
That is why in this project, I combined FastAPI with Pydantic, a powerful data validation library for Python. The goal was to make the API lightweight but strict and strong, when it comes to data quality and validation. Instead of plain dictionaries for example, the Movie Detectives API strictly uses custom classes inherited from the BaseModel
provided by Pydantic. This is the configuration for a quiz for example:
This example illustrates, how not only correct type is ensured, but also further validation is applied to the actual values.
Furthermore, up-to-date Python features, like StrEnum
are used to distinguish certain types, like personalities:
Also, duplicate code is avoided by defining custom decorators. For example, the following decorator limits the number of quiz sessions today, to have control over GCP costs:
It is then simply applied to the related API function:
The combination of up-to-date Python features and libraries, such as FastAPI, Pydantic or Ruff makes the backend less verbose but still very stable and ensures a certain data quality, to ensure the LLM output has the expected quality.
The TMDB Client class is using httpx to perform requests against the TMDB API.
httpx
is a rising star in the world of Python libraries. While requests
has long been the go-to choice for making HTTP requests, httpx
offers a valid alternative. One of its key strengths is asynchronous functionality. httpx
allows you to write code that can handle multiple requests concurrently, potentially leading to significant performance improvements in applications that deal with a high volume of HTTP interactions. Additionally, httpx
aims for broad compatibility with requests
, making it easier for developers to pick it up.
In case of Gemini Movie Detectives, there are two main requests:
get_movies
: Get a list of random movies based on specific settings, like average number of votesget_movie_details
: Get details for a specific movie to be used in a quizIn order to reduce the amount of external requests, the latter one uses the lru_cache
decorator, which stands for “Least Recently Used cache”. It’s used to cache the results of function calls so that if the same inputs occur again, the function doesn’t have to recompute the result. Instead, it returns the cached result, which can significantly improve the performance of the program, especially for functions with expensive computations. In our case, we cache the details for 1024 movies, so if 2 players get the same movie, we do not need to make a request again:
Accessing data from The Movie Database (TMDB) is for free for non-commercial usage, you can simply generate an API key and start makeing requests.
Before Gemini via VertexAI can be used, you need a Google Cloud project with VertexAI enabled and a Service Account with sufficient access together with its JSON key file.
Create project
After creating a new project, navigate to APIs & Services –> Enable APIs and service –> search for VertexAI API –> Enable.
Enable API
To create a Service Account, navigate to IAM & Admin –> Service Accounts –> Create service account. Choose a proper name and go to the next step.
Create Service Account
Now ensure to assign the account the pre-defined role Vertex AI User.
Assign role
Finally you can generate and download the JSON key file by clicking on the new user –> Keys –> Add Key –> Create new key –> JSON. With this file, you are good to go.
Create JSON key file
Using Gemini from Google with Python via VertexAI starts by adding the necessary dependency to the project:
With that, you can import and initialize vertexai
with your JSON key file. Also you can load a model, like the newly released Gemini 1.5 Pro model, and start a chat session like this:
You can now use chat.send_message()
to send a prompt to the model. However, since you get the response in chunks of data, I recommend using a little helper function, so that you simply get the full response as one String:
A full example can then look like this:
Running this, Gemini gave me the following response:
You are awesome
I agree with Gemini:
Eres increíble
Another hint when using this: you can also configure the model generation by passing a configuration to the generation_config
parameter as part of the send_message
function. For example:
I am using this in Gemini Movie Detectives to set the temperature
to 0.5, which gave me best results. In this context temperature
means: how creative are the generated responses by Gemini. The value must be between 0.0 and 1.0, whereas closer to 1.0 means more creativity.
One of the main challenges apart from sending a prompt and receive the reply from Gemini is to parse the reply in order to extract the relevant information.
One learning from the project is:
Specify a format for Gemini, which does not rely on exact words but uses key symbols to separate information elements
For example, the question prompt for Gemini contains this instruction:
The naive approach would be, to parse the answer by looking for a line that starts with Question:
. However, if we use another language, like German, the reply would look like: Antwort:
.
Instead, focus on the structure and key symbols. Read the reply like this:
:
With this approach, the reply can be parsed language agnostic, and this is my implementation in the actual client:
In the future, the parsing of responses will become even easier. During the Google Cloud Next ‘24 conference, Google announced that Gemini 1.5 Pro is now publicly available and with that, they also announced some features including a JSON mode to have responses in JSON format. Checkout this article for more details.
Apart from that, I wrapped the Gemini client into a configurable class. You can find the full implementation open-source on Github.
The Prompt Generator is a class wich combines and renders Jinja2 template files to create a modular prompt.
There are two base templates: one for generating the question and one for evaluating the answer. Apart from that, there is a metadata template to enrich the prompt with up-to-date movie data. Furthermore, there are language and personality templates, organized in separate folders with a template file for each option.
Movie Detectives - Prompt Generator
Using Jinja2 allows to have advanced features like template inheritance, which is used for the metadata.
This makes it easy to extend this component, not only with more options for personalities and languages, but also to extract it into its own open-source project to make it available for other Gemini projects.
The Gemini Movie Detectives frontend is split into four main components and uses vue-router
to navigate between them.
The Home component simply displays the welcome message.
The Quiz component displays the quiz itself and talks to the API via fetch
. To create a quiz, it sends a POST request to api/quiz
with the desired settings. The backend is then selecting a random movie based on the user settings, creates the prompt with the modular prompt generator, uses Gemini to generate the question and hints and finally returns everything back to the component so that the quiz can be rendered.
Additionally, each quiz gets a session ID assigned in the backend and is stored in a limited LRU cache.
For debugging purposes, this component fetches data from the api/sessions
endpoint. This returns all active sessions from the cache.
This component displays statistics about the service. However, so far there is only one category of data displayed, which is the quiz limit. To limit the costs for VertexAI and GCP usage in general, there is a daily limit of quiz sessions, which will reset with the first quiz of the next day. Data is retrieved form the api/limit
endpoint.
Movie Detectives - Vue Components
Of course using the frontend is a nice way to interact with the application, but it is also possible to just use the API.
The following example shows how to start a quiz via the API using the Santa Claus / Christmas personality:
Movie Detectives - Example: Santa Claus personality
This example shows how to change the language for a quiz:
And this is how to answer to a quiz via an API call:
After I finished the basic project, adding more personalities and languages was so easy with the modular prompt approach, that I was impressed by the possibilities this opens up for game design and development. I could change this game from a pure educational game about movies, into a comedy trivia “You Don’t Know Jack”-like game within a minute by adding another personality.
Also, combining up-to-date Python functionality with validation libraries like Pydantic is very powerful and can be used to ensure good data quality for LLM input.
And there you have it, folks! You’re now equipped to craft your own LLM-powered web application.
Feeling inspired but need a starting point? Check out the open-source code for the Gemini Movie Detectives project:
The future of AI-powered applications is bright, and you’re holding the paintbrush! Let’s go make something remarkable. And if you need a break, feel free to try https://movie-detectives.com/.