DAY 04 · MODULE 2

Now, AI that creates. ✨

Module 1 gave you the foundation — now the exciting part. How AI creates — foundation models, prompts, tuning, and the agents that act for you.

Module 2 — 7 Chapters 🗺️

This is the official Generative AI course, distilled. From the models underneath, all the way to building AI agents.

✨

1 · What is Gen AI

Creates & acts · the stack

💎

2 · Foundation Models

Gemini family & friends

🛠️

3 · Idea to App

Vertex AI Studio · prompts

🎛️

4 · Prompt Engineering

Temperature, Top-K, Top-P

🔗

5 · Grounding & Tuning

RAG & fine-tuning

🤖

6 · AI Agents

Chatbot → agent → agentic

🏗️

7 · Build Agents on Google Cloud

Gemini Enterprise · Agent Builder · ADK

Seven chapters, but they connect. Once you get foundation models, the rest just stacks on top. Jom.

Chapter 1 · What is Gen AI

Two Things, Not One ✨

We met this in Module 1 — generative AI creates and acts. Let's open that up.

🎨 It creates content

multimodal output

Text, code, images, speech, video, even 3D
From a prompt — a question or instruction
Summaries, reports, Q&A chatbots, images & video

🦾 It takes action

through AI agents

Autonomous, goal-oriented action on your behalf
Automate workflows, book travel, schedule
We reach agents at the end of this module

Definition to keepGen AI is a type of AI that generates content and takes action for you.

The Gen AI Stack — 3 Layers 🏛️

Just like the AI architecture in Module 1, but now for generative AI. Bottom to top:

· Gen AI Applications Gemini Enterprise · NotebookLM · no-code
· Gen AI Development Vertex AI Studio · Agent Builder · Model Garden
· Foundation Models the intelligence — language, image, video

Exam fact: the Transformer (Google, 2017) is the architecture behind every modern Gen AI app; Gemini (2023) made it multimodal.

Foundation models are the brain at the bottom. Development tools in the middle. Ready-made apps on top. The whole module follows this stack, bottom up.

Chapter 2 · Foundation Models

What Is a Foundation Model? 💎

The backbone of every Gen AI app. Here's the idea in plain terms:

📚

Trained on a LOT

Learns from massive existing text, images, video. That learning = training.

🔢

Huge parameters

From millions → trillions. More parameters = more capacity to learn.

🧩

General-purpose

Pre-trained for broad use, then specialised later.

One lineA foundation model is a large model — lots of parameters, lots of training data, lots of compute — that becomes the intelligence behind Gen AI apps.

Google's Models — Tap to Explore 💎

🕹️Tap each card to open it. Goal: know which model to reach for — Gemini Pro/Flash, or a specialist. (Explore — no points.)

💎 Gemini Pro

GENERAL · tap to expand

The most capable Gemini. For complex tasks needing advanced reasoning.

⚡ Gemini Flash

GENERAL · tap to expand

Optimised for speed & low latency — high-volume, real-time apps like chatbots.

🪶 Gemini Flash-Lite

GENERAL · tap to expand

The most cost-effective — high-volume tasks where time isn't critical (batch translation, summarising).

🖼️ Imagen / 🎬 Veo

SPECIALTY · tap to expand

Imagen = image generation. Veo = video. Also Chirp (voice), Lyria (music).

🔎 Embeddings

SPECIALTY · tap to expand

Turns content into vectors for semantic search & data representation. Powers RAG (Chapter 5).

💡 The big idea

tap to expand

Because Gemini is multimodal, it can often replace several specialists — handling text, image & video in one model.

Multimodal · Pre-trained vs Fine-tuned 🧠

Multimodal = one model that takes in and creates across text, images, audio & video. e.g. show Gemini a cookie photo → it generates a recipe video.

🎓 Pre-trained

broad foundation

Trained on a huge general dataset
Like 12 years of school — literate, general
Horizontal AI — works across industries

🩺 Fine-tuned

specialised

Further trained on a small, field-specific dataset
Like medical school — a specialist
Vertical AI — retail, finance, healthcare

Same analogy as a doctor: general schooling first (pre-trained), then specialist training (fine-tuned). We go deeper on tuning in Chapter 5.

Try It · Chapter 2

Match the Model to the Job 🧩

🕹️Drag each task onto the model you'd use. Goal: match the job to Pro, Flash, or a specialist. Correct → +5⭐.

Complex task needing deep reasoning

Real-time chatbot, high volume

Generate a product image

Create a short video

Power a semantic search

💎 Gemini Pro

⚡ Gemini Flash

🖼️ Imagen

🎬 Veo

🔎 Embeddings

Try it yourself first, ya. Drag each card to where you think it belongs.

Chapter 3 · Idea to App

Meet Bea, Ann & Ian 👩‍💼👨‍💻

Three people at Cymbal Insurance — our guides for the rest of the module.

👩‍💼

Bea · Analyst

No tech background. Wants to prototype an idea fast.

👩‍💻

Ann · AI Developer

Wants to design & manage prompts.

👨‍🔧

Ian · ML Engineer

Wants to deploy & fine-tune at scale.

Their gateway: Vertex AI StudioOne workshop to test prompts, tune models with your own data, ground with real info, and deploy — in low-code or even no-code.

Try It · Chapter 3

Anatomy of a Prompt — Tap Each Part 🧪

🕹️Tap each highlighted line → which prompt part it is. Goal: see what a strong prompt is built from — Task, Context, Examples. Each new part → +2⭐.

You are a business analyst at Cymbal Insurance. Conduct a housing risk assessment for southern Los Angeles. Rate each risk 1–5, classify by type, and follow this report template…

Tap a highlighted line above to see which prompt component it is.

Task is a must. Context and examples are optional but powerful. No examples = zero-shot; with examples = few-shot.

Design vs Engineering · Idea → App 🛠️

✍️ Prompt Design

writing one good prompt

Crafting a prompt to get the response you want
Be direct & specific · use structure · iterate

🔁 Prompt Engineering

the whole iterative loop

Designing, refining & optimising prompts over time
Explore few-shot, chain-of-thought, RAG

The magic momentIn Vertex AI Studio, Bea & Ann click Build with Code → Deploy as App, and a working web app is generated. Idea → app, just like that.

Chapter 4 · Prompt Engineering

The Knobs You Can Turn 🎛️

After the prompt, you tune how the model picks its words. Four settings:

🤖

Model

Pick the right one — Gemini Flash/Pro, or specialists. Vertex AI Studio even hosts Claude, Llama, GPT.

🌡️

Temperature

Controls randomness. Low = safe & typical. High = creative & unusual.

#️⃣

Top-K

Pick randomly from the K most-likely words. K=2 → choose from the top 2.

🎯

Top-P

Pick from the smallest set of words whose probabilities add up to P.

You usually don't fuss with Top-K and Top-P. Temperature is the one you'll actually reach for most.

Try It · Chapter 4

Feel the Temperature 🌡️

🕹️Drag the temperature slider — the next word shifts from safe “flowers” (low) to wild “bugs” (high). Goal: feel how temperature trades predictability for creativity. Touch both ends → +5⭐.

The garden was full of beautiful flowers.

❄️ 0.0 2.0 🔥

—

Low temperature → typical, safe answers (great for Q&A). High → creative, unexpected (great for brainstorming).

Evaluate & refine in Vertex AI Studio: compare prompts side by side, add your own ground truth answer, and save reusable prompt templates with variables — like a function in plain language.

Chapter 5 · Grounding & Tuning

Keeping Answers Accurate 🔗

Foundation models are pre-trained — their knowledge can be outdated. Two fixes:

🔗 Grounding

the WHAT

Connect the model to trusted, current data
Answers get verified against the latest info
Ground with Google Search or your own data

📚 RAG

the HOW

Retrieval-Augmented Generation
The method that implements grounding
Retrieves relevant data, then generates

Remember it as a pairGrounding = the what. RAG = the how.

The Tuning Spectrum 🎚️

Want to improve the model itself? Options run from light to heavy:

LIGHTEST

Prompt Design

Guide with words. Doesn't change the model.

→

MIDDLE

Parameter-Efficient
(Adapter) Tuning

Updates a small subset of parameters.

→

HEAVIEST

Full Fine-Tuning

Updates ALL parameters. Best quality, most compute.

Grounding vs fine-tuning: fine-tuning refines the model's internal knowledge & skill; grounding adds external, real-time facts. Different jobs.

Supervised Fine-Tuning 🏷️

The tuning Vertex AI supports today. You teach the model a new skill with labelled examples.

🏷️

Labelled pairs

Hundreds of input → desired output examples, in a JSONL file.

🎯

Good for

Classification, summarising, extraction, chat — well-defined tasks.

📦

Result

A new tuned model in the Model Registry, ready to deploy.

{"input": "The room was terrible. It needs major rework.", "output": "negative"} {"input": "Great interior layout, architecturally interesting.", "output": "positive"}

Each row = a prompt and the answer you want. The model learns to copy that behaviour. Same idea as supervised learning in Module 1 — just on a foundation model.

Chapter 6 · AI Agents

From Chatbot to Agentic AI 🤖

Gen AI is evolving. A chatbot answers. An agent acts. Agentic AI coordinates many agents.

ASK

Chatbot

You prompt, it answers. Conversational.

→

ACT

AI Agent

Connects to tools & data, takes action, observes feedback.

→

COORDINATE

Agentic AI

Multiple agents reasoning together on multi-step tasks.

Why agents matterFoundation models alone can't reach your internal docs or other apps. An agent connects to them and takes action — that's the value it adds.

What's Inside an Agent? 🧩

Three components working together — picture a body:

🧠

Model — the brain

The reasoning centre. Thinks, plans, decides the steps to reach the goal.

🦾

Tools — hands & senses

APIs (GET, POST…) to act on the world — send an email, fetch the weather.

🔄

Orchestration — nervous system

The cyclical loop: take the decision → use a tool → feed the result back.

An AI agent =Model + Tools + Orchestration, all coordinated toward a goal.

Try It · Chapter 6

Match the Agent Component 🧩

🕹️Drag each role onto its component — Model, Tools, or Orchestration. Goal: lock in brain vs hands vs nervous-system. Correct → +5⭐.

Reasoning & decision centre

Calls an API to send an email

Manages the cycle of actions

Fetches live weather data

Plans the steps to reach the goal

Carries feedback back to the brain

🧠 Model

🦾 Tools

🔄 Orchestration

Try it first, ya — brain, hands, or nervous system?

Chapter 7 · Build Agents on Google Cloud

The Agent Tool Stack 🏗️

Same three layers as before — now for building agents. Bottom to top:

· Applications Gemini Enterprise · Customer Engagement Suite · no-code
· Development Vertex AI Agent Builder · ADK · Agent Engine · Agent Garden
· Foundation Models Vertex AI Model Garden — the agent's brain

Model Garden = access to Google's & third-party models (the brain). Agent Builder = build agents end-to-end. Gemini Enterprise = no-code agents for business users.

Which Tool? Ease vs Flexibility 🧭

Pick by how much code you want to write versus how much control you need.

Tool	Code	Best for
Gemini Enterprise	No-code	Business users — ready-to-use, minimal setup
Agent Garden + Builder	Low-code	Analysts — start from a sample & customise
ADK (Agent Dev Kit)	Pro-code	Engineers — full control, deep integrations

NotebookLM — a standout agent in this stack: your personal AI research assistant. Add sources (PDFs, Drive, YouTube), then chat, summarise, or generate a podcast & study guide. Free for everyone.

More ease → less flexibility. More flexibility → more code. Choose by who's building and how custom it must be.

Knowledge Check · Module 2

Lock It In 🧪

Q1What makes a model a "foundation model"?

It only does images

Large — many parameters, huge data, lots of compute

It runs on your laptop

It needs no training

Q2Which architecture (Google, 2017) underpins modern Gen AI?

BigQuery

TPU

K-means

The Transformer

Q3The iterative loop of refining & optimising prompts is:

Prompt design

Prompt engineering

Deployment

Clustering

Q4A reusable prompt with replaceable variables is a:

Ground truth

Prompt template

Foundation model

Endpoint

Q5Which tuning updates ALL the model's parameters?

Parameter-efficient tuning

Full fine-tuning

Prompt design

Grounding

Q6Which component is the agent's "brain"?

The model

The tools

The orchestration layer

The database

Q7A business user wants a no-code agent. Best choice?

Gemini Enterprise

ADK

Full fine-tuning

Compute Engine

Q8As you move from Gemini Enterprise → ADK, you gain:

More flexibility, but write more code

Less control

No-code simplicity

Fewer options

Module 2 — You Can Now Explain… ✅

Creates + Acts Transformer (2017) Gen AI Stack Foundation Model

Gemini Pro / Flash Multimodal Pre-trained vs Fine-tuned Task · Context · Examples

Zero / Few-shot Temperature / Top-K / Top-P Grounding & RAG Fine-tuning

Model + Tools + Orchestration Chatbot → Agent → Agentic Gemini Enterprise · ADK

Finish Module 2 in 2 stepsDo the hands-on lab first — then take the graded quiz.

🧪 Step 1 · Lab — Gemini Multimodal with Vertex AI Studio

Build an app from a prompt, apply prompt best practices, and generate multimodal media. Open the lab →

📝 Step 2 · Official Module 2 Quiz →

After the lab, take the graded quiz on Skills Boost. skills.google › course 593 › quiz 617895

From models to agents — you've got it. ✨

MODULE 2 COMPLETE

Next: AI Development Options. 🛤️

You've seen how Gen AI is built. Next we compare the four ways to build any AI — pre-trained APIs, BigQuery ML, AutoML, and custom training — and how to choose.

🛤️

M3 · AI Dev Options

Start now →

🔄

M4 · AI Dev Workflow

Coming soon

🏅

Then: the exam

You're getting there

Now, AI that creates. ✨

Module 2 — 7 Chapters 🗺️

1 · What is Gen AI

2 · Foundation Models

3 · Idea to App

4 · Prompt Engineering

5 · Grounding & Tuning

6 · AI Agents

7 · Build Agents on Google Cloud

Two Things, Not One ✨

🎨 It creates content

🦾 It takes action

The Gen AI Stack — 3 Layers 🏛️

What Is a Foundation Model? 💎

Trained on a LOT

Huge parameters

General-purpose

Google's Models — Tap to Explore 💎

💎 Gemini Pro

⚡ Gemini Flash

🪶 Gemini Flash-Lite

🖼️ Imagen / 🎬 Veo

🔎 Embeddings

💡 The big idea

Multimodal · Pre-trained vs Fine-tuned 🧠

🎓 Pre-trained

🩺 Fine-tuned

Match the Model to the Job 🧩

Meet Bea, Ann & Ian 👩‍💼👨‍💻

Bea · Analyst

Ann · AI Developer

Ian · ML Engineer

Anatomy of a Prompt — Tap Each Part 🧪

Design vs Engineering · Idea → App 🛠️

✍️ Prompt Design

🔁 Prompt Engineering

The Knobs You Can Turn 🎛️

Model

Temperature

Top-K

Top-P

Feel the Temperature 🌡️

Keeping Answers Accurate 🔗

🔗 Grounding

📚 RAG

The Tuning Spectrum 🎚️

LIGHTEST

Prompt Design

MIDDLE

Parameter-Efficient(Adapter) Tuning

HEAVIEST

Full Fine-Tuning

Supervised Fine-Tuning 🏷️

Labelled pairs

Good for

Result

From Chatbot to Agentic AI 🤖

ASK

Chatbot

ACT

AI Agent

COORDINATE

Agentic AI

What's Inside an Agent? 🧩

Model — the brain

Tools — hands & senses

Orchestration — nervous system

Match the Agent Component 🧩

The Agent Tool Stack 🏗️

Which Tool? Ease vs Flexibility 🧭

Lock It In 🧪

Q1What makes a model a "foundation model"?

Q2Which architecture (Google, 2017) underpins modern Gen AI?

Q3The iterative loop of refining & optimising prompts is:

Q4A reusable prompt with replaceable variables is a:

Q5Which tuning updates ALL the model's parameters?

Q6Which component is the agent's "brain"?

Q7A business user wants a no-code agent. Best choice?

Q8As you move from Gemini Enterprise → ADK, you gain:

Module 2 — You Can Now Explain… ✅

Parameter-Efficient
(Adapter) Tuning