USING AI

AI Image and Video Creation

From patient education to practice marketing—visual AI tools that work, how to use them effectively, and the limitations you need to understand.

~30 min read Practical guide

Core Question

How can AI-generated images and videos enhance patient education, practice communication, and medical teaching—while avoiding the pitfalls unique to visual content?

Introduction: Why Visual AI Matters in Healthcare

In our previous modules, we explored how large language models process text, the importance of effective prompting, and how clinical decision support tools augment your practice. Now we turn to something that might feel more surprising: the ability to create images and videos from simple text descriptions.

You might reasonably ask, "Why would a physician need to generate images?" The answer spans more territory than you might expect:

Patient education materials explaining conditions and procedures
Practice marketing assets—logos, social media graphics, presentation visuals
Educational content for learners
Conceptual illustrations for research presentations
Visual aids during patient conversations

The technology has matured remarkably over the past year. What once produced uncanny, obviously artificial images now generates visuals that can be genuinely useful—sometimes indistinguishable from professional photography or illustration. And video generation, which seemed like science fiction just two years ago, now produces clips that can convey motion, demonstrate concepts, and tell visual stories.

This module covers the leading tools available today, how to use them effectively, and—critically—the limitations and ethical considerations that should guide your use. We'll focus on practical applications relevant to clinical practice and medical education, with hands-on prompt examples you can try immediately.

Prioritize This!

If you want to start creating AI images and videos today, Google Gemini with Nano Banana is your best starting point. It's free, generates the most accurate text in images (critical for patient education), and now powers video generation through Veo 3.1. Open gemini.google.com and try: "Create a simple diagram showing how an inhaler delivers medication to the lungs, patient education style, with labels pointing to key structures."

For finished materials (flyers, social posts, handouts), pair Gemini with Canva—Canva now integrates Nano Banana Pro directly, so you can generate and design in one place.

Building on What We've Learned

Before diving into visual AI, let's connect this to the concepts we've established throughout this course.

The Prompting Principles Still Apply

Remember our analogy between AI prompting and taking a patient history? The same principles transfer directly to visual generation. A vague prompt yields vague results. Specificity and context matter enormously.

Consider the parallel:

Poor History

"The patient has chest pain."

Poor Image Prompt

"Make a picture of a heart."

Rich History

"67-year-old male with sudden-onset substernal chest pressure radiating to left arm, associated diaphoresis, onset 2 hours ago while at rest, history of hypertension and hyperlipidemia, currently on lisinopril and atorvastatin."

Rich Image Prompt

"A detailed anatomical cross-section illustration of a human heart showing the coronary arteries, educational medical illustration style, clean white background, labeled structures, suitable for patient education materials, professional medical textbook quality."

The specificity you bring to a prompt directly shapes the quality and usefulness of the output.

Models Are Not Artists—They're Pattern Synthesizers

Just as language models don't "understand" in the human sense, image generators don't "see" or "imagine." They've learned statistical patterns from vast image-text datasets. When you describe something, the model generates pixels that match the patterns associated with your description.

This has practical implications. The models excel at generating images that resemble their training data. They struggle with novel combinations, unusual perspectives, or highly specific technical accuracy. A request for a "cross-section of the brachial plexus" will produce something anatomically plausible but potentially inaccurate in details—much like how a language model might generate confident-sounding but subtly wrong medical information.

Verification Remains Essential

Critical Reminder

We've emphasized throughout this course that AI outputs require human verification. This applies doubly to visual content, especially anything medical or scientific. An AI-generated anatomical diagram might look convincing while containing errors that could mislead patients or learners. Always review generated visuals with your clinical knowledge, and never use AI-generated medical images as authoritative references for clinical decisions.

The Current Landscape: Major Tools and Their Capabilities

The AI image and video generation space has evolved rapidly. Let's survey the major tools available today, organized by their primary modality.

Image Generation Tools

Google Nano Banana (via Gemini)

Google's Nano Banana—the image generation model powering Gemini—has become the gold standard for AI image creation, particularly for healthcare applications. It went viral in August 2025 when users discovered it could transform selfies into 3D figurines, but its real power is in professional applications: creating patient education materials, infographics, and labeled diagrams with accurate, legible text.

Nano Banana (Fast Mode)

Character consistency: Maintain consistent characters across multiple images for rich storytelling—useful for patient education series.
Image blending: Combine multiple images into a single coherent image.
Natural language editing: Make targeted transformations by describing what you want changed.
Top-rated editing: Currently the highest-rated image editing model in the world.

Access: In Gemini, select "🍌Create images" from tools and "Fast" from the model menu. Free tier available.

Nano Banana Pro (Thinking Mode)

Best-in-class text rendering: The best model available for generating images with correctly rendered, legible text—from short taglines to long paragraphs. Essential for infographics, menus, diagrams, and educational materials.
High resolution: Built-in generation at 1K, 2K, and 4K resolution.
Multi-person consistency: Upload up to 14 reference images and maintain resemblance of up to 5 people.
Real-time data: Can use Google Search to verify facts and generate imagery based on current information.
Platform integrations: Now integrated into Adobe, Figma, and Canva.

Access: In Gemini, select "🍌Create images" and "Thinking" from the model menu. Also available in NotebookLM, Vertex AI, and Google Workspace.

Why Nano Banana Matters for Healthcare

Most AI image generators struggle with text—producing garbled letters or misspellings. Nano Banana Pro solves this, making it ideal for patient education materials where accurate labels and instructions are essential. All images are watermarked with SynthID for AI detection.

Example: This Module as a Whiteboard Summary

The image below was generated by Nano Banana Pro using a single prompt containing the contents of this web page. It demonstrates the model's ability to render complex text accurately, organize information visually, and produce educational materials ready for use.

AI-generated whiteboard summary of AI Image and Video Creation module, created with Nano Banana Pro — *Generated with Nano Banana Pro from the contents of this page. Note the accurate text rendering across headings, bullet points, and the tools table.*

ChatGPT Images (GPT Image 1.5)

On December 16, 2025, OpenAI released GPT Image 1.5—their new flagship image generation model and the most significant upgrade to ChatGPT's image capabilities to date. This release came as part of OpenAI's competitive response to Google's Gemini 3 and Nano Banana Pro.

What's New in GPT Image 1.5

GPT Image 1.5 generates images up to 4x faster than previous versions while delivering stronger instruction following and more precise editing. It's OpenAI's most capable general-purpose text-to-image model, with more expressive transformations, improved dense text rendering, and more natural-looking results.

Key Capabilities

4x faster generation: Dramatically reduced wait times for image creation.
Precise editing with consistency: Makes edits while keeping important details like facial likeness consistent across changes.
Stronger instruction following: Better at producing exactly what you describe in your prompts.
Improved dense text rendering: More accurate text in images—important for diagrams, infographics, and patient education materials.
More natural results: Images look less "AI-generated" with better lighting, proportions, and details.
One-time likeness upload: Upload your appearance once and reuse it across future creations—useful for personalized content.

Access: Rolling out to all ChatGPT users. Also available in the API as "GPT Image 1.5." Business and Enterprise access coming soon.

New Images Experience

Alongside the model upgrade, OpenAI introduced a dedicated Images experience in the ChatGPT sidebar. This includes preset filters and trending prompts to help you get started quickly, plus the one-time likeness upload feature for consistent self-representation across images.

Video Generation Tools

Google Veo 3.1

Veo 3.1, released October 2025, transformed AI video from an impressive tech demo into a production-ready tool. It generates videos with native audio—from natural conversations to synchronized sound effects—directly from text prompts.

Key Capabilities

Native synchronized audio: Generate context-appropriate soundscapes, sound effects, dialogue with lip-sync, and even multi-person conversations—all from a single text prompt.
Extended duration: Generate 4, 6, or 8 seconds from text/images, then extend up to 148 seconds (2.5 minutes!) using Scene Extension. Each extension maintains visual continuity with background audio.
High resolution: 720p or 1080p output at 24fps.
Ingredients to Video: Use multiple reference images to control characters, objects, and style—create scenes exactly as you envision them.
Frames to Video: Provide starting and ending images; Veo generates seamless transitions between them.
Insert & Remove: Add objects to scenes or delete elements/characters with natural physics.

Access: Available through Google Flow, Gemini API, and Vertex AI. Pricing: $0.15/second (Fast) to $0.40/second (Standard).

Google Flow: AI Filmmaking

Flow is Google's AI filmmaking interface that brings Veo 3.1, Nano Banana, and Gemini together. Use natural language to describe shots, manage story "ingredients" (cast, locations, objects, styles) in one place, and create cinematic sequences. This is where video generation is heading—and it's available now.

OpenAI Sora 2

OpenAI describes Sora 2 as "the GPT-3.5 moment for video"—a major capability jump that makes physically accurate video generation accessible. Released October 2025, it focuses on realistic motion and physics simulation.

Key Capabilities

Exceptional physics: Handles difficult scenarios like Olympic gymnastics, backflips on paddleboards (modeling buoyancy and rigidity), and accurate ball rebounds. If a basketball player misses, the ball bounces realistically rather than teleporting.
Synchronized audio: Generates sophisticated dialogue, music, and sound effects matched to video content.
Cameos: Scan your own likeness and insert yourself into generated videos—useful for personalized patient education content.
Multi-shot control: Type instructions for camera movement and composition; maintains visual consistency across shot sequences.
Extended duration: Up to 60 seconds while maintaining quality and coherence.

Access: Included with ChatGPT Plus ($20/month) with limited generations. ChatGPT Pro ($200/month) includes higher limits and resolution. Free tier gets 30 video generations per day. Additional packs of 10 generations available for $4.

Canva: The Accessible Middle Ground

For healthcare professionals who need practical visual design without becoming AI experts, Canva deserves special attention. While Midjourney, ChatGPT, and Veo are primarily AI generation tools, Canva is a comprehensive design platform that has integrated AI features thoughtfully—making it ideal for everyday practice needs.

Magic Studio: Canva's AI Suite

Canva's "Magic Studio" bundles multiple AI capabilities into their familiar drag-and-drop interface:

Magic Design

Describe what you want to create—"a patient education handout about diabetes management"—and Canva generates multiple complete design layouts to choose from.

Magic Media

Text-to-image and text-to-video generation directly within Canva. Generated assets drop right into your design workspace.

Magic Edit

Select any part of an image and describe changes—add objects, change colors, swap backgrounds.

Magic Eraser

Remove unwanted objects from photos with a simple brush selection.

Background Remover

Instantly isolate subjects from backgrounds—useful for creating professional photos or headshots.

Magic Switch

Instantly resize any design for different platforms or translate into different languages.

Why Canva Works for Healthcare

The advantage of Canva over pure AI generation tools is integration with practical workflow. You're not just generating images—you're creating finished materials. Need a vaccination reminder postcard? A waiting room poster about preventive screenings? An Instagram post announcing flu shot availability? Canva provides templates specifically designed for these use cases, and AI features enhance rather than replace this template-based approach.

Access and Pricing

Free tier: Basic design tools, 2+ million templates, limited AI features, 5 GB storage.
Canva Pro: $14.99/month or $120/year. Full Magic Studio access, premium templates, Brand Kit for maintaining consistent visual identity.
Canva for Teams: $29.99/month for the first 5 users. Collaboration features, shared brand assets, role-based permissions.

Practical Applications for Healthcare

Let's move from tool descriptions to practical applications. Here are specific use cases where visual AI can enhance your practice, with example prompts you can adapt.

Patient Education Materials

Creating clear visual explanations of conditions, procedures, and treatments is perhaps the highest-value application for clinical practice.

Explaining Asthma to Parents

Poor Prompt

"Picture of lungs with asthma"

Better Prompt

"A side-by-side educational illustration comparing normal and asthmatic airways. Left side shows a healthy bronchiole with open airway and thin smooth muscle. Right side shows an asthmatic bronchiole with constricted smooth muscle, thickened airway walls, and excess mucus partially blocking the passage. Clean, simple medical illustration style suitable for patient education. Light blue and coral color palette. Labels pointing to key structures: 'Normal airway,' 'Inflamed airway,' 'Mucus,' 'Constricted muscle.' White background."

Demonstrating Proper Inhaler Technique (Video)

Video Prompt (Veo or Sora)

"An educational video showing proper metered-dose inhaler technique. A person shakes the inhaler, exhales fully, places the inhaler at their lips, begins slow inhalation while pressing the canister, continues inhaling for 3-5 seconds, then holds breath with closed mouth. Clean clinical setting, well-lit, shot from the side to show technique clearly. No audio commentary, just ambient sound."

Practice Branding and Logos

Creating a professional visual identity for your practice used to require expensive design agencies. AI tools can generate solid starting points.

Pediatric Practice Logo

Logo Prompt

"A modern, friendly logo for 'Riverside Pediatrics.' Incorporate a subtle river wave element and a simple, warm representation of a child or family. Color palette: soft teal, coral, and white. Clean vector style, minimalist, would work at small sizes. No text—just the icon/symbol portion of the logo."

Note: Generate the symbol separately from text. AI tools are improving at text, but logos require precise typography—often better handled in Canva or a dedicated design tool where you have exact control over fonts and spacing.

Social Media and Marketing Content

Flu Season Awareness Post

Social Media Prompt

"A warm, inviting image for a social media post promoting flu vaccinations. Show a friendly, diverse family (parents and two children of different ages) in cozy autumn clothing, looking healthy and happy. Soft fall colors in background—golden leaves, warm lighting. Photorealistic style. Leave clear space at top for text overlay. 1080x1080 square format."

Practice Open House Announcement (Video)

Video Prompt

"A short welcoming video showing the exterior of a modern medical office building, then transitioning through the front door into a bright, clean waiting room with comfortable seating and friendly natural lighting. Camera moves smoothly in a walking motion. Warm, inviting atmosphere. Morning light. 5 seconds."

Presentation Visuals

Conference Presentation on Childhood Obesity

Presentation Prompt

"An abstract conceptual illustration representing the multifactorial nature of childhood obesity. Interconnected circular elements suggesting: physical activity (motion/movement shapes), nutrition (simple food icons), genetics (DNA helix), environment (home/school building silhouettes), and mental health (brain outline). Professional infographic style, purple and teal gradient color scheme, suitable for a medical presentation slide. Clean white background."

Medical Education Content

Case-Based Learning Scenario Image

Education Prompt

"A middle-aged woman sitting in a doctor's office examination room, appearing fatigued and concerned. She's dressed casually, seated on an exam table. Her body language suggests she's describing symptoms—hand gesturing toward her chest area. The physician (visible from behind, wearing white coat) is seated and listening attentively. Professional medical setting, warm but clinical lighting. Realistic style, suitable for a medical education case vignette."

Prompting Strategies for Visual AI

Drawing on our prompting module, here are specific strategies optimized for image and video generation.

The Anatomy of an Effective Image Prompt

Effective prompts typically include several elements:

1. Subject

What is the main focus? Be specific about who or what appears.

2. Action/State

What is happening? Is the subject doing something or in a particular state?

3. Setting/Context

Where is this taking place? What's the environment?

4. Style

What aesthetic are you going for? Photorealistic, illustration, diagram, infographic?

5. Technical Specs

Aspect ratio, resolution, color palette, lighting.

6. Purpose Qualifier

What will this be used for? "Suitable for patient education," "social media post format."

Style Keywords That Work

Certain keywords reliably influence output style:

"Medical illustration style" or "medical textbook quality" — Clean, educational aesthetic
"Photorealistic" — Aims for photography-like output
"Infographic style" — Data visualization, clean graphics
"Vector illustration" — Clean lines, scalable, logo-appropriate
"Warm and friendly" — Patient-facing, approachable
"Professional" or "clinical" — Appropriate for healthcare context
"Clean white background" — Useful for materials you'll composite later

Iteration and Refinement

Rarely will your first generation be exactly what you need. Build in time for iteration:

Start broad, then narrow: Your first prompt establishes the general direction. Subsequent prompts refine.
Identify what's working: When you get a result, note which elements you like before requesting changes.
Be specific about changes: "Make the background lighter" is better than "I don't like the background."
Use built-in editing: Most tools now allow selective editing—often faster than regenerating entirely.

Video-Specific Prompting

Video prompts need additional considerations:

Describe motion explicitly: "Camera slowly pans left," "subject walks toward camera," "gentle zoom in."
Specify pacing: "Slow, contemplative motion" vs. "quick, dynamic movement."
Consider audio: If using Veo or Sora with audio, describe the soundscape: "ambient office sounds," "no dialogue, just environmental audio."
Keep it simple: Current video AI handles single continuous actions better than complex multi-step sequences.

Limitations and Critical Considerations

What These Tools Cannot Reliably Do

Being clear about limitations helps you use these tools appropriately:

Anatomical accuracy: AI-generated anatomical images may contain errors. Never use them as clinical references or for diagnostic education without expert review.
Consistent characters: Maintaining the exact same person across multiple images remains challenging. Reference image features help but don't guarantee consistency.
Complex text: While newer models have improved dramatically, long passages of text still risk errors. Always verify any text in generated images.
Specific real people: Most tools restrict generation of identifiable real individuals. This is a safety feature, not a bug.
Complex hands and poses: Though improved, hands remain a weak point. Check carefully when hands are prominent.
Video complexity: Video AI currently handles simple scenes well but struggles with complex multi-person interactions or rapid action sequences.

Ethical Considerations

Representation and Bias

AI models reflect biases in their training data. When generating images of people, be intentional about diversity and representation. Don't default to narrow demographics in your prompts.

Transparency about AI origin: Patients and colleagues should know when images are AI-generated, particularly for educational or clinical materials. All major tools now add invisible watermarks, but visible disclosure is also appropriate.

Copyright and commercial use: AI-generated images raise unresolved copyright questions. Most tools grant commercial use rights for images you generate through their platforms. However, the underlying legal framework remains unsettled—multiple lawsuits are ongoing. For high-stakes commercial use, consider consulting with legal counsel.

Patient Privacy Reminder

Never upload patient photos or identifiable information to AI image tools. These platforms are not HIPAA-compliant for patient data processing. Never upload patient-identifiable information—photos, names, medical record numbers, or any combination of information that could identify a patient—to consumer AI tools. If you need to describe a clinical scenario for educational content, use fictional composite cases, not real patients.

Deepfakes and misuse: These same tools that create useful content can generate misinformation. As a healthcare professional, you have particular credibility—be thoughtful about how AI-generated content from your practice could be misused if shared out of context.

Getting Started: Your First Projects

The best way to learn these tools is to use them. Here are concrete starting projects appropriate for different experience levels.

Beginner: Practice Logo Concept

Goal: Generate 3-5 logo concepts for a hypothetical (or your real) practice.

Tool: ChatGPT (free), Gemini, or Midjourney

Steps:

Write down 3 words that describe the feeling you want your practice to convey (e.g., "warm," "professional," "modern")
Identify any visual elements relevant to your specialty or location
Choose 2-3 colors that feel appropriate
Generate initial concepts using a prompt incorporating these elements
Iterate on the most promising result with refinement prompts

Starter Prompt Template

"A modern, [adjective] logo icon for a [specialty] medical practice. The design should suggest [concept/feeling]. Color palette: [colors]. Clean vector style, minimalist, professional, works at small sizes. No text—icon only."

Intermediate: Patient Education Handout

Goal: Create a one-page patient education handout on a common condition including AI-generated illustrations.

Tools: Image generator of choice + Canva for layout

Steps:

Choose a condition you frequently explain to patients
Outline 3-4 key points patients need to understand
Identify 1-2 concepts that would benefit from visual explanation
Generate illustrations for those concepts
Use Canva to combine text and images into a polished handout
Have a colleague review for clarity and accuracy

Advanced: Short Educational Video

Goal: Create a 15-30 second educational video demonstrating a patient self-care technique.

Tools: Veo or Sora

Steps:

Choose a simple technique (e.g., nasal saline irrigation, proper splint application, wound care)
Break the technique into 3-4 distinct steps
Generate a short clip for each step
Use a video editor (or Canva) to combine clips and add text overlays
Review for accuracy—AI may generate incorrect technique

Choosing the Right Tool

With multiple capable options, here's practical guidance on tool selection:

Use Case	Recommended Tool	Why
Best starting point	Gemini (Nano Banana)	Free tier, best text accuracy, powers both images and video
Images with text/labels	Nano Banana Pro	Best-in-class text rendering for infographics, diagrams, and patient education
Aesthetic quality	Nano Banana Pro	High-resolution output (up to 4K) with exceptional detail and style control
Complete design projects	Canva (with Nano Banana)	Nano Banana Pro integrated directly into Canva's design platform
Video with audio	Google Veo 3.1	Native audio, 2.5-minute extensions, best overall video tool
Realistic video physics	OpenAI Sora 2	Best for complex motion, liquids, object interactions, sports movements
Conversational iteration	ChatGPT (GPT-4o)	Natural conversation to refine when exploring ideas
Patient education handouts	Canva + Nano Banana	Generate labeled illustrations, compose in healthcare templates

Resources for Further Learning

Official Documentation

Google: Image Generation with Gemini (Nano Banana)

Official Gemini API documentation for Nano Banana and Nano Banana Pro

Google DeepMind Veo 3.1 Model Page

Technical overview of Veo 3.1 capabilities including Flow integration

OpenAI: The New ChatGPT Images Is Here

Official announcement of GPT Image 1.5 (December 2025)

OpenAI: Creating Images in ChatGPT

Official guide to GPT-4o image generation

OpenAI Sora

Official Sora 2 documentation and examples

Canva Magic Studio

AI features overview including Nano Banana Pro integration

YouTube Channels & Video Tutorials

Olivio Sarikas

Comprehensive tutorials on AI image generation tools. Hands-on walkthroughs with prompt engineering tips for better results.

Matthew Berman

AI news and tutorials covering ChatGPT, generative art, and emerging tools. Good for keeping up with rapid developments.

Research & Healthcare Applications

"Advancing Innovation in Medical Presentations: A Guide for Medical Educators to Use AI-Generated Images"

PMC, 2024 · 12 practical tips for using AI image generation in medical education, from prompt templates to ethical considerations.

"Generative Artificial Intelligence: Enhancing Patient Education in Cardiovascular Imaging"

PMC, 2024 · How generative AI enables personalized multimedia content for patient education through natural language interactions.

Communities and Prompt Libraries

PromptHero

Searchable database of prompts and results across multiple AI image generators

Key Takeaways

Visual AI Has Matured

The tools available today can genuinely enhance patient education, practice marketing, and educational content creation.

Prompting Principles Transfer

Specificity, context, and clear communication produce better results—just like patient histories.

Each Tool Has Strengths

Nano Banana for text/labels, ChatGPT for conversation, Canva for finished projects, Veo 3.1 for video with audio, Sora 2 for physics.

Verification Is Essential

AI-generated medical content requires expert review before use—especially anatomical or clinical images.

Ethics Matter

Be intentional about representation, transparent about AI origin, and vigilant about patient privacy.

Learn By Doing

Start with a small project and iterate. The technology rewards experimentation.

The Bottom Line

Visual AI represents another dimension of how artificial intelligence can augment healthcare practice. Like the language models we've discussed throughout this course, these tools work best when humans bring clinical judgment, creativity, and ethical awareness to the collaboration. The images and videos these tools generate are starting points—raw material that you refine, verify, and deploy in service of better patient care and communication.

Learning Objectives

Identify appropriate use cases for AI image and video generation in healthcare settings
Construct effective prompts for visual AI tools using specificity and context
Evaluate the capabilities and limitations of major image and video generation platforms
Apply verification practices to AI-generated visual content before clinical or educational use
Navigate ethical considerations including representation, transparency, and patient privacy
Create practical visual content for patient education, practice marketing, and presentations