Sunday, 18 January 2026

You can upload all these file types to Gemini to get feedback on your speaking, including your pronunciation MP3, WAV, AAC, M4A, OGG, FLAC, MP4, MOV, AVI, WMV, WebM


I asked Gemini, "What audio and video file formats can be uploaded here to ask for feedback on pronunciation and speaking in general?"

To help you and your students use this workflow effectively, here is a breakdown of the supported formats.

Since your focus is on analyzing speech patterns and pronunciation, audio files are usually the most efficient, but video files also work well if the student wants to show their facial expressions or mouth movements.

Supported Audio Formats

These are the most common formats for voice recordings:

  • MP3 (The standard format for most phones and voice recorders)
  • WAV (High quality, but larger file sizes)1
  • AAC / M4A (Common for iPhone "Voice Memos")2
  • OGG 
  • FLAC

Supported Video Formats

If a student records themselves on camera, these formats are supported:

  • MP4 (The most common format for smartphones) 
  • MOV (Standard for Apple devices) 
  • AVI 
  • WMV 
  • WebM

 Key Technical Limits

  • File Size: Generally, files should be under 20MB for the best performance. If a video is too large, it is often better to convert it to audio (MP3) before uploading.
  • Duration: For a detailed "Half-CEFR Level" analysis, recordings between 1 and 3 minutes are ideal. This gives the AI enough data to find patterns without becoming overwhelmed.

Pro-Tip for your Students:

If a student is using a smartphone, I recommend the following:

  1. iOS: Use the Voice Memos app (saved as M4A).3
  2. Android: Use the built-in Voice Recorder (usually saved as MP3 or M4A) 
  3. Environment: Remind them to record in a quiet room. Background noise (like a TV or wind) can cause the AI to create an incorrect transcript, which will ruin the bracket [ ] error-finding step.

No comments:

Post a Comment