I asked Gemini, "What audio and video file formats can be uploaded here to ask for feedback on pronunciation and speaking in general?"
To help you and your students use this workflow effectively, here is a breakdown of the supported formats.
Since your focus is on analyzing speech patterns and pronunciation, audio files are usually the most efficient, but video files also work well if the student wants to show their facial expressions or mouth movements.
Supported Audio Formats
These are the most common formats for voice recordings:
- MP3 (The standard format for most phones and voice recorders)
- WAV (High quality, but larger file sizes)
1 - AAC / M4A (Common for iPhone "Voice Memos")
2 - OGG
- FLAC
Supported Video Formats
If a student records themselves on camera, these formats are supported:
- MP4 (The most common format for smartphones)
- MOV (Standard for Apple devices)
- AVI
- WMV
- WebM
Key Technical Limits
- File Size: Generally, files should be under 20MB for the best performance. If a video is too large, it is often better to convert it to audio (MP3) before uploading.
- Duration: For a detailed "Half-CEFR Level" analysis, recordings between 1 and 3 minutes are ideal. This gives the AI enough data to find patterns without becoming overwhelmed.
Pro-Tip for your Students:
If a student is using a smartphone, I recommend the following:
- iOS: Use the Voice Memos app (saved as M4A).
3 - Android: Use the built-in Voice Recorder (usually saved as MP3 or M4A)
- Environment: Remind them to record in a quiet room. Background noise (like a TV or wind) can cause the AI to create an incorrect transcript, which will ruin the bracket
[ ]error-finding step.
More details on File sizes
Here are the upload limits for audio and video files in the standard Gemini web app (gemini.google.com) and mobile app:
Video
Maximum File Size: 2 GB per file.
Length Limit:
Free Version: Up to 5 minutes total duration per prompt.
Gemini Advanced: Up to 1 hour total duration per prompt.
Audio
Maximum File Size: Generally up to 100 MB (often treated under the general file limit).
Length Limit:
Free Version: Up to 10 minutes total duration per prompt.
Gemini Advanced: Up to 3 hours total duration per prompt.
Key Notes:
You can upload up to 10 files in a single prompt.
The limits apply to the total duration in a single conversation turn (e.g., if you are on the free plan, you could upload two 2.5-minute videos, but not two 3-minute videos).

No comments:
Post a Comment