Friday, 9 January 2026

You can upload mp3 files to Gemini to get feedback on your speaking AND your pronunciation


I didn’t know this worked and when Frank Andrade posted about it, I immediately uploaded an mp3 file on my Android mobile phone. Gemini gave me feedback, but I then asked, ““Is your feedback on my pronunciation based on the sounds I produced or on typical errors learners make with these sounds?”

The reply began, “My feedback is a combination of both: I analyse the specific acoustic signals in your recording while also referencing common error patterns to provide meaningful context.
How the Analysis Works
When you upload an audio file, I use advanced speech processing to break down your speech into individual sounds (phonemes) and compare them against standard models.
1. Direct Sound Analysis: I analyse the physical properties of your voice—such as pitch, duration, and intensity—to identify exactly where your pronunciation deviates from the reference. For example, if you shorten a vowel that should be long, I detect that specific timing difference.

I then decided to test it again on my iPad, so I uploaded a different file and asked, “Can you give me feedback on my speaking including my pronunciation?”

 

Next, I asked, “ Can you make a corrected version of what I wanted to say?”

 

The third prompt was, “Can you make a more colloquial version of this?”

 


The fourth and final prompt was, “Can you make a new version based on the corrected one but half a CEFR level up?”

 


My reflections on the replies from Gemini:

I was very impressed by the feedback on the two students’ pronunciation, although it found three very common errors made by Spanish and Catalan learners of English-

The feedback was very detailed, maybe too detailed. Maybe the prompt should be, “Can you give me some brief feedback on my speaking including my pronunciation?”

Under Grammatical Accuracy, Gemini said, “Since you are telling a story in the past, try to keep all verbs in the past tense.” However, when asked to produce a corrected version it was all in the present simple and included the comment,  “In English, when we summarize the plot of a book or a movie, we almost always use the present tense.”

The more colloquial version was great but at B2/B2+ it is a lot higher than the student’s B1+ level as assigned by Gemini.

The version half a CEFR level up was definitely overcomplicated for a student at her level and would certainly end up with her asking things like, “What does frosty mean?” It was supposed to be at B2 level as Gemini claimed it was, but it was actually at C2, which is about two levels higher, rather than half a level higher!

What to do next:

I wanted to try the same prompts with other GenAI tools like ChatGPT, Claude, Copilot, Perplexity and DeepSeek.

Claude, Copilot and Deepseek simply won’t accept uploading mp3 files.

ChatGPT allowed me to upload the same mp3 file, but then left me waiting for a few hours after saying, “Please wait a moment while I listen and prepare everything.” And when I complained it asked me to upload the file again as it was not ‘available’ any more. I was told to upgrade or wait 21 hours to be able to upload the file again without paying. It also suggested that I should upload 1-2-minute-long files next time. Mine was just over 3 minutes long. So I think we can discount ChatGPT on a free account.

Perplexity was the only one that allowed me to upload an mp3 file and immediately give me feedback on it, but it admitted that its feedback on pronunciation was not based on the audio, but on the transcript and typical Spanish errors. A further problem with Perplexity using the webpage or the app is that there is no ‘Read Aloud’ so students can’t listen to anything. As I was using Microsoft’s Edge browser, I tried the ‘Read Aloud’ incorporated in it, but it was not very satisfactory, sounding very robotic in comparison to the ‘Read Aloud’ available with Gemini.

This is all a great pity as the different versions of the transcriptions made by Perplexity were much closer to what I had been hoping for. For example, the feedback was B2/B2+, which for a B1+ student is probably alright, compared with Gemini, which was B2+/C1. This happened with the different versions, apart from the corrected versions, which were very similar B1/B1+.


I want to see what subtle changes can be made to the four prompts to get feedback that the student could engage with more easily.

1.      Can you show me where my errors are without telling me how to correct them?

2.      Can you give me some brief feedback at my level on my speaking including my pronunciation?

3.      Can you make a corrected version of what I wanted to say?

4.      Can you make a more colloquial version of this without increasing the sophistication more than half a CEFR level?

5.      Can you make a new more sophisticated version based on the corrected one but less than half a CEFR level up?

Ideally, students can be given a choice of which prompts to use and help with how to engage with the feedback in different ways.

No comments:

Post a Comment