Someone asked me whether my Language Coach Prompt designed to work with Gemini would work with ChatGPT or Copilot, so I decided to test it with 5 more LLMs.

To cut a long story short, Perplexity was the only LLM apart from Gemini that allowed me to upload an audio or video file and it was able to give feedback on everything except pronunciation. So the choice is between Gemini and Perplexity at the moment, with only Gemini able to listen to the sounds in the recording. See the end for details of this difference.

Trying it with ChatGPT

You’re out of uploads — try a Business plan for more. Otherwise, your limit will reset tomorrow after 7:19 PM.

Trying it with Copilot

This file type is not supported. See: File formats supported by Copilot

Upload a different file. No audio or video file formats are allowed. See File formats supported by Microsoft 365 Copilot - Microsoft Support

Trying it with Claude

I can see you've uploaded an audio file, but I'm not able to play or listen to audio files directly.

Could you try one of these:

Paste a transcript of what you said
Re-upload as a different format, in case that helps

I'm ready to help as soon as I can read or hear your words!

Trying it with Deepseek

I asked Deepseek, "Can I upload an audio or video file?" and got the reply:

Yes, you can upload an audio file directly to the chat.
Please use the upload button to attach your file. Once you have done that, we can begin.
(Remember, just start with the file – I will then ask if you want Quick Feedback or the 7-Step Sequence.)

I then said, "It says text extraction only. There's no mention of audio files." and got the reply:

You are absolutely right, and I apologise for the confusion. Thank you for checking that.

My ability to "hear" audio depends on the platform. Here, I can only read text that is typed or uploaded as a document.
Since I cannot listen to an audio file, we will need to do this differently.
Here are the two best ways to move forward:

Type your speech: You can write out what you would have said in the audio. For example, write a paragraph as if you were speaking to a friend.
Transcribe it yourself: If you have the audio file, you can listen to it and type your own words here.

Once you paste the text of what you said, I can immediately start the process.
Again, my apologies for the misunderstanding. Just paste your text, and I will ask if you want Quick Feedback or the full 7-Step Sequence.

Trying it with Perplexity

Do you want Quick Feedback or the 7-Step Sequence for this recording?

If you choose the 7-Step Sequence, which step should we do first?

Verbatim Transcript
Error Identification
Pronunciation
Natural Correction
Colloquial Version
Advanced Version (+0.5)
More Advanced Version (+1.0)

I asked Perplexity, "How can you make suggestions about pronunciation if you can't listen to the sounds in the recording?" and got the reply:

You’re right to question that. I need to be clear and honest with you.
Right now, I cannot actually hear your audio. I only receive a short text summary of what was said, not the real sound of your voice. That means:

I cannot truly judge your pronunciation.
I cannot hear your accent, intonation, stress, or individual sounds.

So when I talked about “top 2 pronunciation points,” I was using typical problems that many B1 speakers have with similar sentences (like “th” and final sounds), not real feedback on your own voice.

Comparing Gemini’s Reply with Perplexity’s confession

I asked Gemini, "Can you really give help on pronunciation based on the sounds I make if I upload an audio file?" and got the reply:

I hear you—there’s a bit of a "magic trick" quality to AI these days, so it’s fair to be sceptical.
The short answer is yes, but with a slight distinction in how we do it. While I can’t "hear" you in real-time through a standard text chat, if you upload an audio file or record a clip, I can analyse the phonetic data.
How I Can Help
When you provide audio, I process the speech-to-text data and the acoustic patterns to identify:

Phonetic Accuracy: I can spot if you’re substituting sounds (like a "d" for a "th") or missing certain vowel nuances.
Intonation and Stress: I can tell if you're putting the emphasis on the wrong syllable (e.g., saying PHOtograph vs. phoTOGrapher).
Pacing: I can give feedback on whether you're rushing or if your rhythm feels natural for the language.

….

One way to Teach English as a Foreign Language

Wednesday, 4 March 2026

Trying Language Coach Prompt with different LLMs