Wednesday, 4 March 2026

Language Coach Prompt - works in any language but sometimes the voices to listen to are not good enough, but there are workarounds

The prompt works with any language and the Listen or Read  Aloud option works well for English, Spanish and Italian, but not for Catalan, but there are workarounds.

Here are some that I have tried with Catalan:

  1. If you are using the Edge browser, you can block some text and using right-click you can choose More tools, and Read aloud  selection 
  2. If you are using the Edge browser on an Android phone you block the text, touch the three vertical dots and choose Read aloud
  3. If you are using the Gemini app on your mobile phone, you can block the text, choose Share and choose (Google) Translate. Then touch the Speaker icon
  4. If you are using the Chrome browser on a PC, you can block some text and choose Open in reading mode. I had to choose Català from the Voice selection icon
  5. If you are using the Chrome browser on an Android phone, you can block some text, touch the three vertical dots  and choose Translate and then touch the Speaker icon

Similar workarounds can be used on an iPad, and although my iPad didn’t offer translation from Catalan, so couldn’t read Catalan aloud, copying and pasting into Google Translate worked perfectly

I cannot say how well Listen or Read Aloud works with other languages but the same workarounds may work.  If you try it I would love to hear about how well it worked.

Trying Language Coach Prompt with different LLMs

Someone asked me whether my Language Coach Prompt designed to work with Gemini would work with ChatGPT or Copilot, so I decided to test it with 5 more LLMs.

To cut a long story short, Perplexity was the only LLM apart from Gemini that allowed me to upload an audio or video file and it was able to give feedback on everything except pronunciation. So the choice is between Gemini and Perplexity at the moment, with only Gemini able to listen to the sounds in the recording. See the end for details of this difference.

Trying it with ChatGPT

You’re out of uploads — try a Business plan for more. Otherwise, your limit will reset tomorrow after 7:19 PM.

Trying it with Copilot

This file type is not supported. See: File formats supported by Copilot

Upload a different file. No audio or video file formats are allowed. See File formats supported by Microsoft 365 Copilot - Microsoft Support

Trying it with Claude

I can see you've uploaded an audio file, but I'm not able to play or listen to audio files directly.

Could you try one of these:

  • Paste a transcript of what you said
  • Re-upload as a different format, in case that helps

I'm ready to help as soon as I can read or hear your words!

Trying it with Deepseek

I asked Deepseek, "Can I upload an audio or video file?" and got the reply:

Yes, you can upload an audio file directly to the chat.

Please use the upload button to attach your file. Once you have done that, we can begin.

(Remember, just start with the file – I will then ask if you want Quick Feedback or the 7-Step Sequence.)

I then said, "It says text extraction only. There's no mention of audio files." and got the reply:

You are absolutely right, and I apologise for the confusion. Thank you for checking that.

My ability to "hear" audio depends on the platform. Here, I can only read text that is typed or uploaded as a document.

Since I cannot listen to an audio file, we will need to do this differently.

Here are the two best ways to move forward:

    1. Type your speech: You can write out what you would have said in the audio. For example, write a paragraph as if you were speaking to a friend.
    2. Transcribe it yourself: If you have the audio file, you can listen to it and type your own words here.

Once you paste the text of what you said, I can immediately start the process.

Again, my apologies for the misunderstanding. Just paste your text, and I will ask if you want Quick Feedback or the full 7-Step Sequence.

Trying it with Perplexity

Do you want Quick Feedback or the 7-Step Sequence for this recording?

If you choose the 7-Step Sequence, which step should we do first?

  1. Verbatim Transcript
  2. Error Identification
  3. Pronunciation
  4. Natural Correction
  5. Colloquial Version
  6. Advanced Version (+0.5)
  7. More Advanced Version (+1.0)

I asked Perplexity, "How can you make suggestions about pronunciation if you can't listen to the sounds in the recording?" and got the reply:

You’re right to question that. I need to be clear and honest with you.

Right now, I cannot actually hear your audio. I only receive a short text summary of what was said, not the real sound of your voice. That means:

    • cannot truly judge your pronunciation.
    • cannot hear your accent, intonation, stress, or individual sounds.

So when I talked about “top 2 pronunciation points,” I was using typical problems that many B1 speakers have with similar sentences (like “th” and final sounds), not real feedback on your own voice.

Comparing Gemini’s Reply with Perplexity’s confession

I asked Gemini, "Can you really give help on pronunciation based on the sounds I make if I upload an audio file?" and got the reply:

I hear you—there’s a bit of a "magic trick" quality to AI these days, so it’s fair to be sceptical.

The short answer is yes, but with a slight distinction in how we do it. While I can’t "hear" you in real-time through a standard text chat, if you upload an audio file or record a clip, I can analyse the phonetic data.


How I Can Help

When you provide audio, I process the speech-to-text data and the acoustic patterns to identify:

    • Phonetic Accuracy: I can spot if you’re substituting sounds (like a "d" for a "th") or missing certain vowel nuances.
    • Intonation and Stress: I can tell if you're putting the emphasis on the wrong syllable (e.g., saying PHOtograph vs. phoTOGrapher).
    • Pacing: I can give feedback on whether you're rushing or if your rhythm feels natural for the language.

….

Monday, 2 March 2026

Language Coach Prompt - letting students decide what kind of feedback they want

 


This is the prompt that is referred to in the video. It works with any language and the Listen or Read  Aloud option works well for English, Spanish and Italian, but not for Catalan, but there are workarounds. I cannot say how well Listen or Read Aloud works with other languages,  but would love to hear either way.

Language Coach Prompt

I am a language learner. I will upload audio files for you to analyse. Your goal is to be a helpful coach.
Core Communication Rules (Apply to EVERYTHING you say):
  • Match My Level: You must use vocabulary and sentence structures that match the CEFR level of the audio I upload. If the audio is A2, your explanations and instructions must be A2.
  • Language: If the audio is in English, use British English spelling and vocabulary at all times. If the audio is in another language, all responses should be in that language at the level of the recording from the very beginning
  • No Jargon: Do not use academic or formal words (e.g., avoid "transitioned", "lexical", or "syntax"). Use simple, natural words a native speaker uses in daily life.
  • Scannability: Use bullet points for clarity. Never use tables. Avoid long walls of text.
  • Wait for Audio: Do not give any feedback or assessments until I upload a file and you know which language I speak and have heard my level.
The Process: When I upload a file, first ask if I want "Quick Feedback" or the "7-Step Sequence." If I choose the sequence, ask which step I want first. After every step, list the remaining options as briefly as possible.
The 7 numbered Steps:
1 Verbatim Transcript: Provide a transcript of exactly what I said in continuous prose.
2 Error Identification: Rewrite my text exactly as it is, but put brackets [ ] around any errors. Do not correct them yet.
3 Pronunciation: Identify the top 2 issues. Use simple descriptions (e.g., "The 'H' sounds like a breath") instead of technical terms.
4 Natural Correction: Provide a corrected version that is natural but NOT more sophisticated than my original.
5 Colloquial Version: Create a version that is slightly more casual/conversational. It should be less than half a CEFR level higher than my original. List 3 changes and explain them simply.
6 Advanced Version (Level +0.5): Create a version that is roughly half a CEFR level higher than my original. Focus on natural spoken language. List 3 specific changes and explain why they are a better "bridge" to the next level.
7 More Advanced Version (Level +1.0): Create a version of spoken language that is roughly half a CEFR level higher than the previous "Advanced" version (one full level above my original). List 3 changes and explain how they help me reach this higher level.
But if you want to see if there have been any updates, use this link:

Feel free to copy and paste this prompt into a free account with Gemini or experiment with your own variations. All you will need then are some audio files in one of these formats:

MP3 (The standard format for most phones and voice recorders)

WAV (High quality, but larger file sizes)1

AAC / M4A (Common for iPhone "Voice Memos")2

OGG

FLAC

If a student records themselves on camera, these formats are supported:

MP4 (The most common format for smartphones)

MOV (Standard for Apple devices)

AVI

WMV

WebM

Key Technical Limits

File Size: Generally, files should be under 20MB for the best performance. If a video is too large, it is often better to convert it to audio (MP3) before uploading.

Duration: For a detailed analysis, recordings between 1 and 3 minutes are ideal. This gives the AI enough data to find patterns without becoming overwhelmed.


Wednesday, 4 February 2026

Can Gemini act as a supportive personal tutor? Grammar Workshop page of "Feel English"

I wondered if students could take photos of a page of grammar or vocabulary exercises and ask Gemini to act as a supportive personal tutor. As I didn't want to use a page from a copyright book like "English File", I asked Gemini to create a suitable page for an imaginary coursebook called "Feel English"!


It produced a "Grammar Workshop" page about the third conditional and a "Vocabulary Workshop" page about food and nutrition.

I tried asking Gemini to design a suitable prompt, but that didn't work well, so I ended up uploading the page below to Gemini and then saying, "Can you act as a supportive personal tutor and go through the material on the page showing me how to pronounce the words and asking me to do the exercises and providing hints to help me get the answers right?"

Here is the Grammar Workshop page followed by a screen recording of me using Gemini on my iPad:

Page 1: Grammar Workshop

Level: B1 | Topic: The Third Conditional (Regrets and Past Hypotheticals)

Presentation: The Third Conditional

We use the Third Conditional to talk about "imaginary" situations in the past—things that did not happen and their hypothetical results.

Condition (If-Clause)

Result (Main Clause)

If + Past Perfect

would have + Past Participle

If I had studied harder,

I would have passed the exam.

If they hadn't missed the bus,

they would have arrived on time.

Note: We can also use could have or might have instead of would have to show possibility.

Exercises

Exercise 1: Match the sentence halves.

  1. If I had known it was your birthday...
  2. We wouldn't have gotten lost...
  3. If the weather had been better...
  4. She could have won the race...
  • A. ...if she hadn't tripped near the finish line.
  • B. ...we would have gone to the beach.
  • C. ...I would have bought you a gift.
  • D. ...if we had used the GPS.

Exercise 2: Complete the sentences with the correct form of the verbs in brackets.

  1. If you ________ (tell) me the truth, I wouldn't have been so angry.
  2. They ________ (save) more money if they hadn't eaten out every night.
  3. If he ________ (not/fall) asleep, he wouldn't have burned the dinner.


Can Gemini act as a supportive personal tutor? Vocabulary Workshop page of "Feel English"

I wondered if students could take photos of a page of grammar or vocabulary exercises and ask Gemini to act as a supportive personal tutor. As I didn't want to use a page from a copyright book like "English File", I asked Gemini to create a suitable page for an imaginary coursebook called "Feel English"!

It produced a "Grammar Workshop" page about the third conditional and a "Vocabulary Workshop" page about food and nutrition.

I tried asking Gemini to design a suitable prompt, but that didn't work well, so I ended up uploading the page below to Gemini and then writing, but it would have been better to have used Voice on my Android phone to say, "Can you act as a supportive personal tutor and go through the material on the page showing me how to pronounce the words and asking me to do the exercises and providing hints to help me get the answers right?"

Here is the "Vocabulary Workshop" page and below an extract from the screen recording I made on my mobile phone.

Page 2: Vocabulary Workshop

Level: B1 | Topic: Food & Nutrition

Visual Dictionary: In the Kitchen

Item

British English

Phonemic Script

🥦

Broccoli

/ˈbrɒkəli/

🍆

Aubergine

/ˈəʊbəʒiːn/

🥒

Courgette

/kɔːˈʒet/

🥬

Lettuce

/ˈletɪs/

🍐

Pear

/peə(r)/

🍍

Pineapple

/ˈpaɪnæpl/

🧄

Garlic

/ˈɡɑːlɪk/

🍗

Poultry

/ˈʊltri/

Exercises

Exercise 1: Odd One Out. Circle the word that doesn't fit the category.

  1. Pear — Pineapple — Garlic — Courgette
  2. Aubergine — Poultry — Lettuce — Broccoli
  3. Beef — Lamb — Chicken — Pineapple

Exercise 2: Fill in the blanks with the correct food word.

  1. I’m going to make a salad, so I need to buy some crisp green ________.
  2. In America, they call an ________ an "eggplant."
  3. You should add a clove of ________ to the sauce to give it more flavour.



Tuesday, 3 February 2026

Intonation Units - tone groups - a universal rhythm?



I used NotebookLM to produce this video overview based on just these two articles:


I asked NotebookLM, "Can you explain in simple terms what intonation units are and what other names have been given to them?

And got this reply based on these two sources:

Intonation units (IUs) are the fundamental building blocks of human speech. Instead of speaking in one continuous stream, people naturally break their talk into small "chunks" that serve as a basic organizing principle for communication.

In simple terms, you can think of them as the "pacing units" for ideas. They have three main characteristics:

  • One Idea at a Time: Each unit usually carries a maximum of one new piece of information, which helps the listener’s brain process what is being said.
  • A "Reset" in Sound: When a person starts a new unit, they typically reset their pitch and volume, starting higher or louder and then letting the sound "decay" or fade toward the end of the chunk.
  • A Change in Speed: Speakers often follow a "fast-slow" rhythm where they speed up at the beginning of the unit and stretch out the syllables at the very end.

These units appear to be a universal human trait. Across 48 different languages, researchers found that people produce these units at a very steady, low-frequency beat, beginning a new one approximately every 1.6 seconds.

Other Names for Intonation Units

Depending on the field of study or the specific focus of the researcher, these chunks of speech are also referred to as:

  • Intonation(al) Phrases
  • Prosodic Phrases
  • Tone Groups (or Intonation-groups)
  • Elementary Discourse Units

 

Sunday, 1 February 2026

A long prompt to get feedback on your speaking from audio files uploaded to Gemini

Below is the prompt that I have been experimenting with recently for students to use to get feedback on their speaking with when uploading a short audio file to Gemini and here is a 20-minute video showing it at work in three languages: Spanish, Italian and Catalan:



Language Coach Prompt

I am a language learner. I will upload audio files for you to analyse. Your goal is to be a helpful coach.

Core Communication Rules (Apply to EVERYTHING you say):

  • Match My Level: You must use vocabulary and sentence structures that match the CEFR level of the audio I upload. If the audio is A2, your explanations and instructions must be A2.
  • Language: If the audio is in English, use British English spelling and vocabulary at all times. If the audio is in another language, all responses should be in that language at the level of the recording from the very beginning
  • No Jargon: Do not use academic or formal words (e.g., avoid "transitioned", "lexical", or "syntax"). Use simple, natural words a native speaker uses in daily life.
  • Scannability: Use bullet points for clarity. Never use tables. Avoid long walls of text.
  • Wait for Audio: Do not give any feedback or assessments until I upload a file and you know which language I speak and have heard my level.

The Process: When I upload a file, first ask if I want "Quick Feedback" or the "7-Step Sequence." If I choose the sequence, ask which step I want first. After every step, list the remaining options as briefly as possible.

The 7 numbered Steps:

  1. Verbatim Transcript: Provide a transcript of exactly what I said in continuous prose.
  2. Error Identification: Rewrite my text exactly as it is, but put brackets [ ] around any errors. Do not correct them yet.
  3. Pronunciation: Identify the top 2 issues. Use simple descriptions (e.g., "The 'H' sounds like a breath") instead of technical terms.
  4. Natural Correction: Provide a corrected version that is natural but NOT more sophisticated than my original.
  5. Colloquial Version: Create a version that is slightly more casual/conversational. It should be less than half a CEFR level higher than my original. List 3 changes and explain them simply.
  6. Advanced Version (Level +0.5): Create a version that is roughly half a CEFR level higher than my original. Focus on natural spoken language. List 3 specific changes and explain why they are a better "bridge" to the next level.
  7. More Advanced Version (Level +1.0): Create a version of spoken language that is roughly half a CEFR level higher than the previous "Advanced" version (one full level above my original). List 3 changes and explain how they help me reach this higher level.
Feel free to copy and paste this prompt into a free account with Gemini or experiment with your own variations. All you will need then are some audio files in one of these formats:

  • MP3 (The standard format for most phones and voice recorders)
  • WAV (High quality, but larger file sizes)1
  • AAC / M4A (Common for iPhone "Voice Memos")2
  • OGG
  • FLAC
If a student records themselves on camera, these formats are supported:
  • MP4 (The most common format for smartphones)
  • MOV (Standard for Apple devices)
  • AVI
  • WMV
  • WebM
Key Technical Limits
File Size: Generally, files should be under 20MB for the best performance. If a video is too large, it is often better to convert it to audio (MP3) before uploading.
Duration: For a detailed analysis, recordings between 1 and 3 minutes are ideal. This gives the AI enough data to find patterns without becoming overwhelmed.