Gemini
"Uplifting and Inspirational" by
So I gave up!
I had never used Artificial Intelligence to create songs, but had listened to some produced by Joe Dale using Suno. The workflow he suggested was ask ChatGPT to write the lyrics and copy them into Suno.
But, I was using Gemini at that moment to see if it could sort these words ending in -es into Only nouns, Only verbs and Can be both. (It was about as good as me. We both made mistakes!) So, I decided to add in a prompt. I asked, "Can you write some lyrics for a song that use as many of the verbs as possible in that form"
And it produced a song! Apparently when I asked for 'lyrics', Gemini decided to create the song itself using Lyria 3 and Music
But I wasn't satisfied with what Gemini produced and asked, "Can you restrict the -es endings to verbs, make it in British English and think of A2 level learners?" because it used some nouns ending in -es and it used the word 'faucet'. This is the 30-second song it created:
While attending the Automated L2 Speaking Assessment (AL2SA) International Workshop 2026, I asked Gemini to give me a verbatim transcription of a recording and there ensued a conversation about how Gemini could help learners with very specific pronunciation problems. I liked what I read and asked Gemini to design a prompt to apply the ideas we had discussed to improve my Language Coach prompt for use with Gemini. This is Gemini’s suggestion:
3 Pronunciation (The Sound
Spotlight): Identify the top 2 issues.
Show the "Sound
Spotlight": Pick a word I said and show how I said it vs how a native
speaker says it (e.g., "You said 'so' but it sounds like 'saw'").
Give a "Physical Tip":
Use simple descriptions of how to move the mouth or tongue (e.g., "Keep
your jaw low and your mouth open wide") instead of technical terms.
This was to replace the previous version’s section on
pronunciation, which was:
3 Pronunciation: Identify
the top 2 issues. Use simple descriptions (e.g., "The 'H' sounds like a
breath") instead of technical terms.
Here are three examples of the feedback Gemini produced with
this improved pronunciation step:
Pronunciation (The Sound
Spotlight)
Here are the top two areas to
focus on from your recording:
The "S" at the end
of words: In English, we use the -s for plurals (chocolates) and
third-person verbs (she says). Sometimes this sound was missing in your
recording.
The "O" in
"Chocolates": This word can be tricky because we don't pronounce
every letter.
The Sound Spotlight:
"Chocolates"
What you said: It sounded
like you were trying to say every syllable, like "chock-oh-lay-tes."
How a native speaker says it:
It sounds like "chock-letz." The middle "o" is
silent, and the "a" sounds like a short "e" or
"i."
Physical Tip
To say "chock-letz"
naturally:
The first part: Open your
mouth wide and drop your jaw for the "chock" sound.
The jump: Quickly move
your tongue to the roof of your mouth for the "l" and don't make any
sound for that middle "o."
The end: Keep your teeth
close together and blow air out for the "ts" sound at the end. It
should sound like a little hiss.
Let's look at how you pronounced some of the words. Overall,
you were very clear, but here is your Pronunciation Spotlight.
3. Pronunciation (The Sound Spotlight)
Sound Spotlight:
"Bought"
·
You said: It sounded like
"boat" (with a long 'o' sound like in "no").
·
A native speaker says: /bɔːt/. It should have a deep, open
"aw" sound, like in the word "law."
Physical Tip
To get the "aw" sound
in bought:
·
Drop your jaw down low.
·
Keep your tongue flat at the bottom of your
mouth.
·
Make your mouth into a tall O-shape (not a wide
one).
·
It should feel much "deeper" in your
throat than the word "boat."
Sound Spotlight: "Sweater"
·
You said: "Es-weater" (adding
an 'e' at the start).
·
A native speaker says: /ˈswetə/. It starts directly with
the "s" sound, like a snake hissing.
Physical Tip
To avoid the extra "e"
in sweater:
·
Close your teeth and let the air hiss out (sssss)
before you move your lips to make the "w" sound.
·
Don't let your throat vibrate until you get to
the "w."
·
Try saying "ssss-sweater" very slowly
to practice.
Great choice! Let's focus on the
sounds of your speech. Here is your Sound Spotlight for the recording:
Top 2 Issues
·
The "th" sound: In the word
"this," the beginning sounded a bit more like a "d" or a
"z."
·
The "-ing" ending: In the word
"staying," the end of the word was a bit short.
Sound Spotlight:
"This"
·
How you said it: You said something like
"dis."
·
How a native speaker says it: The
"th" is soft and uses air.
·
Physical Tip: Put the tip of your tongue
between your front teeth. Blow a little bit of air out while you make a sound
with your throat. It should feel like a small vibration on your tongue.
Sound Spotlight: "Watching"
·
How you said it: You said
"watch."
·
The "-ing" ending: In the word
"staying," the end of the word was a bit short.
·
How a native speaker says it: We add a
"ing" sound at the end.
·
Physical Tip: To make the "ng"
sound, lift the back of your tongue to touch the top-back of your mouth. The
air should come out of your nose, not your mouth.
The prompt works with any language and the Listen or Read Aloud option works well for English, Spanish and Italian, but not for Catalan, but there are workarounds.
Here are some that I have tried with Catalan:
- If you are using the Edge browser, you can block some text and using right-click you can choose More tools, and Read aloud selection
- If you are using the Edge browser on an Android phone you block the text, touch the three vertical dots and choose Read aloud
- If you are using the Gemini app on your mobile phone, you can block the text, choose Share and choose (Google) Translate. Then touch the Speaker icon
- If you are using the Chrome browser on a PC, you can block some text and choose Open in reading mode. I had to choose Català from the Voice selection icon
- If you are using the Chrome browser on an Android phone, you can block some text, touch the three vertical dots and choose Translate and then touch the Speaker icon
Similar workarounds can be used on an iPad, and although my
iPad didn’t offer translation from Catalan, so couldn’t read Catalan aloud,
copying and pasting into Google Translate worked perfectly
I cannot say how well Listen or Read
Aloud works with other languages but the same workarounds may work.
If you try it I would love to hear about how well it worked.
Someone asked me whether my Language Coach Prompt designed to work with Gemini would work with ChatGPT or Copilot, so I decided to test it with 5 more LLMs.
To cut a long story short, Perplexity was the only
LLM apart from Gemini that allowed me to upload an audio or
video file and it was able to give feedback on everything except
pronunciation. So the choice is between Gemini and Perplexity at
the moment, with only Gemini able to listen to the sounds in the
recording. See the end for details of this difference.
You’re out of uploads — try a Business plan for
more. Otherwise, your limit will reset tomorrow after 7:19 PM.
This file type is not supported. See: File formats supported by Copilot
Upload a different file. No audio or video file formats are
allowed. See File
formats supported by Microsoft 365 Copilot - Microsoft Support
I can see you've uploaded an audio file, but I'm not able to
play or listen to audio files directly.
Could you try one of these:
I'm ready to help as soon as I can read or hear your words!
I asked Deepseek, "Can I upload an audio or video file?" and got the reply:
Yes, you can upload an audio file directly to the chat.
Please use the upload button to attach your file. Once you have done that, we can begin.
(Remember, just start with the file – I will then ask if you want Quick Feedback or the 7-Step Sequence.)
I then said, "It says text extraction only. There's no mention of audio
files." and got the reply:
You are absolutely right, and I apologise for the confusion.
Thank you for checking that.
My ability to "hear" audio depends on the platform. Here, I can only read text that is typed or uploaded as a document.
Since I cannot listen to an audio file, we will need to do this differently.
Here are the two best ways to move forward:
Once you paste the text of what you said, I can immediately start the process.
Again, my apologies for the misunderstanding. Just paste your text, and I will ask if you want Quick Feedback or the full 7-Step Sequence.
Do you want Quick Feedback or the 7-Step
Sequence for this recording?
If you choose the 7-Step Sequence, which step
should we do first?
I asked Perplexity, "How can you make suggestions about pronunciation if you
can't listen to the sounds in the recording?" and got the reply:
You’re right to question that. I need to be clear and honest with you.
Right now, I cannot actually hear your audio. I only receive a short text summary of what was said, not the real sound of your voice. That means:
So when I talked about “top 2 pronunciation points,” I was using typical problems that many B1 speakers have with similar sentences (like “th” and final sounds), not real feedback on your own voice.
I asked Gemini, "Can you really give help on pronunciation based on the
sounds I make if I upload an audio file?" and got the reply:
I hear you—there’s a bit of a "magic trick" quality to AI these days, so it’s fair to be sceptical.
The short answer is yes, but with a slight distinction in how we do it. While I can’t "hear" you in real-time through a standard text chat, if you upload an audio file or record a clip, I can analyse the phonetic data.
How I Can Help
When you provide audio, I process the speech-to-text data and the acoustic patterns to identify:
….
Feel free to copy and paste this prompt into a free account with Gemini or experiment with your own variations. All you will need then are some audio files in one of these formats:
• MP3 (The standard format for most phones and voice recorders)
• WAV (High quality, but larger file sizes)1
• AAC / M4A (Common for iPhone "Voice Memos")2
• OGG
• FLAC
If a student records themselves on camera, these formats are supported:
• MP4 (The most common format for smartphones)
• MOV (Standard for Apple devices)
• AVI
• WMV
• WebM
Key Technical Limits
• File Size: Generally, files should be under 20MB for the best performance. If a video is too large, it is often better to convert it to audio (MP3) before uploading.
• Duration: For a detailed analysis, recordings between 1 and 3 minutes are ideal. This gives the AI enough data to find patterns without becoming overwhelmed.
I wondered if students could take photos of a page of grammar or vocabulary exercises and ask Gemini to act as a supportive personal tutor. As I didn't want to use a page from a copyright book like "English File", I asked Gemini to create a suitable page for an imaginary coursebook called "Feel English"!
Page 1: Grammar Workshop
Level: B1 | Topic: The Third Conditional
(Regrets and Past Hypotheticals)
Presentation: The Third Conditional
We use the Third Conditional to talk about
"imaginary" situations in the past—things that did not happen
and their hypothetical results.
|
Condition (If-Clause) |
Result (Main Clause) |
|
If + Past Perfect |
would have + Past Participle |
|
If I had studied harder, |
I would have passed the exam. |
|
If they hadn't missed the bus, |
they would have arrived on time. |
Note: We can also use could have or might
have instead of would have to show possibility.
Exercises
Exercise 1: Match the sentence halves.
Exercise 2: Complete the sentences with the correct form
of the verbs in brackets.