Wednesday, 21 January 2026

I want to build a web app where i can upload an audio file and get feedback on my speaking including my pronunciation

SpeakSmart AI

Try it here your self

Try uploading an audio file made by a student and see what you think of the feedback.


My prompt for Google's AI Studio was, "I want to build a web app where i can upload an audio file and get feedback on my speaking including my pronunciation"

Sunday, 18 January 2026

You can upload all these file types to Gemini to get feedback on your speaking, including your pronunciation MP3, WAV, AAC, M4A, OGG, FLAC, MP4, MOV, AVI, WMV, WebM


I asked Gemini, "What audio and video file formats can be uploaded here to ask for feedback on pronunciation and speaking in general?"

To help you and your students use this workflow effectively, here is a breakdown of the supported formats.

Since your focus is on analyzing speech patterns and pronunciation, audio files are usually the most efficient, but video files also work well if the student wants to show their facial expressions or mouth movements.

Supported Audio Formats

These are the most common formats for voice recordings:

  • MP3 (The standard format for most phones and voice recorders)
  • WAV (High quality, but larger file sizes)1
  • AAC / M4A (Common for iPhone "Voice Memos")2
  • OGG 
  • FLAC

Supported Video Formats

If a student records themselves on camera, these formats are supported:

  • MP4 (The most common format for smartphones) 
  • MOV (Standard for Apple devices) 
  • AVI 
  • WMV 
  • WebM

 Key Technical Limits

  • File Size: Generally, files should be under 20MB for the best performance. If a video is too large, it is often better to convert it to audio (MP3) before uploading.
  • Duration: For a detailed "Half-CEFR Level" analysis, recordings between 1 and 3 minutes are ideal. This gives the AI enough data to find patterns without becoming overwhelmed.

Pro-Tip for your Students:

If a student is using a smartphone, I recommend the following:

  1. iOS: Use the Voice Memos app (saved as M4A).3
  2. Android: Use the built-in Voice Recorder (usually saved as MP3 or M4A) 
  3. Environment: Remind them to record in a quiet room. Background noise (like a TV or wind) can cause the AI to create an incorrect transcript, which will ruin the bracket [ ] error-finding step.

Tuesday, 13 January 2026

Comments on Jeff Jordan's article on input, noticing, intake and other dubious constructs.

I've been reading Jeff Jordan's article on input, noticing, intake and other dubious constructs. And I hadn't realised before that the difference between input and intake isn't really significant, isn't really enough. I think the word exposure is worth using. It's exposure to language. Then there is input and then there is intake—if there is noticing in between input and intake so be it—but I was also reminded of something that Kunihide Sakai, a Japanese specialist in extensive reading,  talked about regarding 'swells'. 


He said that 'swells' are just the sounds you hear around you, and it's strange because he was talking about extensive reading. But what he was saying was that as babies learn a language, all they're aware of is this sort of washing of sounds around them. That cannot be input; it is exposure. It is a potential stimulus—it isn't a stimulus unless something happens as a result of it. It's sort of background noise, shall we say, with no attempt at interpreting that there is meaning involved in it.

This is a rather complicated way of expressing the ideas, but what I mean is that there is exposure to a raft of noises and sounds. Obviously, babies will focus their attention on particularly strident sounds because these will stand out. They will begin to detect repeated sounds that make them apply their attention to these repeated sounds just because they're interesting—not because they're necessarily attaching meaning to them—but simply because they're repeated sounds in the environment. They gain prominence because they've been identified as the same sound, and these could be words like "baby" or sounds like a clicking of the tongue.

• I think that paying attention to any sounds will be the beginning of some form of learning, which the baby is undertaking from the environment—whether it be spoken language or any other sounds in the environment.

• Should it only be sounds, or is it perhaps also things they see? I think it is appropriate to say that noticing is one of the parts of this learning process. It could be noticing visual stimuli as well as auditory stimuli.

A baby begins to learn about the world around them by paying attention to things that stand out, things that draw their attention for whatever reason. I think this is prior to considering it as language—identifying that some of the noises made by human beings in the environment are attempts to attract the child's attention is perhaps the first realization of intentionality. In this way, they begin to understand that people are trying to achieve something with these sounds they're making.

The first step for a baby must be to experiment with making sounds themselves, with the intention of getting the attention of the people around them. None of this is clearly language—it is just responding to the environment. Language learning and eventually language output is a much slower process. It must depend on beginning to recognize associations between the sounds they hear and physical realities, objects, and things in the world around them.

For example:

• People saying "doggy, doggy, doggy"

• The child, being interested in this hairy animal that's looking at them

• It all has to do with attention and its relation to objects, sounds, and then the intentionality of the person who is saying "doggy, doggy, doggy"

• The child is realizing that "doggy, doggy, doggy" is being said with the intention of drawing their attention to what they later discover is called a dog

That's a rather complex exposition of my thoughts on the matter.

It has nothing to do directly with second language acquisition, but it helped me clarify my own thoughts on exposure, attention, intentionality, input, intake as precursors to L1 language acquisition.

Friday, 9 January 2026

You can upload mp3 files to Gemini to get feedback on your speaking AND your pronunciation


I didn’t know this worked and when Frank Andrade posted about it, I immediately uploaded an mp3 file on my Android mobile phone. Gemini gave me feedback, but I then asked, ““Is your feedback on my pronunciation based on the sounds I produced or on typical errors learners make with these sounds?”

The reply began, “My feedback is a combination of both: I analyse the specific acoustic signals in your recording while also referencing common error patterns to provide meaningful context.
How the Analysis Works
When you upload an audio file, I use advanced speech processing to break down your speech into individual sounds (phonemes) and compare them against standard models.
1. Direct Sound Analysis: I analyse the physical properties of your voice—such as pitch, duration, and intensity—to identify exactly where your pronunciation deviates from the reference. For example, if you shorten a vowel that should be long, I detect that specific timing difference.

I then decided to test it again on my iPad, so I uploaded a different file and asked, “Can you give me feedback on my speaking including my pronunciation?”

 

Next, I asked, “ Can you make a corrected version of what I wanted to say?”

 

The third prompt was, “Can you make a more colloquial version of this?”

 


The fourth and final prompt was, “Can you make a new version based on the corrected one but half a CEFR level up?”

 


My reflections on the replies from Gemini:

I was very impressed by the feedback on the two students’ pronunciation, although it found three very common errors made by Spanish and Catalan learners of English-

The feedback was very detailed, maybe too detailed. Maybe the prompt should be, “Can you give me some brief feedback on my speaking including my pronunciation?”

Under Grammatical Accuracy, Gemini said, “Since you are telling a story in the past, try to keep all verbs in the past tense.” However, when asked to produce a corrected version it was all in the present simple and included the comment,  “In English, when we summarize the plot of a book or a movie, we almost always use the present tense.”

The more colloquial version was great but at B2/B2+ it is a lot higher than the student’s B1+ level as assigned by Gemini.

The version half a CEFR level up was definitely overcomplicated for a student at her level and would certainly end up with her asking things like, “What does frosty mean?” It was supposed to be at B2 level as Gemini claimed it was, but it was actually at C2, which is about two levels higher, rather than half a level higher!

What to do next:

I wanted to try the same prompts with other GenAI tools like ChatGPT, Claude, Copilot, Perplexity and DeepSeek.

Claude, Copilot and Deepseek simply won’t accept uploading mp3 files.

ChatGPT allowed me to upload the same mp3 file, but then left me waiting for a few hours after saying, “Please wait a moment while I listen and prepare everything.” And when I complained it asked me to upload the file again as it was not ‘available’ any more. I was told to upgrade or wait 21 hours to be able to upload the file again without paying. It also suggested that I should upload 1-2-minute-long files next time. Mine was just over 3 minutes long. So I think we can discount ChatGPT on a free account.

Perplexity was the only one that allowed me to upload an mp3 file and immediately give me feedback on it, but it admitted that its feedback on pronunciation was not based on the audio, but on the transcript and typical Spanish errors. A further problem with Perplexity using the webpage or the app is that there is no ‘Read Aloud’ so students can’t listen to anything. As I was using Microsoft’s Edge browser, I tried the ‘Read Aloud’ incorporated in it, but it was not very satisfactory, sounding very robotic in comparison to the ‘Read Aloud’ available with Gemini.

This is all a great pity as the different versions of the transcriptions made by Perplexity were much closer to what I had been hoping for. For example, the feedback was B2/B2+, which for a B1+ student is probably alright, compared with Gemini, which was B2+/C1. This happened with the different versions, apart from the corrected versions, which were very similar B1/B1+.


I want to see what subtle changes can be made to the four prompts to get feedback that the student could engage with more easily.

1.      Can you show me where my errors are without telling me how to correct them?

2.      Can you give me some brief feedback at my level on my speaking including my pronunciation?

3.      Can you make a corrected version of what I wanted to say?

4.      Can you make a more colloquial version of this without increasing the sophistication more than half a CEFR level?

5.      Can you make a new more sophisticated version based on the corrected one but less than half a CEFR level up?

Ideally, students can be given a choice of which prompts to use and help with how to engage with the feedback in different ways.

Monday, 5 January 2026

Repurposing Eric Curts' reader "The Indestructible Moss Piglet" in Catalan or any other language

Eric Curts has produced a whole series of graded readers for learners of English (25 to date) by giving very detailed instructions to NotebookLM.

You can access them here Control Alt Achieve: Making Graphic Novels with Gemini and NotebookLM

As you will see Eric Curts has produced and shared a wide range of other resources for language learning, but for the moment let me just show you a short extract from the instructions for NotebookLM:

The Indestructible Moss Piglet


[COVER PAGE]

  • Title: The Indestructible Moss Piglet
  • Visual Description: A stunning 16:9 wide composition. A split background: The bottom left is a lush, green, textured moss forest painted in soft gouache strokes. The top right fades into a deep, velvety dark blue space with soft white splatter stars. In the center, bridging both worlds, is Terry the Tardigrade (a chubby, squishy, microscopic water bear, painted in soft pastel teal and mint green with visible brush strokes, eight stubby legs, and a round snout). He looks happy and is floating as if swimming. The title is written in playful, hand-cut paper letters with soft edges.
  • Style Note: Soft Gouache & Pastel, textured collage, visible brush strokes, no hard black outlines, dreamy atmosphere.

I’d be interested to know how Gemini can help drafting these very detailed prompts! If you want to see the complete instructions. https://docs.google.com/document/d/1aQgasTqUvQGubFJIu-8zVbvCuHWMV7H9k0mUn43EUwk/edit?usp=sharing

Teachers of English are lucky that, as Eric says, there are hundreds of professionally written, illustrated and published readers, but in many less studied languages there are far fewer.

For example, in Catalan, although there are some excellent publishers of graded readers, there are nothing like as many as there are for English. So, I wondered if Eric Curts’  readers could be repurposed for learners of Catalan.

Here is his original https://drive.google.com/file/d/1PHPk9kBJ5owGAG93XZ9-pIHdcJJHihTI/view?usp=sharing

Here is the result of asking NotebookLM (Studio - Slide Deck), “Can you make a version in Catalan? Using the same graphics” It produced a pdf file like the original, but I am sharing it as a 30-second video crediting Eric Curts



Tuesday, 30 December 2025

Trying to post the downloaded web page from a storyline I pasted the html code for the downloaded webpage into the Html view in Blogger

Visual Storyboard
Scene 1
One of the problems with the app at this stage is that it doesn't have a consistent character on every page.
Scene 2
I think I've seen this scene before, but possibly not with a scientist in it.
Scene 3
And now you see the problem? Umm, the scientist is now a woman.It was a man in both the previous scenes.
Scene 4
It always seems to be pouring with rain outside this cafe.

Trying to post the downloaded web page from a storyline I pasted the html code for the downloaded webpage into the Html view in Blogger I had to remove one parameter: overflow: hidden; This was preventing the page from scrolling
If you want to see the four pages, click on the picture.
If you want to listen to my recordings, click on the 'play' icon
I also changed the colour in the caption to - caption { color: #000000;