I wondered how SpeakSmart AI arrived at the scores in its Analysis, so I asked it in the chat:
after 24 seconds, I got this reply:
But I wasn't 100% convinced, so I used it to evaluate my own recordings in Catalan over a period of time, hoping that it would show a clear improvement, But It didn't.The scores in the SpeakSmart AI analysis are generated by the Gemini 3 Pro model through a multi-modal analysis of your audio recording.
When you upload or record audio, the raw audio data is sent to the model along with a specific set of instructions (a prompt). The AI arrives at these scores by evaluating several characteristics of the speech:
1. Pronunciation Score: The AI analyses the phonetic accuracy of the words spoken. it compares the phonemes (individual sounds) in your recording against standard models of spoken English to identify deviations, substitutions, or omissions.
2. Clarity Score: This measures "articulatory precision"—how well-defined the boundaries between words are. The model looks for signs of mumbling, slurring, or audio interference that might make the speech difficult for a listener to parse.
3. Pacing Score: The model evaluates the tempo and rhythm of your speech. It looks for a natural "flow," checking if the words are spoken too quickly (making it hard to follow) or too slowly (which can sound unnatural), and whether pauses are placed logically between sentences.
4. Overall Score: This is a holistic assessment derived from the combination of the metrics above, reflecting the general effectiveness and professional quality of the communication.
Because Gemini is a "native multi-modal" model, it doesn't just transcribe the text and then grade it; it "listens" to the nuances of the audio signal itself—including tone, stress, and intonation—to provide feedback that goes beyond just the words used.
I used the same recording made in class by a pre-intermediate student three times.
The
first and most important problem is the arbitrary nature of the Analysis
Results. They are not helpful at all and may give a totally false
assessment of the student’s pronunciation.
|
First time |
Second
Time |
Third Time |
|
Analysis
Results 45 Overall 50 Pronunciation 50 Clarity 35 Pacing |
Analysis
Results 52 Overall 55 Pronunciation 50 Clarity 45 Pacing |
Analysis
Results 40 Overall 45 Pronunciation 40 Clarity 35 Pacing |
The third Transcript was very different from the first two as it included all the ums and uhs and so was like a verbatim transcript. Here is an extract:
"Hana was driving her car
and suddenly one men across the... across round. Uh, after that, eh,
she eh stop... the car just... [Spanish interaction: She stopped the
car?]... Hana stopped eh the car and eh... she eh...
see a man... uh, and the man eh was Jamie.
In the Strengths, there was only one similar phrase: “You
successfully conveyed the main plot points of the story” together with some
reference to self-correction.
In General Advice all three versions insisted on not
translating (word-for-word) from Spanish, and on using the correct past forms
of irregular (or regular) verbs. They each included one or two other useful
suggestions.
The Specific Improvements were very similar and covered, albeit in different orders:
- the pronunciation of the -ed ending in ‘stopped’
- the use of the false friend ‘history’ instead of ‘story’
- the fact that ‘explained’ is followed by ‘to’ if the person is included
- the correct past tense of ‘see’ is ‘saw’
- the use of the preposition ‘across’ instead of the verb ‘cross’


