Sunday, 12 January 2025

A Comparison of how ChatGPT, GSE Text Analyzer, Text Inspector and Write&Improve evaluated a series of texts

 


My ideas about how students can use transcripts produced by tools like Turboscribe.ai and Rev.com combined with ChatGPT to get suggestions for ways to upgrade their language depend on ChatGPT grading their original transcript correctly and then producing progressively more complex versions of it for them to read, listen to, study and take notes on. They would then repeat the speaking task with a different listener, hopefully benefitting from the exposure to the emergent language.

So, I have tried examining how ChatGPT's grading compares with other tools that claim to do the same. The chart above is based on only one B1 student's original transcript. There are vast differences between the four different tools.

To convert the CEFR scale to numbers I used this conversion chart:

It must be said that both Text Inspector and Write&Improve only claim to evaluate written English.

As there is a lot of data packed into the chart, I decided to ask ChatGPT to compare the accuracy of the four. Am I being cynical when I say, as expected, ChatGPT rated itself to be one of the most accurate?

The conversation with ChatGPT on the subject is far too long to include here , but if you are interested you can read a summary of it here    

This is the conclusion, which means I am reassured that ChatGPT is doing a 'good enough' job of rating and producing progressively more difficult versions.



Sunday, 29 December 2024

An experiment with a 5-step prompt with 5 types of GenAI

I decided that it must be possible to include all the prompts for a routine sequence of prompts in one rather long prompt.

I wanted to get:

  1. A student transcript with the errors marked so they could try to see their errors
  2. A corrected version of the transcript
  3. An improved version at the students level (A2, in this case)
  4. An improved version at the next level (B1, in this case)
  5. An improved version at two levels up (B2, in this case)

The secret was to include the five steps in the prompt but to instruct GenAI to wait for the prompt “Next” before moving on to the next step.

I then took a short recording made by a very good A2 student and used Rev.com to get the transcript. The idea is that the student has to copy the long prompt from a WhatsApp group, for example, and paste it into the chosen GenAI and then copy the transcript from Turboscribe.ai or Rev.com  and paste it into the same GenAI.

I did this with these 5 types of GenAI:

  1. ChatGPT
  2. Gemini
  3. Claude
  4. Copilot
  5. Deepseek

I then copied the output from each GenAI for each step and pasted it into Pearson’s GSE Text Analyzer ( https://www.english.com/gse/teacher-toolkit/user/textanalyzer ), which gave me GSE and CEFR levels for all 25 versions of the transcript.

Apart from looking carefully at the English used in each version to try to understand what language had determined the levels given for them all, I also made an Excel spreadsheet with the two sets of levels. With these I was able to produce two graphs. The first one shows what happened with each type of GenAI:


At first glance Claude was the best at producing increasingly sophisticated versions of the student’s transcript, although they were always half a CEFR level too high. Mark you, the original transcript was already half a level higher as she was a very good student.

Equally obvious is the fact that Copilot was useless!

ChatGPT, Gemini and Deepseek failed to produce increasingly sophisticated versions across the four levels, so they didn’t do what I had intended.

In fact, as can be seen in this second chart, the whole idea didn’t work on average:


Once again, this may be as a result of the student’s original transcript being higher than expected (A2+ rather than A2). Maybe the lesson to be learnt from this is that instead of using fixed levels for each step up, the different GenAIs may be able to produce new version of the transcript one level higher on the CEFR scale. This would have the added advantage that the same long prompt could be useful for all students in a class and across all courses.

Here is the original version of the long prompt derived with help from Turboscribe and ChatGPT’s advice. (I hope the concept of including steps and a trigger word will be useful):

“You will be provided a transcript as well as instructions about what to do with that transcript. Unless otherwise specified, your response should be in the same language as the transcript.[1] Here are your instructions, which you must follow:

Instructions:

  1. I will provide a transcript. For each step, you should follow the instructions carefully.
  2. For each task, do not include timestamps unless specifically requested.
  3. After each step, wait for me to say "next" before proceeding. 

Steps:

  1. Mark the errors in the transcript in bold, but do not correct them.
  2. Correct the errors, marking the changes in bold and leaving the original errors in brackets.
  3. Improve the transcript for A2 level students, marking improvements in bold.
  4. Improve the transcript to a B1 level, marking changes in bold.
  5. Finally, enhance the transcript to a B2 level, marking improvements in bold. 

[Sample transcript of an A2 student’s recording:]

The history start in the summer of 2011. Anna went on holiday with some friends on island. The photo was taken on hill. Hill is a little mountain and called Ana. The photo is important for her because the stone is a mysterious for her. And she put your hands around it, the stone, and, and she was sleeping. And this photo there are in other places because in your mobile phone, computer, et cetera.”


[1] These two sentences came from Turboscribe’s Custom prompt

Thursday, 12 December 2024

Warning – very long post! (Remixing) Artifacts with Claude

“Artifacts are content made with Claude like code snippets, text documents, graphics, diagrams, or website designs, that appear in a dedicated window alongside a chat with Claude.

This Artifact was made and published using Claude.ai, a next generation AI assistant made by Anthropic that’s built for work and trained to be safe, accurate, & secure.

You can also remix this Artifact with Claude using your own Claude.ai account to evolve and build on the ideas of the publisher.”

Apart from the fact that Claude can help you code a web app (Artifact) from scratch, anyone using the artifact can also remix it -  not only the creator. So you can do it, too, if you have a free account with Claude!

For example my original vocabulary training artifact didn’t allow the learner to choose to go from English to Catalan or vice versa, but it was just a matter of  asking Claude to add that feature, and it was done. (See the prompts I used below.)

Here is the improved version:

  • Claude Artifact             (Animals, Journeys, Verbs – English and Catalan)

I later decided that more people would be able to use the same artifact with translations to and from Spanish rather than Catalan. So, I remixed it. This is the one I shared on social media:

  • Claude Artifact             (Animals, Journeys, Verbs – English and Spanish)

Next, I wondered how easy it would be to change the three categories and the lists of words and discovered that I could do that as well. Here are two more versions of the same artifact:

  • Claude Artifact             (Travel and Tourism, Health and Fitness, Crime and Safety -  English and Catalan)
  • Claude Artifact             (Shopping and Consumerism, Social and Cultural Activities, Education and Learning – English and Catalan)

Here is a list of the prompts I used to remix the original artifact and Claude’s replies in italics:

·    Can you include a toggle between English to Catalan and Catalan to English?
I'll modify the artifact to include a toggle for learning direction between English to Catalan and Catalan to English.

·    Could I change the languages to be Spanish and English?
I'll modify the vocabulary data and labels to use Spanish and English instead of Catalan and English.

·    Can I change the vocabulary lists in this artifact?
Yes, you can modify the vocabulary lists in the artifact!

·    Can you remix the artifact with these three topics and vocabulary lists?
Travel and Tourism tourism, sightseeing, destination, route, adventure, urban, jungle, mountain, trail, city, beach, wetlands, hotel, hostel, tent, exploring, hiking, cycling, running, flights, arrival, connection, Christ the Redeemer, museum, landmark, river, Amazon, Pantanal
Health and Fitness cold, flu, virus, bacteria, infection, handwashing, soap, warm water, sneezing, fitness, gym, yoga, spinning, Pilates, swimming, tennis, basketball, fruit, vegetables, fish, healthy, fried food, fizzy drink, brain, puzzle, reading, learning, exercise, sleep, oxygen, muscle
Crime and Safety robbery, theft, burglary, stealing, thief, burglar, pickpocket, grab, faint, disappear, escape, report, police, investigation, suspect, connection, safety, lock, stranger, attention, train, theatre, bridge, house, glasses, hat, blonde, brown
I'll modify the vocabulary Data object with the new vocabulary lists you've provided. I'll translate them to Catalan to maintain the bilingual learning approach of the original artifact.

·    Can you make a new version with this vocabulary? I'd like to be able to send links to the two versions Shopping and Consumerism shopping, sale, gift, Secret Santa, delivery, jewellery, furniture, accessories, leather, clothing, vegan, organic, handmade, vintage, buy, sell, borrow, advertise, pay, fast fashion, marketplace
Social and Cultural Activities art, gallery, studio, café, urban farm, artist, family, friends, socialising, meeting, performance, opera, singer, judge, audience, festival, concert, event
Education and Learning school, term, course, schedule, reading, studying, practising, learning, revise, memory, thinking, lesson, assignment, classroom, holiday, start, complete
I'll create a new version of the Vocabulary Trainer with the new vocabulary lists you've provided.

 

Please experiment with the artifacts I have shared here and make changes to the language your students might want, and change the categories and vocabulary lists to suit your classes. You could even try remixing the artifact to increase or decrease the number of categories!

Remember that you have to Publish your new artifacts to be able to share them with your students and they don’t need to be 18 or have an account with Claude.

The first time I remixed the original artifact, Claude offered these alternatives:

I'll help you explore variations of this vocabulary trainer. First, let me reproduce the artifact to ensure we're working with it properly.

Here are three interesting ways we could remix this vocabulary trainer:

1. Spaced Repetition System (SRS)

We could enhance the learning algorithm by implementing a proper spaced repetition system. Instead of just reviewing unknown words once, we could:

- Add confidence levels (1-5) instead of just Yes/No

- Track when each word was last reviewed

- Calculate the next review date based on performance

- Show daily review queues

- Add a progress streak system

 

2. Multi-directional Learning Mode

We could add flexibility in how users practice:

- Toggle between English→Catalan and Catalan→English

- Add multiple-choice mode where users select from 4 options

- Include a typing mode where users must type the translation

- Add audio pronunciation (text-to-speech)

- Include example sentences for context

 

3. Gamification and Social Features

We could make it more engaging by adding:

- Points system based on correct answers and speed

- Daily challenges with specific word sets

- Achievement badges for milestones (100 words learned, 7-day streak, etc.)

- Leaderboards for different categories

- Option to create and share custom vocabulary lists

- Study groups where friends can track each other's progress

 

Would you like me to implement any of these variations? I can focus on the specific aspects you find most interesting.

I only took up the - Toggle between English→Catalan and Catalan→English option. You might like to try some of the others!

Just a footnote about a mistake I made. I unpublished an artifact and failed to read the following warning.

Warning:
Once Artifact is unpublished, it cannot be republished.
A new Artifact must be created.

 

 

Monday, 2 December 2024

Elevenlabs now allows students to make podcasts based on things they have spoken or written

Students can import their work by cutting and pasting text, sharing a link to a recording, importing a file or scanning a document.



This is only possible using the apps for Android or iOS on a phone or tablet. Get them here

I've tried it three times and this is what I found out about the length, speed and sophistication of the three podcasts:

 

Type

Shortened Title

Number of words

Length

Speed in words per minute

GSE
Level

CEFR
Level

imported *.txt file

Titanic

964

6’ 6”

158

65-69

B2+

link to same Turboscribe recording

Chocolate 1

1328

8’ 9”

163

62-66

B2

link to same Turboscribe recording

Chocolate 2

1091

7’ 47”

140

61-65

B2

Click on the Shortened Titles to see screen recordings of what the student would see.

The speed of the podcast can be reduced to 80% although at an average of 154 wpm this may not be as necessary as I feel it is with NotebookLM’s podcasts, which average 183 wpm and don't show a transcript.

As you will have seen at the end of two of the screen recordings, clicking on ‘Share’ throws you out of the app. This is why I had to resort to using screen recordings as I could find no other way to share the podcasts.

The level of the language in the podcast about The Titanic is perhaps too high for a B1 student and the fact that in all three podcasts a lot of new information is brought in makes these podcasts less useful than podcasts from NotebookLM and Wondercraft. I imagine that for B2 students these podcasts would be useful further exposure to comprehensible input on a subject they have already spoken or written about.

I haven’t tried yet with cutting and pasting text or scanning a document (handwritten doesn't work) as the source of these podcasts but will try to do so in the next few days.

Elevenlabs offers voices in 22 languages other than English, including French, German, Italian and Spanish so it should work for modern language learners. On the other hand, it is only available for users over 18.

Friday, 29 November 2024

Seven podcasts based on a B1 student’s recording about a Spanish Couple on the Titanic

 

Podcast
Click name to listen

Level GSE

Level CEFR

Words

Length

Words per minute

NotebookLM

57-61

B2

823

4:27

185

Lettercast

62-66

B2

309

2:06

147

Upod.ai

60-64

B2

333

1:56

172

Wondercraft

59-63

B2

246

1:26

172

Elevenlabs

65-69

B2+

964

6:06

158

Monica.im

62-66

B2

575

3:52

149

Inpodcast.ai

64-68

B2

1489

9:22

159

Remember that students can slow down the podcast to 75%

Which do you like best?

Which do you think would be the best one to listen to and read for the B1 student who told the story?

The links to the podcasts will take you to the transcripts with the recordings on Turboscribe.ai 
You won't need to log in to listen to them, but you will need to move the page down to get the transcript and the recording synchronised.

Transcript of spoken conversation with Copilot fed into Claude to get a podcast script to use in Wondercraft to produce a podcast

This is really part 2 of a post about a wonderful conversation I had with Copilot using voice. You can read part 1, which ended with me getting a podcast produced by Google's NotebookLMhere.

I wanted to experiment as well with Wondercraft, which offers various ways to create a podcast. One way is to provide it with a script and you can then choose from different voices and add music.

I thought Claude would be a good way to get a script for a podcast based on my conversation with Copilot and that proved to be true.  I uploaded a Word doc of my conversation and said, "Can you make this into a podcast?"The script it produced was for one speaker and it suggested when to have pauses and when to insert music at the beginning and end.

I gave the script to Wondercraft and chose a woman's voice, Abby's, for the podcast and some music for the intro and outro.  I exported the .wav file and uploaded it to Rev.com to get a transcript that would be synchronised with the recording. You can see the results here as I made a screen recording of it playing on my iPad (4 minutes):


Here is the screen recording of the podcast produced by NotebookLM (9 minutes):



While both podcasts give a good account of my conversation with Copilot about a presentation I'm planning to give, my real interest is in how students might benefit from creating such podcasts based on their recorded speaking or their writing.


Friday, 22 November 2024

Transcript of spoken conversation with Copilot fed into NotebookLM

It all started off with a wonderful conversation with Copilot using voice. I wanted to try out an idea I had heard about in a webinar (GenAI tools for independent learning) yesterday of using GenAI to help you polish your presentation.

It didn’t quite work out as I had imagined, and I only wish I had recorded the conversation on Rev.com on my phone, which I had beside me. It turned into a to and fro conversation about what I planned to talk about in the presentation. It was actually constructive and I clarified some doubts I had about what to include and how to handle the fast pace of development in speech to text and GenAI in general.

You can’t share conversations on Copilot, believe it or not, so I copied and pasted the transcript of the conversation into Google’s NotebookLM to generate an 8-minute podcast.

Of course I downloaded podcast from NotebookLM and uploaded it into Rev.com to get synchronised transcripts.

I then used screen recorder on my Android phone to record how that looks.



You may not be that interested in the content, but I hope you can see how the same ideas can be used to help learners of English who are 18 and over can get content to enhance their language development.

  1. They can have a voice conversation with Copilot and record it with either a screen recorder or another device. 
  2. They can upload the recording or the transcript of the conversation to NotebookLM.
  3. They can generate a podcast based on the conversation
  4. They can download the audio file of the podcast and upload it to Rev.com
  5. They can listen to the podcast at a slightly slower speed while following the synchronised transcript on Rev.com