Sunday, 12 January 2025

A Comparison of how ChatGPT, GSE Text Analyzer, Text Inspector and Write&Improve evaluated a series of texts

 


My ideas about how students can use transcripts produced by tools like Turboscribe.ai and Rev.com combined with ChatGPT to get suggestions for ways to upgrade their language depend on ChatGPT grading their original transcript correctly and then producing progressively more complex versions of it for them to read, listen to, study and take notes on. They would then repeat the speaking task with a different listener, hopefully benefitting from the exposure to the emergent language.

So, I have tried examining how ChatGPT's grading compares with other tools that claim to do the same. The chart above is based on only one B1 student's original transcript. There are vast differences between the four different tools.

To convert the CEFR scale to numbers I used this conversion chart:

It must be said that both Text Inspector and Write&Improve only claim to evaluate written English.

As there is a lot of data packed into the chart, I decided to ask ChatGPT to compare the accuracy of the four. Am I being cynical when I say, as expected, ChatGPT rated itself to be one of the most accurate?

The conversation with ChatGPT on the subject is far too long to include here , but if you are interested you can read a summary of it here    

This is the conclusion, which means I am reassured that ChatGPT is doing a 'good enough' job of rating and producing progressively more difficult versions.



No comments:

Post a Comment