I had an audio clip (length – 32 min, 21 seconds) that I ran through both the Google Cloud’s Speech-to-Text Model and Google Cloud’s Video-to-Text Model. Here are the results.
Total # of words: 3,956
Total # of characters: 20,970
Total # of pages: 7
Speech Model
Total # of errors: 102
Accuracy: 97.42 %
Video Model
Total # of errors: 9
Accuracy: 99.77 %