WordsWorth logo

See what a word's worth.
Contact Us

WordsWorth Site Navigation

How accurate are vowel counts anyway?

by Derek LeBlond

Version 2 - Create Date: 2022-04-29, Last Update: 2022-08-26

Words like colonel fail immediately, yet the rule persists. The rule persists so much, vowels are the primary mechanism for syllable counts in almost any natural language processing system. And yet, words like lazy or dune get incorrect counts in this paradigm.

The immediate reason for usage is speed. A check of vowels is far faster than a dictionary reference of terms. What is the tradeoff in accuracy though?

That is what I set out to do here.

The initial goal was an analysis of the 624 lexical words from the Essential Word List in Machine Readable Wordlists on GitHub. The Essential Word list was first published in: Dang, T. N. Y., & Webb, S. (2016). Making an essential word list for beginners. In I. S. P. Nation, Making and Using Word Lists for Language Learning and Testing (pp. 153-167, 188-195). Amsterdam: John Benjamins.

Each word had 2 checks:

  • How many vowels appear?

  • What was the primary Merriam-Webster pronunciation syllable count?

No difference was found in vowel counts to pronunciation.

    No. Difference.
The presumption here is if the words in common usage, this method is accurate and precise.

What if we go for a bigger word set?

The analysis was of 2,284 words from the General Service List in Machine Readable Wordlists on GitHub. Per Machine Readable Wordlists on GitHub, this is “A list of vocabulary families reflecting the 2,000 most frequent words in English and representing an average of “around 82 per cent coverage” of various types of texts (Nation & Waring, 1997, p.15). Used as the basis for many graded readers and other ESL/EFL materials.”

Same checks… Same result! No difference was found in vowel counts to pronunciation.

One more time, with fancier words.

This time, I went after the 91-word Academic Word List, by Averil Coxhead Professor, of Applied Linguistics and TESOL, School of Linguistics and Applied Language Studies, New Zeland. Machine Readable Wordlists on GitHub, this is a list of word not in the General Service List, “…but that have wide range in academic texts, across disciplines (based on corpus research in arts, commerce, law, and science). Further divided into 10 sublists that reflect frequency and range.”

Still no difference in vowel counts to pronunciation.

In conclusion, save yourself the time. The vowel method for syllable counts is precise and accurate for 2,375 words with common (and even academic) usage. While this may represent only 1% of the over 250,000 words in the English language, usage is a key consideration. Your likelihood to encounter a word outside the paradigm dictates the risk of a less precise or accurate model. More than 80% of a text has precise and accurate vowel coverage given this analysis. In short, while true in their discrepancies, colonel, lazy, and dune are not as common usage as you may think. They should, therefor, not drive decision making for syllable count models, especially at the cost of speed.

Tags

Minutes to Read

2:22

Flesch Reading Ease

41

Difficult to read.