Voice Recognition Software Is Ineffective For Children Them In The Classroom

Popular voice assistants like Soapboxlabs, Google use speech recognition software to answer your questions, but children's language is complex and unpredictable.

Sentences are over-enunciated, certain syllables are elongated, each word is punctuated as they are thought aloud, and other terms are skipped entirely. Children spend up to 5 hours every day in front of devices. Thus, they must be understood.

Before the pandemic, children made up more than 40% of new internet users. According to current estimates, children's screen usage has increased by 60% or more, with children aged 12 and underspending up to five hours per day on screens (with all of the associated benefits and perils).

Although it's tempting to admire digital natives' technological prowess, educators (and parents) are well aware that young "remote learners" sometimes struggle to manage the keyboards, menus, and interfaces required to fulfill the promise of educational technology.

In this context, voice AI in children's education digital assistants offer the prospect of a more frictionless relationship with technology. However, while children like asking Alexa or Siri to beatbox, tell jokes, or make animal sounds, parents and teachers are aware that these systems struggle to understand their children when they depart from typical requests.

Have You Read Anything?

The problem is that the speech recognition software that powers popular voice assistants like Soapboxlabs, Siri, and Google was never built for use with children whose voices, vocabulary, and behavior are significantly more sophisticated than adults'.

Not only are children's voices squeakier, but their vocal tracts are thinner and shorter, their vocal folds are more minor, and their larynx is not fully grown. This generates speech patterns substantially different from those of an older child or an adult.

It's clear from the graph below that just changing the pitch of adult voices to train speech recognition fails to replicate the richness of information needed to understand a child's speech. Language structures and patterns in children differ widely. They make leaps in syntax, pronunciation, and grammar that speech recognition systems' natural language processing component must account for. Interspeaker variability among children at various developmental stages adds to the complexity, which cannot be accounted for with adult speech.

The speaking behavior of children is not only more changeable than that of adults but also wildly chaotic. Sentences are over-enunciated, certain syllables are elongated, each word is punctuated as they are thought aloud, and other terms are skipped entirely. Their speech patterns are not constrained by the general cadences seen in systems designed for adults. As adults, we've figured out how to interact with these technologies in the most effective way possible. Straightening ourselves up, formulating the request in our brains, modifying it depending on learned behavior, speaking our demands aloud, inhaling deeply... "Alexa … " Kids blurt out their rash requests as if Siri or Alexa were human, and they almost always get an incorrect or canned response.

Speech recognition must deal with not only ambient noise and the unpredictability of the classroom in an educational setting but also changes in a child's speech throughout the year and the diversity of accents and dialects in a regular primary school. The physical, verbal, and behavioral distinctions between children and adults grow drastically as children get older. That implies that the development of young learners, who stand to benefit the most from voice recognition, is the most difficult.

Speech recognition systems intended to learn from the ways children speak purposefully are required to account for and understand the highly different eccentricities of children's language. Children's speech is fundamentally and practically different from adult speech, and it evolves as children grow and develop physically and in language skills.

In contrast to other consumer situations, accuracy has significant ramifications for children. A system that tells a child they're wrong when they're right (false negative) hurts their confidence; one that tells them they're right when they're wrong (false positive) threatens socioemotional (and psychometric) harm. These false negatives or positives contribute to frustrating encounters in entertainment settings, such as apps, gaming, robotics, and intelligent toys. Errors, misunderstandings, and prefabricated school replies can have much more severe educational and equity consequences.

Speech recognition technology has the potential to improve classroom fairness. Human reading evaluation is, after all, very subjective, with assessor bias causing differences of up to 18% in recent research. Today's child-centered high-accuracy speech recognition overcomes human prejudice by ensuring that every child's voice is comprehended regardless of accent or dialect.

-- Abdul Alim - 2022-05-22

Comments


Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2022-05-22 - AbdulAlim
 
  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback