Tuesday, February 10, 2009

Why Speech Recognition is no Longer Sufficient

Speech recognition has been around for over 30 years and part of our consciousness since the mid 1960’s but it is only in the last 3-4 years that we have see the technology really start to deliver some value to the much beleaguered and over worked clinician. There are innumerable studies that demonstrate the savings linked to the efficiencies possible with faster report turnaround. Unfortunately producing more reports faster is not always the best answer and oftentimes this is simply making the patient information haystack larger. This tsunami of data is overwhelming even the best organized clinicians and many are struggling to keep up with this alongside the explosion of diagnostic and treatment choices. Keeping up with the medical knowledge is a full time job if anyone had the time – but they don’t.

Clinicians want to give great care - that's a universal maxim for the profession and anything that enables or facilitates this will be successful. But that's not what has been going on with speech recognition which has not only required a change in behavior to enunciate in special ways, dictate commands, speak slowly and add punctuation and in the ultimate punishment requiring the highly skilled and time pressured expert to review and correct poorly drafted content. The output is a blob of text that cannot be read or interpreted by the electronic medical record (EMR) since it is not machine readable.

Innovation in speech recognition was last made in 1993 when continuous speech recognition was rolled out. Since then the technology has stagnated and while allowing clinicians to type with their tongue has provided some efficiencies and improvements, speech recognition has failed to address the underlying challenges facing clinicians today. So now we have reached this point what’s next?

It is the capture of structured clinical data that can automatically feed the EMR that is the real goal. Achieving this requires an alternative approach to speech recognition, not just recognizing the words but actually understanding the meaning and context. Comprehending normal human speech is not a word recognition process but speech understanding process that takes as input not just the phonemes or parts of words but the complete context of a conversation including the intonation, the subject matter and relevant prior information which is all applied to the complete conversation. It is this process that enables humans to exhibit the “cocktail effect” which allows us to listen in to more than one conversation at a time even though we are not fully participating in either. The added knowledge allows for inferring of missed words and understanding the content allows us to complete the picture producing a fully understood interpretation of the speech. Speech understanding is the next frontier of innovation in clinical documentation.

This content can be stored as part of the full story - the Healthstory that contains the computer interpretable data AND the fine detail in the narrative that is the essence of clinical insight, judgment and essential to the transmission and flow of useful clinical information between all the team members delivering care in our multi disciplinary model.

No comments: