Every year, the consulting firm Gartner Group publishes its assessment on the progress of emerging technologies for delivering real benefits. The report is designed to help strategists and planners assess the maturity, business benefits and future directions of new technology. Gartner uses a model it calls The Hype Cycle to plot the maturity, adoption and social application of specific technologies.
Over the years, Speech Recognition (SR) technology has steadily risen up the steep incline of the Hype Cycle that leads to Gartner’s Plateau of Productivity. At this point on the curve, the new technology starts to make a broad impact on business.
This year, as shown in Gartner’s 2013 Hype Cycle Special Report (visit www.gartner.com/technology/research/hype-cycles), SR made its mark on the Plateau. So, if SR is really that good now, what does this progress mean for law firms?
Speech technology is application driven
Since Apple’s incorporation of Siri into its iOS devices in 2011, public adoption of SR solutions has been strongly influenced by the SR features rolled out by Apple and its competitors. There is, however, growing interest in other SR applications including business and personal productivity.
Speech technology is already used in numerous areas; for example, in creating legal transcripts, taking a patient’s medical history, talking to a call centre, commanding a car, and sending a text message on a smart phone.
In each case, the speech technology employed is based on similar principles but is tailored to suit the application, and therefore varies. The systems use different levels of comprehension, and handle different speakers and audio environments.
For instance, mobile applications are speaker-independent. They are not tailored for each user’s specific voice, as is a speaker-dependent application. Plus, SR is computationally intensive, so it requires significant computing and electrical power to perform.
Mobile devices, with limited computing, memory and battery resources rely on servers running in the cloud for the hard-lifting. This limits the length of speech that can be transcribed, and because mobile internet connections are inherently unreliable, means that users may experience SR outages.
SR interfaces are not yet ubiquitous, and will not work in all situations in the same way. Some understanding and expertise is required to appropriately implement the technology in different environments.
Legal productivity advances
The legal fraternity is experiencing productivity gains thanks to SR. The ability to quickly send text messages and emails by voice on smartphone platforms are two examples.
For core business workflows, speaker-dependent SR has been providing solid ROIs for 10 years. As Gartner recognises, these returns have coincided with new versions of Dragon NaturallySpeaking and continuing improvements in computer hardware.
Dragon desktop software runs on the PC and delivers the best accuracy because it learns how people speak and what they write. It does not require an internet function for transcription and you can talk as fast and as long as you like.
The emergence of two dictators
It is also increasingly apparent in the speech solutions industry that there are two kinds of legal dictators.
Firstly, there are those in the legal environment who use digital devices for dictation. Support staff then processes the dictated file.
For these people, traditional workflow solutions remove responsibility from the dictator for the transcription of the text and its subsequent formatting. They see themselves as most productive when they are speaking, not typing.
The second group of legal dictators tends to be younger. They have grown up using keyboards and work with their computer to produce documents. They are most content and productive when they can dictate directly to the document they are working on, see the results on the screen, and finalise content and formatting in one session.
The first group argues that their approach is more efficient and results in greater productivity as their time is not wasted entering and formatting documents. The second group argues that they do not require expensive support staff and that documents are prepared immediately.
However, neither group is absolutely correct as it is the human issues that ultimately determine the adoption rate and the effectiveness of any technical system. The key is to match the technology to the person.
SR on both sides
Technically speaking, the first group is likely to be content with solutions such as those supplied by BigHand, Winscribe, Philips and Olympus. These systems take care of the document flow and manage the progress of documents through the system without the document originator having to be involved, except at the beginning and the end.
Speech recognition can be easily integrated in these environments to transcribe speech on a server or from a digital recorder without the speaker having to be involved.
The second group prefers to create their own documents. For these people, Windows or Macintosh systems equipped with SR software provide the fastest and most accurate way for entering text into the computer.
Both of these approaches can be supported on the same technology platform without compromise on either side, and regardless of the type of dictators within the firm.
The desktop software can be installed on PCs and integrated with any enterprise document management system. Everyone can speak at least three times faster than they can type and using voice to convert speech to text is far more productive.
Speech recognition is ideal for document creation whether it be at the front-end running Dragon as a supporting application, or the back-end using a combination of machine and human transcription to convert speech to text.
Derek Austin is Interactive Technologies Manager for Nuance Communications.