ScummVM could easily do something like this for their Windows (10) binary... it didn't require any changes to the game.
I propose that it be on by default, along with undithering

. j/k
I build a grammar out of the words in the game, but unfortunately Unity's API forces it to be loaded from an XML file. This means it's one set of words everywhere. If I used the Windows APIs directly, I think that lets me build a grammar at runtime, which would make it easier to tailor the words for the specific room in question (making the recognizer more accurate).