Home

Athena: voice-computer interaction

Title, Intro, KeywordsAction
Text and binary ARPA models with IRSTLM (2015-06-26)

Intro:

In an earlier blog post I wrote a little about IRSTLM and how it was useful in analyzing language models.

Recently I had an issue using one of the IRSTLM tools 'compile-lm' which gave some problems, so I am just noting the issue to my own file. The problem was related to the option to convert a regular text iarpa version to a full arpa version, and whether the output should be in text or binary format.

There are other pages on the web where the issue is discussed, where I picked up the information that compile-lm can be run with the --text option with value 'yes'. A suggested command might therefore be:

compile-lm --text yes path_to_source path_to_output

However on my system (Linux openSUSE) this gives the warning 'WARNING: No value specified for parameter "text"' and the help info is printed but no output given. After some experimenting I found that what was required is the command:

compile-lm --text=yes path_to_source pat...

Keywords: arpa, iarpa, compile, irstlm, text, binary


About (2014-12-06)

Intro: This blog is the work of Colin Beckingham who can be contacted at ...

Keywords: beckingham colbecstartca beforehand - contrasts opensuse julius htk arecord aplay mplayer ffmpeg audacity simon-listens linuxcom developerworks recognizer voxforge responses/comments tastefulness about+ in inconsistencies apologize the adverbs the000
Dictionary files with Julius: Grammar versus Language Model (2014-08-15)

Intro: It's an exciting process working through the Voxforge how-to to get your first speech recognition models working. And when you are done you have the means to build tools that are cost effective and helpful. It is probably wise for Voxforge to take the "Grammar" route to start you off; the results are fixed by the grammar and so you have a result that gives positive, accurate feedback....

Keywords: voxforge how-to successes julius predictable senses recognizer mkdfapl sampledict automaton complains cannot ltsgt lt/sgt dict workaround irstlm float8 eliminated brackets resembles sphinx adapt accompany dic spreadsheet delimiter /usr/bin/python -u -- utf-8 lensysargv = exitwrong args sysargv[1] openkey+dictr openkey+dicw lin freadlines linsplit\t lenx == x1 x[1]lstrip[ x2 x1rstrip] gwritex2+\t+x[2] sampledic sed -f23 | s/[][]//g redirected
Laboratory data entry: microscope slides with voice (2014-08-07)

Intro: Voice data input is especially helpful where either both hands are busy or you are using special equipment. In a previous post I discussed the use of voice data input in a beekeeping context where the beekeeper is in a suit and veil and has both hands busy examining a frame of bees. Another example might be the laboratory technician tasked with counting cells in a series of microscope slides....

Keywords: beekeeper veil bees tasked microscope http//youtube/vaffxxl3wj8 waits spreadsheet dictate backend htk recognizer julius colbec
Recursive grammars and language models as diagnostics (2014-07-31)

Intro: One of the great things about detecting more than one word at a time is that it reveals much more about how well the detector is working than a fixed grammar can do. If say your grammar can accept ALPHA BRAVO or BRAVO CHARLIE then of course it will generally do quite well separating those possibilities....

Keywords: bravo recursion recognizer problematic clarify julius myriad ivr linguistics replete waveform re-the lexicon phoneme exploit recursive checkers
From fixed grammar to language model with Julius dictation and IRSTLM (2014-07-28)

Intro: Up to this point, all of my speech models have been of the fixed grammar type. It would be a new challenge to explore the continuous recognition approach, for example dictation. Quite often the tools exist in the open source world to achieve this objective, but the process to get there might be a bit obscure. So this is just a note to file of how I set up and ran a toy example of Julius running as a continuous speech recognizer; a point of reference to myself for a few years down the road when I need to nudge my memory....

Keywords: dictation obscure julius recognizer nudge htk irstlm jconf voca ns_b ns_e nums ---------- ltsgt sil lt/sgt ay ao ey ax ih iy mkdfapl dict theres lmbtxtgz http//sourceforgenet/projects/irstlm/ evaporate /bin/bash path=path/your executables here/bin irstlm=/path build-lmsh -i gunzip -c -o lmbilmgz -k gzipped subdivide m_fusion initialize append [ltsgt] [lt/sgt] -nlr left-to-right n-gram -v grammar/lmbdict -nrl right-to-left voxforge quickstart respects one-gram no-gram
Postgresql and interactive voice (2014-07-27)

Intro: I'm a great believer in the usefulness of a relational database system when developing voice interaction materials. The wonderful Voxforge site has some very informative pages on getting started with voice interaction, but once that is working you are soon looking to modify details and expand your resources. For example your speaker-dependent approach might need to adjust the lexicon to suit your accent (S EH V IH N or S EH V AX N?) and you might need to boost the prompt samples. You may start by making lots of different little models, but it soon dawns on you that you need a central repository which can generate base materials as required....

Keywords: believer usefulness relational voxforge speaker-dependent lexicon ih ax dawns utterances prompts librivox swamped interlinked phoneme pronounced spoiled dominate http//gutenbergorg/ http//librivoxorg/ http//wwwpostgresqlorg/ http//voxforgeorg/ http//wwwpgadminorg/ http//pythonorg
Interaction and etymology (2014-07-26)

Intro: The elusive goal in the voice interaction world is that of a FLOAT8 "conversational" type experience with a computer. We can already use dialog managers efficiently to manage strictly defined interactions, or we can use the computer as a domain-specific dictation machine; for many purposes these issues have been solved. Conversations remain an open challenge....

Keywords: elusive float8 conversational domain-specific dictation cannot insist con+versation sense  - servant etymology listens interchange analyzes directs duologue dismantle interlocution interlocutors colloquy http//wwwetymonlinecom
The duologue continues (2014-07-25)

Intro: It has been about two years since I last posted here related to voice interaction. During that two year period I have had a great deal of fun tackling various courses offered online. Most of them were Coursera courses, mostly about Mathematics and math related subjects, but also there have been a number of history and economics-related items. In that two years I have accumulated a number of "statements of accomplishment" as they are called, in order to distinguish them from certificates, which actually certify something. I attempted to pay back the free access to education by volunteering as a community teaching assistant for Andrew Ng's Machine Learning course. I was CTA for three sessions in a row; it was great fun working with my fellow volunteers and with my contacts at Coursera who were amazingly efficient and professional in their efforts to keep the machinery going. At the end of that period when I let Coursera know I would not be participating in the following session I was ver...

Keywords: coursera economics-related accumulated accomplishment distinguish certify ngs cta - neuroscience forwards opensuse from1 to3 subsystem pulseaudio voxforge grammar-based re-launch
New Approaches To Recognition And Analysis (2012-08-26)

Intro: To get a wider appreciation for the subject of audio interaction with computers I have decided to follow ...

Keywords: courseraorg mfc broaden htk-julius
Person Computer Audio Interaction - Literature Review (1) (2012-08-13)

Intro: One of the important things in any discussion of any topic is what others have found before. This becomes encoded into the literature, what experts have written and published for others to read. Audio interaction between people and computers is sort of new, but you don't realize how new until you review what has been written about it....

Keywords: encoded tore jansons readable differs mcwhorters acknowledges dismisses stimulate janson phonemes graphemes bounds ceases enunciation immutable dante rethink athena pidgin conveys dialect subset paralanguage crystals allophones cmu/htk tts emoticons exclamation neurolinguistic distinguish deserves
Review, Refactoring And Rationalization (2012-08-07)

Intro: With a number of sort-of-completed projects done (twenty questions, bee inspections, Spanish, French, Portuguese simple dialog managers, verbal sudoku, setting alarms, vocal text search and Roget lookups) it is time for a bit of review and rationalization. There is another reason for this timing, and that is the upcoming release of OpenSUSE 12.2 which will take my main system into Python 3, which might cause a few issues in my dialog managers. Moving to Python 3 is something best left to the OpenSUSE guys. I learned from an earlier attempt that Python 3 was at that time not compatible with a number of system utilities. Best to let everything move up as a system....

Keywords: sort-of-completed sudoku alarms roget lookups rationalization opensuse2 opensuse programmed ported similarities merged commonalities codified voicexml settles readable dialogs contemplate lexicon pls postgresql rdbms hinges
Voice Interactive Beekeeping (5) - The Faraday Cage (2012-08-02)

Intro: In full bee suit with wire cloth veil for protection, there is no transmission of bluetooth signals between the headset and the smartphone.  The bluetooth headset is in fact inside an effective Faraday Cage. I should have seen this coming, but it never occurred to me....

Keywords: veil smartphone  faraday weave bees deform smartphone earbud/microphone earbuds earbud yankee nineteen quasi
Voice Interactive Beekeeping (4) - In The Field (2012-07-30)

Intro: I have just made my first successful report in the field. But there were a few problems. These can be classed as:...

Keywords: classed postgresql asterisk apiary repeater ddwrt smartphone initiate bluetooth-smartphone-repeater-hostap annoyance interrupted feasible robustness
Asterisk AGI Festival - The Telephone Voice (2012-07-28)

Intro: I spent a bit of time yesterday stuck on an issue with remote server based delivery of voice. It's probably a good idea to make a quick record for my own use at least as to what the issue was and how it was resolved....

Keywords: trivial straightforward athena decides text2wave agi tweak pronunciations trotted linuxcom inconvenient agiverbose asterisk ssh cli -eval nbsp
Voice Interactive Beekeeping (3) (2012-07-26)

Intro: The beekeeping inspection routine is now working through the bluetooth/smartphone combination in the lab....

Keywords: bluetooth/smartphone anomalies earthquakes cracks recognizer zulu lengthy rationalization
Voice Interactive Beekeeping (2) (2012-07-23)

Intro: The first goal in getting a voice interactive inspection procedure is to add a testing routine to my main dialog manager. This helps since the DM contains a lot of error checking and the language and audio models are quite well established. Once the main framework has been tested in Athena's dialog manager, the code can be transferred to the Asterisk server for refactoring as a standalone AGI process. A standalone script will not need the extra training of prompts that merely select out the needed subroutines, since I can just call a different phone number or SIP address....

Keywords: athenas asterisk standalone agi prompts subroutines athena rebuilt grammars ogg illustrates centigrade completeness colonies bounds iterates a1 a2 a3 alphanumeric terminated tbd sub-loop bees postgresql timestamps reorganize refactor
Voice Interactive Beekeeping (2012-07-20)

Intro: As a beekeeper, one of the activities I perform is a regular inspection of colonies. Based on the results of the inspection I make decisions on what operations are necessary to maximize control over apiary development....

Keywords: beekeeper colonies apiary veil hive dismantled astronaut utter ouch decides bees transcribe contextual imaginary swhich bcolony sokay swhat entrances bbottom inactive ssame removes bframes slast bempties sthis interactivity communicates smartphone contexts athena repeater
Python And WordNet (2012-07-18)

Intro: During my experiments with the disambiguation of Twenty Questions entries I came across an issue with my database installation of WordNet. The problem was that my database version showed different results from the same intended query using the CLI. Specifically, given one word, say 'cat', when looking for the hypernyms of all the senses of cat my database would find hypernyms for some but not other senses, and for some senses the hypernym set would be incomplete. It's probably an issue with my semlinks table or my SQL statement, but so far not found....

Keywords: disambiguation wordnet cli hypernyms senses hypernym semlinks explored avenues pywn -h --help non-standard subprocess check_output exits searchword cannot -1 behave workaround wnc recompile recalled lemma nltk corpora version0 downsides fingertips
Twenty Questions (5) - Definitions and CDV (2012-07-17)

Intro: As discussed in ...

Keywords: 4 levenshtein simpler scans wordnet occurrences theq growingq dismal szymański duch nlq disambiguation disambiguated inq senses orth distinguish rises accumulated cdv hypernyms myq accumulate edible felis synonyms feline felid carnivore placental mammal eutherian mammalian vertebrate craniate chordate animate brute fauna organism hombre bozo mortal causal grownup accumulation discrepancies anyq incompatible
Twenty Questions (4) - Disambiguation (2012-07-14)

Intro: When I start a game of 20 questions with Athena and I have in mind, say, a 'car', then if I mean the thing that is variously called an automobile, auto, car, or more remotely wagon, jalopy, ride and so on, then to be strictly fair to Athena I can't change my mind halfway through the game and start thinking of the wagon pulled behind a locomotive which can also be called a car, as in a baggage car....

Keywords: athena jalopy halfway locomotive baggage wordnet synsetid synonyms senses meanings trivial y/n edible lemma the659 synsetids disambiguated levenshtein are  bony synset levdistanceextend bonyarar pedal extremity ankle vertebrates organs locomotion invertebrates prosody syllables poetic qualifier myq begs aq disambiguation asleep disambiguate
Twenty Questions (3) - Voice Interactive (2012-07-12)

Intro: Here is ...

Keywords: 3 secs5 ogg athena prompts empties backend emotion volition roget magnetizes zenith stumbles recognizer enunciation fetches taster
Twenty Questions (2) (2012-07-11)

Intro: The technique of having a computer learn a body of knowledge using a twenty questions method is proving very interesting indeed....

Keywords: http//www20qnet/ faults downsides disagreements unambiguous contradictions unavoidable learns invites - differentiates abstraction learner subsets diminishes inherits accumulation
Twenty Questions (2012-07-03)

Intro: Turing's 1950 paper speculates that it might be possible for a computer (machine) to learn in the same way as a child. This idea was represented in the movie "2001 - A Space Odyssey" when, as Hal's memory banks are disconnected, the computer regresses to a childlike state....

Keywords: turings speculates 2001 - odyssey hals disconnected regresses childlike punishments voice-interactive adapt speculation distinguish proves suffice accumulates rationalizes pythonqxpy guess|exit|clearguess tangible inorganic ironn iscopper kby ironreddish reddish irony guess|exit|clear
Verbal Sudoku (2) (2012-06-29)

Intro: This is certainly proving to be an educational experience....

Keywords: lexicon accounted disadvantage brazilian/portuguese lateral diagonal diagonal_positive diagonal_negative 19 64 21 wraps warp
Verbal Sudoku (2012-06-27)

Intro: In the movie "Blade Runner", the character Deckard navigates in a holographic image using verbal commands. Cool stuff....

Keywords: deckard navigates holographic cockpit developerworks xdotool sudoku harmless adventurous gnome-sudoku repetition internalization incarnation cannot ctrl+w float8 recognizer hears contextual destructive audio-visual peck opensuse distros /usr/lib/python27/site-packages/gnome_sudoku/gsudokupy divides thex9 intox3 = gdkcolor_parsecolor[1] subscripted gdkcolor_parsecolor blackx3
Speaker authentication (2012-06-25)

Intro: One interesting aspect of voice interaction is: who is actually speaking to the computer?...

Keywords: wondered distinguish decipher speaker-dependent athena julius encoded triphones float8 cmscore beckingham beckingham000000000000 cmu again000000000 one000000000 beckingham000000997000 one000997000 zulu000000000 silences to000 decode convince than000 zulu invariant enunciation hears profitable
Voice Query Roget's Thesaurus (2012-06-22)

Intro: One worthy little project for a voice consultation is asking for information from Roget's Thesaurus. While writing, you wish to use the word 'touch', but since you have used the word several times already you want to use an alternate expression. Roget might have some suggestions....

Keywords: rogets roget bucket navigable gutenberg noun [sensation pressure] -- tact taction tactility palpation palpability contrectation manipulation [organ touch] forefinger paw feeler palpus rogetlist ogg format6 mb5 athena interprets buffers alphabet terminated verb adjective adverb interjection confirms waits repeats enunciated archaic tts substituted reinstated
Regular Testing Of The Audio Model (2012-06-20)

Intro: The proof of the audio model is in the recognition. So I begin each day by running tests on my audio model. The test consists of taking all the possible prompts in the grammar, putting them in a bucket, shuffling the bucket contents and randomly checking my enunciation of that prompt....

Keywords: prompts bucket enunciation randomized inclined tedious to-5 loudness blip athena re-test float8 remedial intuitive fixable annoyances yes|no festivalrc lexicon
Yes And No (2012-06-19)

Intro: Yes and No are two of the most important words in the grammar. You need them to be correctly recognized as much as possible to allow confirmations to proceed smoothly ("Are you sure you want to do this? Yes|No"). False responses (Yes heard when you meant No and vice versa) are costly in terms of undoing, and repeating, steps....

Keywords: confirmations yes|no versa recognizer prompts julius misses plosives conversational disparate lexicon iy ee-yes ax ow prefixed
Julius "Hypothesis Stack Exhausted" (2012-06-17)

Intro: On and off I have had problems with R sounds. The strangest visible sign is that on its first pass Julius gets the right answer, and then on the second pass completely loses its way and runs out of ideas....

Keywords: strangest julius loses voxforge decipher htk misses phoneme ax announcer lexicon popped exhausted phonemes ow iy float8 generalize non-ax-prefixed ax-prefixed problematic
Attentive Listening (2012-06-16)

Intro: The CBC programme "Spark" recently had an ...

Keywords: cbc spark attentive dnd e-mail athena asterisk announcer primitive statistic burma - xxxx athena2 yyyyy nbsp
Silent Sounds And Julius Echoes (2012-06-14)

Intro: Julius, the speech recognition engine, provides some very useful feedback regarding what it heard. This is most helpful in diagnosing some recognition errors....

Keywords: julius eliminated prompts guinea_bissau guinea_conakry clarify glottal prefixed ax overdone habit lexicon ih iy geminate consonant whichever hears echoes libya nbsp
Refining Athena's Interactive Testing Routine (2012-06-11)

Intro: Following the issues described in the previous post I decided to do a deeper test and found that my prompt COUNTRY GERMANY was only detected once in about 20 attempts....

Keywords: inclination geminate consonants phonemes triphones dialectic pronunciations julius now  athenas prompts deficiency bucket pulls non-while athena enunciation spelled ~400 ogg illustrates cleans hears illustrate verifies g-e-r-m-a completes re-tested problematic
Fricative sibilants and other stuff (2012-06-10)

Intro: As my audio model grows with the addition of a broader range of voices and samples, so the ability to recognize prompts changes quite subtly. It is necessary to run Athena's testing routine regularly to reveal those grammar entries which are suddenly confused by the new data presented....

Keywords: prompts athenas recognizer other+ pronounced problematic prejudice tricky gambia/zambia iraq/iran emerge fricative sibilant geminate consonant phoneme ih ax ao remediation diphone s+ih|y
Error detection by HTK (2012-06-08)

Intro: It seems that HTK lets me get away with some important errors....

Keywords: htk librivox segmentation inaccurate complain rebuild fade audacity abrupt severance tedious
New Librivox Transliteration Project: Barr - Valmont (2012-06-05)

Intro: I have started a new project, the transliteration of another Gutenberg text to fit with the Librivox sound version. I think 'transliteration' is the correct term since I am transliterating one text (the Gutenberg original) into a new version which is line segmented according to my subdivisions of the sound file. According to modern hermeneutics the transliteration could also refer to the process of making a text version directly from the sound segments. This is a practical necessity when the audio file differs from the text version....

Keywords: transliteration gutenberg librivox segmented subdivisions hermeneutics differs distinctive lancashire accents pronounce lexicon pronunciation confuse hypotheses
Lucky Thirteen (2012-06-04)

Intro: For some time the word THIRTEEN has been one of the words that, let's say, have been slightly more problematic in recognition than other words. The recognizer has seen SEVENTEEN most often in response to my vocal prompt of THIRTEEN. Clearly Julius was getting the TEEN and doing its best with the rest....

Keywords: thirteen problematic recognizer seventeen julius phonemes lexicon phoneme iy - ipa θ recompiled
Randomness in the dialog manager (2012-06-02)

Intro: After a number of days exploring various details of voice interaction not involving my main dialog manager, I had cause to restart the main Athena DM to ask her to add some new samples to increase the training for some words that were proving problematic....

Keywords: restart athena problematic bland randomness athenas predictable automaton prompted paralyzed chooses contextual unpredictable parol 1950 mind-460 arithmetic tuned
Compound Input Devices (3) - Even More Testing (2012-06-01)

Intro: Having added more audio and interacted with this new model created with the compound device over a period of a couple of hours, the impression I am getting is that I am retraining a model from scratch even though about six other devices are contributing data. I am reminded of the experience of starting out with no audio samples at all. Many of the problems this audio model has can be ironed out by speaking a certain way, for example with longer or shorter silences between words. Then the model works again. This may be an issue with Julius start-up parameters; I am looking into that. This is an entirely new experience, quite different from simply using a different Bluetooth device, where the data gained is pretty much complementary to the previous devices used. In this case the new device needs comparatively little training to make the model useful. I don't have enough audio yet to create a model exclusively using this one device. But clearly I need a lot more audio with greater vari...

Keywords: interacted reminded all many ironed silences julius start-up that this complementary useful i training update smartphone earpiece
Compound Input Devices (2) - More Testing (2012-05-30)

Intro: Since the results with the untrained model were not good (...

Keywords: untrained wep475 agi close-to-perfect untested reversible waveform punctuated interruptions pops amplitude unhelpful discarded wander weeds nbsp
Compound Input Devices (2012-05-28)

Intro: Until now, I have been using simple input devices to send audio into my computer systems. Either a wired headset or a wireless unit communicating directly with a USB dongle of some kind. The most adventurous connection has been with a smartphone communicating with an Asterisk server using the Asterisk Gateway Interface. In this mode, you use a telephone to talk to the dialog manager through WiFi, SIP, and RTP protocols....

Keywords: dongle adventurous smartphone asterisk rtp wondered cannot straightforward wep475 activates agi wep earpiece inferior pulseaudio nokia+samsung nokwep
Athena Context Manager (2012-05-27)

Intro: If I ask Athena ACCOUNTS SUMMARY, there are two things that Athena needs to consider before replying: what kind of response is required, and should Athena do anything at all?...

Keywords: athena two-word concise cryptic posed recognizer ooc implicit extremes continuum human-like postgresql julius tractable loosen downside duplication contexts grammars disadvantage chords binomial straightforward captures comma refreshes superfluous
Homographs (2012-05-25)

Intro: If I want to record a voice note I could ask Athena for RECORD ONE.  In this case the ONE signifies a chunk of time, in seconds, that the message might last, say 15 secs. TWO could mean twice that, say 30 seconds. This is useful when I cannot use the silence command with sox to start and stop recording. It's a hint to Athena as to how much time to spend waiting for input....

Keywords: athena one  signifies chunk secs cannot implicit itis verb _look up_ noun iy ao julius lexicon recordn recordv phoneme enunciate ax pronunciation creep supplementary prompts pls the  lexeme recognizer
The Blizzard Challenge (2012-05-23)

Intro: I just completed the ...

Keywords: blizzard about-3 wavs rectangle kaffeine workaround compilers xine gxine triggered tts
Yet more Librivox (2012-05-22)

Intro: My third Librivox project produced over 750 prompts. At first it looked like about 500 would be the total, but during the analysis many longer prompts broke easily into multiple shorter fragments....

Keywords: librivox prompts fragments audacity example2425332435 originalwav overwritten truewav aswav stray belonged audiotext fine000 supplementary julius pronunciation ih ax phoneme lexicon incompatible iy arose fourteen ow sour ao discrepancy
Pre-processing Librivox text (2012-05-20)

Intro: One of the tasks involved in preparing Librivox audio for analysis by an audio model builder such as HTK is the preparation of the text to match the audio. The more exposure you have to hand editing the text, patterns start to emerge, calling for something like a Perl, PHP or Python script to do most of the work....

Keywords: librivox htk emerge maclean voxforge unknowns pitfalls numerals napoleon transliterations gutenberg punctuation
Subdirectories in the training directory (2012-05-18)

Intro: Once your collection of audio samples starts to grow, the issue of file management arises....

Keywords: arises directory/folder nautilus re-is experimented subdirectories manageable subdirectory hled phones0mlf phones1mlf prompts lexicon step4 voxforge / // slashes truncated phones0 phones1 imagined /train/wav repopulate theres thunar nimble
Reshaping audio (2012-05-16)

Intro: I came across an interesting issue while evaluating a potential public domain audio file. The audio seemed a bit quiet; this can lead to issues later with HTK unable to extract information. So I tried to normalize it, but the waveform did not change. Usually, if there is room to increase the audio then the waveform becomes fatter and the volume louder, but this did not happen....

Keywords: htk normalize waveform fatter louder audacity reshape
Transcribing audio (2) - O. Henry and the vernacular (2012-05-14)

Intro: To try to solidify in my own mind the process of segmenting public domain audio, I decided to have a go at a second large public domain audio file from Librivox. I could have simply proceeded to do the second chapter of my original choice, but I wanted a different voice. (It so happened that chapter 2 of my original choice used the same reader, I don't know if this is standard Librivox practice.)...

Keywords: solidify librivox proceeded willa cathers prose vernacular latters abound contractions slang coney idioms hammers tintype lexicon subroutine queried phoneme tis prompts htk escaped http//speechtechiewordpresscom/ http//wwwlingohio-stateedu/~bromberg/htk_problemshtml - punctuated enunciate pronounce segar interpret irishmen malleable breathiness forgiveable float8 tts breathy endings confuse segmentation kwrite silences x+2 x+1 audacity as2425252627 or242627 suffixes 23252525a
Transcription of audio files (2012-05-12)

Intro: I have spent the last few days experimenting with the transcription of public domain audio for the benefit of the Voxforge collection. The product has been delivered successfully after I tested the additional audio prompts on my own English model. There were many things learned along the way....

Keywords: voxforge prompts librivox htk cleaned punctuation ogg audacity silences control+b insertinga andb kwrite numbered waveform pronounced plugged lexicon analyzer corresponds reformatted codetrainscp /train/wav/ augmented slew mkdfapl mlf /interim_files/wordsmlf uninitialized fname substitution s/// //htk_scripts/prompts2mlf concatenation complained [+1232] numparts cannot - hled underscore postgresql htk_compile_modelsh julius triphone
Review of my recent language experiments (2012-05-09)

Intro: It was a chance idea that took me into exploring a dialog manager in Spanish, and another chance idea that started the 'telephone as a device' path. Both were instructive and successful adventures....

Keywords: instructive easiest gut prompts utterances - asterisk e71 loudspeaker ogg format 00 beeps digits athena julius exits cleans terminates tango predominant yes|no true|false from5 to0 outlier a6 cron siri multilingual
Interaction in Brazilian Portuguese (2) - Telephone adaptation (2012-05-08)

Intro: This process was a simple repeat of that followed for French. Pretty straightforward except for:...

Keywords: straightforward gra2as por_favor misses grunty holdout
Interaction in Brazilian Portuguese (1) - Audio Model (2012-05-07)

Intro: Many years ago I followed a couple of University courses in Brazilian Portuguese. Since then I have tried to keep up a basic familiarity with little success. I have never been to Brazil and only spoken to a few native speakers; most of my continuing association is via FTA satellite broadcasts. It's a beautiful language....

Keywords: familiarity fta broadcasts brasileiro falado bf voxforge phonemes lexicons tts lexicon phoneme gleaned emerge pronunciation accents accented true/false graÇas nÃo obrigado unaccented equivalents suffice htk unicode unfamiliar illustrate alphabet cannot nao n1o gracas gra2as sent-start sent-end sent_start sent_end chamar camÕes telefonar lhasa binomial chamar_para para_camÕes cero nove ogg format00 apologies fetches waits julius +1 deliberate messed dois correto prompts
Interaction in French (4) - TTS (2012-05-06)

Intro: French is not pronounced exactly as it is written, and there are lots of complications in the phrasing of sentences. This leads to complex rules which make it much harder to develop a voice for Festival in French, which is probably why I can't immediately find an open source voice with sufficient quality to suit my experiments. This is not a problem in my application, since I have a limited number of prompts to express; not every response must be generated on the fly by Festival. I can prepare complete wav files beforehand and play them from a database. This is inconvenient, but is a practical workaround....

Keywords: pronounced prompts beforehand inconvenient workaround server-end lexaddentry quixote listener distinctive intonation tempo subroutine saytextannounce_thisusing_this_voiceother_parameters saytext audacity programmatic qualitative normalization deletion semitones retains rule-based transformed bonjour suis serviteur français à tout lheure thesounds = { francaisintroogg lheurefinogg } ogg asterisk --with-ogg= --with-vorbis=
Interaction in French (3) - Telephone adaptation (2012-05-05)

Intro: Adapting my French wep475 audio model to telephone input is proving significantly more difficult than the same process in Spanish. The first step was to copy the wep475 model to the Asterisk server and see how accurate the model was when used with telephone devices. The accuracy was quite poor. Longer more distinct prompts were not an issue, but the all-important prompt OUI D_ACCORD was consistently seen as HUIT. Single word prompts such as numbers were quite inaccurate. This is not helpful since the dialog manager that harvests new prompts remotely on the Asterisk server relies on the AM to get confirmations. When collecting my new prompts using the Snom phone in loudspeaker mode, the DM only once recognized OUI D_ACCORD correctly. The rest of the time I had to signal true and false by pressing digits on the phone. It's not a blocker, just slightly inconvenient. Here is an ...

Keywords: wep475 spanish the asterisk prompts all-important oui d_accord huit inaccurate harvests relies confirmations snom loudspeaker digits blocker inconvenient here ogg format305 illustrate events fade keypad appelez georges agiverbose beeps beep discards hears recherchez reseau_mondial notes in blaze dahdi rebuild devices initial on update t/f merci holdout deux neuf hardest grunt
Interaction in French (2) - Audio model (2012-05-04)

Intro: Within 2 hours of finalizing the lexicon I have my first audio model. The process worked more or less as anticipated, except for:...

Keywords: lexicon sent-start sent-end d_accord underscore htk cannot athena postgresql regenerated replaceword_z asc phonemes merci blocker julius recognizes
Interaction in French (1) - Setup (2012-05-03)

Intro: Now I am interested in interaction in French, to see how this will be different from preparation in English and Spanish. I will use the same process as with Spanish, and try to generalize my processes to deal with other languages like Portuguese....

Keywords: generalize phoneme phonemes phonetics ipa htk julius non-standard lexicon lexicons accent-biased bypass appelez georges oui daccord merci voca prompts
More audio samples lead to less accurate AM? - Resolved (2012-05-02)

Intro: 8:00 a.m. After spending a day or so setting up a script to harvest my own audio from telephone sources for my Spanish model, I finally had enough data to rebuild the audio model using four different sources: the WEP475 bluetooth on my main machine, and three telephone sources recorded via the Asterisk server (Nokia E71, wireless analog and Snom 360 in loudspeaker mode). The resulting audio model is not accurate at all, with any of the devices. In fact it is totally unusable in the first raw tests....

Keywords: 800 rebuild wep475 asterisk e71 snom loudspeaker unusable improves athena muddy refinements muddled work30 omitted resample intact prompts /train/wav/ wavs venerable voxforge htk_compile_model uno ocho disappeared resampled htk handset speakerphone interchanged ache
Interaction in Spanish (4) - Telephone (2012-04-28)

Intro: Interactive voice response via telephone is an interesting application of speech detection and management by a dialog manager. You have a language and audio model, so how to apply this in a telephone context? It is of course nothing new, my bank already uses such a system for telephone voice print identification setup....

Keywords: restrictive htk julius tts text2wave voice_el_diphone asterisk agi ogg format5 snom loudspeaker unbiased beep initiate responds discard atena termine cue athena detects hasta luego hangs e71 dfa dict hmmdefs tiedlist prompts fuller juntadeandalucia fills narrower violate dialplan processor6 rises from05 steady20 runaway - fastagi rebuild handset dect nbsp
Interaction in Spanish (3) - Dialog Manager (2012-04-25)

Intro: The final stage of the process is the dialog manager, where I try to interact with the audio model in some kind of intelligent and useful manner....

Keywords: dialogs tts voice_el_diphone pkgsorg hispavoces voice_juntadeandalucia_es_pa_diphone voice_juntadeandalucia_es_sf_diphone slowed enunciation slower ogg juntadeandalucia diphone mono307 prompts julius initialize shuffles randomized evaluates correcto exhausted goodbye terminates uno ocho tres seis ranged from/29 to/29 llame llesha phonemes
Interaction in Spanish (2) - HTK audio model (2012-04-24)

Intro: There are a number of steps in building the audio model. I followed the process detailed in the Voxforge auto tutorial for HTK/Julius, and this resulted in a surprisingly accurate final result. It is worth giving it a shot....

Keywords: voxforge htk/julius phonemes lexicon emulate - abchdefgiklllmnnyoprrrsshtthux pronounced accounted phoneme tilde completeness lopsided pequenyo   [pequeÑo] illustrates accents unicode htk julius mkdfapl accented vowels /lexicon/ voxforge_lexicon arises /scripts/htk_compile_modelsh rename htk_compile_modelsh 4 parameterize spanish_lexicon prompts codetrainscp filenames nbsp
Interaction in Spanish (1) - Festival TTS (2012-04-23)

Intro: I have recently spent some time attempting to build an interactive system using Festival, HTK, Julius and Voxforge tools using the Spanish language. The following comments are more or less notes to file regarding problems encountered along the way, mostly as a reminder to myself....

Keywords: htk julius voxforge phoneme audible human-but phonetic festivalgt lexlookup incendio ih iy ow non-existent phonemes accustomed lexica voice_el_diphone e1 i0 castilian enunciation accents  - vowels tilde dieresis dots pronunciation gu enunciated saytext unicode maría lt-- accented deletions forgets cli habra pare hacia modico mu~neca verguenza enunciations speaker-dependent
Natural Language? (2012-04-20)

Intro: There are at least two schools of thought related to how to speak to a computer.  One says that a user should be able to say anything and the machine must do its best to decipher, as fully as possible, what was said. Another says that there should be a fixed set of recognizable commands that the computer will be expected to understand. The first is termed "Natural Language". I don't know that the other end of the spectrum has a label. Perhaps it is the domain of fixed grammars. The latter approach is what Athena uses. The set of recognizable commands is defined in the grammar, and the recognizer will choose from amongst those possibilities....

Keywords: computer  decipher recognizable termed grammars athena recognizer anticipates listens untrained sharpened lexicon extremes tendency signify equates okey dokey bailiwick adapt nbsp
Vocal text searches - 'assize' in Thomas Hardy? (2012-04-18)

Intro: One of the useful attributes of a computer is its ability to scan and report back information about documents. A practical example is that you know that a word is used in the works of an author but you can't remember exactly which book and chapter and the context in which the author used it....

Keywords: hardy assize gutenberg hardys tedious picky ogg format2 athena julius buffers spelled others-9 a+s+s+i assizes manageable contradiction searchable occurrences distinguish noun adverb regex synset wordnet enunciation tts
An example alert routine (2012-04-16)

Intro: This audio file (OGG format, mono, 4 min) is an example session where I set an alarm or alert. An example might be that I need to make a call at a certain time and day, or fetch a cake from the oven; Athena can help by giving me a phone call or sending an email with a recorded message, at a specific time on a stated day. This is a straight run through, warts and all, with no editing apart from synchronizing the input and output tracks and merging them.

The basic process is straightforward. Athena is in command and the alarm manager probes me for criteria :

  1. I change to context alarms - this gives me the ability to create a new alarm, browse existing alarms in the queue to edit, deactivate, activate, delete or simply list alarms

  2. I ask for alarm new, this puts the alarm manager in charge

  3. spell out a label in this case "l-a-b-e", then the autocomplete routine kicks in, offe...

    Keywords: ogg fetch athena warts straightforward probes alarms - deactivate activate l-a-b-e autocomplete kicks terminate enunciate timestamp countdown inserts cannot string- alarm/alert asterisk secs acted crontab don_diphone clearer slower smallish of+ waits unfortunate merits selects silences enunciation alphanumeric lookups interject interrupt supervisory volts
Audio Hardware in speech recognition (2012-04-15)

Intro: In my opinion, a critical element in the overall system is the type of audio hardware used. This is because the hardware chain has distinctive qualities when creating audio samples, either for direct recognition or creating or modifying a model. Changing any one of the computer+adapter+headset+speaker stages can have significant effects on recognition ability....

Keywords: distinctive computer+adapter+headset+speaker cannot re-record wears mikes donkeys workhorses replaceable stray hum disadvantage headsets beard strands  repairable fray over-the-head cowls alsa pulseaudio insensitive mouthpiece wep475 bluez sspmode hciconfig underrun julius suspends arecord prompts symptom alsa/arecord listeners fuzziness synthesizer degradation
NATO phonetic alphabet (2012-04-14)

Intro: For some time I have been concerned about the use of the NATO phonetic alphabet (A alpha, B bravo, C charlie) in a voice recognition context.The issue is that for short words enunciated outside of the context of a sentence, as in spelling out a registration plate, errors cannot be smoothed over at the receiving end by the use of the word in a context. In "Let's climb the ...a tomorrow", the word sierra can be inferred. You can't say the same for "Lima charlie ...a zulu." The NATO version has undergone substantial changes over the years in response to problems encountered....

Keywords: phonetic alphabet bravo contextthe enunciated cannot smoothed inferred lima zulu undergone noisy contexts athena ironed phoneme vertical_bar infer problematic hears confidence8 semicolon julius perennial phonemes fulfill under-used frequently-used
Stumbling blocks (2012-04-12)

Intro: Speech technology is hard. There are many things that can go wrong. Errors can be related to input accuracy or misunderstanding and incorrect action....

Keywords: enunciation shallow athena distinguish hears adapt wer say+ selects assesses swer tendency lexicon pronunciations tanzania[tanzania]t ey iy ax julius clues phonemic derives lexicons phonemes dialect tedious htk complain cannot aborts poke salient pops anther contexts prompts ~450 oov recognizer detections ooc contextual postgresql rdbms grandstand clever
The Athena Framework (2012-04-11)

Intro: I call my system Athena - this after a voice interactive computer in one of the books by Arthur C. Clarke....

Keywords: athena - linuxcom julius developerworks srgs sisr readable recognizer parsed elementtree hears contexts inaccurate postgresql clever mitigate prompts assesses pounce input/output software/hardware shifts float8 brains reside
Voice recognition and response - who, what and why (2012-04-10)

Intro: My name is Colin Beckingham. This blog is designed to clarify my experiments with voice control and interaction with a computer. I'm hoping that keeping a kind of diary will help me organize thoughts as this project grows....

Keywords: beckingham clarify ive part-time mashed adaptable cmu sphinx kiku distros responds contexts braille leaps bounds theres abstruse joystick rudder wand scalpel counterpoint a-core headsets - o/s opensuse htk ffmpeg audacity arecord aplay voxforge julius recognizer assorted ogg

Last modified: July 21 2015 11:37