A few questions..

Hi my name is Ryan. I am 30 and although I grew up down south and was born in Texas, I have spent the past 8 years living closer to family in the Poconos of Pennsylvania. I am a complete computer geek and have been since I got my first PC with Windows 95' on it. I unfortunately did not have the advantage of learning from earlier windows and starting with DOS. To this day I am still caught in the 2d realm of front end programs. I would love to get into Linux and learn c++. I have spent the past few years trying to learn to code and for it to make sense to me. Hopefully I have retained a fundamental grasp of at least the concept and fundamentals. I did not graduate High School and although I never got any grade below B- throughout school my senior year I failed Algebra 2 and had also became a father. I got my GED and a job instead of summer school.

skipping many years of trial and error and learning life's lessons as they came and went.................

I am not sure how to go about these questions so I am just going to lay it all out on the line. I want to make an audio programming language. I will get more into that upon interest and momentum towards the concept but for now. What I am really looking for is the code that I can type into a c++ compiler that will allow me to do the following. And a few like minded geeks like me to help in the cause.

- allow an index of verbally activated keywords or phrases to be translated from
speech to text.

- allow that index to specify special attributes for certain keywords or phrases
that would allow it to do the desired task. (i.e. create new folder or search within container)

- write the output of recordings based off the previously defined index to both text and speech as an actual voice recording.


What I would like to do is create a program that can be used for any language and of course an app that will allow the end user to trigger a predetermined event such as the following.

AS PROGRAM RUNS.
-- user verbally says phrase in index "New Folder"
--- this action triggers the corresponding indexed event to write the next output received to a new folder.
---- for example "New Folder" (5 seconds allowed) "ideas" would create a new folder named Ideas.

-- user verbally says keyword in index "Switch To"
--- this action triggers the corresponding indexed event to switch to a new wildcard folder.
---- for example "Switch To" (5 seconds reserved for speech recording) "Ideas"
would result in the program switching to the ideas folder and creating all of it's new output their unless told otherwise.

-- user verbally says keyword "Record"
--- this action triggers the corresponding indexed event to start recording the output as audio and text.
---- for example from main directory of original index. if the verbal command is
"New Folder" "Ideas" (a new folder will be created named ideas)
"Switch To" "Ideas" (will change working directory to the folder named "Ideas"
"Record" (will output in raw speech and text to a predetermined string of variables, such as (timestamp) date/hour/min/sec. or other configurable by the end user variables)

In closing. This is just the beginning. I have more ideas revolved around this than you could shake a stick at. The keywords and phrases I have mentioned are nothing to what I have in mind. Support and momentum go a long long way. Let's get the ball rolling and stir things up a bit shall we?
*bump*
Voice recognition is an entire field of study, and even the best software still is "flakey" depending on the user (everyone speaks a little differently).

Wanting this to work in "any language" is especially ambitious. I recommend you forget about that entirely for now and pick one language and focus on that before you even consider moving to others.

The problem here is basically in 2 parts:

1) Doing the actual voice recognition
2) Applying the spoken words to do what they want.


If you don't care about #1 and just want to do #2, then look into existing voice recognition libs. There probably are a couple you can work with -- though it's probable that the good ones will not be free. Start small -- like do a speech to text program first, to make sure you are capturing the words correctly. Then actually start applying them to do different things.



If you do care about #1 (like you want to do this from the ground up), then that's a whole other can of worms. I have no idea how speech recognition actually works, but I'm sure it's a pretty involved subject. The first step I would take would be to read existing literature on the subject. You might be able to find online PDFs or something describing known methods. Try googling for "software speech recognition techniques" or something like that.
thx i am doing much research... i would really love to use google's speech to text api but having a hard time locating it on googlecode..

so after some thought, i guess with as little exp as i have what i am really trying to do is make a customizable voice recognition app for my phone.. eventually i wouldn't mind releasing it in the app store but i would want months to write and alpha test it... i have a difficult time w/ my memory and i am looking for a voice recorder with a kick of my own customization's on what i can actually control.. this would certainly be a good start for me at least..
Last edited on
Doing voice recognition requires significantly more math and computer science education than you can pick up just wandering the net.

On Windows, Microsoft's Speech Recognition software is free, and it's not too difficult to hook (via COM/OLE/whatever they call it these days).

Life's a bit messier for Linux. See http://en.wikipedia.org/wiki/Speech_recognition_software_for_Linux for pointers.
Topic archived. No new replies allowed.