You're missing the point. Speech-recognition is a notoriously difficult task. Even superior software, developed by people with millions of dollars to pour into it, typically requires calibration to a particular speaker's voice.
Use a library of some kind. Your application will actually get finished some day if you do.
I assume you are looking to make something like Dragon from Nuance (Of course on a much smaller scale). As others have pointed out if you are asking how to do it you don't really have the programming skill to take on such a task as of right now.
Specially if you want to make something better then those that are out there (Like Microsoft Speech Recognition, Dragon, Siri, ect) which all had teams of hundreds of highly experienced programmers working on them and budgets of millions of dollars.
We aren't trying to deter you from what you want to do, but we are just being realistic. This is not something that a beginner would be able to accomplish and I doubt even a highly experienced programmer could accomplish it without a massive budget and a big team.
Though if you just want to learn about speech regonition in generally this might be a good open source library to check out http://cmusphinx.sourceforge.net/ there is also the Mircosoft SAPI http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx
Though be warned you have to have a pretty good grounding in whatever language you choose to use. If you do you can come up with a nice little speech recognition program, but just don't expect it to be in the same league as Dragon, Siri and the other.
If you did want to try to do this without a library, it would be tough, but not impossible. This would be done in several sections:
1. Get a stream from a microphone. You need to interface with a driver, use your OS, or a library like OpenAL to start recording from the microphone. You'll get a stream of data.
2. Recognize when to split up your stream. If you can recognize the space between words, or between sentences then package that data as a single sound clip and pass it onto the next step, then record your next clip.
3. Convert the sample to text. This is the toughest part and you'll really need to think about how to do this. One way could be to pre-record some words, then calculate a similarity ration. Another way could be to pre-record your words and study the waveforms in a graphical format such as what is offered with Audacity. If you can find patterns for specific syllables, letters, or words, and you can associate those patterns with your stream, then you've got it and you've probably just made $1,000,000.
Of course, doing it yourself allows you to choose whatever language you want.
Sorry to barge in on this thread guys, but I want to build a castle. Can anyone tell me what to build it out of? I don't want to use bricks, but if I have to, which bricks are the best? Also, I want the castle to be able to transform like megatron.
Interesting you should mention that xismn. I'm currently digging a new river, on my own, with a spade. It's going to be roughly the size of the Yangtze, without use of explosives, machinery or extra manpower. Anyway, it would not be much trouble for me to redirect a bit of that water to give you a moat for your castle. :)