I'll agree that the use of Vertex Buffer Objects would be a good starting point for optimization. It requires that you change how you render all of your objects, but will be worth it in the long-run. I'm not going to go too far into details and will instead let you learn about them at locations that have already done a good job explaining how to use them:
That will get you into the basics of how to work a VBO. Once you get your own little demo of it up and running, go ahead and start making a system to handle your data. As Ogoyant has described, optimal use of a VBO can be a complex thing. I wouldn't worry about it too much until you have your system set up and you know VBOs inside-and-out.
After VBO implementation I still recommend storing voxel data using octrees and merging faces from there. The gist of it has already been discussed, so I won't go into it again. I'd open a separate thread for use of an octree if you ever decide to try it. As Telion stated, it's not something that's easy to set up, but I still believe that it could provide immense performance gains under normal circumstances.
EDIT: Framework - That covers a lot of things that we've been discussing here; the sample pseudo-code will be useful. Thanks.