make video timed to music out of images

I came from a python background, and installed c++/mingw/codeblocks 2 days ago. My skill level is I can make pacman in python/pygame but am new/fumbly in c++. I want to make music videos playable on a site like youtube. I make music in modplug tracker and usually save to wav then convert to mp3. I now want to make animations play at the same tempo of the music beats per minute (maybe making a circle increase radius every snare hit or etc things like that), playable on youtube.

So I looked up video formats youtube accepts and it appears "matroska" is the most free/open source/non-proprietary. I sort of half read this and speed read the rest:
It still didn't seem clear like if you can feed it a long sequence of jpgs or something like that and ta da! it becomes a movie. Or maybe matroska only contains a "video" and "audio" track, not a list of individual manipulable image frames.
I came to c++ cuz "moviepy" and a lot of things in python seem uninstallable and unusable.. so that's sort of my experience using packages and whatnot- they don't work (in python, which seems plagued with "pip" and "wheel files"). except pygame. I mean I have the matroska specs right there, I should somehow be able to code up my own solution that takes images and a frame rate and an mp3 and turns it into a movie? If I'm not retarded? I could try some sort of c++ package that does this already.. so I can be dependent on it and not self sufficient, like a domesticated chicken, if it's humanly possible to install and use it.
You said you new to C++ why dont you buy a book or learn the from this website toturial.C++ is hard language to master.
I wrote a live music visualizer sometime last year, crammed into a tiny microcontroller. The project ended up in an effects pedal for use while performing.

Obviously, the video signal my visualizer emits is quite rudimentary, quite unlike most useful HD video codecs. But the idea is the same.

In order to pack output data into a real video, an external tool is your best bet. I suggest ffmpeg, which I think is indispensable when prototyping signal-processing stuff. In this case, you could use ffmpeg and your program in a pipeline like this:
<music-file to audio frames> | <audio frames to image data> | <image data to video>
Where the first and last components are handled by ffmpeg, leaving your code the task of analysing the audio frames on stdin and emitting image frames to stdout.

This is basically how I prototyped my music visualizer last year, except the pipeline I used looked somewhat like
<music-file to raw audio frames> | <audio frames to output frames> | <output visualizer>
Where the first program was ffmpeg and the last two were written by me.

I'm happy to answer any questions, seeing as I've apparently done a similar project before.
Last edited on
You didn't have to edit out the "today I learned" ha ha, I thought it was funny. So in python I had pygame in order to create and save image files (ideally I should learn c++'s SDL and start doing everything in c++ instead of python, but I'm gonna have to spend lots of time reading the beginner forum here before doing that to de-n00b). Then I make songs in modplug tracker, say one might be 128 rows with 8.3333 rows per seconds. I could have a program draw a monster face to an image, then every bass drum hit, the eye radius is larger, maybe every sample 4 note the monster hair is fuzzier, or something weird like that, so a whole image animates to music. I could currently create the images in python/pygame, then use gimp to save them as a gif, but that's not playable on youtube and has no music.

Well I was just trying to use "ffmpeg" in python, and it didn't work, and got so sick of being unable to make things work in python that I switched to c++ to see if it's better. Maybe just to start as a test, like I have a folder with a black filled image and a white filled image. Say I want to make a video go in a pattern black,white,white,white,white,white,white,white (black hits every 8th frame) and time it to a song that's 8.3333333 rows/second. Maybe I could use matroska, and someone else said they successfully used AVI (I mean I'm fumbling around blindly maybe if I choose a wrong format, too much secret proprietary code or stuff that's too confusing (ffmpeg seemed to fall into that camp) might make it impossible, or I guess ffmpeg is another possibility. Maybe it's the easiest one after I get past the hurdle of actually making it work/understanding it. I don't know if you call ffmpeg within your program or use the windows command prompt etc, or what all the little -r or whatever "parameters" do, and I read the alienspeak non noob friendly ffmpeg page. All that did is make me have to ask "how do you use ffmpeg?".
So you're trying to programatically create a video with music that can then be uploaded to a place like youtube, right? +1 good question, seems like fun. And by fun I mean living hell if you're trying to figure out the video format from scratch. But luckily you probably don't have to.

I don't have much experience with this, but I hope your question gets more attention.

I would check out Matroska on GitHub to check out their libraries that can parse/write .mkv files.
But just a guessy suggestion because I have not used their parsing utilities. I skimmed through the code and some parts of their interface seem very complicated, even for a non-beginner.

I have used ffmpeg to roughly stitch still animation frames together to make a video.
I believe I started by searching the question on StackOverflow and got a result probably similar to this question:
so I would check that link out.

The results aren't always pretty, so unless you do the right settings, the video tends to look atrocious when re-encoded on youtube. But I probably was just using bad settings, so experiment with different options until you get what you like. The other problem with this is that storing those images, especially if we're working in the 30+ fps range, can really add up for high-def videos. Slow to work with since you have to write each image to a file before calling ffmpeg. At least the way I was doing it.

Professional video-creating software like Adobe Premiere uses multi-core rendering and GPU-acceleration, which I have no idea is possible with ffmpeg. Might be.

And then there are more commands in ffmpeg to mux in an audio channel/subtitles, I'm almost positive (but I haven't used those commands).

<audio frames to image data> | <image data to video>

So does that mean you had to write each image frame to file in order to stitch it together with ffmpeg? Is that the standard way of doing this? Or did you do it in some memory-only way similar to how Windows Media Player has its visualizer?
Last edited on
I don't know if you call ffmpeg within your program or use the windows command prompt etc

Either. There's a command line tool, which I was talking about, but there's also a set of libraries if you prefer to use the API directly.

My suggestion should allow you to focus on processing the audio, without having to worry about things like the input or output formatting. You can use a tool of your choice to convert the input of your program to something easy to process, and convert the output of your program from something easy to emit to something you can publish. This is the basis of my suggestion - there's no need to do everything yourself, or all in one step.

So does that mean you had to write each image frame to file in order to stitch it together with ffmpeg?

I actually did this where the output control signal was instead sent directly to hardware.

Regardless, each video frame may be generated in some raw format (or any one of many supported formats) and sent through a shell pipeline directly into the ffmpeg utility. There's no need to touch the disk. Of course, if the APIs are used, a pipe is not necessary - I'm just assuming that these utilities keep things simpler.

The extent to which ffmpeg supports hardware acceleration depends on its configuration - some details here:
Topic archived. No new replies allowed.