They're pointed plots of a sound wave.
Imagine a grid.
The top of the grid is 32767
The bottom of the grid is -32768
The center line of the grid is 0
The left side of the grid is 0
The right side of the grid is boundless (how far it goes depends on how long the tune plays)
The X axis is time.
The Y axis is the output at that particular point in time.
Each sample plots a point on that grid. The X coord is which sample index it is, the Y coord is the actual value of the sample.
A simple example:
if you have the samples 0, 1, 2, 2, 2, -1, -3 it might look like this:
Connect the dots and it forms a sound wave. This translates roughly into the physical motion that the speakers will perform in order to recreate the sound.
As for how it can contain multiple tones, that's a whole other topic. I can get into the basics if you're really interested.