Sound and time are inextricably linked.
Like motion and time, sound/hearing and time are inseparable; there is no one without the other. Sound only exists because of the passage of time. Without time – and by extension motion  – sound could not exist. We have no mechanism for perceiving sound independently of time. This is unlike vision where we can view a photograph, or a still frame from a film, independent of time. We can slice time into frozen moments of sight, but not of sound.
independently of time
The thing we see need not be in motion for us to view it; not so the thing we hear.
In essence, our perceptual systems are ‘difference detectors’ which continually respond to changes in the stimuli to which they are addressed. Due to neural adaptation, no change means nothing to perceive; if there’s no perceptible differentiation, there is by definition no percept.
One of the differences between visual and aural perception, in terms of physiology, is the fact that the eye itself provides small rapid movements to maintain visibility of fixed (stationary) objects. These small ‘fixational’ movements help refresh neural stimulation, without which we would not be able to see motionless objects (assuming the viewer is also stationary) .
In other words, through this built-in mechanism for ‘refreshing’ neural activity, the visual system itself provides a source of constant differentiation enabling visual perception of non-moving objects. The ear has no such mechanism, which means that the ears must receive continual (meaning continually changing) external stimulation; no change means no sound.
Sound perception requires change/differentiation, which requires time. Hence, we can see motionless (stationary) things but we can not hear motionless things.
There is no aural analog of the photograph or still frame from a film. Sound is, at its core, inherently a time-dependent phenomenon. In this sense, at least, ‘phonography’  is a poor analogy to photography. Music and sound art are essentially time arts in ways that even film and video are not.
If, as Cage asserted , the proper basis for structure in music is not pitch, timbre, and harmony, but rather sound, silence, and duration, this is actually nothing more than an affirmation of the underlying condition of aural perception: regardless of the nature of a sound itself (or lack thereof), duration is the fundamental principle enabling its perception.
Time, motion, sound…and listening. More on listening later…
1. Current understandings in physics posit that motion, time, and space are inseparable. See, for example <http://plato.stanford.edu/entries/newton-stm/>, <http://en.wikipedia.org/wiki/Absolute_time_and_space>, <http://fqxi.org/data/forum-attachments/Relation_between_Time_Space_And_Motion_Sorli__2009.pdf>.
2. See, for example: <http://www.neuralcorrelate.com/smc/files/publications/martinez-conde_et_al_nrn04.pdf>, <http://www.pnas.org/content/108/39/E765.full>.
3. See, for example <http://en.wikipedia.org/wiki/Field_recording> and <http://www.phonography.org/>.
4. See, for example <John Cage “Forerunners of Modern Music” in Silence, Wesleyan University Press, 1961>, and <http://www.kim-cohen.com/Assets/CourseAssets/Texts/Kahn_Cage-Silence%20and%20Silencing%20(1997).pdf>.