ALC (Alternate levels clustering) constitutes the “heart” of the project’s contribution. You will find here an online demo and a conceptual presentation of the algorithm (below).
The principle of ALC comes from considering that audio scenes may be simultaneously approached following two complementary viewpoints.
The first is that of their sequential organization: following the temporal axis, sound fragments coalesce into perceptual objects according to perceptual rules of proximity, continuity, closure, etc. This is a “short-sighted” viewpoint that has the benefit of being capable of taking into account fine-grained properties of objects.
The second corresponds to the relationships that appear between objects by virtue of their similarities and dissimilarities, independently of their temporal location, throughout the scene or even across scenes. We can this axis “conceptual” since it builds generalizations from individual objects, and is not meant to account for detailed object properties.
It is worth nothing that these two axes roughly correspond to the “syntagmatic” and “paradigmatic” dimensions of linguistic analysis.
ALC jointly considers those two points of view in order to overcome their limitations: due to its limited field of view and the simplicity of the rules it considers, the temporal approach cannot allow us to build complex objects. The conceptual approach allows us, by “abstracting” many individual objects into a limited number of classes, to identify recurring patterns in a scene, patterns whose stability justifies combining their composing elements into higher-level complex objects. Reciprocally, by building more complex objects along the temporal axis, we allow the conceptual analysis to work on a reduced number of better-defined objects, which brings even more complex patterns within its reach.
Rather than tackling the difficult problem of jointly optimizing the spatial and conceptual criteria defined above, we adopt an alternating strategy (giving its name to the algorithm) of building objects alternatively on one axis, then the other, in order to produce an increasingly high-level representation of the studied scene.