Information site for ANR project HOULE


Download: https:// bitbucket.org/mlagrange/simscene

SimScene is a tool dedicated to the thorough evaluation of analysis systems, allowing the user to simply generate a large collection of soundscapes following a generic high-level description, such as “one minute of urban hubbub with a pedestrian walking nearby and cars driving every 10 seconds on average”. Events and textures are stored in several instances in the tool’s sound database, so repeated occurrences will not sound exacty the same. The scene description formalism allows one to specify the energy and positioning of sounds, as well as the amount of randomness to introduce in it. In addition to the audio file corresponding to the scene, SimScene generates a precise and reliable ground truth description that can be used for precise evaluation, as well as some graphical representations such as the colored spectrogram below.

Annotated spectrum generated by SimScene. The scene features a noise of sea waves in the background (light green), splashing noises (red) and seagull calls (blue).

Annotated spectrum generated by SimScene. The scene features a noise of sea waves in the background (light green), splashing noises (red) and seagull calls (blue).

To allow users to create sound environment, a generative model is designed. The proposed model takes into account morphological and perceptive considerations. In this model, the basic elements are classes of sounds and not individual sounds. Classes of sounds are collections of semantically identical sounds as for example the classes “passing-car” or “passing-scooter”. To ease the creation of soundscape, the simulator assumes a perceptually inspired distinction made between the sounds populating the sonic world which may be summed up in the sentence of Nelken and Cheveigne concerning the soundscapes: “a skeleton of events on a bed of texture “. The distinction between the so called sound events and sound textures is perceptively motivated as several studies point out the fact that this two types of sounds provoke two distinct cognitive processes. Roughly speaking, short and salient sounds are considered as events (“car passing “, “male yelling “, “bird singing”), and long and amorphous sounds are considered as textures (“wind “, “rain “, “street hubbub”, “urban square hubbub”).

For all sounds, the following parameters can be controlled:

  • energy (specifying average value and variance)
  • start time (time when it appears in the scene)
  • end time (in the scene)
  • fade in time
  • fade out time

In addition, sound events can be repeated, in a manner controlled by an extra parameter:

  • repeat period (specifying average value and variance)

Given the presence of randomness in the parameters, and the fact that actual sound instances are always selected at random from the collections in the database that match the selection criteria, a single scene description can be used to produce a potential infinity of scene variants, making SimScene a powerful tool to generate evaluation benchmarks.

Download the SimScene Matlal tool, with user documentation : https:// bitbucket.org/mlagrange/simscene