ISiS: Ircam Singing Synthesis

The ISiS software is a command line application for singing synthesis that can be used to generate singing signals by means of synthesizing them from melody and lyrics.
The ISiS software is the result of the french national project ChaNTeR (Project ANR-13-CORD-011), that was performed in collaboration with Acapela, LIMSI, and Dualo. and is distributed free of charge for members of the IRCAM Forum.

ISiS operates offline synthesis by means of reading score and lyrics from data files and renders the result into an output sound file. In its current version ISiS supports synthesis with 3 French singing voices:

  1. RT: a tenor male pop singer, and
  2. MS: a female mezzo-soprano pop singer, and
  3. EL: a female soprano lyrical singer.

Installation

For working with ISiS the software and singing voices need to be installed. Please see detailed instructions for the installation of the command line application and singing voices in the following section

The ISiS command line

Once the initial steps are performed you are ready for the first synthesis. For this please open your terminal app and type (where the > character represents the shell prompt and should not be typed)

First steps

> isis.sh -v

If all went well, you should receive the isis version displayed on the terminal.

ISiS version::1.2.7

In case you receive an error message that means that at least one of the steps described so far have not been performed correctly. please check all of them, and if you don’t find the problem please contact IRCAM Forum support, sending the following pieces of information

  1. The error message you received
  2. The content of the .bashrc, .bash_profile, .tcshrc, and .cshrc files you modified
  3. The output you receive when you type
> echo $SHELL

in your terminal.

Synthesising the default song

the default song is a short extract from the French song Les feuilles mortes. We will first experiment a little bit with the singing synthesis on the the command line before we will describe in more detail the parameters you can manipulate in the score cfg file.

As a first trial please run the command

> isis.sh -o defsong.wav

After a about 20 seconds (the time depends on the power of your computer) and a long list of cryptic output the command prompt will reappear. The last lines should display

#################################################
PaN voiced synthesis
#################################################
#################################################
PaN unvoiced synthesis
#################################################
#################################################
apply post-processing treatments
#################################################
Create: defsong.wav using wav format!
=======================================
computed in 28.277118921279907s

Which means all went well and you can now listen to the song by means of running

> open  defsong.wav

in the terminal. This will open the default application for wav snd files allowing you to listen to the result. In case you installed MS as default voice you should get defsong_MS.wav

If you use another singing voice as default you may get a little bit strange sounding results. The score is in fact written for soprano voices and especially if you use RT as singing voice the result suffers from the required transpositions. So depending on the default voice you have configured you should adapt the command line to transpose the default melody, such that it better matches the singing voice.

For MS the database is recorded in 315Hz (approx midi note 63) which is slightly below the average note frequency of the default song. For RT the database is recorded at 150Hz (approx. midi note 50) and to adapt the average note pitch to the voice you can use the command line flag –global_transp which transposes the melody by the given number of half tone steps. For RT you get good results by means of lowering the melody by 13 half tones as follows

> isis.sh --global_transp -13 -o defsong_RT.wav

which will produce the singing which results in defsong_RT.wav

Finally for EL the database is recorded at about 440Hz (midi note number 69) and to get a good result you could transpose the melody upwards by 2 half tones

> isis.sh --global_transp 2 -o defsong_EL.wav

which results in defsong_EL.wav

In case you have established the ISIS_CORPORA environment variable you can select the singing voice simply by means of selecting the sub directory in the ISIS_CORPOPA folder. To select the MS voice you would simply run the synthesis as follows:

> isis.sh -sv MS -o defsong_MS.wav

Automagically open synthesized sounds

Before discussing all the different options you can select on the
ISiS command line a final tip that simplifies inspecting the synthesis results.

If you add the -a flag to the command line

> isis.sh -sv MS -o defsong_MS.wav -a

the synthesised snd will be loaded into the AudioSculpt application, together with the target pitch contours and the phoneme locations in the synthesized snd creating the following AudioSculpt window

MS default song displayed in AudioSculpt

MS default song displayed in AudioSculpt

If you don’t have the AudioSculpt application installed you can use the -O flag with a similar purpose, it opens the synthesized sound file in the default application you use, for playing sound files.

> isis.sh -sv EL -o defsong_EL.wav -O

A complex example

To demonstrate the sound quality that can be obtained with ISiS we use here a fully synthetic extract from the Opera I.D. produced by Arnaud Petit

Synthetic mockup from the opera

All command line arguments

For a discussion of all command line arguments please read

Score files

After having understood the basic options that are available on the command line, we will now discuss the central control of a singing synthesis system: the score. The score gathers all basic melodic and lyric parameters of a singing performance. this comprises the sequence of notes to be played, the tempo, the sequence of phonemes to be sung, as well as note dynamics.

For an in depth discussion of the representation of singing scores in ISiS please read

Advanced configuration files

ToDO

Manual adaptation of generated parameter contours

ToDO

Credits

Particular thanks for contributions go to

  • Luc Ardaillon, for having worked on the ISiS software during his PhD thesis in developing a large part of the software and the singing style models,
  • Marlene Schaff, Raphaël Treiner, and other singers for contributing their voices.
  • Acapela for contribution of the annotation of the singing corpora,
  • All participants of the ChaNTeR project for valuable discussions throughout the project,