ACAMODIA project



ACAMODIA aims at designing of a computational dialogue model for Embodied Conversational Agent (ECA) in Ambient Intelligence environment (AmI), to improve child-agent interaction. ACAMODIA proposes to provide a generic software platform dedicated to dialogue analysis (transcription, annotation and semi-automatic extraction of dialogue patterns), and thus to study interactions of children with a software agent. The chosen application field is interactive narration.


Dialogue modelling; Knowledge extraction; Embodied Conversational Agent; Narration



Method and material



The method proposed to model the dialogue uses the following steps:

  1. collecting and digitizing a corpus of dialogues, with an audio format, or a video format enable to encode multi-modal information. The corpus considered here consists of storytelling sessions by parents to their children;
  2. the transcription step produces raw data with various levels of details (speaking slots, utterances, onomatopoeia, pauses, ...) depending on the phenomena and characteristics which the model must exhibit;
  3. knowledge and regularity extractions are applied, through the utterances coded during the previous step. This knowledge consists in the regularities of the dialogues and constitutes the model;
  4. finally, the model is exploited for interactive storytelling.

Corpus of narrative dialogues

In this study, 30 dialogues between children and parents (ages: 3, 4 and 5) were recorded during emotional story telling situations. These records have been transcribed and coded to capture information about the mental states (beliefs, volition, emotions, ...) contained by the various utterances.


Each utterance is characterised by a line number, a speaker (P: parent, C: child), a transcription and its encoding annotations along five columns. The five columns of the annotation grid includes between 2 and 7 encoding features, following a fixed coding scheme. The feature of the first column is related to the nature of the utterance. It can be either an (A)ffirmation, a (Q)uery, a request for paying attention to the story (D), or a demand for general attention (G). The second column expresses the reference of the utterance. It can refer to the character (P), the interlocutor (H) or the speaker (R). The third column is dedicated to mental states. The interlocutors can express an (E)motion, a (V)olition, an observable or a non-observable cognition (B or N), an epistemic statement (K), an assumption (Y) or a (S)urprise. The two last columns represent explanations with cause / (C)onsequence, (O)pposition or empathy (M), which can be applied either to explain the story (J), or to precise a situation with a personal context (F).

Emotion detection

The classifier we have chosen is a commonly used unsupervised method, the Self-Organizing Maps (SOM). This method is a particular type of neural network used for mapping large dimensional spaces into small dimensional ones. Our technique requires a multi-step process, each step assuring the output for the next phase.

Preprocessing Step

During the preprocessing step, we applied on each utterance a collection of filters, in order to remove any useless information, such as special characters and punctuation, camel-case separators and stop word filtering. We considered as stop words, all prepositions, articles and other short words that do not carry any semantic value. We kept only the words that are considered to carry a strong semantic and emotional value, as WordNet Affect is suggesting.

Feature Extraction

From the feature extraction perspective, we have chosen a Latent Semantic Analysis (LSA). In our experiments, the document collection is represented by the dialogues corpus, where each utterance is a separate document and the term set is constructed using the top 10 000 most frequent English words, extracted from approximately 1 000 000 documents existing in the Project Gutenberg. The words are used as key terms. The features used are the document similarities obtained after applying the LSA algorithm.

Feature Selection

The selection can be done automatically with a k-LSA instead of the classical version of the algorithm. For the training part, the feature vectors are the columns from the LSA matrix, which represent the document similarities. For testing, a feature projection is done by translating the new occurrence matrix into the document space.


Our implementation of the SOM differs from the classic one since the feature space and classes were split into two distinct concepts and the classes are not used actively in the self-organizing algorithm; data and label vectors are separated in the Self-Organized Nodes and the learning process is done similarly for both of the vectors, with the same parameters.

During our experiments, a 40*40 grid size was used for the SOM configuration. The feature vectors were the document similarity vectors obtained from the feature extraction step. As for the labels, we used the intensities available in the corpus as an independent vectorial space.

Since a complex labelling technique is used, with 6 emotions represented by their probability of occurrence, a color was assigned to each emotion.



We used the same measure as during the training phase, which computes a distance from a proposed individual to all the elements in the SOM grid. The Best Matching Unit is selected, i.e. the element of the grid which is closest to the desired individual. In our experiments, the Euclidean distance was used both in SOM algorithm and for evaluation.

Knowledge Extraction for Dialogue modelling

Extraction of dialogue patterns

A dialogue pattern is defined as a set of annotations whose arrangement occurs - exactly or approximately - in several dialogues. The method designed to extract significant dialogue patterns consists in a regularity extraction step based on matrix alignment using dynamic programming and a clustering step using machine learning heuristics to group and select the recurrent dialogue patterns.


The method for extracting two-dimensional patterns is a generalisation of the local string edition distance which can be assimilated to sequence alignment. The problem of two-dimensional pattern extraction corresponds to matrix alignment. The patterns, extracted with the matrix alignment algorithm, only appear in two dialogues. In order to determine the importance of each pattern, we propose to group them by means of various standard clustering heuristics. According to Dunn's index, the best methods for the presented problem seem indeed to be the affinity propagation and the spectral clustering methods.

Predicting the interaction of the child

As our goal is to build a dialogue model dedicated to narrative ECAs that stimulates child interaction, we have to precisely model the arising of the child's interaction, focusing on event prediction. In other words, we look for sequences of dialogue events leading to child's interaction. We propose to split the data over each turn of utterance, in other words over each sequence of parents assertion or question and child's interaction. The problem consists therefore in predicting the end of each turn.

Evaluation: avatar impact on child interaction

To assess the impact of embodiment, a second experiment was carried out to collect other storytelling dialogues: child-adult dialogues using a videoconference system and child-avatar dialogues during a Wizard of Oz (WOz). During the experiment, the extracted dialogue patterns were used.

The following picture present the GUIs used during the experiment that combined mediated interactive narrations and WOz.


Interaction with avatar or in a visioconference mode is similar in term of semantics, but the modalities used are different.


ICAART13 : Pauchet A., Rioult F., Chanoni E., Ales Z. and Serban O., Advances on Dialogue Modelling Interactive Narration Requires Prominent Interaction and Emotion, International Conference on Agents and Artificial Intelligence, Barcelone, Espagne, pp. 527-530, 2013. Poster.

IJIIDS12 : Lecroq T., Pauchet A., Chanoni E., Ayala Solano G., Pattern discovery in annotated dialogues using dynamic programming, International Journal of Intelligent Information and Database Systems, Vol. 6, No. 6, pp. 603-618, 2012.

WACAI12a : Pauchet A., Rioult F., Chanoni É, Ales Z., Serban O., Modélisation de dialogues narratifs pour la conception d'un ACA narrateur, Workshop Affects, Compagnons Artificiels et Interaction, 8p., Grenoble, 2012. Article.

WACAI12c : Serban O., Pauchet A., OAK: The Online Annotation Kit, Workshop Affects, Compagnons Artificiels et Interaction, 2p., Grenoble, 2012. Article.

HAIDM@AAMAS12 : Ales Z., Dubuisson Duplessis G., Serban O., Pauchet A., A Methodology to Design Human-Like Embodied Conversational Agents based on Dialogue Analysis, Workshop HAIDM@AAMAS, Valencia, Spain, pp.34-49, 2012. Article.

ICAART12b : Serban O., Pauchet A., Pop H.F., Recognizing emotions in short texts, ICAART, Vilamoura, Algarve, Portugal, pp.477-480, 2012.