Generating Social Interactions with Adolescents with Autism Spectrum Disorder, through a Gesture Imitation Game Led by a Humanoid Robot, in Collaboration with a Human Educator ()
1. Introduction
Autism is a global neurodevelopmental condition affecting social interactions, communication, and behavior. The autism spectrum is diverse, from Kanner autism to the Asperger syndrome. For some people with ASD, there exists a comorbidity, such as some intellectual deficit. Autism severity level can be measured through several methods, among which the Childhood Autism Rating Scale (CARS), the Autism Diagnostic Interview (ADI-R for the revised version) and the Autism Diagnostic Observation Schedule (ADOS).
Globally, as well as in some specific regions (e.g. North America) and countries, the number of reported cases of people with autism has been increasing for decades. In 2012, the global prevalence of autism was around 1 out of 162 [1]. As stated by the author, the increase in the prevalence figure could be due to various reasons, such as changes to the classification or an increase in the number of diagnoses. Today, autism professionals (e.g. at the Centre de Ressources Autisme Bretagne) say that the prevalence is closer to 1 out of 100 people in the world.
[2] presents experiments proving that children with ASD were able to imitate. Knowing how important imitation is for developmental processes, this was a key milestone for all autism professionals and caregivers of children with ASD.
[3] contains the description of a pilot study aiming at improving nonverbal communication skills of ASD children, using reciprocal imitation sessions. During a year, Scarpa and other researchers worked with eight children between 30 and 84 months old. Imitation skills were improved, based on the PEP-R and Nadel’s scale, as well as positive behaviors.
Within the framework of her thesis defended in 2015, Sarah Bendiouis built on Nadel’s work to prove that when practicing imitation with autistic children, their imitation capabilities improved as well as their ability to interact socially [4].
Furthermore, since the late 1990s, there have been many studies [5]-[8] showing the potential benefit of using information and communication technology (ICT) tools within the framework of therapy for ASD children. ICT tools used ranged from tablets to virtual agents in serious games, through avatars and different types of robots (animatronic, humanoid, android). Most humanoid robot-based experiments took place with Robota, Kaspar and Nao. Pepper, Milo and QTrobot were also used sometimes.
An example of a Nao-based experiment is Rob’Autisme [9]. Since 2016, the project has been gathering adolescents with ASD, autism professionals and artists into a process where the participants learn to program Nao robots to perform a play in front of an audience. The robots are used as an extension of the participants, borrowing their voice, and having the necessary social interactions with the audience, which the participants with ASD would have trouble initiating and maintaining themselves.
During any intervention for ASD individuals, it must be considered that ASD and attention deficit often occur together [10].
In 2018, Grossard’s review [11] highlighted several facts, among which: a few studies were related to adolescents with ASD and a few experiments were TE-TU, which aimed at therapeutic effectiveness as well as technology usability.
The objective of the present pilot study is a prerequisite for any subsequent learning process: showing that social interaction with individuals with ASD and an intellectual deficit can be generated through robot-led and educator-supervised imitation practice. Our long-term goal is to contribute to a better integration of autistic adolescents into society, allowing them to navigate with more fluidness into their personal and professional environments, and therefore allowing us to learn from them and benefit from their talents. Following the present introduction, materials and methods shall be described. Afterwards, our system will be presented. Then, we will follow the results of the experiment, along with a discussion, before the work is concluded and perspectives listed.
2. Materials and Methods
2.1. Description of Participants
During the COVID and early post-COVID periods, it was difficult for researchers to organize experiments with many human participants. We therefore decided to start with a pilot study with two participants whose profiles are described in detail below, with the aim of collecting insightful information that would allow us to adjust the protocol subsequently.
The 17-year-old female who participated met our inclusion criteria: biological age between 13- and 19-year-old, neurodevelopmental age between 6- and 36-month-old (severe intellectual deficit), diagnosis of low-functioning autism spectrum disorder, based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5). For the sake of anonymity, she will be called ‘Participant F’ in the remainder of the present document. She appears in Figure 1 below.
The control participant was a 12-year-old preadolescent with ASD, but no intellectual deficit (Asperger syndrome). He will be called Participant G and appears in Figure 2.
Both participants are from middle-income families living in France.
The robot used was the humanoid QTrobot from the LuxAI company. It is expressive thanks to the screen that serves as a face and the several programmed “emotions” (Figure 3). It has a powerful processor, a depth camera, a stereo speaker, as well as high-performance digital microphones.
In the literature, we noted the presence of a human caregiver during most experiments, including those of a child with ASD. That one would most often perform the role of a visual inciter and sometimes that of a physical inciter. Indeed, autism professionals (child psychiatrists, field educators) often stress the need for a physical inciter, with low-functioning ASD individuals.
Figure 1. Participant F with robot (April).
Figure 2. Participant G with robot (June).
Figure 3. QTrobot in the experimental room.
Figure 4. Participant F in front of robot, with visual and physical inciters.
In our experiment, we tried different configurations, with two possible roles for the field educators (Figure 4):
- Educator 1 was the one staying with the participant in the waiting room or outside for the breaks. She was also in charge of entering the room with the participant and acting as a physical inciter when needed. In some session configurations, she would switch to the role of visual inciter.
- Educator 2 was the main visual inciter, standing next to the robot.
2.2. Scenario and Timeline
The scenario used for the experiment was that of the imitation game designed within the framework of our research, along with child/adolescent psychiatrists and field educators working with adolescents with special needs. That game comprises four phases: greetings, pairing, imitation and closing. The justification for these specific phases is included in [12].
At the beginning of the game, after greeting the participant and introducing itself, the robot will put on some music to create an engaging atmosphere. By the end of the song, it will announce that it would like to play and that it will start with performing some gesture that the participant should observe. After the demonstration, the robot would ask the participant to perform the same gesture. Through its depth camera, it detects the gesture, evaluates whether it was properly imitated, and provides feedback accordingly. The participant will have the opportunity to watch the gesture three times in total and try imitating it. After a successful attempt, he or she would be congratulated, and the game would continue with another gesture. After three unsuccessful attempts or no attempt, the robot would gently tell the participant that they could try again some other time.
The experiment took place in two phases: one from April 21 to 22 and the other from June 21 to 23.
The first phase aimed at offering initial contact between Participant F and the robot, as well as observing the reaction of the former through the greetings phase. The second phase of the experiment allowed for the implementation of the full game algorithm with Participant F and Participant G separately.
2.2.1. First Phase of the Experiment
In this phase, six individual sessions, three per day, were carried out on April 21 and 22, in a technical room of the laboratory. The room could not be completely cleared out, due to the quantity of equipment and accessories. However, cupboards were closed to lessen the number of potential sources of distraction. Each session consisted of a test of the greetings phase of the imitation game scenario.
On April 21 the very first contact occurred between the participant and the robot, which were introduced to each other by the human caregiver present in the room, along with the technical operator. The participant would be accompanied into the room by the human caregiver. There, they would find the robot, standing on a table and the technical operator sitting at a distance. Once the participant had made a few steps into the room and then stopped, the operator would launch the program. The robot would then execute the greetings phase of the imitation game.
2.2.2. Second Phase of the Experiment
During the second phase, fifteen individual sessions in total were performed on June 21, 22 and 23, inside a meeting room completely transformed into an appropriate experimental room (Figure 5). Our study will analyze in detail sessions from this second phase, although a couple of comparisons could occur with the first phase.
To turn the meeting room into our experimental room, it was emptied of all scientific posters, markers, accessories, and furniture but the essentials. Electrical wire was hidden inside the floor, and the camera was placed behind a wooden screen. Blinds were closed to avoid distractions from the external environment.
Next to the experimental (main) room, there is a small lab that was subdivided into two spaces using wooden screens: one waiting room and one small office for the technical operator, from where part of the main room could be observed through a two-way looking glass. The technical operator’s space remained in the dark all through the experimental sessions, while the light was turned on inside the main room. Therefore, the participant could not see the operator through the glass, which would act as a mirror, due to the reflection of light particles from the experimental room. In addition, for the mirror not to distract the participant, two paper panels were positioned upon it, leaving just enough space for the operator to see the essential part of the main room, where the robot, participant and human caregiver(s) usually stood. These precautions were taken due to the high probability of attention deficit in ASD individuals.
Figure 5. Experimental room setup.
2.2.3. Detailed Session Configurations
Sessions were prepared with several key principles in mind, among which are best practices when working with children with ASD. The duration of an individual session should lie between 3 and 5 minutes. A break should be observed between two successive sessions. The number of sessions per day should not exceed 7. The natural rhythm of the participant should be respected (meal and nap times). Particularly, sessions should not be too early in the morning or immediately following a meal.
The baseline was that Participant F, when prompted by a human educator only, would not imitate gestures.
Different game session configurations were tested:
Session configuration 1 (base/SC 2106-1):
The participant enters the room with Educator number 1. The door is left open. Participant F is properly positioned in front of the robot, while Educator 2 (visual inciter) is standing on one side (left) of the robot. Educator 1 tries leaving the room. Based on Participant F’s reaction, Educator 1 actually leaves the room or stays and acts as a physical inciter.
Session configuration 2 (e.g. 2106-2, 2106-5):
The participant enters the room with Educator number 1. The door is closed. Participant F is properly positioned in front of the robot, while Educator 2 (visual inciter) is standing on one (left) side of the robot. Educator 1 tries leaving the room. Based on Participant F’s reaction, Educator 1 leaves or stays and acts as a physical inciter.
See justification for session configurations below.
Many people with ASD react in an extreme way due to changes that would appear meaningless to people developing typically. Kanner describes an experiment where a child with ASD would keep on closing an open door, while the experimenter would reopen it. Until a point when the child would get very frustrated and react violently. This type of experiment could be questioned from an ethics point of view. At the time, it allowed to discover some principles determining the behavior of children with ASD. In our case, we tried two configurations: one with the door opened, the other with the door closed, hoping that the latter would be accepted by the participant, because it leaves less space for potential sources of distraction.
Session configuration 3 (e.g. 2106-4):
Same configuration as Session 2 but Educator 2 (visual inciter) is standing on the right side of the robot instead.
Justification:
This elementary change was made only to determine if, upon entrance into the room, it would be more natural for the participant to look at a person located on the right rather than the left side of the robot.
Session configuration 4 (e.g. 2206-1):
Same configuration as Session 2.
The algorithm was modified to include more occurrences of the participant’s name in the sentences of the robot.
Justification:
Calling someone by their name draws their attention.
Session configuration 5 (e.g. 2206-2):
Same configuration as Session 2.
The default song is replaced with a special song that the participant loves.
Justification:
Using a song that we know is enjoyable to the participant, increases their engagement in the session.
Session configuration 6 (e.g. 2206-3):
The participant enters the room with Educator
1. The door is closed. Participant F is properly positioned in front of the robot, while Educator 2 (visual inciter) is standing on one side (left) of the robot. Educator 1 then positions herself on the other side of the robot from Educator 2.
Justification:
Having two visual inciters instead of one visual and one physical, is a transition towards the removal of the physical inciter. The impact on the participant’s behavior, in terms of evolution of distance and orientation towards the robot, will be evaluated.
Session configuration 10 (e.g. 2306-4):
Educator 2 is not present in the room. The participant enters the room with Educator 1. Participant F is properly positioned in front of the robot. Educator 1 then leaves the room and closes the door. The imitation game is launched. After two minutes, based on the participant’s reaction, Educator 1 comes back as a physical inciter.
Justification:
The absolute need or a caregiver inside the experimental room will be evaluated.
Please note that between session configurations 6 and 10, three others were tested but no significant impact was noted. Therefore, we decided not to include those here. However, we keep the initial numbering, for better coherence with other documents we produced.
2.3. Description and Justification of Measured Indicators
The parameters that were monitored are listed below, along with the justifications for these choices:
Initial approach (F naturally walks towards the robot?): this parameter is measurable and reflects part of the feelings of the participant towards the robot. If F is afraid or not interested at all, she will not go towards the robot upon entering the room.
Distance to the robot: the variation of the distance between the participant and the robot appears as a key parameter in the evaluation of the level of the social interaction (Kim, 2014).
Head orientation: based on the literature [13] the eye contact is a key indicator of social interaction. However, since eye-tracking devices could disturb our participants, we approximated it by the head orientation.
Body orientation: the orientation of the participant’s body relative to the robot (facing, sideways) appeared as a key parameter in the evaluation of the level of social interaction.
Reaction to the robot’s voice or move: observing the adjacency pairs is important in the evaluation of the level of social interaction. The items of the pairs can be verbal (an instruction, a question) or not (a gesture, a sound, etc.)
Number of times the participant initiates physical contact with the robot: initially we listed the physical contact as an indicator of social interaction. After discussing with a psychiatrist for children with special needs, as well as a specialized educator, we realized that physical contact could have different meanings in the context of autism, not related to the existence of a social interaction per se. However, we decided to monitor it to confirm that this parameter was not directly linked with social interaction.
Signs of happiness: an interesting but subjective parameter to evaluate the appreciation of the session by the participant. Indeed, depending on the adolescent with ASD and intellectual deficit, the signs of happiness could range from a smile or a dance move, to a specific move performed with the fingers or arms, or a particular pose involving the whole body. [13] used the emotional expressions (facial, body, voice) as part of their formula to calculate the interaction level.
Imitation attempts: the purpose of the whole experiment is to improve social interactions through imitation practice, as in [4], imitation attempts needed to be monitored.
We recommend that the initial approach, reactions to the robot’s voice or moves, signs and happiness and imitation attempts be evaluated by the human observers. Indeed, the quality of the initial approach depends on various parameters such as the velocity, the orientation of the head and body, the emotional expressions, the potential stops made along the way or other signs of hesitation or fear. As of today, it remains easier for a human being than for a computing system, to perform these types of global evaluation. Reactions to the robot’s voice or moves, as well as signs of happiness, can greatly vary from one participant with ASD to another. Furthermore, wearing a facial mask prevents the identification of certain reactions and signs of happiness.
As for the imitation attempts, the nature of the selected participants entails being lenient as regards the quality of the imitation. Anyway, the aim was not to reach perfect imitation of the demonstrated gestures, but to spark off imitation attempts. The technical system may therefore not be able to detect some of the most subtle imitation attempts. However, we consider our global system as socio-technical. The human observer is part of it and can take care of some of the evaluations.
3. Results and Discussion
3.1. Analysis of the Results of the April Experimentation
During the two-day preliminary experiment in April, the goal was to assess participant F’s first reaction to the robot and test the greetings phase of the imitation game. From day 1 to day 2, Participant F’s spontaneity to approach the robot increased. This was measured through the number of seconds between the moment she entered the room and the moment she reached a spot inside the ‘close interval’ of social distance to the robot (less than 2.1 meters), where she remained for at least three seconds. On day 2, the participant was more at ease with, and interested in the robot. In particular, she would go directly towards the robot upon entrance into the room, whereas on day 1, she would remain at a distance for several tens of seconds, then would need the human caregiver’s invitation before approaching the robot. Also, on day 2, she would spend more time than on day 1, with her head and body oriented towards the robot. This corresponds to a measure of attention.
3.2. Analysis of the Results of the June Experimentation
Ten different session configurations were tested with Participant F during that three-day experiment, as described in the Materials and Methods section. The presence of a visual inciter was always needed. A physical inciter was added to the setup when necessary to increase the participant’s engagement level.
Participant G was required to be left alone in the room with the robot, which corresponds to session configuration 10, then allowed for the presence of the experimenter only. This one did not have to perform a visual or inciter role. She was just present as a friendly interlocutor to talk about the robot that naturally became a subject of joint attention, hence a factor of increased engagement.
Comparing Participants F and G, we noticed that the initial approach was very different and much easier for the latter. He was at ease from the very beginning and spent time analyzing the attributes of the robot. For instance, after going around the robot for a minute, he exclaimed, “Il a une caméra et deux capteurs de mouvements!” meaning “He has one camera and two movement sensors!”. However, Participant F needed much time and a human caregiver to get used to the robot, get closer to it, and face it before she could start engaging herself in the game. Conversely, Participant G got close to the robot very quickly, facing it and started following its instructions instantly, imitating the demonstrated gestures.
During the sessions, however, we noticed that Participant F started getting closer and closer, and spending more time looking at the robot, even facing it for several tens of seconds. She also displayed a few imitation attempts.
It was noticed that Participant F would look towards the robot or stop the stereotyped behaviors when hearing her name called.
Also, the standard song initially used was replaced with a special song that the participant loved to increase appreciation and engagement. The engagement level of the participant indeed increased when she heard one of her favorite songs. She started raising her arms, moving, dancing, heavily clapping sometimes and vocalizing. She would then follow the robot’s instructions. The more the participants enjoy the activity, the more they are engaged.
During the three-day experiment in June, Participant F’s spontaneity in approaching the robot globally increased. Furthermore, Participant F’s attention towards the robot improved. It took up to the end of the second day for Participant F to have a moment when she was close and facing the robot. The main body orientation was sideways during most sessions. It was only on the third day that we observed a session where the main orientation was facing. However, then, Participant F’s head remained bent most of the time.
Comparing day 2 of the first phase and day 1 of the second phase of the experiment, it was noticed that two months later, the participant seemed less comfortable than by the end of the previous phase. In [12], we had suggested weekly experiments for the teenagers. Specific constraints prevented us from observing that rhythm, and we had to wait for two months before conducting the second period of experiment. This had a negative impact on the level of ease of the participant with the robot.
In addition to the previous remarks, we must note that all the experiments performed showed the importance of the presence of at least one human caregiver, to accompany Participant F into the room, to help make her at ease in the environment, to repeat the robot’s instructions and follow those.
During the second phase of the experiment, on days 2 and 3, Participant F imitated when:
- the robot asked her to, and the educators helped her (induced imitation)
- the robot performed the “bye-bye” gesture (spontaneous imitation).
Regarding the imitation attempts, some may be hard to identify. For instance, the first time we noted one from Participant F, it was quite different from the demonstrated gesture (waving with the hand) but we knew she had tried, based on when the gesture occurred, and the direction of her eye gaze (Figure 6).
Below Tables 1-7 spread on the following pages, show the results of different indicators for Participant F, all along the June session.
Figure 6. Participant F clapping in front of robot.
Table 1. Initial approach towards the robot.
Results |
N |
Y |
|
|
2106-1 (SC 1) |
|
2106-5 (SC 2) |
|
|
2106-4 (SC 3) |
|
Session name and configuration |
2206-1 (SC 4) |
|
|
2206-2 (SC 5) |
|
|
2206-3 (SC 6) |
|
|
|
2306-4 (SC 10) |
Table 2. Is the participant comfortable with the robot (staying close and facing)?
Results |
N |
Y |
|
2106-1 (SC 1) |
|
|
2106-5 (SC 2) |
|
|
2106-4 (SC 3) |
|
Session name and configuration |
2206-1 (SC 4) |
|
|
|
2206-2 (SC 5) |
|
|
2206-3 (SC 6) |
|
2306-4 (SC 10) |
|
Table 3. Evolution of body orientation.
Results |
B |
S or F |
|
2106-1 (SC 1) |
|
|
|
2106-5 (SC 2) |
|
|
2106-4 (SC 3) |
Session name and configuration |
|
2206-1 (SC 4) |
|
|
2206-2 (SC 5) |
|
|
2206-3 (SC 6) |
|
2306-4 (SC 10) |
|
Table 4. Positive reactions to the voice or movements of the robot.
Results |
N |
Y |
|
2106-1 (SC 1) |
|
|
|
2106-5 (SC 2) |
|
|
2106-4 (SC 3) |
Session name and configuration |
|
2206-1 (SC 4) |
|
|
2206-2 (SC 5) |
|
|
2206-3 (SC 6) |
|
2306-4 (SC 10) |
|
Table 5. Number of physical contacts with the robot.
Results |
1 |
0 |
|
2106-1 (SC 1) |
|
|
|
2106-5 (SC 2) |
|
|
2106-4 (SC 3) |
Session name and configuration |
2206-1 (SC 4) |
|
|
|
2206-2 (SC 5) |
|
|
2206-3 (SC 6) |
|
|
2306-4 (SC 10) |
Table 6. Signs of happiness along the sessions.
Results |
N |
Y |
|
2106-1 (SC 1) |
|
|
|
2106-5 (SC 2) |
|
|
2106-4 (SC 3) |
Session name and configuration |
2206-1 (SC 4) |
|
|
|
2206-2 (SC 5) |
|
|
2206-3 (SC 6) |
|
2306-4 (SC 10) |
|
Table 7. Presence of imitation attempts.
Results |
N |
Y |
|
2106-1 (SC 1) |
|
|
|
2106-5 (SC 2) |
|
2106-4 (SC 3) |
|
Session name and configuration |
2206-1 (SC 4) |
|
|
2206-2 (SC 5) |
|
|
|
2206-3 (SC 6) |
|
2306-4 (SC 10) |
|
The main limitation of this study is the low number of participants, due to the specific context. The next step would be to involve a minimum of thirty participants to reach clinical validity. Those participants should be from different genders, ages, origins, and social backgrounds but should have a similar clinical profile. Also, since individuals with ASD are often bothered by accessories positioned on their bodies or heads, research perspectives are to find more innovative ways to measure certain useful parameters.
4. Conclusions
Our aim was to study the evolution of some parameters along robot-led experimental sessions of an imitation game with an autistic adolescent with an intellectual deficit. Practicing imitation with adolescents with autism spectrum disorder and intellectual deficits is tedious for human caregivers. They can therefore unconsciously display signs of tiredness, boredom, or impatience, which could trouble the participants with ASD. However, it is important to remind that social robots are assistive tools to facilitate the work of parents and field educators but are not meant to replace them [14]. Two experimental sessions were led, one over two days in April, and the other over three days in June. During the first session, one 17-year-old autistic girl, with intellectual deficit, called Participant F, discovered the robot and experimented with the greetings phase only, in order to establish first contact with it. During the second one, Participant F went through the four phases of the imitation game: greetings, pairing, imitation and closing. Additionally, Participant G, an adolescent with Asperger syndrome (autism spectrum disorder, no intellectual deficit) also experimented with the four phases of the game. Comparing Participants F and G, we noticed that the initial approach was very different and much easier for the latter. He was at ease from the very beginning and spent time analyzing the attributes of the robot. However, Participant F needed much time and a human caregiver to get used to the robot, get closer to it, and face it before she could start engaging herself in the game. Conversely, Participant G got close to the robot very quickly, facing it, and started following its instructions instantly, imitating the demonstrated gestures. This great difference between the reactions of both participants confirmed that it is relevant to specify “with intellectual deficit” in the description of our target population. Experimental results would not be homogeneous for most adolescents with ASD. During the sessions, we noticed that Participant F started getting closer and closer and spending more time looking at the robot, even facing it for several tens of seconds. She also displayed a few imitation attempts, which was not usual for her, even with a human interaction partner. These results are reassuring and show that it can be envisioned that we can use a robot as an assistant for long-term imitation practice with our target population to improve social interactions with them.
Future work should include experimental sessions of the imitation game with at least thirty adolescents with ASD and intellectual deficit, once to three times a week, over a period of six months minimum. These figures are based on the literature as well as feedback from professionals and would allow for the results to be considered as clinically valid. Ideally, the participants should be of diverse social and cultural origins and live in different countries. This would allow for the generalization or the invalidation of some of our preliminary results.
Authors’ Contributions
All authors contributed to the study conception/design and/or the writing/revision of the paper. Olivier Asseu, as the PhD Director, impacted the study design. Material preparation and data collection were performed by Linda N. Vallée. Malik Koné’s contribution was very important in the data analysis phase. A student called Fousseny Coulibaly helped as a research assistant in various tasks. The first draft of the manuscript was written by Linda N. Vallée and all authors commented on subsequent versions of the manuscript. All comments were considered for this final version.
Compliance with Ethical Standards
The authors confirm that, to the best of their knowledge, their work conform to ethical standards.