Imitation and mirror systems in robots through Deep Modality Blending Networks

Seker, M. Y.; Ahmetoglu, A.; Nagai, Y.; Asada, M.; Öztop, Erhan; Ugur, E.

Publication:
Imitation and mirror systems in robots through Deep Modality Blending Networks

dc.contributor.author	Seker, M. Y.
dc.contributor.author	Ahmetoglu, A.
dc.contributor.author	Nagai, Y.
dc.contributor.author	Asada, M.
dc.contributor.author	Öztop, Erhan
dc.contributor.author	Ugur, E.
dc.contributor.department	Computer Science
dc.contributor.ozuauthor	ÖZTOP, Erhan
dc.date.accessioned	2023-04-24T12:00:28Z
dc.date.available	2023-04-24T12:00:28Z
dc.date.issued	2022-02
dc.description.abstract	Learning to interact with the environment not only empowers the agent with manipulation capability but also generates information to facilitate building of action understanding and imitation capabilities. This seems to be a strategy adopted by biological systems, in particular primates, as evidenced by the existence of mirror neurons that seem to be involved in multi-modal action understanding. How to benefit from the interaction experience of the robots to enable understanding actions and goals of other agents is still a challenging question. In this study, we propose a novel method, deep modality blending networks (DMBN), that creates a common latent space from multi-modal experience of a robot by blending multi-modal signals with a stochastic weighting mechanism. We show for the first time that deep learning, when combined with a novel modality blending scheme, can facilitate action recognition and produce structures to sustain anatomical and effect-based imitation capabilities. Our proposed system, which is based on conditional neural processes, can be conditioned on any desired sensory/motor value at any time step, and can generate a complete multi-modal trajectory consistent with the desired conditioning in one-shot by querying the network for all the sampled time points in parallel avoiding the accumulation of prediction errors. Based on simulation experiments with an arm-gripper robot and an RGB camera, we showed that DMBN could make accurate predictions about any missing modality (camera or joint angles) given the available ones outperforming recent multimodal variational autoencoder models in terms of long-horizon high-dimensional trajectory predictions. We further showed that given desired images from different perspectives, i.e. images generated by the observation of other robots placed on different sides of the table, our system could generate image and joint angle sequences that correspond to either anatomical or effect-based imitation behavior. To achieve this mirror-like behavior, our system does not perform a pixel-based template matching but rather benefits from and relies on the common latent space constructed by using both joint and image modalities, as shown by additional experiments. Moreover, we showed that mirror learning (in our system) does not only depend on visual experience and cannot be achieved without proprioceptive experience. Our experiments showed that out of ten training scenarios with different initial configurations, the proposed DMBN model could achieve mirror learning in all of the cases where the model that only uses visual information failed in half of them. Overall, the proposed DMBN architecture not only serves as a computational model for sustaining mirror neuron-like capabilities, but also stands as a powerful machine learning architecture for high-dimensional multi-modal temporal data with robust retrieval capabilities operating with partial information in one or multiple modalities.	en_US
dc.description.sponsorship	European Union’s Horizon 2020 ; Japan Science and Technology Agency, Japan CREST "Cognitive Mirroring" ; International Joint Research Promotion Program of Osaka University, Japan un-der the project "Developmentally and biologically realistic mod-eling of perspective invariant action understanding" ; Turkish Directorate of Strategy and Budget under the TAM
dc.description.version	Publisher version	en_US
dc.identifier.doi	10.1016/j.neunet.2021.11.004	en_US
dc.identifier.endpage	35	en_US
dc.identifier.issn	0893-6080	en_US
dc.identifier.scopus	2-s2.0-85119897054
dc.identifier.startpage	22	en_US
dc.identifier.uri	http://hdl.handle.net/10679/8136
dc.identifier.uri	https://doi.org/10.1016/j.neunet.2021.11.004
dc.identifier.volume	146	en_US
dc.identifier.wos	000726608700003
dc.language.iso	eng	en_US
dc.peerreviewed	yes	en_US
dc.publicationstatus	Published	en_US
dc.publisher	Elsevier	en_US
dc.relation	info:eu-repo/grantAgreement/EC/H2020/731761
dc.relation.ispartof	Neural Networks
dc.relation.publicationcategory	International Refereed Journal
dc.rights	openAccess
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.keywords	Imitation learning	en_US
dc.subject.keywords	Multimodal learning	en_US
dc.subject.keywords	Representation learning	en_US
dc.subject.keywords	Robot learning	en_US
dc.title	Imitation and mirror systems in robots through Deep Modality Blending Networks	en_US
dc.type	article	en_US
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	85662e71-2a61-492a-b407-df4d38ab90d7
relation.isOrgUnitOfPublication.latestForDiscovery	85662e71-2a61-492a-b407-df4d38ab90d7

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Imitation and mirror systems in robots through Deep Modality Blending Networks.pdf
Size:: 3.06 MB
Format:: Adobe Portable Document Format
Description:

Download