Computer Science

Permanent URI for this collectionhttps://hdl.handle.net/10679/43

Browse

Now showing 1 - 5 of 5

Open Access
Deepsym: Deep symbol generation and rule learning for planning from unsupervised robot interaction
(AI Access Foundation, 2022) Ahmetoglu, A.; Seker, M. Y.; Piater, J.; Öztop, Erhan; Ugur, E.; Computer Science; ÖZTOP, Erhan
Symbolic planning and reasoning are powerful tools for robots tackling complex tasks. However, the need to manually design the symbols restrict their applicability, especially for robots that are expected to act in open-ended environments. Therefore symbol formation and rule extraction should be considered part of robot learning, which, when done properly, will offer scalability, flexibility, and robustness. Towards this goal, we propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning. Our robot interacts with objects using an initial action repertoire that is assumed to be acquired earlier and observes the effects it can create in the environment. To form action-grounded object, effect, and relational categories, we employ a binary bottleneck layer in a predictive, deep encoder-decoder network that takes the image of the scene and the action applied as input, and generates the resulting effects in the scene in pixel coordinates. After learning, the binary latent vector represents action-driven object categories based on the interaction experience of the robot. To distill the knowledge represented by the neural network into rules useful for symbolic reasoning, a decision tree is trained to reproduce its decoder function. Probabilistic rules are extracted from the decision paths of the tree and are represented in the Probabilistic Planning Domain Definition Language (PPDDL), allowing off-the-shelf planners to operate on the knowledge extracted from the sensorimotor experience of the robot. The deployment of the proposed approach for a simulated robotic manipulator enabled the discovery of discrete representations of object properties such as 'rollable' and 'insertable'. In turn, the use of these representations as symbols allowed the generation of effective plans for achieving goals, such as building towers of the desired height, demonstrating the effectiveness of the approach for multi-step object manipulation. Finally, we demonstrate that the system is not only restricted to the robotics domain by assessing its applicability to the MNIST 8-puzzle domain in which learned symbols allow for the generation of plans that move the empty tile into any given position.
Metadata only
Developmental scaffolding with large language models
(IEEE, 2023) Çelik, B.; Ahmetoglu, A.; Ugur, E.; Öztop, Erhan; Computer Science; ÖZTOP, Erhan
Exploration and self-observation are key mechanisms of infant sensorimotor development. These processes are further guided by parental scaffolding to accelerate skill and knowledge acquisition. In developmental robotics, this approach has been adopted often by having a human acting as the source of scaffolding. In this study, we investigate whether Large Language Models (LLMs) can act as a scaffolding agent for a robotic system that aims to learn to predict the effects of its actions. To this end, an object manipulation setup is considered where one object can be picked and placed on top of or in the vicinity of another object. The adopted LLM is asked to guide the action selection process through algorithmically generated state descriptions and action selection alternatives in natural language. The simulation experiments that include cubes in this setup show that LLM-guided (GPT3.5-guided) learning yields significantly faster discovery of novel structures compared to random exploration. However, we observed that GPT3.5 fails to effectively guide the robot in generating structures with different affordances such as cubes and spheres. Overall, we conclude that even without fine-tuning, LLMs may serve as a moderate scaffolding agent for improving robot learning, however, they still lack affordance understanding which limits the applicability of the current LLMs in robotic scaffolding tasks.
Metadata only
Discovering predictive relational object symbols with symbolic attentive layers
(IEEE, 2024-02-01) Ahmetoglu, A.; Celik, B.; Öztop, Erhan; Uğur, E.; Computer Science; ÖZTOP, Erhan
In this letter, we propose and realize a new deep learning architecture for discovering symbolic representations for objects and their relations based on the self-supervised continuous interaction of a manipulator robot with multiple objects in a tabletop environment. The key feature of the model is that it can take a changing number of objects as input and map the object-object relations into symbolic domain explicitly. In the model, we employ a self-attention layer that computes discrete attention weights from object features, which are treated as relational symbols between objects. These relational symbols are then used to aggregate the learned object symbols and predict the effects of executed actions on each object. The result is a pipeline that allows the formation of object symbols and relational symbols from a dataset of object features, actions, and effects in an end-to-end manner. We compare the performance of our proposed architecture with state-of-the-art symbol discovery methods in a simulated tabletop environment where the robot needs to discover symbols related to the relative positions of objects to predict the action's result. Our experiments show that the proposed architecture performs better than other baselines in effect prediction while forming not only object symbols but also relational symbols.
Metadata only
High-level features for resource economy and fast learning in skill transfer
(Taylor & Francis, 2022) Ahmetoglu, A.; Uğur, E.; Asada, M.; Öztop, Erhan; Computer Science; ÖZTOP, Erhan
Abstraction is an important aspect of intelligence which enables agents to construct robust representations for effective and efficient decision making. Although, deep neural networks are proven to be effective learning systems due to their ability to form increasingly complex abstractions at successive layers these abstractions are mostly distributed over many neurons, making the re-use of a learned skill costly and blind to the insights that can be obtained on the emergent representations. For avoiding designer bias and unsparing resource use, we propose to exploit neural response dynamics to form compact representations to use in skill transfer. For this, we consider two competing methods based on (1) maximum information compression principle and (2) the notion that abstract events tend to generate slowly changing signals, and apply them to the neural signals generated during task execution. To be concrete, in our simulation experiments, we either apply principal component analysis (PCA) or slow feature analysis (SFA) on the signals collected from the last hidden layer of a deep neural network while it performs a source task, and use these features for skill transfer in a new, target, task. We then compare the generalization and learning performance of these alternatives with the baselines of skill transfer with full layer output and no-transfer settings. Our experimental results on a simulated tabletop robot arm navigation task show that units that are created with SFA are the most successful for skill transfer. SFA as well as PCA, incur less resources compared to usual skill transfer where full layer outputs are used in the new task learning, whereby many units formed show a localized response reflecting end-effector-obstacle-goal relations. Finally, SFA units with the lowest eigenvalues resemble symbolic representations that highly correlate with high-level features such as joint angles and end-effector position which might be thought of as precursors for fully symbolic systems.
Open Access
Imitation and mirror systems in robots through Deep Modality Blending Networks
(Elsevier, 2022-02) Seker, M. Y.; Ahmetoglu, A.; Nagai, Y.; Asada, M.; Öztop, Erhan; Ugur, E.; Computer Science; ÖZTOP, Erhan
Learning to interact with the environment not only empowers the agent with manipulation capability but also generates information to facilitate building of action understanding and imitation capabilities. This seems to be a strategy adopted by biological systems, in particular primates, as evidenced by the existence of mirror neurons that seem to be involved in multi-modal action understanding. How to benefit from the interaction experience of the robots to enable understanding actions and goals of other agents is still a challenging question. In this study, we propose a novel method, deep modality blending networks (DMBN), that creates a common latent space from multi-modal experience of a robot by blending multi-modal signals with a stochastic weighting mechanism. We show for the first time that deep learning, when combined with a novel modality blending scheme, can facilitate action recognition and produce structures to sustain anatomical and effect-based imitation capabilities. Our proposed system, which is based on conditional neural processes, can be conditioned on any desired sensory/motor value at any time step, and can generate a complete multi-modal trajectory consistent with the desired conditioning in one-shot by querying the network for all the sampled time points in parallel avoiding the accumulation of prediction errors. Based on simulation experiments with an arm-gripper robot and an RGB camera, we showed that DMBN could make accurate predictions about any missing modality (camera or joint angles) given the available ones outperforming recent multimodal variational autoencoder models in terms of long-horizon high-dimensional trajectory predictions. We further showed that given desired images from different perspectives, i.e. images generated by the observation of other robots placed on different sides of the table, our system could generate image and joint angle sequences that correspond to either anatomical or effect-based imitation behavior. To achieve this mirror-like behavior, our system does not perform a pixel-based template matching but rather benefits from and relies on the common latent space constructed by using both joint and image modalities, as shown by additional experiments. Moreover, we showed that mirror learning (in our system) does not only depend on visual experience and cannot be achieved without proprioceptive experience. Our experiments showed that out of ten training scenarios with different initial configurations, the proposed DMBN model could achieve mirror learning in all of the cases where the model that only uses visual information failed in half of them. Overall, the proposed DMBN architecture not only serves as a computational model for sustaining mirror neuron-like capabilities, but also stands as a powerful machine learning architecture for high-dimensional multi-modal temporal data with robust retrieval capabilities operating with partial information in one or multiple modalities.

Browse

Browsing by Author "Ahmetoglu, A."