Browsing by Author "Arditi, Emir"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Conference paperPublication Metadata only Collective voice of experts in multilateral negotiation(Springer International Publishing, 2017) Güneş, Taha Doğan; Arditi, Emir; Aydoğan, Reyhan; Computer Science; AYDOĞAN, Reyhan; Güneş, Taha Doğan; Arditi, EmirInspired from the ideas such as “algorithm portfolio”, “mixture of experts”, and “genetic algorithm”, this paper presents two novel negotiation strategies, which combine multiple negotiation experts to decide what to bid and what to accept during the negotiation. In the first approach namely incremental portfolio, a bid is constructed by asking each negotiation agent’s opinion in the portfolio and picking one of the suggestions stochastically considering the expertise levels of the agents. In the second approach namely crossover strategy, each expert agent makes a bid suggestion and a majority voting is used on each issue value to decide the bid content. The proposed approaches have been evaluated empirically and our experimental results showed that the crossover strategy outperformed the top five finalists of the ANAC 2016 Negotiation Competition in terms of the obtained average individual utility.Master ThesisPublication Metadata only Explorations on inverse reinforcement learning for the analysis of motor control and cognitive decision making mechanisms of the brainArditi, Emir; Öztop, Erhan; Öztop, Erhan; Aydoğan, Reyhan; Uğur, E.; Department of Computer Science; Arditi, EmirReinforcement Learning is a framework for generating optimal policies given a task and a reward/punishment structure. Likewise, Inverse Reinforcement Learning, as the name suggests, is used for recovering the reasoning behind an optimal policy based on demonstrations from an expert. We set out to explore whether recent Reinforcement Learning and Inverse Reinforcement Learning methods can serve as a computational tool for investigating optimality principles of motor control and cognitive decision-making mechanisms of the brain. For this purpose, we have targeted several different tasks involved with different parts of the sensorimotor learning mechanism of the brain. We aim to recover the optimality principles employed by the brain for various control and decision-making tasks. If this is achieved, we can analyze, understand, mimic and improve demonstrated behavior with less bias, which we hope is a step forward in understanding the process of learning in both human-based and artificial systems. For the scope of this thesis, we have evaluated two tasks. The first task was investigating the applicability of perceptual development for Reinforcement Learning. For this task, we have proposed a perceptual development based learning regime for a Reinforcement Learning agent, and the results obtained suggest that a suitable perceptual development regime may improve the learning progress and yield better-performing agents. The second task was to predict reward function parameters of a provided trajectory in a standing up under perturbation scenario. For this task, we have proposed two different Inverse Reinforcement Learning approaches. Our results indicate that we were able to infer valid reward parameters on synthetic data.Conference paperPublication Metadata only Inferring cost functions using reward parameter search and policy gradient reinforcement learning(IEEE, 2021) Arditi, Emir; Kunavar, T.; Ugur, E.; Babic, J.; Öztop, Erhan; Computer Science; ÖZTOP, ErhanThis study focuses on inferring cost functions of obtained movement data using reward parameter search and policy gradient based Reinforcement Learning (RL). The behavior data for this task is obtained through a series of squat-to-stand movements of human participants under dynamic perturbations. The key parameter searched in the cost function is the weight of total torque used in performing the squat-to-stand action. An approximate model is used to learn squat-to-stand movements via a policy gradient method, namely Proximal Policy Optimization(PPO). A behavioral similarity metric based on Center of Mass(COM) is used to find the most likely weight parameter. The stochasticity in the training result of PPO is dealt with multiple runs, and as a result, a reasonable and a stable Inverse Reinforcement Learning(IRL) algorithm is obtained in terms of performance. The results indicate that for some participants, the reward function parameters of the experts were inferred successfully.