Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access

Siddiqi, U. F.; Sait, S. M.; Uysal, Murat

dc.contributor.author	Siddiqi, U. F.
dc.contributor.author	Sait, S. M.
dc.contributor.author	Uysal, Murat
dc.date.accessioned	2021-03-08T11:21:47Z
dc.date.available	2021-03-08T11:21:47Z
dc.date.issued	2020
dc.identifier.issn	2169-3536	en_US
dc.identifier.uri	http://hdl.handle.net/10679/7370
dc.identifier.uri	https://ieeexplore.ieee.org/document/9262953
dc.description.abstract	NOMA is a radio access technique that multiplexes several users over the frequency resource and provides high throughput and fairness among different users. The maximization of the minimum the data-rate, also known as max-min, is a popular approach to ensure fairness among the users. NOMA optimizes the transmission power (or power-coefficients) of the users to perform max-min. The problem is a constrained non-convex optimization for users greater than two. We propose to solve this problem using the Double Deep Q Learning (DDQL) technique, a popular method of reinforcement learning. The DDQL technique employs a Deep Q- Network to learn to choose optimal actions to optimize users' power-coefficients. The model of the Markov Decision Process (MDP) is critical to the success of the DDQL method, and helps the DQN to learn to take better actions. An MDP model is proposed in which the state consists of the power-coefficients values, data-rate of users, and vectors indicating which of the power-coefficients can be increased or decreased. An action simultaneously increases the power-coefficient of one user and reduces another user's power-coefficient by the same amount. The amount of change can be small or large. The action-space contains all possible ways to alter the values of any two users at a time. DQN consists of a convolutional layer and fully connected layers. We compared the proposed method with the sequential least squares programming and trust-region constrained algorithms and found that the proposed method can produce competitive results.	en_US
dc.description.sponsorship	Deanship of Scientific Research, King Fahd University of Petroleum & Minerals, Dhahran, Saudi Arabia
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.relation.ispartof	IEEE Access
dc.rights	openAccess
dc.title	Deep reinforcement based power allocation for the max-min optimization in non-orthogonal multiple access	en_US
dc.type	Article	en_US
dc.description.version	Publisher version
dc.peerreviewed	yes	en_US
dc.publicationstatus	Published	en_US
dc.contributor.department	Özyeğin University
dc.contributor.authorID	(ORCID 0000-0001-5945-0813 & YÖK ID 124615) Uysal, Murat
dc.contributor.ozuauthor	Uysal, Murat
dc.identifier.volume	8	en_US
dc.identifier.startpage	211235	en_US
dc.identifier.endpage	211247	en_US
dc.identifier.wos	WOS:000596356100001
dc.identifier.doi	10.1109/ACCESS.2020.3038923	en_US
dc.subject.keywords	NOMA	en_US
dc.subject.keywords	Optimization	en_US
dc.subject.keywords	Silicon carbide	en_US
dc.subject.keywords	Resource management	en_US
dc.subject.keywords	Task analysis	en_US
dc.subject.keywords	Relays	en_US
dc.subject.keywords	Reinforcement learning	en_US
dc.subject.keywords	Non-orthogonal multiplexing	en_US
dc.subject.keywords	Double deep Q learning	en_US
dc.subject.keywords	Deep reinforcement learning	en_US
dc.subject.keywords	Non-convex optimization	en_US
dc.subject.keywords	Power-domain NOMA	en_US
dc.identifier.scopus	SCOPUS:2-s2.0-85097130085
dc.contributor.authorMale	1
dc.relation.publicationcategory	Article - International Refereed Journal - Institutional Academic Staff