Weight update skipping: Reducing training time for artificial neural networks

Safayenikoo, P.; Aktürk, İsmail

Publication:
Weight update skipping: Reducing training time for artificial neural networks

dc.contributor.author	Safayenikoo, P.
dc.contributor.author	Aktürk, İsmail
dc.contributor.department	Computer Science
dc.contributor.ozuauthor	AKTÜRK, Ismail
dc.date.accessioned	2021-12-06T08:29:49Z
dc.date.available	2021-12-06T08:29:49Z
dc.date.issued	2021-12
dc.description.abstract	Artificial Neural Networks (ANNs) are known as state-of-the-art techniques in Machine Learning (ML) and have achieved outstanding results in data-intensive applications, such as recognition, classification, and segmentation. These networks mostly use deep layers of convolution and/or fully connected layers with many filters in each layer, demanding a large amount of data and tunable hyperparameters to achieve competitive accuracy. As a result, storage, communication, and computational costs of training (in particular time spent for training) become limiting factors to scale them up. In this paper, we propose a new training methodology for ANNs that exploits the observation of improvement of accuracy shows temporal variations which allow us to skip updating weights when the variation is minuscule. During such time windows, we keep updating bias which ensures the network still trains and avoids overfitting; however, we selectively skip updating weights (and their time-consuming computations). This training approach virtually achieves the same accuracy with considerably less computational cost and reduces the time spent on training. We developed two variations of the proposed training method for selectively updating weights, and call them as i) Weight Update Skipping (WUS), and ii) Weight Update Skipping with Learning Rate Scheduler (WUS+LR). We evaluate these two approaches by analyzing state-of-the-art models, including AlexNet, VGG-11, VGG-16, ResNet-18 on CIFAR datasets. We also use ImageNet dataset for AlexNet, VGG-16, and Resnet-18. On average, WUS and WUS+LR reduced the training time (compared to the baseline) by 54%, and 50% on CPU and 22%, and 21% on GPU, respectively for CIFAR-10; and 43% and 35% on CPU and 22%, and 21% on GPU, respectively for CIFAR-100; and finally 30% and 27% for ImageNet, respectively.
dc.identifier.doi	10.1109/JETCAS.2021.3127907
dc.identifier.endpage	574
dc.identifier.issue	4
dc.identifier.scopus	2-s2.0-85119441655
dc.identifier.startpage	563
dc.identifier.uri	http://hdl.handle.net/10679/7656
dc.identifier.uri	https://doi.org/10.1109/JETCAS.2021.3127907
dc.identifier.volume	11
dc.identifier.wos	000730514000007
dc.language.iso	eng
dc.peerreviewed	yes
dc.publicationstatus	Published
dc.publisher	IEEE
dc.relation.publicationcategory	International
dc.rights	restrictedAccess
dc.subject.keywords	Artificial neural networks
dc.subject.keywords	Training time
dc.subject.keywords	Temporal variation
dc.subject.keywords	Weight update
dc.title	Weight update skipping: Reducing training time for artificial neural networks
dc.type	article
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	85662e71-2a61-492a-b407-df4d38ab90d7
relation.isOrgUnitOfPublication.latestForDiscovery	85662e71-2a61-492a-b407-df4d38ab90d7

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.45 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Computer Science

Publication: Weight update skipping: Reducing training time for artificial neural networks

Files

License bundle

Collections

Publication:
Weight update skipping: Reducing training time for artificial neural networks