Show simple item record

dc.contributor.authorKayış, Enis
dc.date.accessioned2023-05-23T07:13:44Z
dc.date.available2023-05-23T07:13:44Z
dc.date.issued2021
dc.identifier.isbn978-303083013-7
dc.identifier.issn1865-0929en_US
dc.identifier.urihttp://hdl.handle.net/10679/8323
dc.identifier.urihttps://link.springer.com/chapter/10.1007/978-3-030-83014-4_8
dc.description.abstractMultiple linear regression is the method of quantifying the effects of a set of independent variables on a dependent variable. In clusterwise linear regression problems, the data points with similar regression estimates are grouped into the same cluster either due to a business need or to increase the statistical significance of the resulting regression estimates. In this paper, we consider an extension of this problem where data points belonging to the same category should belong to the same partition. For large datasets, finding the exact solution is not possible and many heuristics requires an exponentially increasing amount of time in the number of categories. We propose variants of gradient descent based heuristic to provide high-quality solutions within a reasonable time. The performances of our heuristics are evaluated across 1014 simulated datasets. We find that the comparative performance of the base gradient descent based heuristic is quite good with an average percentage gap of 0.17 % when the number of categories is less than 60. However, starting with a fixed initial partition and restricting cluster assignment changes to be one-directional speed up heuristic dramatically with a moderate decrease in solution quality, especially for datasets with a multiple number of predictors and a large number of datasets. For example, one could generate solutions with an average percentage gap of 2.81 % in one-tenth of the time for datasets with 400 categories.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.relation.ispartofInternational Conference on Data Management Technologies and Applications DATA 2020: Data Management Technologies and Applications, Part of the Communications in Computer and Information Science book series (CCIS,volume 1446)
dc.rightsrestrictedAccess
dc.titleDesigning an efficient gradient descent based heuristic for clusterwise linear regression for large datasetsen_US
dc.typeConference paperen_US
dc.publicationstatusPublisheden_US
dc.contributor.departmentÖzyeğin University
dc.contributor.authorID(ORCID 0000-0001-8282-5572 & YÖK ID 29747) Kayış, Enis
dc.contributor.ozuauthorKayış, Enis
dc.identifier.volume1446en_US
dc.identifier.startpage154en_US
dc.identifier.endpage171en_US
dc.identifier.doi10.1007/978-3-030-83014-4_8en_US
dc.subject.keywordsClusterwise linear regressionen_US
dc.subject.keywordsGradient descenten_US
dc.subject.keywordsHeuristicsen_US
dc.identifier.scopusSCOPUS:2-s2.0-85113289375
dc.relation.publicationcategoryConference Paper - International - Institutional Academic Staff


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record


Share this page