dc.contributor.advisor |
Ranathunga S |
|
dc.contributor.author |
Sewwandi KAU |
|
dc.date.accessioned |
2022 |
|
dc.date.available |
2022 |
|
dc.date.issued |
2022 |
|
dc.identifier.citation |
Sewwandi, K.A.U. (2022). Duplicate bug report detection using pre - trained language models [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21592 |
|
dc.identifier.uri |
http://dl.lib.uom.lk/handle/123/21592 |
|
dc.description.abstract |
Software testing and defect reporting are significant factors of software development
and maintenance. Defects are identified and reported in a bug tracking system like
JIRA, or Bugzilla. Those reported defects are further triaged by an expert who has an
understanding of the repository, system, and developers and assigns them to the
developers to fix them. During this defect reporting there can be duplicate bugs
reported and identifying duplicate bugs is a crucial task. Manual labeling of duplicate
defects is time-consuming, may identify defects as duplicate bug reports, and also
increases the cost of software maintenance. Therefore automated duplicate bug report
detection is very significant. This research proposes a duplicate bug report
classification methodology that leverages the Pre-trained language models BERT and
XLNet with Multi-Layer Perceptron as the Deep Learning classifier for duplicate bug
detection. We tested on publicly available datasets related to Eclipse, NetBeans, and
OpenOffice bug reporting datasets. The selected models were shown to outperform
the previously proposed systems for the same task. Among them, the approach used
with BERT embeddings has shown the best results. Further experiments showed that
BERT is capable of domain adaptation –meaning that even when the BERT was finetuned
with
different
bug
report
datasets,
it
is
still
capable
of
detecting
duplicate
bugs
in
an unseen dataset. Finally, a multi-stage classification was done using a
Convolutional Neural Network model and a BERT model using Eclipse and
NetBeans datasets and a combined dataset of Eclipse and NetBeans. The approach
used with the combined dataset has outperformed the baseline approach. |
en_US |
dc.language.iso |
en |
en_US |
dc.subject |
DUPLICATE BUG DETECTION |
en_US |
dc.subject |
BERT |
en_US |
dc.subject |
XLNET |
en_US |
dc.subject |
MLP |
en_US |
dc.subject |
CNN |
en_US |
dc.subject |
DOMAIN ADAPTATION |
en_US |
dc.subject |
MULTI-STAGE CLASSIFICATION |
en_US |
dc.subject |
COMPUTER SCIENCE & ENGINEERING -Dissertation |
en_US |
dc.subject |
COMPUTER SCIENCE -Dissertation |
en_US |
dc.subject |
INFORMATION TECHNOLOGY -Dissertation |
en_US |
dc.title |
Duplicate bug report detection using pre - trained language models |
en_US |
dc.type |
Thesis-Abstract |
en_US |
dc.identifier.faculty |
Engineering |
en_US |
dc.identifier.degree |
MSc In Computer Science and Engineering |
en_US |
dc.identifier.department |
Department of Computer Science and Engineering |
en_US |
dc.date.accept |
2022 |
|
dc.identifier.accno |
TH4977 |
en_US |