dc.contributor.advisor |
Perera L |
|
dc.contributor.author |
De Silva HWIU |
|
dc.date.accessioned |
2022 |
|
dc.date.available |
2022 |
|
dc.date.issued |
2022 |
|
dc.identifier.citation |
De Silva, H.W.I.U. (2022). Building explanatory models for road crash analysis using data science and machine learning technologies [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/19697 |
|
dc.identifier.uri |
http://dl.lib.uom.lk/handle/123/19697 |
|
dc.description.abstract |
Over three thousand people die annually on the roads of Sri Lanka due to traffic crashes. This
is a massive socio and economic problem faced by the country. Road crashes globally cause
more than 1.3 million fatalities every year and are the eighth leading cause of death worldwide.
Traditionally, road traffic crash analysis and accident modeling resorted to regression models
and discrete choice models based on past data. Many countermeasures have been identified
and implemented addressing the issues highlighted through such models.
Since road traffic crashes occur across space and time, the conventional numerical approaches
have failed to provide alerts and insights in relation to geospatial regions. Also, having to
handcraft these models limits the explainability that can be leveraged with the help of advanced
tools and techniques available in modern data science and machine learning disciplines.
Further, the disjointed efforts in building analytical models or geospatial models on available
crash data (e.g., crash hotspot identification) limit road agencies’ abilities in prioritizing funds
allocation for more impactful improvements. Due to the difficulty in identifying patterns in
causal factors of accident risks using conventional or isolated methods, the authorities also find
it difficult to prioritize their staff strength in high-risk areas.
The combination of exploratory data analysis (EDA), machine learning models, and modern
geospatial visualization tools offer a unique opportunity to fill these gaps cost-effectively. This
study presents an application of the latest data science and machine learning technologies to
build explanatory models that help analyze road crashes. Popular packages written in Python
and Javascript programming languages were used. Pandas and SweetViz libraries provided
simple, yet powerful EDA. GeoPandas library provided the ability to process GPS locations
(latitude and longitude) while Matplotlib was used to generate static maps. Folium library and
the underlying Leaflet.js library were applied to generate interactive maps to help visualize
crash hot spots. Two leading gradient boosting techniques, namely LightGBM and Catboost
were applied to build models that highlight causal factors via feature importance estimation
methods.
The study developed algorithms, methods, and charts to generate attribute correlation and
gradient boosted decision tree models to relate accident severity with recorded data sets and
interactions of certain aggregate features (e.g., weather, and light condition). The visualization
efforts produced road crash density maps by administrative region size and population
Interactive maps that allow authorities to drill down (or zoom in) to hot spots were also
developed.
The programmatic approach developed in this study enables the repeatable application of the
explanatory analysis and visualizations to new and old datasets with minimal effort. The
findings from the study lay the foundation for a digital system that can be easily converted to
an online platform for road and enforcement agencies to obtain reports and alerts on road crash
risks and hot spots. The application was tested using crash data in Sri Lanka and the outcomes
are presented in this study.
Future work on the fusion of multiple data sources such as real-time weather data and traffic
congestion levels onto the same platform can enhance these outcomes to even near real-time
crash prediction to further assist proactive accident prevention measures. |
en_US |
dc.language.iso |
en |
en_US |
dc.subject |
ROAD SAFETY |
en_US |
dc.subject |
EXPLANATORY MODELS |
en_US |
dc.subject |
GEOSPATIAL CRASH VISUALIZATION |
en_US |
dc.subject |
MULTI-FACETED ANALYSIS |
en_US |
dc.subject |
ROAD CRASHES |
en_US |
dc.subject |
EXPLORATORY DATA ANALYSIS |
en_US |
dc.subject |
MACHINE LEARNING CRASH MODELS |
en_US |
dc.subject |
TRANSPORTATION - Dissertation |
en_US |
dc.subject |
CIVIL ENGINEERING - Dissertation |
en_US |
dc.title |
Building explanatory models for road crash analysis using data science and machine learning technologies |
en_US |
dc.type |
Thesis-Abstract |
en_US |
dc.identifier.faculty |
Engineering |
en_US |
dc.identifier.degree |
M.Sc. in Transportation |
en_US |
dc.identifier.department |
Department of Civil Engineering |
en_US |
dc.date.accept |
2022 |
|
dc.identifier.accno |
TH4919 |
en_US |