Abstract:
Many research experiments with large data processing requirements rely on massive,
distributed Computing Grids for their computational requirements. A Computing Grid
is built by combining a large number of individual computing sites distributed globally.
These Grid sites are maintained by different institutions across the world and contribute
thousands of worker nodes possessing different capabilities and configurations.
Developing software for Grid operations that works on all nodes while harnessing the
maximum capabilities offered by any given Grid site is challenging without knowing
what capabilities each site offers in advance. This research focuses on developing an
architecture-independent Grid infrastructure monitoring design to monitor the infrastructure
capabilities and configurations of worker nodes at sites across a Computing
Grid without the need to contact local site administrators. The design presents a highly
flexible and extensible architecture that offers infrastructure metric collection without
local agent installations at Grid sites. The resulting design is used to implement a Grid
infrastructure monitoring framework called “Site Sonar v2.0” that is currently being
used to monitor the infrastructure of 7,000+ worker nodes across 60+ Grid sites in the
ALICE Computing Grid. The proposed design is then used to introduce an improved
Job matching architecture for Computing Grids that allows job matching based on any
infrastructure property of the worker nodes. This dissertation introduces the proposed
architecture for a highly flexible and extensible Grid infrastructure monitoring design
and an improved job design for Computing Grids and the implementation of those designs
to derive important findings about the infrastructure of ALICE Computing Grid
while improving its job matching capabilities. This work provides a significant contribution
to the development of distributed Computing Grids, particularly in terms of
providing a more efficient and effective way to monitor infrastructure and match jobs
to worker nodes.
Citation:
Wijethunga, R.M.K.D. (2023). Flexible and extensible infrastructure monitoring architecture for computing grids with infrastructure aware job matching [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/22213