Training a CNN-based classifier (or a deep learning model in general) or running a Machine Learning Spark application are time-consuming processes that can run for days. There are several parameters of this process that can be tuned and influence the application execution time and the accuracy of the obtained models: to correctly select them, information about the expected execution time should be available. This information is obtained by using performance modelling.
ATMOSPHERE has developed several machine learning regression techniques that can be used to build performance models for cloud based applications and provides an open source library which automatically compares several techniques, supports feature augmentation and selection and hyper-parameters tuning (https://github.com/eubr-atmosphere/a-MLLibrary). In its final release the library introduces parallelization at the top level, i.e., and all the techniques evaluations can be run fully in parallel (even if the underlying Python scikit-learn toolkit code is single threaded).
Performance models are also integrated within a capacity planning tool (https://github.com/eubr-atmosphere/opt_ic) that efficiently and effectively explores the space of alternative Cloud configurations, seeking the minimum cost deployment that satisfies an a priori deadline. The soundness of the proposed solutions has been thoroughly validated in a vast experimental campaign encompassing different applications and cloud platforms.
Developers, Sys Admins, Application Managers, and open-source communities.
Apply best practices, mitigation actions and integrate services derived from the project to improve quantitatively the trustworthiness, both a priori and at runtime. Improve the trustworthiness of the application.
7BULLS and E4Company, along with POLIMI are planning to embed the performance models within an advanced scheduler for disaggregated resource hardware, for the H2020 TETRAMAX proposal they are working together, for the 3rd open call on Value Chain Oriented and Interdisciplinary Technology Transfer Experiments.
POLIMI is opening a self-fund a new one-year position to support the open-source version of the machine learning library developed within the ATMOSPHERE project (i.e., a-MLLibrary) and to start the development of an online scheduler for GPU-based disaggregated clusters.
Future exploitation and sustainability plans based on new funding opportunities that POLIMI and Federal University of Minas Gerais are looking for.