Research Topic

Software Engineering Challenges for Machine Learning

About this Research Topic

The development, deployment and maintenance of Machine Learning (ML) enabled applications differs from that of traditional software. The main difference is that learning from data replaces the hard coding of the rules. This results in randomness inherent to the parametrization of most ML algorithms, strong dependence on the available data and difficulty to provide specifications more refined than overall performances' objectives: overall, these effects introduce unusually high amounts of uncertainty to most of the steps of a typical ML process. Another difference is that the model learned from data is typically much less transparent than traditional software. These facts alone already bear important consequences. For instance, also the data from which the model is learned should in principle be tested somehow. As to code debugging, unfortunately, the traditional debugging becomes impossible to apply, since in many cases - such as for the Artificial Neural Networks - the logic of the learned model is scattered over a large number of parameters, whose meaning is difficult to characterize. Even more challenging is the area of ML known as Deep Learning (DL), where not only the number of parameters can be of the order of millions, but where, typically, the representation of the data is learned separately form the inferential models, and can consist of different nested levels of abstraction.

In ML and DL in particular, the Software Engineering (SE) processes needs to be completely reconsidered. A high-level development process for instance should include the following activities: defining the problem, collecting and preprocessing the data, establishing the ground truth, selecting the algorithm, selecting (or learning) the features, creating the ML model, evaluating the model and finally monitoring the model in production. This development process already poses several challenges such as: 1) the need for automatic maintenance to manage a predictable degradation in performance, 2) the necessity for an approach based on experiments right from the requirements phase, 3) the inherent high coupling of the components with multiple feedbacks loops, 4) the issues of performance reporting and reproducibility asking for specific data and configuration management. Further challenges appear in relation to the production phase of real-world commercial applications and impact the organization developing ML-enabled applications. Moreover, when the volume of data needed to train very large models makes it impractical to use centralized learning, suitable distributed versions of the learning and analytics should be adopted.

Since data are used to “program” the system, the performance of the model is unknown before test time, which makes performance planning very difficult. Since in ML and DL models transparency is traded for accuracy, the learned model becomes complex to understand, and difficult to break down into smaller and simpler blocks: the semantic understanding of the model can be only attempted by approximation methods. Manual inspection of the code becomes unfeasible. During the training phase the only metrics available are global error estimates, furthermore the non-deterministic nature of many learning algorithms makes model testing more demanding. The mentioned need to test the data is normally faced by using datasets of small size, however, it is very challenging to provide a small dataset that contains the right proportion of edge cases that appear in the full dataset.

In production, frequent retraining of ML systems has the application change its behavior autonomously as the external data change and is no longer under the complete control of human decisions: unit test and integration tests are no longer able to guarantee the behavior and performance of the system. Whereas in traditional SE the hardware is considered a static entity, DL systems run on GPUs, new versions of which are released a few times per year: the speed up in hardware performance is a strong incentive to continuously updating hardware platforms and creates the need for hardware dependency management. Moreover, during the development, a number of experiments are made that explore the experimental space: tracking all of them and making the results reproducible is a difficult task; it may involve keeping track not only of the version of the source code but also of hardware, platform (OS and installed packages), configuration (e.g. preprocessing settings) and most importantly training data used. The cost of the overall process for the construction of ML and DL models prompts for model reuse under different forms from training/testing data reuse up to model reuse by adaptation and transfer learning. Overall in order to produce effective ML applications the Software Engineering process has to undergo a complete change of paradigm.

The objective of this Research Topic is to explore recent advances and techniques in the Software Engineering process for ML-enabled applications in general and for DL-enabled applications in particular. Topics of interest include (but are not limited to):
· Data testing techniques
· Reproducibility and performance reporting issues
· Model debugging
· Distributed solutions
· Challenges from GPU memory limitations
· Hardware dependency management
· Software dependency management
· Libraries vs. glue code management
· Effort estimation challenges
· Privacy and Data safety
· Security of ML-based Systems
· Heterogeneity of data sources
· Best practices for building ML systems
· ML Workflow Automation
· Challenges from the representation learning phase
· Data reuse
· Model reuse
· Engineering Transfer learning
· Compliance with requirements for accountability and transparency
· Explainability in ML
· Evaluating Transparency and Interpretability of AI Systems
· From Garage to Production
· Automatic Machine Learning and efficient Hyperparameter-Optimisation
· Large-scale evaluation and benchmarking techniques


Keywords: Software Engineering Process for Machine Learning, Data Testing Techniques, Model and data reuse, Deep Learning Workflow Automation, Explainability in Machine Learning


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

The development, deployment and maintenance of Machine Learning (ML) enabled applications differs from that of traditional software. The main difference is that learning from data replaces the hard coding of the rules. This results in randomness inherent to the parametrization of most ML algorithms, strong dependence on the available data and difficulty to provide specifications more refined than overall performances' objectives: overall, these effects introduce unusually high amounts of uncertainty to most of the steps of a typical ML process. Another difference is that the model learned from data is typically much less transparent than traditional software. These facts alone already bear important consequences. For instance, also the data from which the model is learned should in principle be tested somehow. As to code debugging, unfortunately, the traditional debugging becomes impossible to apply, since in many cases - such as for the Artificial Neural Networks - the logic of the learned model is scattered over a large number of parameters, whose meaning is difficult to characterize. Even more challenging is the area of ML known as Deep Learning (DL), where not only the number of parameters can be of the order of millions, but where, typically, the representation of the data is learned separately form the inferential models, and can consist of different nested levels of abstraction.

In ML and DL in particular, the Software Engineering (SE) processes needs to be completely reconsidered. A high-level development process for instance should include the following activities: defining the problem, collecting and preprocessing the data, establishing the ground truth, selecting the algorithm, selecting (or learning) the features, creating the ML model, evaluating the model and finally monitoring the model in production. This development process already poses several challenges such as: 1) the need for automatic maintenance to manage a predictable degradation in performance, 2) the necessity for an approach based on experiments right from the requirements phase, 3) the inherent high coupling of the components with multiple feedbacks loops, 4) the issues of performance reporting and reproducibility asking for specific data and configuration management. Further challenges appear in relation to the production phase of real-world commercial applications and impact the organization developing ML-enabled applications. Moreover, when the volume of data needed to train very large models makes it impractical to use centralized learning, suitable distributed versions of the learning and analytics should be adopted.

Since data are used to “program” the system, the performance of the model is unknown before test time, which makes performance planning very difficult. Since in ML and DL models transparency is traded for accuracy, the learned model becomes complex to understand, and difficult to break down into smaller and simpler blocks: the semantic understanding of the model can be only attempted by approximation methods. Manual inspection of the code becomes unfeasible. During the training phase the only metrics available are global error estimates, furthermore the non-deterministic nature of many learning algorithms makes model testing more demanding. The mentioned need to test the data is normally faced by using datasets of small size, however, it is very challenging to provide a small dataset that contains the right proportion of edge cases that appear in the full dataset.

In production, frequent retraining of ML systems has the application change its behavior autonomously as the external data change and is no longer under the complete control of human decisions: unit test and integration tests are no longer able to guarantee the behavior and performance of the system. Whereas in traditional SE the hardware is considered a static entity, DL systems run on GPUs, new versions of which are released a few times per year: the speed up in hardware performance is a strong incentive to continuously updating hardware platforms and creates the need for hardware dependency management. Moreover, during the development, a number of experiments are made that explore the experimental space: tracking all of them and making the results reproducible is a difficult task; it may involve keeping track not only of the version of the source code but also of hardware, platform (OS and installed packages), configuration (e.g. preprocessing settings) and most importantly training data used. The cost of the overall process for the construction of ML and DL models prompts for model reuse under different forms from training/testing data reuse up to model reuse by adaptation and transfer learning. Overall in order to produce effective ML applications the Software Engineering process has to undergo a complete change of paradigm.

The objective of this Research Topic is to explore recent advances and techniques in the Software Engineering process for ML-enabled applications in general and for DL-enabled applications in particular. Topics of interest include (but are not limited to):
· Data testing techniques
· Reproducibility and performance reporting issues
· Model debugging
· Distributed solutions
· Challenges from GPU memory limitations
· Hardware dependency management
· Software dependency management
· Libraries vs. glue code management
· Effort estimation challenges
· Privacy and Data safety
· Security of ML-based Systems
· Heterogeneity of data sources
· Best practices for building ML systems
· ML Workflow Automation
· Challenges from the representation learning phase
· Data reuse
· Model reuse
· Engineering Transfer learning
· Compliance with requirements for accountability and transparency
· Explainability in ML
· Evaluating Transparency and Interpretability of AI Systems
· From Garage to Production
· Automatic Machine Learning and efficient Hyperparameter-Optimisation
· Large-scale evaluation and benchmarking techniques


Keywords: Software Engineering Process for Machine Learning, Data Testing Techniques, Model and data reuse, Deep Learning Workflow Automation, Explainability in Machine Learning


Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.

About Frontiers Research Topics

With their unique mixes of varied contributions from Original Research to Review Articles, Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author.

Topic Editors

Loading..

Submission Deadlines

31 May 2021 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..

Topic Editors

Loading..

Submission Deadlines

31 May 2021 Manuscript

Participating Journals

Manuscripts can be submitted to this Research Topic via the following journals:

Loading..
Loading..

total views article views article downloads topic views

}
 
Top countries
Top referring sites
Loading..