COSAM News Articles 2021 October Department of Mathematics and Statistics receives $300K award from Department of Energy # Department of Mathematics and Statistics receives$300K award from Department of Energy

Published: 10/05/2021

Yanzhao Cao, Don Logan endowed chair in mathematics and Graduate Program Officer from the Department of Mathematics and Statistics, received a \$300,000 grant from the Office of Advanced Scientific Computing Research (ASCR) in the DOE’s Office of Science for the project Reliable and Efficient Machine Learning for Leadership Facility Scientific Data Analytics.

The awarded project is a collaborative effort with the Oak Ridge National Laboratory which will develop standardized reference points for machine learning to help consistently analyze and interpret scientific data.

“Scientific data are very different from business data in feature and complexity and more mathematically rigorous machine learning methods to analyze them,” said Cao.  “This DOE award will allow us to research new methods to efficiently interpret information, extract features, and infer  physical laws  from scientific data that are relevant to DOE’s mission,” continued Cao.

The Office of Advanced Scientific Computing Research in the DOE’s Office of Science selected this research project as just one of five awards for ASCR’s Data-Intensive Scientific Machine Learning and Analysis program.

"We propose to establish mathematical foundations of the prioritized scientific machine learning (SciML) methods for extracting interpretable information from scientific data, inferring physical laws, and steering experiments toward scientific discovery. US Department of Energy's (DOE) scientific user facilities generate a deluge of dynamic experimental data at a rapid velocity on a daily basis. However, our ability to extract interpretable information from the massive dynamic data is far behind our ability to generate the data. The advances in machine learning have had revolutionary effects on large-scale data analytics in the business world, but it is challenging to transfer a successful machine learning method for commercial use to an effective SciML method for scientific use. Thus, a new class of mathematically rigorous and computationally efficient, and reliable SciML methods are required for real-time scientific data analytics to expedite the pace of scientific discovery.

Scientific data analytics is deeply embedded in the scientific discovery process that involves (1) extracting interpretable features from raw experimental data, (2) inferring unknown physics (i.e., unmasking hidden dynamics), and (3) designing and steering a series of experiments to achieve a scientific goal. Despite many promising efforts in these two directions, the progress of mathematical analysis is far behind the SciML algorithms development. Moreover, the scientific data analytics requires rigorous uncertainty quantification (UQ) to ensure that we are obtaining the right answer for the right reason and the methods are sufficiently robust to be deployed at the user facilities. To address these challenges, this project will not only focus on developing novel SciML methods that can be practically used to analyze massive scientific data, but also on establishing mathematical analysis on these methods. Our research objectives include the following: (1) Develop reliable and efficient feature extraction methods for both high-frequency high-resolution dynamic data and high-dimensional functional data collected at DOE's user facilities; (2) Develop mathematical foundations for neural network-based dynamics discovery models and stochastic back-propagation algorithms for training the neural network models; (3) Develop goal-oriented data assimilation methods for dynamic experimental design, i.e., optimally designing and steering a series of experiments to achieve a desired scientific goal.

To motivate, illustrate, and evaluate our new methodologies, we will apply them to neutron scattering data generated at the Spallation Neutron Source (SNS) and High Flux Isotope Reactor (HIFR) facilities, and in situ scanning transmission electron microscopy (STEM) data generated at Center for Nanophase Materials Sciences (CNMS). The purpose of choosing these datasets is not only to demonstrate how the proposed SciML algorithms and mathematical analysis can help address current, urgent needs for advanced data analytics at DOE's user facilities, but also to show the critical role of the proposed research in establishing self-driving user facilities in the near future."