NASA Earth Exchange (NEX): Big Data Challenges, High-Performance Computing, and Machine Learning Innovations
BIDS Data Science Lecture Series | September 25, 2015 | 1:00-2:30 p.m. | 190 Doe Library, UC Berkeley
Speaker: Sangram Ganguly, Senior Research Scientist, NASA
Sponsors: Berkeley Institute for Data Science, Data, Society and Inference Seminar
NASA Earth Exchange (NEX) provides a unique collaborative platform for scientists and researchers around the world to do research in a scientifically complex area. NEX provides customized open source tools, scientific workflows, access to petabytes of satellite and climate data, models, and computing power. Over the past three years, NEX has evolved in terms of handling projects that deal with data complexity, model integration, and high-performance computing. Another unique aspect of NEX is its collaboration with Amazon Web Services (AWS) to create the OpenNEX platform, which leverages the full stack of AWS’s cloud computing platform to demonstrate scientifically relevant projects for government agencies, commercial companies, and other stakeholders. OpenNEX provides access to a wide variety of data through AWS’s public datasets program and virtual machines that replicates a certain workflow capturing data access, search, analysis, computation, and visualization. OpenNEX collaborated with Berkeley’s Geospatial Innovation Facility (GIF) to create an open source visualization dashboard for visualizing the downscaled climate projections dataset. A pressing need in both initiatives is how to deal with large image datasets and efficiently analyze these images using high-performance and cloud computing infrastructures. With funding from several NASA program elements (e.g., AIST, ACCESS, CMS), NEX has showcased activities in which new machine learning algorithms can be deployed and scaled across these computer architectures to process very high-resolution imagery datasets for object classification, segmentation, and feature extraction. An example relates to processing quarter million image scenes from the 1-m multispectral NAIP dataset to estimate tree cover for the continental United States given the large complexities and heterogeneity in land cover types. New computational techniques using open source tools and cloud architectures are a must in achieving performance efficiency in some of the heritage scientific research domains and analyses.