Building bridges between Data Science and
I strongly believe in the importance of building a research
community that bridges the fields of data science and
geoscience, to ensure that all new developments in data
science make it into the geosciences as quickly as possible.
Given the rapidly increasing amount of observational and model
data in the geosciences, and the fact that many of our
society's biggest problems are related to climate change, it
is crucial that we employ all suitable data analysis tools to
help geoscientists answer as many of their science questions
as possible from the data.
While community building work is not very glamorous, it is
important, rewarding, and invigorating to work with researchers
from different disciplines. I learn something new almost
To that end I have been heavily involved in the following
1) CLIMATE INFORMATICS COMMUNITY:
2013: Workshop co-organizer/Steering committee member of the
annual Climate Informatics workshop ( CI2017, CI2016,
2014: Created and maintaining website for Climate
Informatics community: Climateinformatics.org
- A community-driven website for all things related to
2016: Steering committee member of the newly NSF-funded
Research Collaboration Network EarthCube RCN IS-GEO: Intelligent
Systems Research to Support Geosciences, see
2017: Co-lead of working group on data set benchmark
development for IS-GEO RCN (with David R. Thompson at JPL).
research on causal discovery, primarily applied to geoscience
applications. For more information, see below.
invited talks (many with video recordings):
- Feb 2015:
The Potential of Causal Discovery Methods in
Climate Science, NCAR CISL presentation,
National Center for Atmospheric Research, Boulder, CO.
recording of the talk.
at the Workshop
Selected research links:
data sets for causal discovery: http://www.engr.colostate.edu/~iebert/DATA_SETS_CAUSAL_DISCOVERY/
. This page contains several data sets for comparison of
different algorithms for causal discovery from spatio-temporal
- Anything related to climate informatics.
A page with extensions I developed for existing Bayesian
2014: First 3D results for graphs of information flow:
We developed a high efficiency implementation of the PC and PC
stable algorithms in C that can handle many more variables
than the standard implementations in Java, Matlab and R.
Using that we can now derive graphs of information flow of our
planet's climate in 3D, i.e. including several
atmospheric height layers at a time. For first figures,
see this tech report: CSU Tech report,
Sept 3, 2014.
My current Research Interests
consist of the following, partially overlapping, topics:
1. Causal discovery: finding
causal relationships in a system from data
2. Climate Informatics: discovering new
knowledge from climate data
3. Bayesian networks and other graphical
models: any applications of Bayesian networks in
science and engineering (for causal discovery and other
1. What is Causal discovery?
discovery seeks to recover causal relationships between
variables in a system based on observational data. Many
approaches use Bayesian Networks or Markov Networks (my
preferred methods). Other methods exist, e.g. based on
Granger causality. Currently I use these methods
primarily in the context of climate data, because there is
still so much about climate that we do not yet understand
while there is an increasing amount of climate data available
- making this an ideal application of causal discovery.
The process of causal discovery can never prove any
causal relationships with certainty, primarily because there
can always be unknown common causes. However, we can
come up with a set of most likely causal hypotheses
based on data that we can then turn back over to the domain
expert for consideration. Thus causal discovery is
always a process that involves both domain experts and AI
experts, working together.
Causal Discovery in recent news:
- 2011: Judea Pearl is awarded the ACM Touring
The ACM A.M. Turing Award is an annual prize given by
the Association for Computing Machinery (ACM) to "an individual
selected for contributions of a technical nature made to the
computing community". It is stipulated that "The contributions
should be of lasting and major technical importance to the
computer field". The Turing Award is recognized as the "highest
distinction in Computer science" and "Nobel Prize of computing".
(cited from wikipedia)
Judea Pearl was awarded the 2011 ACM Touring Award for fundamental
contributions to artificial intelligence through the development
of a calculus for probabilistic and causal
Press release: Pearl
- Touring Award 2011
- 2011: Nobel Prize in Economic Sciences goes to
Sargent and Sims
The Nobel Prize in Economics in 2011 was awarded to
Thomas J. Sargent and Christopher A. Sims for their work on Cause
and effect in the macroeconomy.
Press release: Sargent
and Sims - Nobel Economics 2011
2. What is Climate Informatics?
informatics seeks to apply techniques from machine learning and
artificial intelligence (AI) to discover new knowledge from
climate data. There are many other names for this
discipline, such as knowledge discovery in climate.
The term Climate Informatics emphasizes the fact we can view
the climate as a system that produces information and then analyze
that information to learn about the system. This viewpoint
allows us to use observational climate data and apply algorithms
from information science that have not yet been widely applied in
climate science. Of course, viewing climate as an information
system is not a completely new approach. Statistical methods -
and to a smaller extend some machine learning methods - have been
used to analyze climate for quite some time. What is new is
the explicit call for more members from the computer
science/mathematics/economics community to collaborate with climate
scientists and to apply a much greater variety of machine learning
algorithms to climate science.
Some helpful pages on this new initiative:
My personal interest within climate informatics is currently to
apply causal discovery to find new causal hypotheses from climate
data. For example, we can define climate networks (a
connected grid around the globe) based on graphical models to track
information flow around the globe. We can also try to find
causal relationships between different modes (such as WPO, EPO, PNA
and NAO). Most of my current work deals with temporal models,
i.e. we also try to identify the time frames in which variables may
"cause" each other. (See my publications.)
For example, the figure below shows likely causal hypotheses we
found for the interaction between modes WPO, EPO, PNA and NAO.
Arrows indicate the direction of information flow and the numbers
next to the arrows indicate time delays in days (Ebert-Uphoff
and Deng, 2012a).
The figure below shows plots of climate networks we generated using
geopotential height data around the globe, using a graphical model
approach to calculate the links (Ebert-Uphoff
and Deng, 2012b). The individual plots below show the
links that connect geographical location over a time span of 0, 1, 2
and 3 days.
3. What are Bayesian Networks?
Bayesian Networks (BNs) are a tool for modeling systems containing
uncertainty. They have gained much popularity in recent
years. BNs use tools from probability theory (primarily
Bayes' theorem, which gave them their name) to solve various tasks
in the areas of data mining and artificial intelligence. They have been
used for applications ranging from the natural sciences (e.g.
meteorology, volcano eruption, water management) to medicine (e.g.
cancer diagnosis), computer science (spam filter, image processing,
text mining), engineering (e.g. reliability analysis, printer
diagnosis, electric load forecasting) to computational biology (e.g.
gene/protein interactions, cellular biology).
For some very accessible introductions to Bayesian networks, see
Bayesian networks can be used for a number of purposes,
(1) to generate predictions,
(2) for diagnosis,
(3) as policy and decision making tool and
(4) for causal discovery.
My goal is to extend the use of Bayesian networks for applications
in science and engineering.
My past research interests
I used to work in the area of robotics
(from 1993 to 2006), primarily in the area of theoretical
Past topics include (see publications):
- Parallel manipulators
- Cable-driven robots
- Connections between cable-driven robots and grasping
- Discretely-actuated manipulators - design, workspace
generation and inverse kinematics
- Workspace generation methods (discrete workspace mapping
algorithm, convolution approach)
- A device for active acceleration compensation
- Digital Clay - a device for haptic shape display
- Assembly sequence generation
To take Causal Discovery where no Causal Discovery
algorithm has gone before.
Last updated: April 2017.