Angels and Dragons: On Handling Systematic Error
Previous Contents Next
Angels and Dragons: On Handling Systematic Error
At the end of the Chandra Calibration Workshop on Nov 1, 2005 a
special session was held that focused on calibration
uncertainties and their effect on data analysis. It was clear
during the session that astronomers have a love/hate
relationship with systematic errors. As Steve Snowden put it,
"Here there may be dragons," and Dick Edgar quoth, "Systematic
Errors are like angels." All the presentations are available
on-line at http://cxc.harvard.edu/ccw/proceedings/05_proc/ - not only will you find more details here, including plots of the data and compelling expositions of instrumental effects, but also a rewrite of a Far Side cartoon that is destined to be a classic.
Context: What is Systematic Uncertainty?
Dragons
Astronomers and physicists are in a special position as far as
what we call "systematic uncertainties", or calibration errors,
go. To an astrophysicist, these are errors inherent in modeling
the instrument (telescope and subsystems). They are not the
same as "bad measurements" such as cosmic ray hits, dead pixels,
and other physical outliers. They affect accuracy. Unknown,
these errors can produce systematically biased results as well
as smearing. Hence Steve Snowden's description of unmapped
problems: Here There Might Be Dragons.
Angels
On the other hand, compared to other fields, we can
specify our sources of error to an extraordinary degree.
Elsewhere (e.g., social sciences, or economics [Ed: aren’t they the same?]), one models
or fits a non-specific extra variance along with the
scientific theory. In contrast, we measure them ahead of
time, each in specific detail. Once found, we investigate
their physical cause. Though each case is different, we have
a rough idea of the unique distribution of each, however
complex. Thus Dick Edgar’s aphorism: 'Systematic errors
are like angels. They are not a single breed; they are
individually created." However we have not yet developed
techniques that incorporate that information in the
analysis.
Generalizing
Statisticians use the term
'Model Mismatch" where astronomers and physicists use 'systematic
uncertainties". One sees that these broad terms
incorprate uncertainties ranging from calibrating our
intruments (e.g., HETG energy scale, HRMA effective area, ACIS
response) and backgrounds, to calibrating our underlying
physics (e.g., atomic lines database). All these topics are
represented here. Whenever there
is a model-mismatch that affects one’s astrophysical
inference, there is a bias in the inference. Once it is
mapped, it becomes known physics or well-understood
instrumental features.
Work
What effect does our model
mismatch have on our final results? How do we incorporate it in a
rigorous but sensible (i.e. reasonably fast and uniform)
way? In the past, researchers have often assumed these
uncertainties could be added in quadrature as if they were
uncorrelated Gauss-Normal errors. However, calibration
errors are often correlated, non-Gaussian, and asymmetric,
and many speakers (and statisticians) emphasized that
Gaussian assumptions are incorrect.
Here Be Dragons
The Soft X-Ray
Background
Brad Wargelin and Steve
Snowden emphasized the effects of the astrophysical X-ray
background. Brad’s main point: It varies! It varies in
time and space. It varies with solar cycle. There is a
geocoronal soft X-ray glow, which is usually weak, but can
have bright `Solar Gusts’. It varies due to charge
exchange, from solar wind ions colliding with neutral gas
in the heliosphere and the Earth’s outer atmosphere.
The charge exchange energy spectrum is all below ~1.5 keV,
with most of the X-ray emission from He-like and H-like O.
Its spatial distribution is complex, with the highest
intensities observed when looking near the Sun and/or
toward the Galactic Center through the helium focusing
cone. Users should know these places: 'Here there might
be dragons!" It particularly affects measurements of
extended sources: SXRB, clusters, etc.
High Angular Resolution
Imaging
Although the Chandra Point Spread Function (PSF) is
encouragingly well-known on axis, and near the peak,
don’t forget the 'dragons" in the wings! The PSF
varies tremendously as a function of off-axis angle and
photon wavelength (energy). Margarita Karovska showed
many examples of the Chandra PSFs and described the variations
in it. There are subtle differences in measured the image
for sources at different distances from the image center;
for example, could it be a double source or a single point
source with the PSF 'splitting" it into an apparent
double? How much of the emission is due to extended
sources, and how much to wings of PSF?
A scary thought: we know our
knowledge of PSF is incomplete, especially in the wings. So by
what procedure does the researcher decide what to
publish?
An
aside: the various deconvolution methods in common use
(Richardson-Lucy,
simple Maximum Entropy, the new multiscale EMC2) can all
be understood as ‘forward fitting’ function with
flexible non-parametric or semi-parametric models
(i.e. based on likelihood of these kinds of
models of the sky convolved with instrument response). So
such methods have a wealth of statistics frameworks behind
them. The EM algorithm Richardson-Lucy, essentially, uses
a model for the sky that is independent pixels on the sky.
This can work nicely for clusters of point-sources; but
not as well for smooth diffuse emission, as it effectively
models only one scale (pixel size). The EMC2 method
(which does not yet include PSF variations) incorporates a
Haar-wavelet-like multiscale model, which essentially can
add some smoothing. It is a better choice for modeling
diffuse emission.
Taming the Dragon
Parametric Estimation
Analysis
Jeremy Drake looked at how to
get a handle on correlated uncertainties that arise from the
accumulated calibration errors from various components of
the subassembly instrument modules that contribute to the
instrument effective area (ARF). He obtained a large
sample of these ARFs (specifically, for the ACIS-I) and
carrying out model fits to the same dataset for each of
these ARFs, he derived an estimate of the stability of the
model parameters to the calibration error. Variations in
the ARF can be seen in systematic shifts in inferred
values of astrophysically interesting model
parameters. Though often small, one sometimes sees
'surprising pathologies”.
Fitting the same source
spectrum hundreds or thousands of times with differing ARFs is
however impractical for the average user. Hence a
‘compressed’, more elegant and efficient
representation of these variations is called
for.
Dimensionality
Reduction
This is Rima Izem’s (a
statistician) simple-sounding title for an elegantly simple method to
‘encode’ the correlated calibration uncertainties in
order to be able to incorporate them into practical fits.
Thousands of simulated curves, each with ~1000 energy
bins, is too much to handle simply. However, continuous
functions (such as the simulated ACIS ARFs used by Jeremy)
are nicely expressed as basis functions in the form of
principal components of the functional variations.
Astrophysicist are probably more familiar with looking for
principal components (that is, directions of maximum
variation, or eigenvectors) as directions in a
point-cloud. There is an analogy for function space:
consider each energy bin of the ACIS ARF curves as a
separate dimension, and perform the PCA analysis
there. Mapping back to function-space gives curves
reflecting the maximum variations in the data. Their
associated eigenvectors describe the amount of variation
each curve is responsible for. The compression achieved
with this method is phenomenal. The ~1000 simulated
curves describing a complex correlated functionals of the
ACIS ARF could be described almost entirely with ~5
components.
With this elegant and compact representation, one can
reasonably include
calibration uncertainties in one’s analyses in a
practical way. It still requires a robust method such as
Markov-Chain Monte Carlo to search the parameter-space to
do the fit. It is also
generalizable to higher
dimensions, such as the instrument energy response
function (RMF) and/or PSF.
Here Be Angels
Some dragons have been
slain. Or at least tamed. Or rather,
doused. Temporarily.
Pixel Randomization
The practice of pixel
randomization, where a uniform random deviate is added to the photon
detector positions in order to reduce residual
aspect-related errors and improve the astrometry, came in
for much criticism. Herman Marshall argued that while it
may be a good practice for diffuse or large sources, it is
a bad idea for point sources. It is easy to show
graphically that randomization ends up broadening the PSF
(not a surprise).
The Advantages of User
Collaboration With the
Calibration Team
Randall Smith appealed to
the users to 'please work with the calibration
team!" First check if you can make suitable
corrections for your data, rather than wait for the
calibration team to notice the problem and come up with
fix. After all, users would know what to look for,
because they are familiar with their data. Randall
pointed to some specific cases where user collaboration
has significantly improved the magnitude and scope of
calibration efforts: micro-roughness affecting the
off-axis PSF, pixel randomization in grating data, the
ACIS chips contamination problem, etc. 'Users, please
work with the calibration team! It will get done faster
and better; and be a service to the
community."
Atomic Data
Uncertainties
Strictly speaking, this falls
more in the realm of model uncertainties than calibration
uncertainties, but the effects are similar. Nancy
Brickhouse pointed out that only a few laboratory
measurements of astrophysically interesting lines exist,
and there are so many lines that theoretical estimates are
unavoidable. In addition, there are lines missing from
atomic database compilations, including missing satellite
lines, as well as missing processes that contribute to the
intensity of a line. So there are a large number of
systematic unknown uncertainties which are not random, and for which errors
should not be added in quadrature.
Systematic model mis-match errors are likely to be more
important than statistical error in these cases and
sensitivity testing is essential. The ATOMDB database
(http://cxc.harvard.edu/atomdb/) from APEC includes errors on individual lines, so now there is an opportunity to test out different approaches.
Propagation of Background Uncertainties into Cluster Temperature
A cluster is a large diffuse source, so typical background estimation methods (e.g., taking a small annular aperture around the source) are not useful. Maxim Markevitch deals with this problem by deriving a robust model for the background. Starting with the simplest case for understanding the background, i.e., acquiring the quiescent detector background when the detector is covered, he finds that the background is constant. Next, examining the quiescent blank-sky backgrounds at E > 2 keV, he finds that it is simple enough, in this case, to add the E ~ 2 keV background in quadrature. (For details on how to use the ACIS "Blank-Sky" background files, see the CIAO thread on the subject, at http://cxc.harvard.edu/ciao/threads/acisbackground/.) This can be done similarly for XMM though XMM backgrounds are less predictable, usually due to flares.
Missing Lines, Calibration Errors, or What?
Ooops! Jürgen Schmitt found dramatic differences between plus and minus orders in the MEG grating spectrum of the bright flare star AU Mic. This was easily explained (after considerable work) as simply statistical fluctuations at play. His advice: 'Always go back to the raw data."
We Are All In This Together
As Herman Marshall emphasized, we must develop an 'Experience Database," with both the calibration team and the users providing their input. One needs both physics-based models as well as semi-parametric phenomenological models; the first gives us a better understanding of the source of the errors, but unlike the latter, is usually computationally more complex. Andy Pollock reiterated that there is a need for some standards in how to report and handle these uncertainties. We should run many more simulations as a regular part of our data analysis, and include our understanding of calibration errors into the results we report. There is also a strong need to develop techniques to include calibration uncertainties in data analysis if a calibration team is providing us with such information.
In summary, a list of take-home points:
1. Interactions with users are very important in the identification
and accounting of systematic errors.
2. There is still a large gap between "theory" and implementation.
When is `sensitivity testing" (where we know we don’t know)
the best we can do? How/when can we do better?
3. Each kind of calibration uncertainty or model mis-match
requires careful examination.
4. Even when we (mostly) understand the physics behind the
calibration uncertainties, it is still a significant effort
to implement this information in a way that is practical.
5. In particular, how does the community make standards for
communicating, working with, and updating calibration
uncertainties?
Alanna Connors, Vinay Kashyap, and Aneta Siemiginowska