Angels and Dragons: On Handling Systematic Error

At the end of the Chandra Calibration Workshop on Nov 1, 2005 a special session was held that focused on calibration uncertainties and their effect on data analysis. It was clear during the session that astronomers have a love/hate relationship with systematic errors. As Steve Snowden put it, "Here there may be dragons," and Dick Edgar quoth, "Systematic Errors are like angels." All the presentations are available on-line at http://cxc.harvard.edu/ccw/proceedings/05_proc/ - not only will you find more details here, including plots of the data and compelling expositions of instrumental effects, but also a rewrite of a Far Side cartoon that is destined to be a classic.

Context: What is Systematic Uncertainty?

Dragons
Astronomers and physicists are in a special position as far as what we call "systematic uncertainties", or calibration errors, go. To an astrophysicist, these are errors inherent in modeling the instrument (telescope and subsystems). They are not the same as "bad measurements" such as cosmic ray hits, dead pixels, and other physical outliers. They affect accuracy. Unknown, these errors can produce systematically biased results as well as smearing. Hence Steve Snowden's description of unmapped problems: Here There Might Be Dragons.

Angels
On the other hand, compared to other fields, we can specify our sources of error to an extraordinary degree. Elsewhere (e.g., social sciences, or economics [Ed: aren’t they the same?]), one models or fits a non-specific extra variance along with the scientific theory. In contrast, we measure them ahead of time, each in specific detail. Once found, we investigate their physical cause. Though each case is different, we have a rough idea of the unique distribution of each, however complex. Thus Dick Edgar’s aphorism: 'Systematic errors are like angels. They are not a single breed; they are individually created." However we have not yet developed techniques that incorporate that information in the analysis.

Generalizing
Statisticians use the term 'Model Mismatch" where astronomers and physicists use 'systematic uncertainties". One sees that these broad terms incorprate uncertainties ranging from calibrating our intruments (e.g., HETG energy scale, HRMA effective area, ACIS response) and backgrounds, to calibrating our underlying physics (e.g., atomic lines database). All these topics are

represented here. Whenever there is a model-mismatch that affects one’s astrophysical inference, there is a bias in the inference. Once it is mapped, it becomes known physics or well-understood instrumental features.

Work
What effect does our model mismatch have on our final results? How do we incorporate it in a rigorous but sensible (i.e. reasonably fast and uniform) way? In the past, researchers have often assumed these uncertainties could be added in quadrature as if they were uncorrelated Gauss-Normal errors. However, calibration errors are often correlated, non-Gaussian, and asymmetric, and many speakers (and statisticians) emphasized that Gaussian assumptions are incorrect.

Here Be Dragons

The Soft X-Ray Background
Brad Wargelin and Steve Snowden emphasized the effects of the astrophysical X-ray background. Brad’s main point: It varies! It varies in time and space. It varies with solar cycle. There is a geocoronal soft X-ray glow, which is usually weak, but can have bright `Solar Gusts’. It varies due to charge exchange, from solar wind ions colliding with neutral gas in the heliosphere and the Earth’s outer atmosphere. The charge exchange energy spectrum is all below ~1.5 keV, with most of the X-ray emission from He-like and H-like O. Its spatial distribution is complex, with the highest intensities observed when looking near the Sun and/or toward the Galactic Center through the helium focusing cone. Users should know these places: 'Here there might be dragons!" It particularly affects measurements of extended sources: SXRB, clusters, etc.

High Angular Resolution Imaging

Although the Chandra Point Spread Function (PSF) is encouragingly well-known on axis, and near the peak, don’t forget the 'dragons" in the wings! The PSF varies tremendously as a function of off-axis angle and photon wavelength (energy). Margarita Karovska showed many examples of the Chandra PSFs and described the variations in it. There are subtle differences in measured the image for sources at different distances from the image center; for example, could it be a double source or a single point source with the PSF 'splitting" it into an apparent double? How much of the emission is due to extended sources, and how much to wings of PSF?
A scary thought: we know our knowledge of PSF is incomplete, especially in the wings. So by what procedure does the researcher decide what to publish?
An aside: the various deconvolution methods in common use (Richardson-Lucy, simple Maximum Entropy, the new multiscale EMC2) can all be understood as ‘forward fitting’ function with flexible non-parametric or semi-parametric models (i.e. based on likelihood of these kinds of models of the sky convolved with instrument response). So such methods have a wealth of statistics frameworks behind them. The EM algorithm Richardson-Lucy, essentially, uses a model for the sky that is independent pixels on the sky. This can work nicely for clusters of point-sources; but not as well for smooth diffuse emission, as it effectively models only one scale (pixel size). The EMC2 method (which does not yet include PSF variations) incorporates a Haar-wavelet-like multiscale model, which essentially can add some smoothing. It is a better choice for modeling diffuse emission.

Taming the Dragon

Parametric Estimation Analysis
Jeremy Drake looked at how to get a handle on correlated uncertainties that arise from the accumulated calibration errors from various components of the subassembly instrument modules that contribute to the instrument effective area (ARF). He obtained a large sample of these ARFs (specifically, for the ACIS-I) and carrying out model fits to the same dataset for each of these ARFs, he derived an estimate of the stability of the model parameters to the calibration error. Variations in the ARF can be seen in systematic shifts in inferred values of astrophysically interesting model parameters. Though often small, one sometimes sees 'surprising pathologies”.
Fitting the same source spectrum hundreds or thousands of times with differing ARFs is however impractical for the average user. Hence a ‘compressed’, more elegant and efficient representation of these variations is called for.

Dimensionality Reduction
This is Rima Izem’s (a statistician) simple-sounding title for an elegantly simple method to ‘encode’ the correlated calibration uncertainties in order to be able to incorporate them into practical fits. Thousands of simulated curves, each with ~1000 energy bins, is too much to handle simply. However, continuous functions (such as the simulated ACIS ARFs used by Jeremy) are nicely expressed as basis functions in the form of principal components of the functional variations. Astrophysicist are probably more familiar with looking for principal components (that is, directions of maximum variation, or eigenvectors) as directions in a point-cloud. There is an analogy for function space: consider each energy bin of the ACIS ARF curves as a separate dimension, and perform the PCA analysis there. Mapping back to function-space gives curves reflecting the maximum variations in the data. Their associated eigenvectors describe the amount of variation each curve is responsible for. The compression achieved with this method is phenomenal. The ~1000 simulated curves describing a complex correlated functionals of the ACIS ARF could be described almost entirely with ~5 components.
With this elegant and compact representation, one can reasonably include calibration uncertainties in one’s analyses in a practical way. It still requires a robust method such as Markov-Chain Monte Carlo to search the parameter-space to do the fit. It is also

generalizable to higher dimensions, such as the instrument energy response function (RMF) and/or PSF.

Here Be Angels

Some dragons have been slain. Or at least tamed. Or rather, doused. Temporarily.

Pixel Randomization
The practice of pixel randomization, where a uniform random deviate is added to the photon detector positions in order to reduce residual aspect-related errors and improve the astrometry, came in for much criticism. Herman Marshall argued that while it may be a good practice for diffuse or large sources, it is a bad idea for point sources. It is easy to show graphically that randomization ends up broadening the PSF (not a surprise).

The Advantages of User Collaboration With the
Calibration Team

Randall Smith appealed to the users to 'please work with the calibration team!" First check if you can make suitable corrections for your data, rather than wait for the calibration team to notice the problem and come up with fix. After all, users would know what to look for, because they are familiar with their data. Randall pointed to some specific cases where user collaboration has significantly improved the magnitude and scope of calibration efforts: micro-roughness affecting the off-axis PSF, pixel randomization in grating data, the ACIS chips contamination problem, etc. 'Users, please work with the calibration team! It will get done faster and better; and be a service to the community."

Atomic Data Uncertainties
Strictly speaking, this falls more in the realm of model uncertainties than calibration uncertainties, but the effects are similar. Nancy Brickhouse pointed out that only a few laboratory measurements of astrophysically interesting lines exist, and there are so many lines that theoretical estimates are unavoidable. In addition, there are lines missing from atomic database compilations, including missing satellite lines, as well as missing processes that contribute to the intensity of a line. So there are a large number of systematic unknown uncertainties which are not random, and for which errors should not be added in quadrature. Systematic model mis-match errors are likely to be more important than statistical error in these cases and sensitivity testing is essential. The ATOMDB database (http://cxc.harvard.edu/atomdb/) from APEC includes errors on individual lines, so now there is an opportunity to test out different approaches.

Propagation of Background Uncertainties into Cluster Temperature
A cluster is a large diffuse source, so typical background estimation methods (e.g., taking a small annular aperture around the source) are not useful. Maxim Markevitch deals with this problem by deriving a robust model for the background. Starting with the simplest case for understanding the background, i.e., acquiring the quiescent detector background when the detector is covered, he finds that the background is constant. Next, examining the quiescent blank-sky backgrounds at E > 2 keV, he finds that it is simple enough, in this case, to add the E ~ 2 keV background in quadrature. (For details on how to use the ACIS "Blank-Sky" background files, see the CIAO thread on the subject, at http://cxc.harvard.edu/ciao/threads/acisbackground/.) This can be done similarly for XMM though XMM backgrounds are less predictable, usually due to flares.

Missing Lines, Calibration Errors, or What?

Ooops! Jürgen Schmitt found dramatic differences between plus and minus orders in the MEG grating spectrum of the bright flare star AU Mic. This was easily explained (after considerable work) as simply statistical fluctuations at play. His advice: 'Always go back to the raw data."

We Are All In This Together

As Herman Marshall emphasized, we must develop an 'Experience Database," with both the calibration team and the users providing their input. One needs both physics-based models as well as semi-parametric phenomenological models; the first gives us a better understanding of the source of the errors, but unlike the latter, is usually computationally more complex. Andy Pollock reiterated that there is a need for some standards in how to report and handle these uncertainties. We should run many more simulations as a regular part of our data analysis, and include our understanding of calibration errors into the results we report. There is also a strong need to develop techniques to include calibration uncertainties in data analysis if a calibration team is providing us with such information.
In summary, a list of take-home points:
1. Interactions with users are very important in the identification
and accounting of systematic errors.
2. There is still a large gap between "theory" and implementation.
When is `sensitivity testing" (where we know we don’t know)
the best we can do? How/when can we do better?
3. Each kind of calibration uncertainty or model mis-match
requires careful examination.
4. Even when we (mostly) understand the physics behind the
calibration uncertainties, it is still a significant effort
to implement this information in a way that is practical.
5. In particular, how does the community make standards for
communicating, working with, and updating calibration
uncertainties?

Alanna Connors, Vinay Kashyap, and Aneta Siemiginowska