Last modified: 29 February 2024


Caveat: Extracting Spectra from Merged Datasets

Related Links


This is a common analysis question:

Does specextract work for a merged event file (made by merge_obs)?

The short answer is, no. You cannot used the merged event file, even the one created by merge_obs, to extract spectra.

This is of course over stating things. You can extract the .pha file from either the merged event file or the individual event files and add them and you will get the same number of counts. Presuming the source is located away from the chip edges, then the associated metadata, things like BACKSCAL will be correct. But, in general, you cannot make the correct response files, RMF and ARF, to go along with the spectrum.

If the data you are merging meet some specific criteria, you have a chance of getting reasonable results. It may depend on where your source is in the field and the accuracy you require your calibrations to be (eg a 10% error in the ARF will not matter a whole lot for a 50 count source). These criteria include:

Many observations that have been split due to thermal constraints do in fact meet these criteria; however, data taken from different observing programs or over large time ranges will likely have issues.

Mechanically, specextract cannot be supplied a single event file with stacks of auxiliary products: aspect solution (asol), maskfile (msk), badpixel files (bpix) , and parameter block files (pbk). You immediately get an error since the stack matching fails.

You can supply specextract a stack of (reprojected) per-obi event files and matching per-obi auxillary files and run with combine=yes which will produce the correctly summed PHA file and weighted ARF and RMF. When simultaneous fitting with Sherpa is not an option, this is the recommended method.

However starting from a merged event file is tricky at best; you must pick one of each of the auxillary products which is what leads to the above restrictions.

Which files affect which calibrations?

Event (evt) File

This is the source of information needed for the mapping from sky(x,y) to mirror (or detector) coordinates (det: theta,phi) and then to chip coordinates. In a merged file that has arbitrary ROLL_PNT values, we have lost the information about PHI (azimuthal angle around the optical axis) and thus cannot compute correct chip coordinates. The mirror effective area is largely radially symmetric so that is not a major concern. The bigger problem is that by picking one value for ROLL_PNT, we can end up computing incorrect chip coordinates : you could get chip coords that are computed closer or farther from the readout (and thus have higher/lower detector QE due to CTI effects), or you could compute chip coords that are for the wrong chip, or even worse, seemingly off any chip (which would artificially decrease the ARF).

Different ROLL_PNT values will also cause problems when creating weight maps (WMAP). The Chandra PSF is not symmetrical and highly variable across the field. When combining observations with different ROLL_PNT values, the spatial distribution of events from different observations will be different -- introducing a time dependent observed source morphology. But since a WMAP is constructed by summing over all time, this time variation is lost and causes the wrong pixels to be included in the weight for some fraction of time. This is less of a problem for on-axis sources where the PSF is almost circular; but quickly becomes a problem beyond 1arcmin where the PSF becomes more elliptical.

SIM_{X|Y|Z} : Similar to ROLL_PNT, the mapping from detector coordinate to chip coordinates requires knowing where the SIM is. Neither choosing a single value nor some weighted combination of values gives the correct results.

DATE-OBS: this is used to query for the CALDB files and to compute the correction for the time varying contamination build up. In a merged file this will be the earliest dataset. If all the data are nearby in time, then the temporal variation is small and picking the earliest is fine. If data are spread out in time, then contamination may be incorrect. [Eg if early observation is very long then that might be okay; if early observation is very short compared to later then contamination may be underestimated]. In the upcoming change to time dependent QE files, the earlier epoch will be used.

Dead-time, DTCOR is an average efficiency factor applied to the total observation time ONTIME. For ACIS, the DTCOR value depends on the number of CCDs being used and how many rows are read out. It will also vary significantly if the observations were performed in alternating exposure mode. In a merged dataset, the DTCOR value is taken from the first file in the stack which may be a poor choice if the values vary signficantly or observations durations vary.

Mask (msk) and Parameter block (pbk) Files

Beyond the spatial information (eg knowing which parts of which chips are actually on), the information in these files is used to compute the dead-area calibration. The number, clocking, and sequence of ccds all go into the dead-calculation. This is in general a small factor, but does vary quite a bit between subarray and full frame datasets.

Bad Pixel File (bpix)

All observations have unique badpixel files; most of the unquiness though comes in due to afterglows. By default, afterglows are not considered when making responses since they are only bad for relatively short times. Thus most of the useful badpixel information is often very similar between observations. The 3 biggest exceptions -- new hot bad columns/pixels identified in the biases, bias parity error, or other "anomaly" [fep0 or t-plane latchup]. Of these, the 1st is probably the most likely. If it is a single bad pixel and your source is not located near it -- then here is no problem. Same is true for bias parity errors. If it is a bad/hot column, then a larger area is affected -- though again if source is not located nearby it is not a concern. So it really depends on the source location.

Aspect Solution File (asol)

Several CIAO tasks make direct use of the aspect solution files. These contain not only the pointing history of the optical axis but also the offsets in the SIM locations due to thermal induced flexing of the spacecraft. While the tools are setup to handle multiple files via stacks, they often require that the SIM_{X|Y|Z} are the same such that the DY, DZ, and DTHETA columns (SIM offset values) are relative to the same fudical. In a merged dataset this is potentially no longer the case.


Some of the effects may be small (very small), some may depend on the source location, some are energy dependent, etc. The effects could affect the normalization (easy to get a 50% error by getting wrong chip coords that hit/miss a bad-pixel/-column) as well as the shape (contam/qe/etc) of the responses.

What about the RMF?

For the RMF -- same basic problems as with the ARF. However, the RMF does not vary (within a chip) as significantly as the ARF. Moreover, you may not be sensitive to RMF effects unless you have either a larger number of counts and/or have strong emission/absorption lines in the spectrum.