Merging Central
Combining Datasets for Imaging & Spatial Analysis
This provides an overview of the considerations involved with combining observations for spatial/imaging analysis. For the most part, the most common tasks is automated using through the merge_obs/reproject_obs/flux_obs family of CIAO scripts. A broad discussion that began pre-dating these scripts and expanded upon their introduction is found in the Why combine data? 'why' page.
Astrometric Corrections (optional)
While updating the astrometry is generally unnecessary, if your science requires the best possible source positions, then you may want to apply astrometric offsets to individual observations to get the correct registration. Chandra's absolute pointing accuracy is generally better than 0.4 arcsec but can be improved in an absolute sense by cross-matching Chandra sources with other higher precision catalogs or in a relative sense by cross matching one Chandra observation with another observation. The CIAO tools reproject_aspect, wcs_match, and wcs_update can be used to compute the fine astrometic shifts between two source lists and apply the offsets to various chandra; files.
- 
	  The basics of astrometric corrections with Chandra data is covered in the Correcting Absolute Astrometry thread. 
- 
	  For a discussion on caveats and nuances involved with the updating astrometry with CIAO, see the Revising Astrometry for Imaging & Spatial Analysis page. 
Many of the steps needed to apply an astrometric correction to Chandra data sets is automated by the fine_astro script, which was introduced in CIAO 4.17.
Reprojection
observations within 10 deg of the south celestial pole,
| ObsID | RA | Declination | 
|---|---|---|
| 15156 | 00:52:44.90 | -80:15:57.60 | 
| 22326 | 03:17:15.85 | -85:32:25.56 | 
| 8272 | 03:51:28.00 | -82:14:11.00 | 
| 16283 | 03:51:31.70 | -82:13:20.00 | 
| 22293 | 05:37:09.89 | -80:28:08.83 | 
| 21421 | 06:05:41.60 | -86:37:55.00 | 
| 14925 | 06:38:33.60 | -80:14:52.00 | 
| 18296 | 09:19:02.50 | -81:03:25.20 | 
| 10365 | 13:08:38.00 | -82:59:34.80 | 
| 19534 | 15:12:54.00 | -81:24:06.00 | 
| 20959 | 15:12:54.00 | -81:24:06.00 | 
| 8266 | 15:39:25.20 | -83:35:34.00 | 
| 3477 | 16:11:02.40 | -83:42:00.00 | 
| 10143 | 20:09:13.00 | -85:38:46.80 | 
| 15124 | 23:39:44.00 | -85:10:33.00 | 
![[NOTE]](../imgs/note.png)
TL;DR it is a fair assumption that the sky coordinate system and its respective WCS transform is unique to each observation.
Co-adding images within CIAO is typically performed in terms of the real-valued "sky" (also referred as "physical") coordinates by the tools.
Recall that the "sky" coordinate system records the event position on a fictitious tangent plane to a nominal, fixed celestial pointing direction. That is, the "sky" plane is the projection of the celestial sphere onto a 2D plane where the plane intersects orthogonally with the 3D unit sphere in the pointing direction.
For a typical observation the tangent point direction is defined by its nominal right ascension and declination which is generally the same as the telescope's aimpoint location—the time-averaged right ascension and declination of the telescope's optical-axis due to spacecraft dithering.
The nominal RA and Dec for the tangent point for an observation is then set to have the sky coordinate position to be (4096.5, 4096.5) for ACIS observations, (16384.5, 16384.5) for HRC-I observations, and (32768.5, 32768.5) for HRC-S observations. With the exception of multi-ObI observations, the aimpoint will have the exact same sky coordinates as the nominal tangent point too, albeit the nominal tangent point and aimpoint can differ on the detector if the Science Instrument Module is offset for an observation.
A practical consequence of the sky coordinate system being specific to a given observation is that the WCS transform between sky and celestial coordinates will also be specific to that observation. This means that to correctly stack observations for imaging analysis, the observations must all share a common sky coordinate system and WCS transform, which is achieved through reprojecting the data sets to a common spherical projection onto a plane.
In this example, we use 49 ACIS observations of the C-COSMOS field which encompasses a 0.5 square degree region centered at \(\alpha = 10^{\mathrm{h}},\ \delta = {+2}^{\circ}\). The most simplistic approach to combining files is to use dmmerge on a set of FITS tables, e.g. event files with a common set of columns.
An example of directly using the reprocessed data sets and merging event files that have not been projected to a common sky plane.
unix% find_chandra_obsid "10:00:00" "02:00:00" radius=30 download=none | grep "C-COSMOS" 8027 20.8 ACIS-I NONE 49.4 2007-04-14 Elvis C-COSMOS 8021 19.1 ACIS-I NONE 48.8 2007-04-09 Elvis C-COSMOS 8015 20.5 ACIS-I NONE 45.5 2007-01-07 Elvis C-COSMOS 8026 13.8 ACIS-I NONE 47.1 2007-04-13 Elvis C-COSMOS 8009 24.7 ACIS-I NONE 45.9 2007-01-02 Elvis C-COSMOS 8020 11.1 ACIS-I NONE 49.1 2007-04-09 Elvis C-COSMOS 8483 30.4 ACIS-I NONE 21.8 2006-12-04 Elvis C-COSMOS 8004 30.4 ACIS-I NONE 15.7 2006-11-27 Elvis C-COSMOS 8482 30.4 ACIS-I NONE 10.4 2006-12-02 Elvis C-COSMOS 8014 13.5 ACIS-I NONE 47.9 2007-01-05 Elvis C-COSMOS 7999 37.0 ACIS-I NONE 29.6 2006-11-25 Elvis C-COSMOS 8478 37.0 ACIS-I NONE 18.0 2006-11-24 Elvis C-COSMOS 8025 8.9 ACIS-I NONE 49.1 2007-04-12 Elvis C-COSMOS 8008 19.2 ACIS-I NONE 46.4 2007-01-02 Elvis C-COSMOS 8019 3.1 ACIS-I NONE 49.5 2007-04-06 Elvis C-COSMOS 8003 26.1 ACIS-I NONE 47.0 2007-04-02 Elvis C-COSMOS 8013 8.3 ACIS-I NONE 48.1 2007-01-04 Elvis C-COSMOS 8493 33.5 ACIS-I NONE 19.8 2006-12-12 Elvis C-COSMOS 7998 33.5 ACIS-I NONE 27.6 2007-01-10 Elvis C-COSMOS 8024 9.7 ACIS-I NONE 49.4 2007-04-11 Elvis C-COSMOS 8007 16.0 ACIS-I NONE 21.8 2006-12-19 Elvis C-COSMOS 8497 16.0 ACIS-I NONE 27.8 2006-12-25 Elvis C-COSMOS 8018 4.9 ACIS-I NONE 46.8 2007-04-05 Elvis C-COSMOS 8002 23.9 ACIS-I NONE 29.6 2006-12-19 Elvis C-COSMOS 8496 23.9 ACIS-I NONE 18.4 2006-12-23 Elvis C-COSMOS 8012 9.1 ACIS-I NONE 49.5 2007-01-04 Elvis C-COSMOS 8494 31.8 ACIS-I NONE 20.8 2006-12-16 Elvis C-COSMOS 8122 31.8 ACIS-I NONE 28.8 2007-01-20 Elvis C-COSMOS 8023 15.4 ACIS-I NONE 49.4 2007-04-10 Elvis C-COSMOS 8503 16.4 ACIS-I NONE 20.4 2006-12-31 Elvis C-COSMOS 8006 16.4 ACIS-I NONE 26.7 2007-01-01 Elvis C-COSMOS 8017 12.9 ACIS-I NONE 46.8 2007-04-04 Elvis C-COSMOS 8123 24.2 ACIS-I NONE 49.1 2007-04-07 Elvis C-COSMOS 8011 15.0 ACIS-I NONE 47.1 2007-04-04 Elvis C-COSMOS 7997 32.1 ACIS-I NONE 45.4 2006-12-30 Elvis C-COSMOS 8022 22.5 ACIS-I NONE 31.8 2007-05-10 Elvis C-COSMOS 8555 22.5 ACIS-I NONE 16.6 2007-05-12 Elvis C-COSMOS 8549 20.3 ACIS-I NONE 17.8 2007-05-05 Elvis C-COSMOS 8124 20.3 ACIS-I NONE 31.7 2007-04-08 Elvis C-COSMOS 8550 20.9 ACIS-I NONE 23.2 2007-04-18 Elvis C-COSMOS 8016 20.9 ACIS-I NONE 23.9 2007-04-19 Elvis C-COSMOS 8001 27.0 ACIS-I NONE 48.7 2007-04-02 Elvis C-COSMOS 8010 22.3 ACIS-I NONE 33.6 2007-04-27 Elvis C-COSMOS 8553 22.3 ACIS-I NONE 14.8 2007-04-29 Elvis C-COSMOS 7996 34.2 ACIS-I NONE 46.3 2006-12-28 Elvis C-COSMOS 8552 26.1 ACIS-I NONE 14.8 2007-04-26 Elvis C-COSMOS 8005 26.1 ACIS-I NONE 31.6 2007-04-25 Elvis C-COSMOS 8000 31.6 ACIS-I NONE 46.7 2007-05-26 Elvis C-COSMOS 7995 38.0 ACIS-I NONE 46.1 2007-06-01 Elvis C-COSMOS # re-run 'find_chandra_obsid' and use 'awk' to grab the first column # listing the ObsIDs and redirect into an ASCII stack file unix% find_chandra_obsid "10:00:00" "02:00:00" radius=30 download=none | grep "C-COSMOS" \ ? | awk '{print $1}' > obsids.lis # download the data unix% download_chandra_obsid @obsids.lis... lots of screen output ...unix% chandra_repro "*" outdir=""... lots of screen output ...
Table of the 49 ACIS ObsIDs composing the C-COSMOS field.
| ObsID | RA | Declination | 
|---|---|---|
| 8027 | 09:58:36.96 | +01:58:41.53 | 
| 8021 | 09:58:47.90 | +02:06:12.48 | 
| 8015 | 09:58:58.85 | +02:13:43.42 | 
| 8026 | 09:59:07.03 | +01:55:57.49 | 
| 8009 | 09:59:09.81 | +02:21:14.70 | 
| 8020 | 09:59:17.97 | +02:03:28.44 | 
| 8004 | 09:59:20.76 | +02:28:45.64 | 
| 8482 | 09:59:20.76 | +02:28:45.64 | 
| 8483 | 09:59:20.76 | +02:28:45.64 | 
| 8014 | 09:59:28.92 | +02:10:59.38 | 
| 7999 | 09:59:31.72 | +02:36:16.58 | 
| 8478 | 09:59:31.72 | +02:36:16.58 | 
| 8025 | 09:59:37.10 | +01:53:13.47 | 
| 8008 | 09:59:39.88 | +02:18:30.67 | 
| 8019 | 09:59:48.04 | +02:00:44.41 | 
| 8003 | 09:59:50.83 | +02:26:01.61 | 
| 8013 | 09:59:58.99 | +02:08:15.36 | 
| 7998 | 10:00:01.79 | +02:33:32.55 | 
| 8493 | 10:00:01.79 | +02:33:32.55 | 
| 8024 | 10:00:07.17 | +01:50:29.44 | 
| 8007 | 10:00:09.95 | +02:15:46.64 | 
| 8497 | 10:00:09.95 | +02:15:46.64 | 
| 8018 | 10:00:18.11 | +01:58:00.38 | 
| 8002 | 10:00:20.90 | +02:23:17.58 | 
| 8496 | 10:00:20.90 | +02:23:17.58 | 
| 8012 | 10:00:29.06 | +02:05:31.33 | 
| 8122 | 10:00:31.85 | +02:30:48.52 | 
| 8494 | 10:00:31.85 | +02:30:48.52 | 
| 8023 | 10:00:37.24 | +01:47:45.41 | 
| 8006 | 10:00:40.02 | +02:13:02.61 | 
| 8503 | 10:00:40.02 | +02:13:02.61 | 
| 8017 | 10:00:48.18 | +01:55:16.35 | 
| 8123 | 10:00:50.97 | +02:20:33.55 | 
| 8011 | 10:00:59.13 | +02:02:47.30 | 
| 7997 | 10:01:01.92 | +02:28:04.50 | 
| 8022 | 10:01:07.30 | +01:45:01.39 | 
| 8555 | 10:01:07.30 | +01:45:01.39 | 
| 8124 | 10:01:10.08 | +02:10:18.59 | 
| 8549 | 10:01:10.08 | +02:10:18.59 | 
| 8016 | 10:01:18.25 | +01:52:32.34 | 
| 8550 | 10:01:18.25 | +01:52:32.34 | 
| 8001 | 10:01:21.03 | +02:17:49.54 | 
| 8010 | 10:01:29.19 | +02:00:03.29 | 
| 8553 | 10:01:29.19 | +02:00:03.29 | 
| 7996 | 10:01:31.99 | +02:25:20.48 | 
| 8005 | 10:01:40.15 | +02:07:34.57 | 
| 8552 | 10:01:40.15 | +02:07:34.57 | 
| 8000 | 10:01:51.10 | +02:15:05.52 | 
| 7995 | 10:02:02.05 | +02:22:36.46 | 
You will need to redefine the sky coordinates of all these observations by reprojecting the events to a common tangent plan, which will then be displayed as the set of obervations covering the central portion of the COSMOS field.
Stacking the observations to a common sky tangent point returns the expected wide mosaic rather than being erroneously clustered in a narrow portion of the sky plane.
unix% merge_obs infiles="*/repro/*evt*.fits" \ ? outroot="merge_reproj_Dec-90/" bands=broad binsize="16" \ ? refcoord="0:0:0 -89:59:59.999999" clobber+
Figure 1: Stacked, Reprojected Observations
![[Thumbnail image: combined observations near the pole, individual observations reprojected to a common tangent plane]](imgs/reproj_good.thmb.png)
[Version: full-size]
![[Print media version: combined observations near the pole, individual observations reprojected to a common tangent plane]](imgs/reproj_good.png)
Figure 1: Stacked, Reprojected Observations
The mosaicked image stacking the images reprojected to a common tangent plane. In this example, the exposure-corrected images are shown.
dmmerge-ing these event files will result in using the sky to WCS transform defined in the first input file to the tool.
An example of directly using the reprocessed data sets and merging event files that have not been projected to a common sky plane.
unix% ls */repro/*evt* > evt.lis unix% dmmerge infile="@evt.lis[cols time,expno,ccd_id,node_id,chip,\ ? tdet,det,sky,pha,energy,fltgrade,grade,status]" \ ? outfile="merge_no-reproj.evt" clobber+
Figure 2: Stacked Observations without Reprojecting
![[Thumbnail image: combined observations of the C-COSMOS field which without reprojecting each observation]](imgs/reproj_bad.thmb.png)
[Version: full-size]
![[Print media version: combined observations of the C-COSMOS field which without reprojecting each observation]](imgs/reproj_bad.png)
Figure 2: Stacked Observations without Reprojecting
dmmerge 49 ObsIDs without first reprojecting the observations to a common tangent plane. The sky coordinates of the first observation listed in the input stack is used as the reference coordinate system.
Figure 3 visualizes the difference by overlaying the field-of-view of the set of reprojected observations on to the non-reprojected merged data set.
Figure 3: Compare Reprojected FOV with Non-Reprojected Data
![[Thumbnail image: FOV of reprojected observations overlaid on the non-reprojected data set]](imgs/C-COSMOS_overlayFOV.thmb.png)
[Version: full-size]
![[Print media version: FOV of reprojected observations overlaid on the non-reprojected data set]](imgs/C-COSMOS_overlayFOV.png)
Figure 3: Compare Reprojected FOV with Non-Reprojected Data
The FOV of the stacked, reprojected images of the C-COSMOS field overlaid on the non-reprojected data for comparison.
Source Detection with Merged PSF Maps
A primary reason users typically [spatially] stack observations is to detect faint sources. The CIAO celldetect and wavdetect tools can make use of information about the PSF size at a given point to either determine the size of a detect cell at that point or determine the optimum wavelet size that matches the size of the sources being detected (for point sources, this scale is comparable to the size of the PSF), for the respective tool using their psffile parameter with PSF maps.
A discussion may be found in the Running wavdetect or celldetect on Merged Data: Choosing psffile thread and a brief overview of the strengths and weaknesses between the two available detection methods (vtpdetect, a Voronoi tessellation and percolation source detection technique does not make use of PSF maps) in CIAO are summarized below.
celldetect
Pros
- Fast and Robust
- Works well for point sources
- Detailed PSF shape not needed, only approximate size
Cons
- Handling extended sources can be difficult without a careful selection of cell sizes
- Can get confused in crowded fields
- Exposure maps needed to eliminate edge sources
- Not very sensitive unless background maps are used, but these maps can be challenging to generate
wavdetect
Pros
- Works well in crowded fields
- Works well for point sources superimposed on extended emission
- Only approximate PSF shape is needed
- Edge-of-field effects not a problem
Cons
- Slow, especially if many wavelet scales used
- Memory-intensive and very large image sizes can pose a problem
From this point forward the discussion will be on wavdetect since it is the most widely used CIAO source detection tool. In wavdetect, providing an exposure map will help with eliminating the vast majority of false detections along the chip edges. The most notable feature of the thread referenced above is the discussion regarding the various methods to combine PSF maps.
![[TIP]](../imgs/tip.png)
The pixel values of a PSF map represents the radius of a circular PSF at that pixel for the given energy and ecf parameter values passed to mkpsfmap in the units that are also specified in the tool—with the assumption that a flat detector is used by the tool. Also realize that the circular assumption used will completely fall apart at large off-axis angles, just based on the changes to the PSF morphology and size.
Scattered throughout the existing documentation related to generating PSF maps (with mkpsfmap and dependent scripts such as fluximage/merge_obs/flux_obs) there are instances where a PSF 'encircled counts fraction' of ecf=0.393 is used to generate PSF maps. An ECF of 39.3% corresponds to the 1\(\sigma\) integrated volume of a 2D Gaussian and in the examples it is chosen so that point sources with only a few counts—which will be concentrated in the core of the PSF—to be better detected.
While wavdetect cannot compute PSF sizes if the tool is not provided with a PSF map, it will still run to completion. The subsequent source characteristics will not be reliable, and if the estimated size is wrong then the wrong wavelet scales may be used to determine the source parameters, thus the source properties will be erroneously estimated. However, the source detection process itself is unaffected. That is, when a PSF map is provided, the PSF sizes are used by the tool to determine the detected source significance and whether or not it is included in the source list, thus limiting the number of spurious detections.
![[NOTE]](../imgs/note.png)
wavdetect is basically a wrapper script around the wtransform and wrecon tools. Source detection is performed with wtransform, which provides a list of source candidates and wrecon tests these candidates for significance, providing wavdetect with a source list file of significant detections.
In the detection step, wtransform does not use the PSF information but uses a Ricker (aka "Mexican Hat") wavelet function at different size-scales to correlate the input image with the wavelets. Pixels with a high-correlation are considered sources. The tool then uses an estimated background and the selected detection threshold (sigthresh) to determine whether or not the sources from the correlated pixels can then be considered candidate source detections.
In the next step the PSF map is important. Since wrecon will take the candidate sources and smooth them at various wavelet scales, the source is only statistically significant—and kept as a source—when smoothed with a wavelet scale greater than the PSF size and is then reconstructed using the wavelet scale closest to the PSF size. The estimated source size and position are also improved during this process. Without a PSF map the wavelets are scaled with the image pixels, which is only reliable in the case where the PSF size does not vary in the region being examined.
As an aside—and frequently asked point for clarification—pertaining to the ellsigma parameter utilized by both wavdetect and celldetect is that it has no relationship to mkpsfmap's ecf value used to generate PSF maps and it does not affect the detections themselves. ellsigma is an optional parameter with a default value of 3\(\sigma\) of the distribution of counts within the source cell (the observed counts) assuming a Gaussian distribution. For example, if ellsigma=1, then the elliptical source region will contain 39.3% of the total observed source counts but is only used as a scaling factor on the major- and minor-axes of the ellipses for each detection for visualization purposes. The scaling enhances the visualization of the source regions when the source list is overlaid onto the data and does not affect the results from the source detection process or other determined source properties.
Ultimately, providing a PSF will affect the wavdetect/celldetect results, giving a shorter source list with better estimated source properties. While the PSF size information does not affect the actual detection process, since it is not used, but the PSF map definitely affects the final source list, as it is used for the significance test to reject false detections.
Merging MARX-projected Simulated PSFs
Whether the HRMA PSF raytracing is performed through SAOTrace/ChaRT or MARX, the most widely used technique to project the simulated photons on to the Chandra detector plane—and through an instrument model—is with MARX. There are a couple ways to go about this, but if the intention is to stack simulated PSFs for a set of observations, a separate PSF will need to be generated for each individual ObsID in all cases.
Once the raytraces are performed each realization must be projected on to the detector-plane. Most users will do this through the simulate_psf CIAO script—which facilitates the use of MARX through a simplified interface—rather than directly interacting with the software suite. A straight forward workflow would be to begin with the observed set of data and reproject them, as one would to generate a merged data set in the standard manner, with reproject_obs or merge_obs. Since these scripts will also output the reprojected event files on a common tangent plane (the only observation without a corresponding reprojected file would be the first observation listed input, since it is used as the reference tangent plane unless the tangent point is explicitly set with the refcoord parameter) as part of their standard output result set, the individual reprojected events files can then be used as the reference files to project each individual PSF raytrace files on to the detector with MARX or simulate_psf (using the infile parameter for the latter).
Since all the PSFs have been projected on to a common 'sky'-plane—with a common WCS transform defining their celestial coordinates—the dmmerge tool may then be used to directly combine the projected PSF event files.
![[TIP]](../imgs/tip.png)
Something to also keep in mind about which may make it difficult to use the PSF events files projected by MARX is that they may not always have the expected columns, keywords, or data-type formats that CIAO would otherwise expect. Typically these issues may be worked around by excluding columns or modifying the file headers.
CAVEATS
TBD