Catalog Statistical Properties

1. Introduction

The Chandra Source Catalog (CSC) is the product of a series of complex data processing pipelines. Statistical characterization of catalog source properties is accomplished primarily through the use of simulated datasets, including both empty fields (blank-sky) and simulated sources. These simulated datasets are processed by the catalog pipelines in the same fashion as real datasets. We present here a summary of these results, primarily for the B band. Full simulation results may be found at http://space.mit.edu/home/houck/tmp.

2. Overall Properties

2.1 Distribution on the Sky

The CSC contains ~136000 individual source entries from ~3900 separate ACIS observations available in the Chandra Public Archive as of Dec. 31, 2008. Because many Chandra targets were observed more than once, these individual source entries correspond to ~94700 unique "master sources". These include both target and serendipitous sources. The distribution on the sky, in galactic coordinates, is shown in Figure 1.

Figure 1: Distribution of CSC sources on the sky, in galactic coordinates.

2.2 Flux Distribution

Although CSC fluxes range from below ~10^-18 erg-cm^-2-sec^-1 (for the deepest exposures) to ~10^-10 erg-cm^-2-sec^-1, most CSC sources have fluxes, as shown in Figure 2, of ~10^-15 - 10^-13 erg-cm^-2-sec^-1 (B band, or 0.5-7.0 keV).

Figure 2: Distribution of CSC fluxes in the broad (black), hard (blue), medium (green), soft (red), and ultrasoft (orange) bands, obtained from the catalog master source table flux_aper columns.

2.3 Field Background

Because event screening performed in the CSC pipeline processing is more aggressive than that done in standard data processing, the non-x-ray background is typically reduced. This is illustrated in Figure 3, in which we display the distribution of chip 0 and chip 7 B Band backgrounds in CSC 10 kilosecond observations, together with the values cited in version 11 of the Proposers' Observatory Guide. The chip backgrounds, as a function of livetime, for chips 0-3 and 5-8, which comprise the overwhelming majority of chips used in the CSC, are shown in Figure 4.

Figure 3: Histogram of B Band Backgrounds for chip 0 (black) and chip 7 (red) for CSC 10 ksec. observations. Backgrounds are estimated by computing total B Band counts per chip, minus the sum of B Band source counts (src_cnts_aper_b) from all sources on the chip. Livetimes between 9500 and 10500 seconds are included.

Figure 4: Field background per chip vs. livetime for commonly used ACIS imaging chips, estimated from CSC event lists with source contributions removed. Median backgorund counts per bin are indicated by horizontal lines. Boxes include 95% of the measurements in each bin, and vertical lines indicate extreme values.

3. Limiting Sensitivity and Sky Coverage

A limiting sensitivity map is computed for each obsid that contributes to the CSC, in each of the 5 science energy bands. The maps are derived from the CSC model background maps for the obsid, randomized to the appropriate statistics for the observation. Each sensitivity map pixel represents the minimum point source photon flux needed to yield a flux significance greater than or equal to the catalog inclusion limit (3σ) at that location, when background is obtained from a region in the randomized background map appropriate to background apertures at that pixel location. An example sensitivity map is shown in Figure 5.

The sky coverage represents the total area in the CSC sensitive to point sources greater than a given flux, as a function of flux. We estimate sky coverage by assigning all non-zero limiting sensitivity map values to all-sky pixels, using the HEALPix projection (Gorski et al. 2005, ApJ, 622, 759), keeping only the most sensitive (i.e., lowest) value in each all-sky pixel. To reduce computational load and size of the projections (i.e., the number of HEALPix pixels), we rebinned the sensitivity maps to block 64 (~31.5'' x ~31.5''), used ~25.8 HEALPix pixels, and assigned rebinned sensitivity map pixels to the nearest HEALPix pixel, ignoring spillover. The resulting sky coverage function for the B band is shown in Figure 6. Total B band sky coverage is ~320 deg².

Figure 5: B Band limiting sensitivity map for obsid 635. Each pixel value represents the minimum point source photon flux needed to yield a flux significance at the catalog inclusion limit, at that pixel location.

Figure 6: B Band Sky Coverage. The value at each flux represents the total CSC area sensitive to point sources with fluxes at least that large.

4. Source Detection

4.1 False Source Rate

To estimate false source rates, we conducted a series of blank-sky simulations at exposures of ~10, ~30, ~60, and ~120 kiloseconds, for typical ACIS-I and ACIS-S chip configurations, using actual CSC event lists to define the metadata for the simulation. For each simulation, a template background event list for each active chip was used to define the overall spatial variations, and the total number of background events were determined from the nominal field background rates and actual exposure. For all chips except chip 8 the template background event lists in the Chandra Calibration Database were used; for chip 8, no adequate template was available, so one was constructed from a number of CSC event lists with no bright sources in chip 8. An example simulated observation is shown in Figure 7.

Figure 7: An example simulated event list using the metadata for obsid 4613. A total of 25 simulation runs were performed for this obsid, yielding 30 source detections that passed CSC inclusion criteria. These detections are shown in red.

Each simulated event list was then processed using the standard CSC source detection and properties software, and the resulting source detections that would have been included in the catalog were tabulated. The results are shown in Table 1.

Table 1: False Source Rates derived from blank-sky simulations. For this set of simulations, background data for
chip 8 were unavailable; the false source rate was renormalized to account for the missing chip.

OBSID	ACIS Configuration	Exposure (ksec)	#Sources (#Runs)	False Source Rate
379	ACIS-I	9	0 (50)	0.0
1934	ACIS-I	29	0 (50)	0.0
4497	ACIS-I	68	11 (50)	0.22
927	ACIS-I	125	64 (50)	1.28
5337	ACIS-S	10	1 (50)	0.02
4404	ACIS-S	30	5 (50)	0.12¹
7078	ACIS-S	51	5 (24)	0.21
4613	ACIS-S	118	30 (25)	1.2

Column 1: obsid from which observation metadata were chosen;
Column 2: detector configuration; active chips for ACIS-I were 0, 1, 2, 3, 5, 6; those for ACIS-S were 2, 3, 5, 6, 7, 8;
Column 3: observation livetime;
Column 4: numbers of source detections and runs;
Column 5: mean false source rate (sources per field per run).

As can be seen in Table 1, the false source rate is appreciable only for exposures longer than ~50 kiloseconds. There is also some evidence for a clustering of false source detections near chip edges and between the back- and front-illuminated chips. To investigate these effects further, we considered the longest ACIS-I and ACIS-S simulation sets, and examined the false source rate separately near chip edges and interfaces. The results for obsid 927 are shown in Figure 8 and for obsid 4613 in Figure 9, and demonstrate that false source rates are enhanced in these regions.

Figure 8: False source rates as a function of flux significance for obsid 927. The maximum flux significance of all science bands is used. Left: Single-chip sources are those whose source regions cover only a single chip, as indicated by the multi-chip code. Chip 6-7 sources are those whose source regions dither across chips 6 and 7. Right: Sources near edges are those whose source regions dither off a chip edge during the observation.

Figure 9: False source rates as a function of flux significance for obsid 4613. The definitions for different subsets are the same as in Figure 8.

4.2 Detection Efficiency

We estimate these quantities through the use of point source simulations. Starting with the aspect solutions and metadata for the blank sky simulations described in Section 4.1, we used MARX to generate X-ray photons incident from a spatially random distribution of point sources. Separate simulations were generated for power-law (index=1.7) AND blackbody (kT=1.0 keV), spectra, all with an absorbing column of N_H = 3 x 10²⁰ cm^-2 . Point sources for all active chips in both the ACIS-I and ACIS-S blank-sky simulations were generated. Source fluxes were drawn from a powerlaw N > S distribution with index 1.5. The overall N > S normalization was adjusted to yield a few hundred detectable sources per simulation, a compromise aimed at reducing source confusion while limiting the total number of simulations required to obtain good statistics for the characterization. The effects of photon pileup and observation-specific bad pixels were included by post-processing each simulation with marxpileup and acis_process_events, respectively. The source events from the MARX simulations were then merged with the appropriate simulated blank-sky event lists, keeping only MARX-simulated source events that fell on active chips in the observation. As with the blank-sky simulations, simulated event lists were then processed using the standard CSC source detection and properties software, and the resulting source detections that would have been included in the catalog were tabulated. Finally, these sources were cross-referenced with the input source lists to allow a source-by-source comparison of input and derived properties.

To estimate detection efficiency, we compared the measured and input N > S distributions. The ratio of these two distributions represents the fraction of input sources of a given incident flux that are actualy detected. Results for the B band detections for the shortest and longest ACIS-I and ACIS-S simulation sets are shown in Figures 10 - 13. The full set of efficiency curves for all simulation sets in all bands may be found at http://space.mit.edu/home/houck/tmp/det_eff for both powerlaw and blackbody spectra.

Figure 10: Detection Efficiency for ~9 kilosecond ACIS-I simulations of sources with powerlaw spectra.

Figure 11: Detection Efficiency for ~125 kilosecond ACIS-I simulations of sources with powerlaw spectra.

Figure 12: Detection Efficiency for ~10 kilosecond ACIS-S simulations of sources with powerlaw spectra.

Figure 13: Detection Efficiency for ~118 kilosecond ACIS-S simulations of sources with powerlaw spectra.

5. Astrometry

5.1 Relative Astrometry

To estimate the relative astrometric precision of the CSC, we use the point source simulations described in Section 4.2, and compare input and detected source positions. Results for B band source detections are shown in Figure 14. Within 10' of the aimpoint, 95% of all detected sources have relative positional uncertainies < 2'', regardless of flux, and the median uncertainty is < 0.4'' . For regions beyond 10' the corresponding values are < 20'' and < 3'' respectively, except for the faintest sources (photon flux < ~10^-6 ph-cm^-2-s^-1 ).

Figure 14: Angular separations between input and measured source positions in the B band, as a function of input flux. Median separations are indicated by horizontal lines. Boxes include 95% of the measurements in each bin, and vertical lines indicate extreme values. Bins in red contain fewer than 100 measurements; bins in blue contain 100-400 measurements; bins in black contain more than 400 measurements.

5.2 Absolute Astrometry

To estimate absolute astrometric precision of the CSC, we have cross-referenced the CSC with Release 5 of the Sloan Digital Sky Survey QSO Catalog (Schneider et al. 2007, AJ, 134, 102). Like the CSC, the SDSS QSO Catalog is registered to the International Celestial Reference System; it has typical positional uncertainies of ~0.1'' per coordinate. Although the degree of overlap between the two catalogs is small, QSOs commonly emit x-rays, enhancing the probability of finding CSC counterparts to the SDSS sources. To match a CSC source with an SDSS QSO, we require an angular separation of < 5 x (σCSC + σSDSS). We find a total of 1488 matches. The distribution of position offsets, Δ , is shown in Figure 15a. The mean offset is ~0.22'' for CSC sources within 3' of the aimpoint, ~0.62'' for sources within 10' and ~1.2'' overall.

Figure 15: Distribution of Position Offsets for CSC sources which are identified with QSOs from the SDSS QSO Survey.

To investigate possible systematic errors in the CSC position uncertainties, we have also computed the normalized separations, z, between the CSC sources and their QSO counterparts, defined as z = Δ/√(σ²CSC + σ²SDSS). In the absence of systematic errors, these normalized offsets should follow a Rayleigh Distribution, R(z) ~ ze^-z²/2. The distribution of normalized separations, together with the Rayleigh Distribution, normalized to the same number of sources, is shown in Figure 15b. There is a excess of sources at small z, indicating that the positional uncertainties of the CSC sources may be slightly overestimated. Additional work is required to quantify this overestimate.

6. Fluxes

To assess the accuracy of CSC source fluxes, we compare the input and measured fluxes of the simulated sources. Results for the powerlaw and blackbody simulation sets are shown in Figures 16 and 17, for the B band and indicate good agreement for sources within 10' of the aimpoint. For sources beyond 10', there appears to be a systematic overestimate of a factor of ~2 for sources fainter than ~3 x 10^-6 ph-cm^-2-s^-1.

Figure 16: Comparison of input and measured B band fluxes for sources with powerlaw spectra. Bins in red contain fewer than 100 measurements; bins in blue contain 100-400 measurements; bins in black contain more than 400 measurements.

Figure 17: Comparison of input and measured B band fluxes for sources with blackbody spectra. Bins in red contain fewer than 100 measurements; bins in blue contain 100-400 measurements; bins in black contain more than 400 measurements.

To investigate this systematic error in more detail, we compute the fractional difference between input and measured fluxes, normalized by the fractional errors in measured fluxes. Representative plots of this quantity are shown in Figures 18 - 19.

Figure 18: Fractional difference between input and measured fluxes, normalized by measured fractional error, for sources with powerlaw spectra, in the B band. The smooth curves show the predicted systematic error for exposure times of 9 ksec (blue) and 125 ksec (red).

Figure 19: Fractional difference between input and measured fluxes, normalized by measured fractional error, for sources with powerlaw spectra, in the S band. The smooth curves show the predicted systematic error for exposure times of 9 ksec (blue) and 125 ksec (red).

Figure 18 demonstrates this systematic error in the faint flux bins. The effect is more prominent in the S band fluxes (Figure 19) in which the measured fluxes appear underestimated at all fluxes. Further investigation of this effect is in progress. Preliminary analysis indicates the effect is due to the assumption of a monochromatic exposure map in computing source fluxes. Models of the systematic errors, based on this assumption, are shown in Figures 18 - 19, and reproduce the general features of the effect. Model calculations indicate the effect is, depending on spectrum, ~10% in the B, M, and H bands, ~20 - 30% in the S band, and ~30% in the U band.

7. Source Size

We again use the point source simulations described in Section 4.2 to investigate the accuracy of CSC source extent measurements. Since all simulated sources were point sources, the fraction of these sources that were determined to have non-zero extent by the CSC source properties software indicates the fraction of false positives in the extent measurement. Results are shown in Figure 20, for sources with both powerlaw and blackbody spectra.

Figure 20: Fraction of simulated sources that are determined to be extended in the B band. The red histogram includes only sources not flagged as confused. The black histogram includes all sources.

8. Variability

The Chandra Source Catalog utilizes three variability tests: Kolmogorov-Smirnov, Kuiper, and Gregory-Loredo. Results from these tests are stored as a probability, p, that the lightcurve in the given band for the indicated variability test is not consistent with being constant (i.e., pure counting noise, modulo source visibility/good time intervals).

For purpose of characterization, a more useful probability is P = 1 - p, which can be taken as the probability that a constant lightcurve would have falsely indicated the detected level of variability. It is further convenient to take the negative log₁₀ of this quantity, or equivalently, log₁₀(P^-1) . For much of the characterization that follows, results are presented in terms of this quantity.

Note that for pure counting noise (i.e., constant) lightcurves, we expect for a "good" test that a fraction, f_P, will yield probabilities P <= f_P, or equivalently, log₁₀(P^-1) >= log₁₀(f_P^-1).

To assess the sensitivity of the variability tests, we have (outside of the source catalog pipeline) created a series of simulated lightcurves with differing durations (from 1 ksec to 160 ksec, utilizing 3.214 sec time bins), mean count rates (ranging from 0.00056 to 0.032 counts per second), and variability properties. Additionally, we have incoporated a simple model of pileup in the simulations, such that if two or more events occur in the same time bin there is a probability that the event will be discarded, or read as a single event.

First, we investigated the sensitivity to "red noise", i.e., variability with a power spectrum that is proportional to Fourier frequency f^-1 . (The frequency range is assumed to cover from the inverse of the lightcurve length to the Nyquist frequency.) The lightcurves were presumed to be statistically stationary, and a variety of fractional root mean square (RMS) variabilities were considered, ranging from 1% to 30%.

In Figures 20 - 23, we show "detection contours" vs. mean rate and fractional RMS variability for the Kolmogorov-Smirnov, Kuiper, and Gregory-Loredo tests, for three different lightcurve lengths of 160, 50, and 20 kiloseconds. The contours show the fraction of simulations whose lightcurves yielded P < 0.01, or equivalently log₁₀(P^-1) > 2, for the given test. These curves give an indication of the sensitivity of the test to aperiodic, red noise variability.

Finally, we show histograms of cumulative fraction of lightcurves detected with a significant variability probability (above some value of log₁₀(P^-1)). Again, for pure Poisson counting noise, we expect that this cumulative fraction will follow P. In Figure 24 we show the expected histogram for no variability as an orange line. Cumulative fraction histograms are shown for a variety of lightcurve lengths, mean count rates, and fractional root mean square variability.

Figure 21: Fraction of simulated sources detected as variable at 99% significance, using the Kolmogorov-Smirnov test.

Figure 22: Fraction of simulated sources detected as variable at 99% significance, using the Kuiper test.

Figure 23: Fraction of simulated sources detected as variable at 99% significance, using the Gregory-Loredo test.

Figure 24: Cumulative fraction of simulated lightcurves detected with a significant probability of variability. The orange histogram is that expected for no variability, assuming Poisson noise. Three sets of rms variability are shown: 30% (solid histogram), 15% (longdash), and 5% (dot-dash).

Catalog Statistical Characterization