Status of the Chandra Source Catalog Project

The Chandra Source Catalog (CSC) project recently reached a significant milestone, completing a science review conducted by an international committee of experts in X-ray astronomy and astronomical catalog construction. The committee described the development of the Chandra Source Catalog as "an important, useful, and exciting project" that is in addition "blazing a path for other facilities."

The CSC will be the definitive catalog of all X-ray sources detected by the Chandra X-ray Observatory, and is intended to provide simple access to Chandra data for individual sources (or sets of sources matching user-specified search criteria) to satisfy the needs of a broad-based group of scientists, including those who may be less familiar with astronomical data analysis in the X-ray regime. For each X-ray source, the catalog will list the source position and a detailed set of source properties, including commonly used quantities such as source dimensions, multi-band fluxes, and hardness ratios. In addition to these traditional elements, the catalog will include additional source data that can be manipulated interactively, by the user, including images, event lists, light curves, and spectra for each source individually from each observation in which a source is detected.

Design Goals

The primary design goals for the CSC are to
(1) allow simple and quick access to the best estimates of the X-ray source properties and Chandra data for individual sources with good scientific fidelity, and directly support medium sophistication scientific analysis on the individual source data;
(2) facilitate easy searches and analysis of a wide range of statistical properties for classes of X-ray sources;
(3) provide a user interface that supports searching and manipulating the actual observational data for each X-ray source in addition to the tabular properties that are recorded in the catalog; and
(4) include all real X-ray sources detected down to a predefined threshold level in all of the public Chandra datasets used to populate the catalog, while maintaining the number of spurious sources at an acceptable level.

Catalog Releases, Characterization, and
User Interfaces

The CSC will be released to the user community in a series of increments with increasing capability. The first release of the catalog is expected to include a subset of the elements described here. Each release of the catalog, including the first, will be accompanied by a detailed characterization of the statistical properties of the catalog to a well defined, high level of reliability. Key properties that will be characterized include limiting sensitivity, completeness, false source rates, astrometric and photometric accuracy, and variability information.

Both the tabulated source properties and the individual pointed observation source data (source images, event lists, etc.) that comprise the CSC will be stored in the Chandra Data Archive. The former will be recorded in SQL databases, and the latter will be stored as FITS files. Using this approach leverages existing archive software and provides a file-based interface that is compatible with existing CIAO tools. To ensure traceability, a history of updates will be maintained so that the state of the database at any point in time is recoverable.

The contents of each catalog release will be carefully controlled. They will include the subset of sources extracted from the database that pass quality assurance checks and conform to the characterization requirements established for the release.
Access to the catalog will be provided in two ways. First, a graphical user interface will be provided that will allow users to peruse the catalog, perform queries on real or virtual columns, display results, and download data files. The user interface will comply with published Virtual Observatory standards such as SIAP, VOTable, and ADQL. Second, an application programming interface will provide direct access to the catalog and data from users' data analysis scripts. Such scripts will be able to query the catalog, directly download data, and manipulate the data to perform further searches. The first release of the catalog is expected to include a simplified user interface that does not include all of the capabilities described here.

Catalog Construction

Each pointed observation may include data for tens to hundreds of X-ray sources within the field of view. In addition to recording tabular catalog information about each source individually, the design goals mandate that the actual observational data for each source individually be extracted from each pointed observation that includes that source, and be accessible through the catalog.
The bulk of the CSC construction will be performed by a set of CXC Data System processing pipelines that run using the same automated processing (AP) infrastructure that is used to run standard data processing. For performance reasons, the compute-intensive pipeline processing will be done on a multi-node Beowulf cluster running Linux, and the AP infrastructure has recently been modified to support this capability. Most of the catalog processing steps are performed using existing CIAO data analysis software, although several new tools have been or are being developed for certain tasks that cannot be done using the current CIAO tool set. These new tools will be included in future CIAO releases.

Catalog construction is conceptually a two part process. First, the observational data for each non-proprietary pointed observation is processed to identify the X-ray sources and extract the per-source data and source properties. Second, the source properties for each source that is detected in more than one pointed observation must be reconciled.

Pointed Observation Pipeline Processing

Processing the data from each pointed observation is performed in two steps. The first step handles the data for the entire field of view, detects sources and extracts data for each source for further processing. The second step processes the data for each detected source and extracts the source properties that will ultimately be merged into the master catalog.
The Detect Sources Pipeline reprocesses each pointed observation using a defined set of calibrations and processing algorithms. Full field exposure maps and background maps are generated, and sources are detected in several energy bands. Source and background regions for each detected source are determined for use in the subsequent Per-Source Pipeline.
The Per-Source Pipeline executes for each detected source. The pipeline extracts the photon events in the rectangle bounding the background region for the source, and construct a "postage stamp" image of the region together with a full resolution exposure map and an image of the point spread function (PSF) at the source position. Various spatial, spectral, and temporal source properties are extracted at this point, both directly from the data, and by performing model fitting (for example, fitting the PSF to the postage stamp image) where appropriate.

Merge Pipeline Processing

The main functionality of the Merge Pipeline is to take the source properties extracted from each pointed observation in which the source is detected and merge them together for inclusion in the catalog. The pipeline first performs a cross-match to determine if the source was detected in any other pointed observations. If no matches are found, then the new source properties are promoted. However, if one or more candidate matching sources are found, then the Merge Pipeline must identify which of those candidates match the current source. There is not a one-to-one relationship primarily because the Chandra PSF varies significantly across the field of view. Once matching sources are identified, the source information is updated by merging the source properties from the contributing pointed observations based on a set of merging rules.

Ian Evans for the Chandra Source Catalog project team