The Chandra Source Catalog (CSC) project recently reached a
      significant milestone, completing a science review conducted by
      an international committee of experts in X-ray astronomy and
      astronomical catalog construction.  The committee described the
      development of the Chandra Source Catalog as "an important,
      useful, and exciting project" that is in addition "blazing a
      path for other facilities."
The CSC will be the definitive catalog of all X-ray sources detected by the Chandra X-ray Observatory, and is intended to provide simple access to Chandra data for individual sources (or sets of sources matching user-specified search criteria) to satisfy the needs of a broad-based group of scientists, including those who may be less familiar with astronomical data analysis in the X-ray regime. For each X-ray source, the catalog will list the source position and a detailed set of source properties, including commonly used quantities such as source dimensions, multi-band fluxes, and hardness ratios. In addition to these traditional elements, the catalog will include additional source data that can be manipulated interactively, by the user, including images, event lists, light curves, and spectra for each source individually from each observation in which a source is detected.
Design Goals
     The primary design goals for the CSC are to
      
 (1) allow simple and quick access to the best estimates of
      the X-ray source properties and Chandra data for individual
      sources with good scientific fidelity, and directly support
      medium sophistication scientific analysis on the individual
      source data;
  (2) facilitate easy searches and analysis of a
      wide range of statistical properties for classes of X-ray
      sources; 
 (3) provide a user interface that supports
      searching and manipulating the actual observational data for
      each X-ray source in addition to the tabular properties that are
      recorded in the catalog; and 
 (4) include all real X-ray
      sources detected down to a predefined threshold level in all of
      the public Chandra datasets used to populate the catalog, while
      maintaining the number of spurious sources at an acceptable
      level.
Catalog Releases, Characterization, and 
User Interfaces
     The CSC will be released to the user community in a series of
      increments with increasing capability.  The first release of the
      catalog is expected to include a subset of the elements
      described here.  Each release of the catalog, including the
      first, will be accompanied by a detailed characterization of the
      statistical properties of the catalog to a well defined, high
      level of reliability.  Key properties that will be characterized
      include limiting sensitivity, completeness, false source rates,
      astrometric and photometric accuracy, and variability
      information. 
     Both the tabulated source properties and the individual pointed
      observation source data (source images, event lists, etc.) that
      comprise the CSC will be stored in the Chandra Data Archive.
      The former will be recorded in SQL databases, and the latter
      will be stored as FITS files.  Using this approach leverages
      existing archive software and provides a file-based interface
      that is compatible with existing CIAO tools.  To ensure
      traceability, a history of updates will be maintained so that
      the state of the database at any point in time is
      recoverable.
     The contents of each catalog release will be carefully
      controlled. They will include the subset of sources extracted
      from the database that pass quality assurance checks and conform
      to the characterization requirements established for the
      release.
     Access to the catalog will be provided in two
      ways.  First, a graphical user interface will be provided that
      will allow users to peruse the catalog, perform queries on real
      or virtual columns, display results, and download data files.
      The user interface will comply with published Virtual
      Observatory standards such as SIAP, VOTable, and ADQL.  Second,
      an application programming interface will provide direct access
      to the catalog and data from users' data analysis scripts.  Such
      scripts will be able to query the catalog, directly download
      data, and manipulate the data to perform further searches.  The
      first release of the catalog is expected to include a simplified
      user interface that does not include all of the capabilities
      described here.
Catalog Construction
     Each pointed observation may include
      data for tens to hundreds of X-ray sources within the field of
      view.  In addition to recording tabular catalog information
      about each source individually, the design goals mandate that
      the actual observational data for each source individually be
      extracted from each pointed observation that includes that
      source, and be accessible through the catalog.
     The bulk
      of the CSC construction will be performed by a set of CXC Data
      System processing pipelines that run using the same automated
      processing (AP) infrastructure that is used to run standard data
      processing.  For performance reasons, the compute-intensive
      pipeline processing will be done on a multi-node Beowulf cluster
      running Linux, and the AP infrastructure has recently been
      modified to support this capability.  Most of the catalog
      processing steps are performed using existing CIAO data analysis
      software, although several new tools have been or are being
      developed for certain tasks that cannot be done using the
      current CIAO tool set.  These new tools will be included in
      future CIAO releases.
Catalog construction is conceptually a two part process. First, the observational data for each non-proprietary pointed observation is processed to identify the X-ray sources and extract the per-source data and source properties. Second, the source properties for each source that is detected in more than one pointed observation must be reconciled.
 
Pointed Observation Pipeline Processing
     Processing the
      data from each pointed observation is performed in two steps.
      The first step handles the data for the entire field of view,
      detects sources and extracts data for each source for further
      processing.  The second step processes the data for each
      detected source and extracts the source properties that will
      ultimately be merged into the master catalog.
     The Detect
      Sources Pipeline reprocesses each pointed observation using a
      defined set of calibrations and processing algorithms.  Full
      field exposure maps and background maps are generated, and
      sources are detected in several energy bands.  Source and
      background regions for each detected source are determined for
      use in the subsequent Per-Source Pipeline.
     The
      Per-Source Pipeline executes for each detected source.  The
      pipeline extracts the photon events in the rectangle bounding
      the background region for the source, and construct a "postage
      stamp" image of the region together with a full resolution
      exposure map and an image of the point spread function (PSF) at
      the source position. Various spatial, spectral, and temporal
      source properties are extracted at this point, both directly
      from the data, and by performing model fitting (for example,
      fitting the PSF to the postage stamp image) where
      appropriate.
 
Merge Pipeline Processing
     The main functionality of the
      Merge Pipeline is to take the source properties extracted from
      each pointed observation in which the source is detected and
      merge them together for inclusion in the catalog.  The pipeline
      first performs a cross-match to determine if the source was
      detected in any other pointed observations.  If no matches are
      found, then the new source properties are promoted.  However, if
      one or more candidate matching sources are found, then the Merge
      Pipeline must identify which of those candidates match the
      current source.  There is not a one-to-one relationship
      primarily because the Chandra PSF varies significantly across
      the field of view.  Once matching sources are identified, the
      source information is updated by merging the source properties
      from the contributing pointed observations based on a set of
      merging rules.