7: Maximum Likelihood Estimate (MLE) Algorithm
The Maximum Likelihood Estimate (MLE) algorithm uses a Bayesian approach to estimate the position and likelihood of each detection, by maximizing the likelihood that the observed distribution of counts is generated by a particular 2-D Gaussian source model. It is based on the XMM emldetect (Watson et al. 2009) algorithm. The maximization of the posterior probability distribution is performed using Sherpa (Freeman et al. 2001), and the uncertainties are determined by sampling this posterior using Markov Chain Monte Carlo (MCMC). The likelihood \(\mathcal{L}\) returned by the algorithm refers to the likelihood ratio between a null hypothesis that assumes only the background and an alternative hypothesis that includes the elliptical source, and is therefore a measure of the significance of a source detection. The detailed description of the algorithm can be found in the MLE Specification Document.
Run for each source bundle in an observation stack
In CSC 2.1 processing, the position and likelihood estimates are performed for groups of sources, or bundles, whose detect source regions are contiguously overlapping. It is convenient to process sources in a bundle at the same time, by fitting a single model that accounts for the multiplicity of sources. This can be done in two ways:
- by fitting the sources in the bundle sequentially, starting with the brightest one and freezing its parameters before proceeding to the next source;
- or by simultaneously fitting all source detections in the bundle using a model that accounts for the multiplicity, and that uses the same background region for all detections in the bundle.
The default approach in CSC 2.1 is the first approach, which can result in convergence issues when the overlap between sources is significant, and/or when the number of counts is low. Simultaneous fitting reduces the number of cases with MCMC convergence issues, and therefore the number of manual quality assurances required. In CSC 2.1 we use simultaneous fitting for the first time for sources in bundles of up to 5 sources, whenever the first approach fails to converge. For larger bundles (>5 source detections), the default sequential fitting is always used.
Fitting stack detections across observations
For a given stack detection or bundle of stack detections, MLE performs two fits: a model that assumes point-like sources and a model that assumes extended sources) for each of the observation detections of the source independently, and also for the joint set of all detections in the stack, for each of the energy bands (See Figure 1). From those fits, the one that maximizes the likelihood is selected to populate the fitted parameters in the SRCFIT extension of the mrgsrc3 file, which is available to the users as a data product. The best fit parameters for the stack are then applied to the associated per-obsid detections. The ra and dec columns in the stacked observation detections table represent these best fitted positions. In addition, the columns named likelihood_<band> in the per-observation detections table and the stacked observation detections table contain respectively the likelihoods of each independent per-observation fit, and the joint fit that considers all the observations contributing to the stack. MCMC draws are provided for each of the fits as an end-user data product for the stack detection and observation level fits.
For each stack detection, both fits (point-like source or extended source) are reported in the mrgsrc3 file. The extended model fit accounts for compact sources with model ellipses that are larger than the local PSF. See MLE Fit below for details on the point-source and extended model fits.
MLE Input
The MLE algorithm operates on a set of inputs including the event files, background images, PSF images, etc., all spatially filtered to match a cutout region centered on the source being fitted. If the source is in a bundle, the inputs are centered at the location of the brightest detection. Here is a description of the inputs:
Source Images
Source counts images are created with the binning chosen to produce an image with 6-12 image pixels across the Full Width Half Maximum (FWHM) of the estimated Point Spread Function (PSF) for the off-axis angle at which the source was found. Image pixels are positioned such that the detected source position is at the center of a pixel. For each source, the image is sized to include the bounding box of the respective background region, and then bumped up, if necessary, to ensure even axis lengths. These per-source postage stamp source images are created for each observation in the stack, and each energy band in the observation.
Background Images
Background images are created using mkvtbkg in the background regions defined in the source neighborhood using adaptively smoothed backgrounds, with binning and image size chosen to match the source image. These per-source background images are also created for each observation in the stack and each energy band in the observation.
PSF Images
Ray trace PSF images are created in each energy band for each candidate source in the bundle, for each observation, also at a binning and image size chosen to match the source image. Since PSF images are generated at the source detection location, PSFs are centered on an image pixel. This has the effect of preventing any artificial widening of the PSF due to image pixel quantization. The Sherpa requirement that PSF images have even axes is met through the initial enforcement of even sized image axes in the source image.
Exposure Maps
Exposure maps are generated for each source candidate in each energy band for each observation, also at a binning and image size chosen to match the source image.
Region Files (containing source and background regions)
The region files defining the source and background regions. The source regions are the result of the source detection process using either wavdetect or mkvtbkg. They are single ellipses with no overlapping source or background region excluded. Background regions are an elliptical annulus with outer radii 5 times greater than the source radii, and inner radii that are 10% larger than the source radii. Neighboring sources are excluded from the background region by excluding ellipses based on those sources, but with 10% larger radii. Invalid areas in the pixmask are also excluded from both source and background regions when they overlap those regions.
Pixmask Files
Pixmask files are generated for each observation in the stack and identify valid source detection areas. Invalid detection areas include off-chip areas, inter-chip boundary areas, and regions identifying bright source streaks.
Define MLE Models
For a given source bundle, the 2-D models are defined and fit to the data both individually to each observation and jointly to all observations in the stack to derive the best fit source positions and corresponding model likelihood. When the default sequential fitting approach to the bundle fails to converge, the simultaneous fitting approach defined in the specification document is used, where the fit is performed in a region constructed as the union of the individual source regions. The position, position errors, and likelihoods reported in CSC2 are chosen to be those corresponding to the fit that yields the highest likelihood, regardless of whether this fit is for a single observation or for the full stack. Note that only the sources with a number of counts higher than the threshold of 3 counts are included in the fit of a single observation (obsid).
Background Model
The background is estimated for each detection by fitting the counts in the generated background image for that source with a model defined as the product of the background image and the exposure map. The parameter to adjust is the normalization factor of the background that best reproduces the observed data. If a simultaneous fit is being performed, the common background region is constructed as the union of the individual detection’s background regions.
Source Model
The assumed source model for a point source is a 2-D Gaussian model defined in Sherpa as sigmagauss2d, which has the following model parameters: location of the center (x,y), amplitude (ampl), the variance for the semi-major and semi-minor axes (sigma_a and sigma_b), and the position angle of the major axis (theta).
This model is convolved with the Chandra PSF and fit to the counts image in the relevant source region. The exposure map images and background images are accounted for in the full model expression:
\[ model = \left(background \times exposure\ map \right) + \left[ \left(\mathrm{PSF} \ast \mathtt{sigmagauss2d} \right) \times exposure\ map \right] \]MLE Fit
The source candidate positions (parameters x, y) are fit within the minimal bounding box enclosing the source region. Fitting is performed in physical coordinates using the C-statistic and the Monte Carlo (moncar) algorithm in Sherpa. For each source, the fitting is performed with both the "point source" model and the "extended source" model.
When fitting with the point source model, only the Gaussian amplitude and two position parameters are thawed (the two sigma parameters are frozen at one image pixel), whereas for the extended source model all Gaussian parameters are thawed. In the case when there are several sources in a bundle, the fit proceeds in a source-by-source basis, starting from brightest source, then freezing its best fit parameters before continuing with the second brightest, and so forth, until the faintest source is fitted. The WCS coordinates corresponding to the MLE fit for (x,y), as well as their MCMC errors, are recorded in the catalog tables as ra and dec.
Calculate Likelihood
The degree of confidence with which a source is detected in the Chandra Source Catalog is given by the detection likelihood, \(\mathcal{L}\).
In order to define the likelihood, we start by modeling the observed distribution of counts in the source region under two different hypotheses: the null hypothesis (i.e. zero source flux, only background is considered), and an alternative hypothesis that accounts for both background, \(B\), and non-zero source emission, \(S\). The model value \(M_{i}\), which represents the counts for a given energy band in a given pixel is thus:
\[ M_{i} = B_{i} + \alpha S_{i} \]where \(\alpha\) is zero for the null hypothesis, and \(\alpha > 0\) for the alternative hypothesis (Watson et al. 2009, A&A 493 339).
Since we are recording individual events at each pixel, we fit these models by minimizing the C-statistic (the cstat statistic in Sherpa), which is dervied from the Poisson distribution for the observed counts, \(D_{i}\):
\[ P_{\mathrm{Poisson}} = \prod_{i} \frac{M_{i}^{D_{i}} e^{-M_{i}}}{D_{i}!} \]The C-statistic (Cash 1979, ApJ 228 939), \(C\), is obtained by computing the negative of the logarithm of this probability distribution, expanding the logarithm of the factorial term using Stirling's approximation, and then multiplying by two:
\[ C = 2 \sum_{i}\left[ M_{i} - D_{i} + D_{i} \left( \ln{D_{i}} - \ln{M_{i}} \right) \right] \]The factor of 2 is added to guarantee that the resulting distribution asymptotically approaches \(\chi^{2}\) under the null hypothesis. This results from Wilks' theorem.
This is akin to the \(\chi^{2}\) statistic minimization in the continuous case of many counts, when we assume Gaussian errors. That is, we are maximizing the likelihood that the observed counts were drawn from the assumed model.
We compute the statistic for the null hypothesis, \(C_{\mathrm{null}}\), and for the alternative hypothesis, \(C_{\mathrm{best}}\). The difference between these two statistics:
\[ \Delta C = C_{\mathrm{null}} - C_{\mathrm{best}} \]is equivalent to a likelihood ratio between the two models, which (again) distributes approximately as \(\chi^{2}\) with \(\nu\) degrees of freedom, where \(\nu\) is the difference between the number of pixels fitted and the number of free parameters in the model.
The probability \(P\) of obtaining the calculated value of \(\Delta C\) or greater by chance fluctuations of the background (that is, under the null hypothesis) is proportional to the complement of the cumulative \(\chi^{2}\) distribution up to \(\Delta C\), and can be computed in terms of an upper incomplete \(\Gamma\) function with \(\nu\) degrees of freedom:
\[ P = \Gamma \left( \frac{\nu}{2}, \frac{\Delta C}{2} \right) \]The incomplete \(\Gamma\) function appropriately describes the cumulative distribution for large values of \(\Delta C\).
In the Chandra Source Catalog, the detect likelihood is defined as the negative logarithm of this p-value:
\[ \mathcal{L} = -\ln{P} \]Note that the null hypothesis is rejected (or the alternative hypothesis of a non-zero flux source is accepted) for small values of \(P\), or large values of \(\mathcal{L}\). Large values of \(\mathcal{L}\) therefore imply that the distribution of counts observed within the detection region are extremely unlikely to have been generated by a random fluctuation of the background in absence of a source.
The likelihood of detection is reported for each source and each energy band, under the likelihood columns.
Classify Detection as True, Marginal, or False Based on Likelihood Thresholds from Simulations
Detections are determined to be True, Marginal, or False, depending on the value of the likelihood estimated as described above. The specific likelihood thresholds depend on several aspects, including the exposure map value in that location, and the corresponding sensitivity limit.
For True or Marginal Detections, Compute MCMC Confidence Intervals for Parameters
After the Sherpa model fit is performed, the confidence intervals for the parameters are calculated. The Sherpa implementation of the pyblocxs algorithm (get_draws function) (Van Dyk et al. 2001, Siemiginowska et al. 2011) is used to obtain an MCMC sample from the posterior probability density for all the fitted parameters. Here, the starting point for the MCMC run is at the best fit (Maximum Likelihood) for the model parameter values. The assumed priors for the position and sigma parameters are flat within the boundary of the fitted regions. The prior for the normalization is set to be uniform and positive.
The get_draws function requires a covariance matrix calculated at the best fit model. The covariance matrix is calculated using Sherpa, and also an estimate of the diagonal covariance matrix is calculated with int_unc function if the covariance does not return a valid entry (e.g. in the cases of low S/N data, or assumed model is not appropriate for the data).
get_draws returns a sample of 5000 draws from the posterior probability distribution for a point source, and 15,000 draws for the extended source. The convergence and correlations within the draws for each parameter are calculated using rhat (Gelman et al. 2011) diagnostic parameter. In calculations of rhat the full MCMC chain is split into 10 sub-sections and the variance within each section and between the sections is used as an input to rhat calculation. We verify that the rhat value for each parameter is close to 1.0 for True sources. In marginal sources this value can be larger, and this can be due to an incorrect model being used, potential multi-source regions fit with single source, additional emission components not taken into account in the model etc.
The 90% confidence bounds are calculated for each parameter using the MCMC posterior probability draws.
Generate Per-Detection Data Products
The MCMC draws files are available as data products via CSCview (Valid Per-Obsid MLE source fit draws). While the direct MLE fit output files are not available, the highest likelihood fit information is available in the mrgsrc3 file SRCLIST extension.
References
Cash 1979, ApJ 228, 939
Freeman, P., Doe, S., & Siemiginowska, A. 2001, SPIE Proceedings, 4477, 76
Gelman et al. 2013, Bayesian Data Analysis, Textbook in Statistics, Chapman & Hall/CRC
Siemiginowska, A., Kashyap, V., Refsdal, B., et al. 2011, Astronomical Data Analysis Software and Systems XX, 442, 439
Stewart, I.M. 2009, Astronomy and Astrophysics, 495, 989
van Dyk, D.A., Connors, A., Kashyap, V.L., & Siemiginowska, A. 2001, ApJ 548, 224, Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior Simulation
Watson, M.G., Schröder, A.C., Fyfe, D., et al. 2009, Astronomy and Astrophysics, 493, 339