Last modified: 2 May 2023

URL: https://cxc.cfa.harvard.edu/csc/proc/mle.html

Maximum Likelihood Estimate (MLE) Algorithm


The MLE algorithm uses a Bayesian approach to estimate the position and likelihood of each detection, by maximizing the likelihood that the observed distribution of counts is generated by a particular elliptical source model. This maximization of the posterior probability distribution is performed using Sherpa, and the uncertainties are determined by sampling this posterior using Markov Chain Monte Carlo (MCMC).

Run for each source bundle in an observation stack

In CSC2 processing, we define a bundle as a group of sources whose detect source regions are contiguously overlapping. It is often convenient to process sources in a bundle at the same time. The MLE pipeline is run for each compact source bundle in an observation stack to refine the source position and calculate the source probability likelihood. The algorithm is based on the XMM emldetect (Watson et al. 2009). It uses Sherpa (Freeman et al. 2001) to fit 2-D Gaussian models to image counts of the source candidates and calculate the position and likelihood of the source using both a point-like source model and an extended source model (the latter allows for a larger ellipse than the PSF at the location of the detection).

MLE Input

Source Images

Source counts images are created with the binning chosen to produce an image with 6-12 image pixels across the Full Width Half Max (FWHM) of the estimated Point Spread Function (PSF) for the off-axis angle at which the source was found. Image pixels are positioned such that the detected source position is at the center of a pixel. For each source, the image is sized to include the bounding box of the respective background region, and then bumped up, if necessary, to ensure even axis lengths. These per-source postage stamp source images are created for each observation in the stack, and each energy band in the observation.


Background Images

Background images are created using mkvtbkg in the background regions defined in the source neighbourhood using adaptively smoothed backgrounds, with binning and image size chosen to match the source image. These per-source background images are also created for each observation in the stack and each energy band in the observation.


PSF Images

Ray trace PSF images are created in each energy band for each candidate source in the bundle, for each observation, also at a binning and image size chosen to match the source image. Since PSF images are generated at the source detection location, PSFs are centered on an image pixel. This has the effect of preventing any artificial widening of the psf due to image pixel quantization. The Sherpa requirement that PSF images have even axes is met through the initial enforcement of even sized image axes in the source image.


Exposure Maps

Exposure maps are generated for each source candidate in each energy band for each observation, also at a binning and image size chosen to match the source image.


Region Files (containing source and background regions)

The region files defining the source and background regions. The source regions are the result of the source detection process using either wavdetect or mkvtbkg. They are single ellipses with no overlapping source or background region excluded. Background regions are an elliptical annulus with outer radii 5 times greater than the source radii, and inner radii that are 10% larger than the source radii. Neighboring sources are excluded from the background region by excluding ellipses based on those sources, but with 10% larger radii. Invalid areas in the pixmask are also excluded from both source and background regions when they overlap those regions.


Pixmask Files

Pixmask files are generated for each observation in the stack and identify valid source detection areas. Invalid detection areas include off-chip areas, inter-chip boundary areas, and regions identifying bright source streaks.


Define MLE Models

For a given source bundle, the 2-D models are defined and fit to the data both individually to each observation and simultaneously to all observations in the stack to derive the best fit source positions and corresponding model likelihood. The position, position errors, and likelihoods reported in CSC2 are chosen to be those corresponding to the fit that yields the highest likelihood, regardless of whether this fit is for a single observation or for the full stack. Note that only the sources with a number of counts higher than the threshold of 3 counts are included in the fit of a single observation (obsid).

[NOTE]
Note

Although the extent of a source can also be estimated using MLE, for CSC2 we have decided to keep the CSC1.1 method (Mexican hat optimization used by wavdetect) to estimate source extent.

Background Model

The background is estimated for each detection by fitting the counts in the generated background image for that source with a model defined as the product of the background image and the exposure map. The parameter to adjust is the normalization factor that best reproduces the observed data.


Source Model

The assumed source model for a point source is a 2-D Gaussian model defined in Sherpa as sigmagauss2d, which has the following model parameters: location of the center (x,y), amplitude (ampl), the variance for the semi-major and semi-minor axes (sigma_a and sigma_b), and the position angle of the major axis (theta).

This model is convolved with the Chandra PSF and fit to the counts image in the relevant source region. The exposure map images and background images are accounted for in the full model expression:

\[ model = \left(background \times exposure\ map \right) + \left[ \left(\mathrm{PSF} \ast \mathtt{sigmagauss2d} \right) \times exposure\ map \right] \]

Note that the background level is not fit but frozen at the normalization determined by the initial background-only fit in the relevant background regions. This setting assumes that the background is uniform across the background and source regions.


MLE Fit

The source candidate positions (parameters x, y) are fit within the minimal bounding box enclosing the source region. Fitting is performed in physical coordinates using the C-statistic and the Monte Carlo (moncar) algorithm in Sherpa. For each source, the fitting is performed with both the "point source" model and the "extended source" model.

When fitting with the point source model, only the Gaussian amplitude and two position parameters are thawed (the two sigma parameters are frozen at one image pixel), whereas for the extended source model all Gaussian parameters are thawed. In the case when there are several sources in a bundle, the fit proceeds in a source-by-source basis, starting from brightest source, then freezing its best fit parameters before continuing with the second brightest, and so forth, until the faintest source is fitted. The WCS coordinates corresponding to the MLE fit for (x,y), as well as their MCMC errors, are recorded in the catalog tables as ra and dec.


Calculate Likelihood

The degree of confidence with which a source is detected in the Chandra Source Catalog is given by the detection likelihood, \(\mathcal{L}\).

In order to define the likelihood, we start by modeling the observed distribution of counts in the source region under two different hypotheses: the null hypothesis (i.e. zero source flux, only background is considered), and an alternative hypothesis that accounts for both background, \(B\), and non-zero source emission, \(S\). The model value \(M_{i}\), which represents the counts for a given energy band in a given pixel is thus:

\[ M_{i} = B_{i} + \alpha S_{i} \]

where \(\alpha\) is zero for the null hypothesis, and \(\alpha > 0\) for the alternative hypothesis (Watson et al. 2009, A&A 493 339).

Since we are recording individual events at each pixel, we fit these models by minimizing the C-statistic (the cstat statistic in Sherpa), which is dervied from the Poisson distribution for the observed counts, \(D_{i}\):

\[ P_{\mathrm{Poisson}} = \prod_{i} \frac{M_{i}^{D_{i}} e^{-M_{i}}}{D_{i}!} \]

The C-statistic (Cash 1979, ApJ 228 939), \(C\), is obtained by computing the negative of the logarithm of this probability distribution, expanding the logarithm of the factorial term using Stirling's approximation, and then multiplying by two:

\[ C = 2 \sum_{i}\left[ M_{i} - D_{i} + D_{i} \left( \ln{D_{i}} - \ln{M_{i}} \right) \right] \]
[NOTE]
Note

The factor of 2 is added to guarantee that the resulting distribution asymptotically approaches \(\chi^{2}\) under the null hypothesis. This results from Wilks' theorem.

This is akin to the \(\chi^{2}\) statistic minimization in the continuous case of many counts, when we assume Gaussian errors. That is, we are maximizing the likelihood that the observed counts were drawn from the assumed model.

We compute the statistic for the null hypothesis, \(C_{\mathrm{null}}\), and for the alternative hypothesis, \(C_{\mathrm{best}}\). The difference between these two statistics:

\[ \Delta C = C_{\mathrm{null}} - C_{\mathrm{best}} \]

is equivalent to a likelihood ratio between the two models, which (again) distributes approximately as \(\chi^{2}\) with \(\nu\) degrees of freedom, where \(\nu\) is the difference between the number of pixels fitted and the number of free parameters in the model.

The probability \(P\) of obtaining the calculated value of \(\Delta C\) or greater by chance fluctuations of the background (that is, under the null hypothesis) is proportional to the complement of the cumulative \(\chi^{2}\) distribution up to \(\Delta C\), and can be computed in terms of an upper incomplete \(\Gamma\) function with \(\nu\) degrees of freedom:

\[ P = \Gamma \left( \frac{\nu}{2}, \frac{\Delta C}{2} \right) \]
[NOTE]
Note

The incomplete \(\Gamma\) function appropriately describes the cumulative distribution for large values of \(\Delta C\).

In the Chandra Source Catalog, the detect likelihood is defined as the negative logarithm of this p-value:

\[ \mathcal{L} = -\ln{P} \]

Note that the null hypothesis is rejected (or the alternative hypothesis of a non-zero flux source is accepted) for small values of \(P\), or large values of \(\mathcal{L}\). Large values of \(\mathcal{L}\) therefore imply that the distribution of counts observed within the detection region are extremely unlikely to have been generated by a random fluctuation of the background in absence of a source.

The likelihood of detection is reported for each source and each energy band, under the likelihood columns.


Classify Detection as True, Marginal, or False Based on Likelihood Thresholds from Simulations

Detections are determined to be True, Marginal, or False, depending on the value of the likelihood estimated as described above. The specific likelihood thresholds depend on several aspects, including the exposure map value in that location, and the corresponding sensitivity limit.


For True or Marginal Detections, Compute MCMC Confidence Intervals for Parameters

After the Sherpa model fit is performed, the confidence intervals for the parameters are calculated. The Sherpa implementation of the pyblocxs algorithm (get_draws function) (Van Dyk et al. 2001, Siemiginowska et al. 2011) is used to obtain an MCMC sample from the posterior probability density for all the fitted parameters. Here, the starting point for the MCMC run is at the best fit (Maximum Likelihood) for the model parameter values. The assumed priors for the position and sigma parameters are flat within the boundary of the fitted regions. The prior for the normalization is set to be uniform and positive.

The get_draws function requires a covariance matrix calculated at the best fit model. The covariance matrix is calculated using Sherpa, and also an estimate of the diagonal covariance matrix is calculated with int_unc function if the covariance does not return a valid entry (e.g. in the cases of low S/N data, or assumed model is not appropriate for the data).

get_draws returns a sample of 5000 draws from the posterior probability distribution for a point source, and 15,000 draws for the extended source. The convergence and correlations within the draws for each parameter are calculated using rhat (Gelman et al. 2011) diagnostic parameter. In calculations of rhat the full MCMC chain is split into 10 sub-sections and the variance within each section and between the sections is used as an input to rhat calculation. We verify that the rhat value for each parameter is close to 1.0 for True sources. In marginal sources this value can be larger, and this can be due to an incorrect model being used, potential multi-source regions fit with single source, additional emission components not taken into account in the model etc.

The 90% confidence bounds are calculated for each parameter using the MCMC posterior probability draws.


Generate Per-Detection Data Products

The MCMC draws files are available as data products via CSCview (Valid Per-Obsid MLE source fit draws). While the direct MLE fit output files are not available, the highest likelihood fit information is available in the mrgsrc3 file SRCLIST extension.


References

Cash 1979, ApJ 228, 939

Freeman, P., Doe, S., & Siemiginowska, A. 2001, SPIE Proceedings, 4477, 76

Gelman et al. 2013, Bayesian Data Analysis, Textbook in Statistics, Chapman & Hall/CRC

Siemiginowska, A., Kashyap, V., Refsdal, B., et al. 2011, Astronomical Data Analysis Software and Systems XX, 442, 439

Stewart, I.M. 2009, Astronomy and Astrophysics, 495, 989

van Dyk, D.A., Connors, A., Kashyap, V.L., & Siemiginowska, A. 2001, ApJ 548, 224, Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior Simulation

Watson, M.G., Schröder, A.C., Fyfe, D., et al. 2009, Astronomy and Astrophysics, 493, 339