![]() ![]() |
Description |
---|---|
BAYES | A Bayesian maximum likelihood function. |
CASH | A maximum likelihood function. |
{CHI CVAR | CHI PARENT} | ![]() |
CHI DVAR | ![]() |
CHI GEHRELS | ![]() |
CHI MVAR | ![]() |
{CHI PARENT | CHI CVAR} | ![]() |
CHI PRIMINI | ![]() |
CSTAT | A maximum likelihood function: XSPEC implementation of CASH. |
USERSTAT | User implemented statistic. |
In this chapter we describe each of the Sherpa fit statistics, in
alphabetical order.
Similar information about each statistic is available within Sherpa with
the command AHELP -W statisticname
,
where
statisticname
is the name of a statistic.
These descriptions are meant to serve as a brief reference.
A Bayesian maximum likelihood function.
In conventional fits involving background data, it is the best-fit
background model amplitude in bin ,
, that is used when one attempts to ascertain
the source model amplitude
. While this
is the most probable value of
, it is not
the only possible value, and
may change
if all possible values of
, weighted by
their likelihood, are taken into account. This can be done using
Bayes' Theorem. Below, we assume familiarity with the basics of
Bayesian statistical methodology; a reader who is not familiar with
these basics should consult, e.g., the review text by Loredo (1992,
Statistical Challenges in Modern Astronomy, ed. Feigelson & Babu,
275).
For simplicity, assume that there are two fitting `regions' (in time,
space, or both), denoted `off-source' and `on-source,' with and
counts respectively. Counts are sampled from the Poisson
distribution, and the best way to assess the quality of model fits is
the use the Poisson likelihood function. Within the on-source region,
the Poisson likelihood of the sum
is
![]() |
(6.1) |
We can relate this likelihood to the Bayesian posterior density for
and
using
Bayes' Theorem:
![]() |
(6.2) |
The factor
is the
Bayesian prior probability for the source model amplitude, which is
assumed to be constant, and
is an
ignorable normalization constant. The prior probability
is treated differently; we can specify it
using the posterior probability for
off-source:
![]() |
(6.3) |
where is an "area" factor that rescales the
number of predicted background counts
to
the off-source region.
IMPORTANT: this formula is derived assuming that the background is constant as a function of spatial area, time, etc. If the background is not constant, the Bayes function should not be used.
To take into account all possible values of , we integrate, or marginalize, the posterior
density
over all allowed values of
:
![]() |
(6.4) |
For the constant background case, this integral may be done
analytically. We do not show the final result here; see Loredo. The
function
is minimized to find the best-fit value of
. The magnitude of this function depends upon
the number of bins included in the fit and the values of the data
themselves. Hence one cannot analytically assign a `goodness-of-fit'
measure to a given value of this function. Such a measure can, in
principle, be computed by performing Monte Carlo simulations. One
would repeatedly sample new datasets from the best-fit model, and fit
them, and note where the observed function minimum lies within the
derived distribution of minima. (The ability to perform Monte Carlo
simulations is a feature that will be included in a future version of
Sherpa.)
sherpa> DATA 1 source.data sherpa> BACK 1 background.data sherpa> SOURCE 1 = [sourcemodel] sherpa> STATISTIC BAYES sherpa> FIT ...
To compare the best-fit source model with the data in a plotting environment, one would have to subtract the background from the raw counts data, as in this example:
... sherpa> FIT ... sherpa> SUBTRACT sherpa> PLOT FIT
Otherwise, the plotted best-fit model and data will differ by approximately the background amplitude.
If the number of counts in each bin is high ( 5), the likelihood is
approximately proportional to
Hence
UNDERFLOW errors can occur if the initial parameter
estimate is too far from the location of a local likelihood maximum,
or if parameter step sizes are set too large. (This is because during
intermediate steps of the computation of
, absolute
likelihoods must be computed, unlike for the case of the
CASH statistic, where the log-likelihood terms
never need to be exponentiated.) To avoid these errors, it may be
necessary to use a
statistic to
find the approximate solution before using the Bayesian maximum
likelihood function.
Examples:
Specify the fitting statistic and then confirm it has been set. The method is then changed from "Levenberg-Marquardt" (the default), since this statistic does not work with that algorithm.
sherpa> STATISTIC BAYES sherpa> SHOW STATISTIC Statistic: Bayes sherpa> METHOD POWELL
A maximum likelihood function.
Counts are sampled from the Poisson distribution, and so the best
way to assess the quality of model fits is to use the product of
individual Poisson probabilities computed in each bin , or the likelihood
:
![]() |
(6.5) |
where
is the sum
of source and background model amplitudes, and
is the number of observed counts, in bin
.
The CASH statistic (Cash 1979, ApJ 228, 939) is derived by (1) taking the logarithm of the likelihood function, (2) changing its sign, (3) dropping the factorial term (which remains constant during fits to the same dataset), and (4) multiplying by two:
![]() |
(6.6) |
The factor of two exists so that the change in CASH
statistic from one model fit to the next, , is distributed approximately as
when the number
of counts in each bin is high (
5). One can then in principle use
instead of
in certain model
comparison tests. However, unlike
, the CASH statistic
may be used regardless of the number of counts in each bin.
The magnitude of the CASH statistic depends upon the number of bins included in the fit and the values of the data themselves. Hence one cannot analytically assign a `goodness-of-fit' measure to a given value of the CASH statistic. Such a measure can, in principle, be computed by performing Monte Carlo simulations. One would repeatedly sample new datasets from the best-fit model, and fit them, and note where the observed CASH statistic lies within the derived distribution of CASH statistics. (The ability to perform Monte Carlo simulations is a feature that will be included in a future version of Sherpa.)
sherpa> DATA source.data sherpa> BACK background.data sherpa> SOURCE = [source model] sherpa> BG = [background model] sherpa> STATISTIC CASH sherpa> FIT
Examples:
Specify the fitting statistic and then confirm it has been set.
sherpa> STATISTIC CASH sherpa> SHOW STATISTIC Statistic: Cash
Chi-square statistic.
The chi-square statistic is
![]() |
(6.7) |
The options for assigning
are described in the documentation for
CHI DVAR (CHIDVAR),
CHI GEHRELS (CHIGEHRELS),
CHI MVAR (CHIMVAR), and
CHI PRIMINI (CHIPRIMINI).
In each of these files,
is the total number of observed counts in bin
of the off-source
region;
is the off-source `area,' which could be the size of the region
from which the background is extracted, or the length of a background time
segment, or a product of the two, etc.; and
is the on-source `area.'
In the analysis of PHA data,
is the product of the
BACKSCAL and EXPTIME FITS header keyword values,
provided in the file containing the background data.
is computed
similarly, from keyword values in the source data file.
Note that in the current version of Sherpa, it is assumed that there is a one-to-one mapping between a given background region bin and a given source region bin. For instance, in the analysis of PHA data, it is assumed that the input background counts spectrum is binned in exactly the same way as the input source counts spectrum, and any filter applied to the source spectrum automatically applied to the background spectrum. This means that currently, the user cannot, e.g., specify arbitrary background and source regions in two dimensions and get correct results. This will be changed in a future version of Sherpa.
(However, this limitation only applies when analyzing background data that have been entered with the BACK command. One can always enter the background as a separate dataset and jointly fit the source and background regions.)
Chi-square statistic with constant variance computed from the counts data.
In some applications, analysts have seen fit to assume that the variance is constant for each bin. For this choice of statistic, the variance is assumed to be the mean number of counts, or
![]() |
(6.8) |
where is the number of on-source (and
off-source) bins included in the fit. The background term appears
only if a background region is specified and background
subtraction is done.
See CHISQUARE for more information, including definitions of the quantities shown above.
Examples:
Specify the fitting statistic and then confirm it has been set.
sherpa> STATISTIC CHI CVAR sherpa> SHOW STATISTIC Statistic: Chi-Squared Constant Variance
Chi-square statistic with variance computed from the data.
If the number of counts in each bin is large ( 5), then the shape
of the Poisson distribution from which the counts are sampled tends
asymptotically towards that of a Gaussian distribution, with variance
![]() |
(6.9) |
The background term appears only if a background region is specified and background subtraction is done.
See CHISQUARE for more information, including definitions of the quantities shown above.
Examples:
Specify the fitting statistic and then confirm it has been set.
sherpa> STATISTIC CHI DVAR sherpa> SHOW STATISTIC Statistic: Chi-Squared Data Variance
Chi-square statistic with the Gehrels variance function.
This is the Sherpa default statistic.
If the number of counts in each bin is small ( 5), then we cannot
assume that the Poisson distribution from which the counts are sampled
has a nearly Gaussian shape. The standard deviation (i.e., the
square-root of the variance) for this low-count case has been derived
by Gehrels (1986):
![]() |
(6.10) |
Higher-order terms have been dropped from the expression; it is
accurate to approximately one percent. If one does not perform
background subtraction, then
; otherwise, one may use standard error
propagation to estimate that
![]() |
(6.11) |
The background term appears only if a background region is specified and background subtraction is done. See CHISQUARE for more information, including definitions of the quantities shown above.
Examples:
Specify the fitting statistic and then confirm it has been set.
sherpa> STATISTIC CHI GEHRELS sherpa> SHOW STATISTIC Statistic: Chi-Squared Gehrels
Chi-square statistic with variance computed from model amplitudes.
This statistic is equivalent to CHI DVAR, except that the variance is estimated using the background and source model amplitudes rather than the observed counts data:
![]() |
(6.12) |
where
is the background
model amplitude in bin
of the off-source
region. See CHISQUARE for more information,
including definitions of the quantities shown above.
Examples:
Specify the fitting statistic and then confirm it has been set.
sherpa> STATISTIC CHI MVAR sherpa> SHOW STATISTIC Statistic: Chi-Squared Model Variance
The statistic CHI CVAR is equivalent. See Section 6.4.1 for futher information.
Chi-square statistic with Primini variance function.
The statistic is a biased
estimator of model parameters (unlike the likelihood functions). In
an attempt to remove this bias, Kearns, Primini, & Alexander
(1995, ADASS IV, 331) use a scheme dubbed `Iterative Weighting' (IW;
see Wheaton et al. 1995, ApJ 438, 322), in which
![]() |
(6.13) |
where is the number of iterations that have
been carried out in the fitting process,
is the background model amplitude in bin
of the off-source region, and
and
are the set of source and background
model parameter values derived during the iteration previous to the
current one.
In addition to reducing parameter estimate bias, it can be used even
when the number of counts in each bin is small ( 5), although the
user should proceed with caution.
See CHISQUARE for more information, including definitions of the quantities shown above.
Examples:
Specify the fitting statistic and then confirm it has been set.
sherpa> STATISTIC CHI PRIMINI WARNING: with CHI PRIMINI, displayed error bars will only be correct after a fit is performed with the current filter. sherpa> SHOW STATISTIC Statistic: Chi-Squared Primini
A maximum likelihood function.
The CSTAT statistic is equivalent to the XSPEC implementation of the Cash statistic.
Counts are sampled from the Poisson distribution, and so the best way
to assess the quality of model fits is to use the product of
individual Poisson probabilities computed in each bin , or the likelihood
:
![]() |
(6.14) |
where
is the sum
of source and background model amplitudes, and
is the number of observed counts, in bin
.
The CSTAT statistic (Cash 1979, ApJ 228, 939) is derived by (1) taking the logarithm of the likelihood function, (2) changing its sign, (3) dropping the factorial term (which remains constant during fits to the same dataset), (4) adding an extra data-dependent term, and (4) multiplying by two:
![]() |
(6.15) |
The factor of two exists so that the change in
CSTAT statistic from one model fit to the next,
, is distributed approximately
as
when the
number of counts in each bin is high (
5). One can then in
principle use
instead of
in certain model
comparison tests. However, unlike
, the CSTAT
statistic may be used regardless of the number of counts in each bin.
The advantage of CSTAT over Sherpa's implementation of CASH is that one can assign an approximate `goodness-of-fit' measure to a given value of the CSTAT statistic, i.e., the observed statistic, divided by the number of degrees of freedom, should be of order 1 for good fits.
sherpa> DATA source.data sherpa> BACK background.data sherpa> SOURCE = [source model] sherpa> BG = [background model] sherpa> STATISTIC CSTAT sherpa> FIT
Examples:
Specify the fitting statistic and then confirm it has been set.
sherpa> STATISTIC CSTAT sherpa> SHOW STATISTIC Statistic: Cstat
User implemented statistic.
It is possible for the user to create and implement his or her own model, own optimization method, and own statistic function within Sherpa. The User Models, Statistics, and Methods Within Sherpa chapter of the Sherpa Reference Manual has more information on this topic.
The tar file sherpa_user.tar.gz contains the files needed to define the userstat, e.g Makefiles and Implementation files, plus example files, and it is available from the Sherpa threads page: Data for Sherpa Threads
cxchelp@head.cfa.harvard.edu