The CSTAT statistic is equivalent to the
XSPEC implementation of the Cash statistic.
Counts are sampled from the Poisson distribution, and so the best way
to assess the quality of model fits is to use the product of
individual Poisson probabilities computed in each bin i, or the likelihood L:
L = (product)_i [ M(i)^N(i)/N(i)! ] * exp[-M(i)]
where M(i) = S(i) + B(i) is the sum
of source and background model amplitudes, and D(i) is the number of observed counts, in bin i.
The CSTAT statistic (Cash 1979, ApJ 228, 939) is
derived by (1) taking the logarithm of the likelihood function, (2)
changing its sign, (3) dropping the factorial term (which remains
constant during fits to the same dataset), (4) adding an extra
data-dependent term, and (4) multiplying by two:
C = 2 * (sum)_i [ M(i) - D(i) + D(i)*[log D(i) - log M(i)] ]
The factor of two exists so that the change in
CSTAT statistic from one model fit to the next,
(Delta)C, is distributed approximately
as (Delta)chi-square when the
number of counts in each bin is high (> 5). One can then in
principle use (Delta)C instead of
(Delta)chi-square in certain model
comparison tests. However, unlike chi-square, the CSTAT
statistic may be used regardless of the number of counts in each bin.
The advantage of CSTAT over Sherpa's
implementation of CASH is that one can assign an
approximate `goodness-of-fit' measure to a given value of the
CSTAT statistic, i.e., the observed statistic,
divided by the number of degrees of freedom, should be of order 1 for
good fits.
The background should not be subtracted from the data when this
statistic is used. It should be modeled simultaneously with
the source, as in this example:
sherpa> DATA source.data
sherpa> BACK background.data
sherpa> SOURCE = [source model]
sherpa> BG = [background model]
sherpa> STATISTIC CSTAT
sherpa> FIT