The cstat statistic is equivalent to the XSPEC implementation
of the Cash statistic.
Counts are sampled from the Poisson distribution, and so the
best way to assess the quality of model fits is to use the
product of individual Poisson probabilities computed in each
bin i, or the
likelihood L:
L = (product)_i [ M(i)^N(i)/N(i)! ] * exp[-M(i)]
where M(i) = S(i) + B(i)
is the sum of source and background model amplitudes,
and D(i) is the number of observed
counts, in bin i.
The cstat statistic (Cash 1979, ApJ 228, 939) is derived by
(1) taking the logarithm of the likelihood function, (2)
changing its sign, (3) dropping the factorial term (which
remains constant during fits to the same dataset), (4) adding
an extra data-dependent term, and (4) multiplying by two:
C = 2 * (sum)_i [ M(i) - D(i) + D(i)*[log D(i) - log M(i)] ]
The factor of two exists so that the change in cstat statistic
from one model fit to the
next, (Delta)C, is distributed
approximately
as (Delta)chi-square when
the number of counts in each bin is high (> 5). One can
then in principle use (Delta)C
instead
of (Delta)chi-square in
certain model comparison tests. However,
unlike chi-square, the cstat
statistic may be used regardless of the number of counts in
each bin.
The advantage of cstat over Sherpa's implementation
of cash is that one can assign an approximate goodness-of-fit
measure to a given value of the cstat statistic, i.e. the
observed statistic, divided by the number of degrees of
freedom, should be of order 1 for good fits.
The background should not be subtracted from the data when
this statistic is used. It should be modeled simultaneously
with the source.