Calculating Uncertainties by Simulating Flux Distributions
![[CXC Logo]](/ciao/imgs/cxc-logo.gif)
Sherpa Threads (CIAO 4.1)
[Python Syntax]
OverviewLast Update: 04 Aug 2009 - Updated for S-Lang users to take advantage of the new "ciao_utils" scripts module. Synopsis: This thread demonstrates a method of determining flux uncertainties by sampling the parameter values using an uncorrelated, normal distribution. The necessary module is provided as part of the CIAO contributed scripts. Read this thread if:
Use this thread if you are interested in estimating the
uncertainties in your modelled fluxes based on unthawed
parameters.
|
Contents
- Introduction to the Flux Error Script
- Getting Started
- Fitting a Model to the Spectrum
- Estimating Uncertainties by Simulating the Flux Distribution
- Creating a Histogram of a Flux Probability Distribution
- Scripting It
- History
- Images
Introduction to the Flux Error Script
The sherpa_contrib.flux_dist module provides routines for creating and plotting flux probability distributions of Sherpa models—that is, the probability distribution of flux values that account for uncertainties in the model parameter values.
All thawed model parameters are sampled from the Gaussian distribution where the mean is set as the best-fit parameter value and the variance determined by the diagonal elements of the covariance matrix. The univariate Gaussian is assumed by default; treating each parameter independently. If there are correlations between parameters, then the multivariate Gaussian is available, using the off-diagonal elements of the covariance matrix. The flux is calculated for each sampled set of the parameters and recorded. The histogram of the simulated flux values give the flux probability distribution for the assumed model.
The flux_dist module includes the functions:
- sample_energy_flux — sample from the energy flux distribution
- sample_photon_flux — sample from the photon flux distribution
- get_energy_flux_hist — creates histograms of the simulated energy flux values
- get_photon_flux_hist — creates histograms of the simulated photon flux values
- plot_energy_flux — plot the energy flux probability distribution histogram
- plot_photon_flux — plot the photon flux probability distribution histogram
- write_arrays — write out arrays to an ASCII file
A VERSION.CIAO_scripts file is included in the scripts pacakge. This allows you to check if you are working with the newest set of scripts:
unix% cat $ASCDS_CONTRIB/VERSION.CIAO_scripts 04 Aug 2009
The VERSION.CIAO_scripts file is updated when a newer scripts package is installed. Please check that you have at least this version of the scripts package installed before continuing. If you do not have the scripts installed or need to update to a newer version, refer to the Scripts page.
Getting Started
Please follow the Obtaining Data Used in Sherpa thread. In this example, we use the spectral data products for 3C273, obtained within the pha_intro sub-directory.
The flux_dist module is part of the Sherpa contributed scripts module, distributed with the CIAO Contributed Scripts and Modules.
In order to simulate and plot the flux distribution, we must first load the module into Sherpa,which is only necessary to do once per session.
sherpa> from sherpa_contrib.flux_dist import *
Above, we have imported just the flux_dist script; however, if the user would like to do less book keeping, it is possible to load the entire module instead with:
sherpa> from sherpa_contrib import *
Which makes available every Sherpa contributed script to the user.
Fitting a Model to the Spectrum
We start with loading the spectrum, background, ARF and RMF with load_pha; filtering the data to model the 0.5-7.0 keV energy range; and set Sherpa to account for background subtraction in the standard manner.
Choosing χ2 statistics, with variance calculated from the data, we fit an absorbed power-law model to the data using a constant neutral hydrogen column density of 0.07×1022 cm-2.
sherpa> set_stat("chi2datavar") sherpa> set_model(xsphabs.abs1*xspowerlaw.p1) sherpa> abs1.nh = 0.07 sherpa> freeze(abs1.nh) sherpa> fit() Solar Abundance Vector set to angr: Anders E. & Grevesse N. Geochimica et Cosmochimica Acta 53, 197 (1989) Cross Section Table set to bcmc: Balucinska-Church and McCammon, 1998 Dataset = 1 Method = levmar Statistic = chi2datavar Initial fit statistic = 1.2732e+11 Final fit statistic = 36.284 at function evaluation 22 Data points = 42 Degrees of freedom = 40 Probability [Q-value] = 0.638277 Reduced statistic = 0.9071 Change in statistic = 1.2732e+11 p1.phoindex 2.10699 p1.norm 0.000216305
Note that it is possible to do freeze(abs1) instead of freeze(abs1.nh) since the chosen absorption model only contains a single parameter. Generally, freezing and thawing a model component will affect all the parameters associated with the component.
Estimating Uncertainties by Simulating the Flux Distribution
We now simulate the energy flux, between 0.5-7.0 keV, a hundred times using sample_energy_flux, which returns one or more samples of the model flux, accounting for errors in the model parameters. The simulations choose parameters assuming their Gaussian distributions, with the mean set to the best-fit values and the variance given by the covariance matrix for all thawed parameters. The flux is then calculated for these parameters. The results will contain one-hundred realizations of the model and one-hundred values of the energy flux.
sherpa> flux100 = sample_energy_flux(0.5,7.,num=100)
Similarly, we may do this with the photon flux distribution using sample_photon_flux. Examining the structure of the array produced by sample_energy_flux:
sherpa> print flux100
[[ 7.56630763e-13 2.09441912e+00 2.14432623e-04]
[ 7.46841329e-13 2.11006289e+00 2.13906321e-04]
[ 7.13526662e-13 2.14930092e+00 2.09731107e-04]
⋮ ⋮ ⋮
[ 7.65066253e-13 2.16657749e+00 2.27402006e-04]
[ 7.96403279e-13 2.09364438e+00 2.25585502e-04]
[ 7.17811190e-13 2.15738677e+00 2.12098697e-04]]
Determining Statistical Values from the Sample of Flux Values
The attributes of the array are the values of flux, photon spectral index, and normalization factor, respectively. We can operate on these arrays to determine further statistical values using the built-in numpy functions, numpy.mean, numpy.median, and numpy.std from the distribution determined by the flux simulation.
Python arrays are called on as [row,column], so we can examine only the flux elements of the flux100 array by doing:
sherpa> print flux100[:,0] [ 7.56630763e-13 7.46841329e-13 7.13526662e-13 7.97198549e-13 6.37093485e-13 8.33877712e-13 7.36173201e-13 8.13152936e-13 8.29238626e-13 7.30404943e-13 7.03043283e-13 7.88228662e-13 … 7.28575486e-13 7.48274102e-13 7.44486042e-13 7.65517376e-13 6.76801731e-13 8.84112916e-13 7.99789170e-13 7.86414841e-13 6.70051477e-13 7.65066253e-13 7.96403279e-13 7.17811190e-13]
and using the numpy functions to determine the mean, median, and standard deviation of the sample of fluxes, in units of ergs/cm2/sec, we find:
sherpa> fluxes = flux100[:,0]
sherpa> numpy.mean(fluxes)
7.5812781527775597e-13
sherpa> numpy.median(fluxes)
7.5641673475598872e-13
sherpa> numpy.std(fluxes)
5.3887732759596219e-14
Determining Quantile Values of the Flux Sample
By estimating the number of points in the array, which is a known from the number of simulations performed, using the information from the len and numpy sort functions, the quantiles for this sample may be obtained.
First, we sort the fluxes into an 1D array in ascending order:
sherpa> sf = numpy.sort(fluxes) sherpa> print sf [ 6.37093485e-13 6.58812570e-13 6.64288605e-13 6.70051477e-13 6.72560081e-13 6.76801731e-13 6.77313907e-13 6.78709480e-13 6.81933064e-13 6.84235454e-13 6.89644130e-13 6.90935228e-13 … 8.31866393e-13 8.32180044e-13 8.33877712e-13 8.34868379e-13 8.42388706e-13 8.44549464e-13 8.53126347e-13 8.53375204e-13 8.62336808e-13 8.64774894e-13 8.67231430e-13 8.84112916e-13]
and then get the value of the flux for the 95% quantile and 50% quantile, which corresponds to the median. Since we ran a hundred simulations, we know that we're looking for the 95th and 50th elements of the array. But in more generality, the quantiles can be determined using the length function, len, to determine the number of elements in the column.
sherpa> sf[0.95*len(fluxes)-1]
8.5312634724617098e-13
sherpa> sf[0.5*len(fluxes)-1]
7.5620270666523012e-13
Remembering that the array indicies begin at zero, we subtract one from the quantile value.
Creating a Histogram of a Flux Probability Distribution
We can visualize the flux distribution with a histogram created by the get_energy_flux_hist function and check whether the distribution is Gaussian or another shape. We simulate the flux distribution between 0.5-7.0 keV ten-thousand times, to produce good statistics. We use the function, get_energy_flux_hist which produces a data object that contains all the information about the simulated sample of parameters and a histogram, normalized to unity, representing the flux probability distribution. By default, get_energy_flux_hist creates a histogram with 75 bins; however, the optional parameter, bins, may be included to change the binning.
sherpa> hist10000 = get_energy_flux_hist(0.5,7.,num=10000)
Similarly, this may be done in photon units with get_photon_flux_hist. To learn more about the data product produced by get_photon_flux_hist, enter: help get_photon_flux_hist into the Sherpa command-line.
We may display the flux probability distribution histogram determined by and get_energy_flux_hist using plot_energy_flux, which plots the calculated energy flux probability distribution, which is the flux distribution for the model component accounting for the errors on the model parameters (Figure 1).
sherpa> plot_energy_flux(0.5,7.,recalc=False)
[Version: full-size]
![[Print media version: histogram of flux probability distribution produced by plot_energy_flux]](1.png)
Figure 1: Histogram of the Flux Distribution
This is a histogram of the energy flux probability distribution, plotted using plot_energy_flux, which is determined by get_energy_flux_hist.
Determining the Uncertainties from the Flux Distributions
Taking the histogram data array, we replot the histogram as a scatter plot to check the probability distribution with a Gaussian function, fitting the Gaussian to the histogram with least-squares statistics.
sherpa> load_arrays(2,hist10000.xlo,hist10000.xhi,hist10000.y,Data1DInt) sherpa> plot_data(2) sherpa> set_model(2,gauss1d.g0) sherpa> g0.integrate = False sherpa> guess(2,g0) sherpa> g0.ampl.min = 0 sherpa> g0.ampl.max = 10 sherpa> g0.ampl = 1 sherpa> set_stat("leastsq") sherpa> fit(2) Dataset = 2 Method = levmar Statistic = leastsq Initial fit statistic = 1.67883 Final fit statistic = 0.0698581 at function evaluation 21 Data points = 76 Degrees of freedom = 73 Change in statistic = 1.60897 g0.fwhm 1.23978e-13 g0.pos 7.54426e-13 g0.ampl 0.985099
We use load_arrays to load the histogram flux probability distribution into dataset ID=2 where hist10000.xlo and hist10000.xhi are the low and high bins in the histogram and hist10000.y is the probability of getting the flux within the bin.
Plotting this fit, we can see the good agreement between the Gaussian distribution and flux probability distribution (Figure 2).
sherpa> plot_fit(2)
sherpa> set_plot_ylabel("Frequency")
sherpa> set_plot_xlabel("Energy Flux [ergs cm^{-2} s^{-1}]")
[Version: full-size]
![[Print media version: flux probability distribution fitted with a gaussian]](2.png)
Figure 2: Gaussian Fitted Flux Distribution
The energy flux probability distribution replotted as a scatter-plot with a Gaussian function fitted to the data (red).
We can now determine the uncertainty in the modelled flux, by calculating the standard deviation instead of the FWHM of the Gaussian using the relationship: FWHM = σ(8 ln 2)1/2 ≅ 2.35482σ.
sherpa> fac0 = numpy.sqrt(8*numpy.log(2)) sherpa> sig = g0.fwhm.val/fac0 sherpa> print sig 5.26486994283e-14
Since we have generated two distributions of fluxes, we can examine the results of the mean of the flux100 sample and compare it with the mean position of the Gaussian fit to the histogram, g0.pos. We see that the two mean values, <flux100>≅7.581×10-13 ergs cm-2 s-1 and <hist10000>≅7.544×10-13 ergs cm-2 s-1, are approximately the same. The standard deviations similarly, are equivalent to each other, σflux100≅5.389×10-14 ergs cm-2 s-1 and σhist10000≅5.265×10-14 ergs cm-2 s-1. This is expected for normal distributions, as was simulated; however, if the flux distribution were different (e.g. skewed-, bi-modal-, or κ-distributions), then the calculated mean and σ using numpy may be very different. Visualization of the distributions using the histogram is a useful sanity check.
We can write the array to an ASCII file using the write_arrays function, including column headers:
sherpa> write_arrays("hist10000.dat",[hist10000.xlo,hist10000.xhi,hist10000.y],fields=["xlo","xhi","y"])
and reload the simulation results into another session with the load_ascii function:
sherpa> load_ascii("hist10000.dat",ncols=3,dstype=Data1DInt)
for future analysis.
Scripting It
The file, flux_script.py , performs the primary commands used above. It can be execupted by typing execfile("flux_script.py") on the Sherpa command line. The Sherpa script command may be used to save everything typed on the command line in a Sherpa session:
sherpa> script(filename="sherpa.log", clobber=False)
The CXC is committed to helping Sherpa users transition to new syntax as smoothly as possible. If you have existing Sherpa scripts or save files, submit them to us via the CXC Helpdesk and we will provide the CIAO/Sherpa 4.1 syntax to you.
History
| 30 Apr 2009 | New for CIAO 4.1 |
| 04 Aug 2009 | Updated for S-Lang users to take advantage of the new "ciao_utils" scripts module. |
