# Fitting and Estimating Parameter Confidence Limits with Sherpa

### David Van Dyk, Alanna Connors, Taeyoung Park

#### California-Harvard Astro-statistics Collaboration (CHASC)

Fit parameterized models

Compute confidence regions

## Overview

• Introduction to modeling and fitting with Sherpa
• Rooting Finding: Sherpa’s confidence method
• A Bayesian Approach: MCMC methods in pyBLoCXS
• Non-linear Regression Example
• Using conf
• Näive sampling of parameter space
• Exploring the posterior distribution with pyBLoCXS

What can Sherpa do?

• Supports 1-D and 2-D model fitting out of the box. Can be extended to N-D.
• Reads data from a variety of sources; FITS, text columns, even NumPy arrays.
• Includes support for user-defined models functions. Sherpa’s model syntax uses the Python parser coupled with NumPy ufuncs to build up arbitrarily complex model expressions.
• Robust optimization methods; Levenberg-Marquardt, Nelder-Mead simplex, and Monte Carlo/Differential Evolution.
• Confidence methods; conf and covariance compute confidence limits, while interval_projection and region_projection compute 1-D and 2-D confidence regions.
• Hooks for user-defined fit statistics and optimization methods.

## Definition of Confidence Limits

In the case where the fit statistic is normalized correctly, the user-specified uncertainty corresponds to one standard-deviation confidence limits for each parameter one by one. At the minimum, the quantity is defined as the change in the fit statistic value.

(1)

For example, 1.6 standard-deviation confidence intervals each map to a change in fit statistic of 2.56.

Common Sherpa fit statistics, include

(2)

(3)

Where represents the observed data points; , the calculated model values; , the model parameters; and , the uncertainties on the observed data points;

• Users can also hook in their own fit statistic function.

## Estimating Confidence

• Find the best-fit parameter values.
• Calculate confidence intervals centered on the best-fit values.
• For each parameter dimension, the multi-dimensional parameter space can take the shape of an asymmetric paraboloid near the minimum.

## Sherpa Confidence method: A root problem

The confidence intervals can be reduced to a root solving problem by translating the y-axis by an amount equal to and selecting points along the fit statistic curve.

Parameter space in non-linear problems is not always well behaved...

The current set of parameters maps to a local minimum.

## A Bayesian Approach

• has limitations – Calculating the error using data variance can overestimate the distribution while model variance can underestimate it.
• Using the Poisson likelihood (e.g. Cash) eliminates the bias typically found with a modified or weighted .
• Statistical model for Poisson data (low counts)

(4)

Where represents the posterior distribution; , the likelihood; , the prior; and is considered constant.

(5)

Where represents the model parameters; , the observed data; and , the initial information.

## pyBLoCXS

• A Python implementation of a Markov chain Monte Carlo (MCMC) algorithm designed to handle Bayesian Low-Count X-ray Spectral (BLoCXS) analysis with Sherpa
• Explores entire parameter space at the suspected minimum
• Not hung up in a local minima
• Includes a flexible definition of priors
• Draws parameter proposals using a multivariate t-distribution
• Currently limited to simple models

“Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior Simulations” - Van Dyk, Connors, Kashyap & Siemiginowska 2001, ApJ. 548, 224

Draw parameters, calculate the likelihood and posterior probability of the “proposed” parameter values given the observed data, and use a Metropolis-Hastings criterion to accept or reject the proposal.

Main MCMC loop

Sampling the posterior probability distribution

• Metropolis-Hastings, MH
• Centered on the best-fit parameters
• Metropolis-Hastings mixed with Metropolis jumping rule, MetropolisMH
• Centered on the current draw of parameters
• Specify a scalar multiple of the covariance matrix

Priors

• Supports priori knowledge such as the range of parameters
• Priors and associated hyper-parameters are defined by the user
• A non-information (flat) prior is the default for each parameter
• Parameters can be optionally scaled according to an inverse or log scale

### Example

The Thurber problem is an example of non-linear least squares regression from the Statistical Reference Datasets (StRD) at the National Institute of Standards and Technology (NIST). The observed data results from a NIST study of semiconductor electron mobility. The data includes 37 observations with the dependent variable (y) represented as electron mobility and the independent variable (x) as the log of the density.

(6)

(7)

 Parameter Certified Values Sherpa Values Percentage 1.2881396800E+03 1.28813971e+03 99.999 1.4910792535E+03 1.49106665e+03 99.999 5.8323836877E+02 5.83229092e+02 99.998 7.5416644291E+01 7.54148565e+01 99.998 9.6629502864E-01 9.66284739e-01 99.999 3.9797285797E-01 3.97967752e-01 99.999 4.9727297349E-02 4.97257372e-02 99.997

## Fitting using Sherpa Optimization Methods

Read the observed data from the text file using asciitable. Define the model according to the specification and the initial parameter values.

import asciitable

data_start=36,
delimiter="\s")

# Columns as NumPy arrays
x = tbl['col2']
y = tbl['col1']

p0 = [1000, 1000, 400, 40, 0.7, 0.3, 0.03]

def calc(p, x):
xx = x**2
xxx = x**3
return ( (p[0] + p[1]*x + p[2]*xx + p[3]*xxx) /
(1. + p[4]*x + p[5]*xx + p[6]*xxx) )

# define a tolerance
tol = 1.e-9


## Sherpa high level UI

Load the Thurber arrays into a Sherpa data set, indicate least-squares with Levenberg-Marquardt, set the user-defined function as the Sherpa model, and fit.

import sherpa.ui as ui

names = ['b%i' % (ii+1) for ii in range(len(p0))]

ui.set_stat('leastsq')

ui.set_method('levmar')
ui.set_method_opt('gtol', tol)
ui.set_method_opt('xtol', tol)
ui.set_method_opt('ftol', tol)
ui.set_method_opt('epsfcn', tol)

ui.set_model('mdl')

ui.fit()
popt = ui.get_fit_results().parvals


Sherpa example: API level UI - fitting with user-model

Find the best-fit parameter values using least-squares with a Python function and NumPy array-based API.

Click to Show/Hide Sherpa API example

import sherpa.optmethods
import numpy

# Fit statistic function returns tuple of statistic value and residuals
def calc_resid(p):
resid = (y - calc(p, x))
return (sum(resid**2), resid)

pmax = numpy.finfo(numpy.float32).max
pmin = -pmax

lm = sherpa.optmethods.LevMar()
tol = 1.e-9
lm.config['gtol']=tol
lm.config['xtol']=tol
lm.config['ftol']=tol
lm.config['epsfcn']=tol

# Sherpa's extension of LMDIF includes strict parameter boundaries
def fit(fcn, pars, parmins=None, parmaxes=None):
pars = numpy.array(pars)
result = lm.fit(fcn, pars, parmins, parmaxes)
return result[1]

popt = fit(calc_resid, p0, [pmin]*len(p0), [pmax]*len(p0))


Compare to SciPy leastsq

import scipy.optimize

def calc_resid(p):
return (y - calc(p, x))

popt = scipy.optimize.leastsq(calc_resid, p0,
ftol = tol, xtol = tol,
gtol = tol, epsfcn = tol)[0]


## Sherpa Confidence Method

Sherpa’s confidence method employs Müller’s root finding algorithm using a translated 1-D parameter space near the minimum.

• Sherpa procedural UI example
ui.set_stat('cstat')

ui.fit()

ui.conf()

# lower error bars
pmins  = ui.get_conf_results().parmins

# upper error bars
pmaxes = ui.get_conf_results().parmaxes

 Parameter Best Fit Lower Bound Upper Bound 1288.12 -12.1594 12.1594 1452.67 -73.3571 17.8398 557.281 -7.09913 34.3927 70.2984 -10.1567 2.42915 0.943534 -0.0575953 0.0433009 0.387899 -0.02639 0.0199346 0.0403176 -0.0134162 0.00914532

## Sherpa Covariance Method

To compute the covariance matrix, Sherpa first estimates the information matrix by finite differences by reducing a multi-dimensional problem to a series of 1-D problems. Sherpa then iteratively applies second central differencing with extrapolation (Kass 1987). The covariance matrix follows by inverting the information matrix.

• Sherpa procedural UI example
ui.covar()

# lower error bars
pmins  = ui.get_covar_results().parmins

# upper error bars
pmaxes = ui.get_covar_results().parmaxes

# where pmins == -pmaxes

# Access the covariance matrix
cov = ui.get_covar_results().extra_output

 Parameter Best Fit Lower Bound Upper Bound 1288.12 -12.1594 12.1594 1452.67 -55.506 55.506 557.281 -39.7166 39.7166 70.2984 -7.58595 7.58595 0.943534 -0.0471354 0.0471354 0.387899 -0.0217024 0.0217024 0.0403176 -0.0107599 0.0107599

Sherpa example: API level UI - confidence and covariance methods

Utility wrappers to Sherpa API functions provide a convenient Python function and NumPy array-based API.

Click to Show/Hide Sherpa API example

API interface to confidence

import conf

def calc_chi2(p):
err = numpy.sqrt(y)
resid = (y - calc(p, x))/err
return (sum(resid**2), resid)

pmins, pmaxes = conf.conf(calc_chi2, fit, range(len(p0)), popt)


API interface to covariance

import conf

pmins, pmaxes, cov = conf.covar(calc_chi2, range(len(p0)), popt)

# file conf.py

import sherpa.estmethods
import numpy

import logging
logger = logging.getLogger('sherpa')
logger.setLevel(logging.ERROR)

_min = -numpy.finfo(numpy.float32).max
_max = numpy.finfo(numpy.float32).max
_tol = numpy.finfo(numpy.float32).eps

def conf(stat_cb, fit_cb, parnums, pars,
parmins=None, parmaxes=None, parhardmins=None, parhardmaxes=None,
sigma=1, eps=0.01, tol=0.2, maxiters=200, remin=0.01,
verbose=False, do_parallel=False, numcores=1, openinterval=False):

xorig = numpy.array(pars)
mins  = numpy.array([_min]*len(pars))
maxes = numpy.array([_max]*len(pars))

if parmins is None: parmins = mins
if parmaxes is None:  parmaxes = maxes
if parhardmins is None: parhardmins = mins
if parhardmaxes is None: parhardmaxes = maxes

def stat(pvals):
stat, fvec = stat_cb(pvals)
return stat

def fit(pvals, pmins, pmaxes, i):
mu = numpy.array(xorig)
mu[i] = pvals[i]

keepers = range(len(pvals))
keepers.remove(i)

current_pars = pvals.take(keepers)
current_mins = pmins.take(keepers)
current_maxes = pmaxes.take(keepers)

def cb(modified_x):
mu.put(keepers, modified_x)
return stat_cb(mu)

xopt = fit_cb(cb, current_pars, current_mins, current_maxes)

mu.put(keepers, xopt)

return stat(mu)

get_par_name = lambda i : 'par' + str(i)
report_progress = lambda i, lower, upper : None
(mins, maxes, flags,
nfits, junk) = sherpa.estmethods.confidence(
pars, parmins, parmaxes, parhardmins, parhardmaxes,
sigma, eps, tol, maxiters, remin, verbose, parnums,
stat, fit, report_progress, get_par_name,
do_parallel, numcores, openinterval)

return mins, maxes

def covar(stat_cb, parnums, pars,
parmins=None, parmaxes=None, parhardmins=None, parhardmaxes=None,
sigma=1, eps=0.01, tol=0.01, maxiters=200, remin=0.01):

def stat(pvals):
stat, fvec = stat_cb(pvals)
return stat

fit_cb = lambda : None
report_progress = lambda i, lower, upper : None

xorig = numpy.array(pars)
mins  = numpy.array([_min]*len(pars))
maxes = numpy.array([_max]*len(pars))

if parmins is None: parmins = mins
if parmaxes is None: parmaxes = maxes
if parhardmins is None: parhardmins = mins
if parhardmaxes is None: parhardmaxes = maxes

(mins, maxes, flags,
junk, cov) = sherpa.estmethods.covariance(
pars, parmins, parmaxes, parhardmins, parhardmaxes,
sigma, eps, tol, maxiters, remin, parnums,
stat, fit_cb, report_progress)

return mins, maxes, cov


## Näive Approach to Sampling Parameter Space

Draw selected parameters from a uniform distribution while the others remain at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).

import numpy

popt = numpy.array(ui.get_fit_results().parvals)
scales = numpy.sqrt(cov.diagonal())

num = 10000

# Run parameters 1 and 2.
params = [0, 1]

samples = [numpy.random.uniform(val-abs(scale), val+abs(scale), num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T


Log-likelihood density (log scale) by sampling from a uniform distribution

Plotting uniform example

Calculate the log-likelihood on the uniform sample of selected parameters. Remaining parameters are at their best-fit values.

Click to Show/Hide

import sherpa.stats

_min = -numpy.finfo(numpy.float32).max
_max = numpy.finfo(numpy.float32).max

cstat = sherpa.stats.CStat()
def calc_stat(p):
stat = cstat.calc_stat(y, calc(p, x), numpy.ones_like(y))[0]
return (stat, None)

popt = nm.fit(calc_stat, popt, [_min]*len(p0), [_max]*len(p0))[1]

scales = numpy.sqrt(cov.diagonal())

num = 10000

# Run parameters 1 and 2.
params = [0, 1]

samples = [numpy.random.uniform(val-abs(scale), val+abs(scale), num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T

def calc_likelihood(p):
return -0.5 * calc_stat(p)[0]

others = list(set(range(len(popt))) - set(params))
bestfits = numpy.repeat(popt.take(others)[:,numpy.newaxis], num, axis=1).T
samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))

import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)


Draw selected parameters from a normal distribution assuming no correlations. The other parameter values are kept at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).

samples = [numpy.random.normal(val, scale, num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T


Log-likelihood density (log scale) using a normal distribution (uncorrelated)

Plotting normal example

Calculate the log-likelihood on the normal sample of selected parameters. Remaining parameters are at their best-fit values.

Click to Show/Hide

samples = [numpy.random.normal(val, scale, num)
for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T

samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))

import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)


Draw selected parameters from a multivariate normal distribution using the covariance matrix. The other parameter values are kept at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).

samples = (popt.take(params) +
numpy.random.multivariate_normal(numpy.zeros_like(popt),
cov, num).take(params, axis=1))


Log-likelihood density (log scale) using multivariate normal distribution (correlated)

Plotting a multivariate normal example

Calculate the log-likelihood on the multivariate normal sample of selected parameters using the covariance matrix. Remaining parameters are at their best-fit values.

Click to Show/Hide

samples = (popt.take(params) +
numpy.random.multivariate_normal(numpy.zeros_like(popt),
cov, num).take(params, axis=1))
samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))

import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)


## pyBLoCXS Example

Calculate draws by probing the posterior probability using Metropolis-Hastings. Assumes a flat prior for each parameter.

import pyblocxs

pyblocxs.ui = ui

ui.set_stat('cash')
ui.set_method('simplex')
ui.fit()
ui.covar()

pyblocxs.set_sampler('MH')
stats, accept, params = pyblocxs.get_draws(niter=1e4)


Plot a trace of draws versus iteration. Rejected draws are shown as gaps in the curve.

pyblocxs.plot_trace(params[0], 'b1')


A trace plot show the draws for per iteration

## Metropolis-Hastings

Plot the log-likelihood density using the Sherpa Cash statistic and pyBLoCXS with Metropolis-Hastings.

import pylab
pylab.figure()
pylab.scatter(params[0], params[1], c=stats, cmap=pylab.cm.jet)


Log-likelihood density using Metropolis-Hastings in pyBLoCXS

## Metropolis-Hastings Mixed with Metropolis

Plot the log-likelihood density using the Sherpa Cash statistic and pyBLoCXS with Metropolis-Hastings mixed with Metropolis. Jumps occurs from the current parameter proposal. pyBLoCXS assumes a flat prior for each parameter. User-defined priors can be set up using pyblocxs.set_prior().

pyblocxs.set_sampler('MetropolisMH')
stats, accept, params = pyblocxs.get_draws(niter=1e4)

pylab.figure()
pylab.scatter(params[0], params[1], c=stats, cmap=pylab.cm.jet)


Log-likelihood density using Metropolis and Metropolis-Hastings in pyBLoCXS

## pyBLoCXS Confidence

Use the parameter draws to calculate summary statistics, quantiles, and plot the CDF.

print("Mean: {0}".format(numpy.mean(params[0])))
print("Median: {0}".format(numpy.median(params[0])))
print("Std Dev: {0}".format(numpy.std(params[0])))

sp = numpy.sort(params[0])

print("95% quantile: {0}".format(sp[0.95*(len(params[0])-1)]))
print("50% quantile: {0}".format(sp[0.50*(len(params[0])-1)]))

pylab.figure()
pyblocxs.plot_cdf(params[0], 'b1')


Cumulative density, quantiles and median for

Bin up the parameter draws and plot the PDF.

pylab.figure()
pyblocxs.plot_pdf(params[0], 'b1')


Probability density for

## Multiple Modes

Multi-modal log-likelihood density for a spectral fit.

## Conclusions

• Multi-dimensional parameter space is not always uniform, so Sherpa provides options to explore its topology.
• Sherpa’s root-finding method conf can estimate fit parameter confidence limits.
• pyBLoCXS includes methods to efficiently probe the posterior probability distribution. Confidence limits and summary statistics follow once the parameter distribution is known.

## Future Work

• Sherpa
• Improve distribution methods; building from source and package management (Ubuntu debs, Red Hat RPMs, Mac disk images).
• Provide additional access for Python community users; repository on github
• pyBLoCXS
• Include support for instrumental uncertainties (non-linear) within the MCMC loop.
• Continue development and testing - include additional distributions beyond student’s t.

## Platforms

• Copyright Smithsonian Astrophysical Observatory (SAO)