SciPy 2011

July 13

Fitting and Estimating Parameter Confidence Limits with Sherpa

Brian Refsdal, Stephen Doe, Dan Nguyen, Aneta Siemiginowska, Vinay Kashyap

Harvard-Smithsonian Center for Astrophysics

Chandra X-ray Center (CXC)

David Van Dyk, Alanna Connors, Taeyoung Park

California-Harvard Astro-statistics Collaboration (CHASC)





alternate-text

Fit parameterized models

alternate-text

Compute confidence regions





Overview

  • Introduction to modeling and fitting with Sherpa
  • Rooting Finding: Sherpa’s confidence method
  • A Bayesian Approach: MCMC methods in pyBLoCXS
  • Non-linear Regression Example
    • Using conf
    • Näive sampling of parameter space
    • Exploring the posterior distribution with pyBLoCXS






_images/sherpa_logo.png


What can Sherpa do?


  • Supports 1-D and 2-D model fitting out of the box. Can be extended to N-D.
  • Reads data from a variety of sources; FITS, text columns, even NumPy arrays.
  • Includes support for user-defined models functions. Sherpa’s model syntax uses the Python parser coupled with NumPy ufuncs to build up arbitrarily complex model expressions.
  • Robust optimization methods; Levenberg-Marquardt, Nelder-Mead simplex, and Monte Carlo/Differential Evolution.
  • Confidence methods; conf and covariance compute confidence limits, while interval_projection and region_projection compute 1-D and 2-D confidence regions.
  • Hooks for user-defined fit statistics and optimization methods.





Definition of Confidence Limits

In the case where the \chi^2 fit statistic is normalized correctly, the user-specified uncertainty \sigma=1 corresponds to one standard-deviation confidence limits for each parameter one by one. At the minimum, the quantity \sigma^2 is defined as the change in the fit statistic value.

(1)\sigma^2 = \text{Stat} - \text{Stat}_\text{minimum}

For example, 1.6 standard-deviation confidence intervals each map to a change in fit statistic of 2.56.

Common Sherpa fit statistics, include

(2)F = \chi^2 = \sum_i \frac{(x_i - y_i(\vec{a}))^2}{e_i^2}

(3)F = \text{C-Statistic} = -2 \sum_i y_i(\vec{a}) + x_i - x_i*log(x_i / y_i(\vec{a}))

Where x_i represents the observed data points; y_i, the calculated model values; \vec{a}, the model parameters; and e_i^2, the uncertainties on the observed data points;

  • Users can also hook in their own fit statistic function.





Estimating Confidence

  • Find the best-fit parameter values.
  • Calculate confidence intervals centered on the best-fit values.
  • For each parameter dimension, the multi-dimensional parameter space can take the shape of an asymmetric paraboloid near the minimum.





Sherpa Confidence method: A root problem

The confidence intervals can be reduced to a root solving problem by translating the y-axis by an amount equal to \sigma^2 and selecting points along the fit statistic curve.

alternate text

Parameter space in non-linear problems is not always well behaved...

alternate text

The current set of parameters maps to a local minimum.






A Bayesian Approach

  • \chi^2 has limitations – Calculating the error using data variance can overestimate the distribution while model variance can underestimate it.
  • Using the Poisson likelihood (e.g. Cash) eliminates the bias typically found with a modified or weighted \chi^2.
  • Statistical model for Poisson data (low counts)

(4)p(\theta|d, I) = \frac{p(d|\theta, I) p(\theta|I)}{p(d|I)}

Where p(\theta|d, I) represents the posterior distribution; p(d|\theta, I), the likelihood; p(\theta|I), the prior; and p(d|I) is considered constant.





(5)p(\theta|d, I) \propto p(d|\theta, I) p(\theta|I)

Where \theta represents the model parameters; d, the observed data; and I, the initial information.






pyBLoCXS

  • A Python implementation of a Markov chain Monte Carlo (MCMC) algorithm designed to handle Bayesian Low-Count X-ray Spectral (BLoCXS) analysis with Sherpa
  • Explores entire parameter space at the suspected minimum
    • Not hung up in a local minima
  • Includes a flexible definition of priors
  • Draws parameter proposals using a multivariate t-distribution
  • Currently limited to simple models

“Analysis of Energy Spectra with Low Photon Counts via Bayesian Posterior Simulations” - Van Dyk, Connors, Kashyap & Siemiginowska 2001, ApJ. 548, 224

Draw parameters, calculate the likelihood and posterior probability of the “proposed” parameter values given the observed data, and use a Metropolis-Hastings criterion to accept or reject the proposal.

alternate text

Main MCMC loop





alternate text

Sampling the posterior probability distribution

  • Metropolis-Hastings, MH
    • Centered on the best-fit parameters
  • Metropolis-Hastings mixed with Metropolis jumping rule, MetropolisMH
    • Centered on the current draw of parameters
    • Specify a scalar multiple of the covariance matrix

Priors

  • Supports priori knowledge such as the range of parameters
  • Priors and associated hyper-parameters are defined by the user
  • A non-information (flat) prior is the default for each parameter
  • Parameters can be optionally scaled according to an inverse or log scale





Example

The Thurber problem is an example of non-linear least squares regression from the Statistical Reference Datasets (StRD) at the National Institute of Standards and Technology (NIST). The observed data results from a NIST study of semiconductor electron mobility. The data includes 37 observations with the dependent variable (y) represented as electron mobility and the independent variable (x) as the log of the density.

(6)y = f(x; \beta) + \epsilon
  = \frac{\beta_1 + \beta_2x + \beta_3x^2 + \beta_4x^3}{1 + \beta_5x + \beta_6x^2 + \beta_7x^3} + \epsilon

(7)\vec{p} = \{ \beta_1, \beta_2, \beta_3, \beta_4, \beta_5, \beta_6, \beta_7 \}

alternate text
The best-fit parameters for Thurber problem.
Parameter Certified Values Sherpa Values Percentage
\beta_1 1.2881396800E+03 1.28813971e+03 99.999
\beta_2 1.4910792535E+03 1.49106665e+03 99.999
\beta_3 5.8323836877E+02 5.83229092e+02 99.998
\beta_4 7.5416644291E+01 7.54148565e+01 99.998
\beta_5 9.6629502864E-01 9.66284739e-01 99.999
\beta_6 3.9797285797E-01 3.97967752e-01 99.999
\beta_7 4.9727297349E-02 4.97257372e-02 99.997





Python Packages

Fitting using Sherpa Optimization Methods

Read the observed data from the text file using asciitable. Define the model according to the specification and the initial parameter values.

import asciitable

tbl = asciitable.read('Thurber.dat',
                      Reader=asciitable.NoHeader,
                      data_start=36,
                      delimiter="\s")

# Columns as NumPy arrays
x = tbl['col2']
y = tbl['col1']

p0 = [1000, 1000, 400, 40, 0.7, 0.3, 0.03]

def calc(p, x):
    xx = x**2
    xxx = x**3
    return ( (p[0] + p[1]*x + p[2]*xx + p[3]*xxx) /
             (1. + p[4]*x + p[5]*xx + p[6]*xxx) )

# define a tolerance
tol = 1.e-9

Sherpa high level UI

Load the Thurber arrays into a Sherpa data set, indicate least-squares with Levenberg-Marquardt, set the user-defined function as the Sherpa model, and fit.

import sherpa.ui as ui

names = ['b%i' % (ii+1) for ii in range(len(p0))]

ui.load_arrays(1, x, y, ui.Data1D)
ui.set_stat('leastsq')

ui.set_method('levmar')
ui.set_method_opt('gtol', tol)
ui.set_method_opt('xtol', tol)
ui.set_method_opt('ftol', tol)
ui.set_method_opt('epsfcn', tol)

ui.load_user_model(calc, 'mdl')
ui.add_user_pars('mdl', names, p0)
ui.set_model('mdl')

ui.fit()
popt = ui.get_fit_results().parvals

Sherpa example: API level UI - fitting with user-model

Find the best-fit parameter values using least-squares with a Python function and NumPy array-based API.

Click to Show/Hide Sherpa API example

import sherpa.optmethods
import numpy

# Fit statistic function returns tuple of statistic value and residuals
def calc_resid(p):
    resid = (y - calc(p, x))
    return (sum(resid**2), resid)

pmax = numpy.finfo(numpy.float32).max
pmin = -pmax

lm = sherpa.optmethods.LevMar()
tol = 1.e-9
lm.config['gtol']=tol
lm.config['xtol']=tol
lm.config['ftol']=tol
lm.config['epsfcn']=tol

# Sherpa's extension of LMDIF includes strict parameter boundaries
def fit(fcn, pars, parmins=None, parmaxes=None):
    pars = numpy.array(pars)
    result = lm.fit(fcn, pars, parmins, parmaxes)
    return result[1]

popt = fit(calc_resid, p0, [pmin]*len(p0), [pmax]*len(p0))

Compare to SciPy leastsq

import scipy.optimize

def calc_resid(p):
    return (y - calc(p, x))

popt = scipy.optimize.leastsq(calc_resid, p0,
                              ftol = tol, xtol = tol,
                              gtol = tol, epsfcn = tol)[0]

Sherpa Confidence Method

Sherpa’s confidence method employs Müller’s root finding algorithm using a translated 1-D parameter space near the minimum.

  • Sherpa procedural UI example
ui.set_stat('cstat')
ui.set_method('neldermead')

ui.fit()

ui.conf()

# lower error bars
pmins  = ui.get_conf_results().parmins

# upper error bars
pmaxes = ui.get_conf_results().parmaxes
The 1 \sigma confidence limits for Thurber problem.
Parameter Best Fit Lower Bound Upper Bound
\beta_1 1288.12 -12.1594 12.1594
\beta_2 1452.67 -73.3571 17.8398
\beta_3 557.281 -7.09913 34.3927
\beta_4 70.2984 -10.1567 2.42915
\beta_5 0.943534 -0.0575953 0.0433009
\beta_6 0.387899 -0.02639 0.0199346
\beta_7 0.0403176 -0.0134162 0.00914532

Sherpa Covariance Method

To compute the covariance matrix, Sherpa first estimates the information matrix by finite differences by reducing a multi-dimensional problem to a series of 1-D problems. Sherpa then iteratively applies second central differencing with extrapolation (Kass 1987). The covariance matrix follows by inverting the information matrix.

  • Sherpa procedural UI example
ui.covar()

# lower error bars
pmins  = ui.get_covar_results().parmins

# upper error bars
pmaxes = ui.get_covar_results().parmaxes

# where pmins == -pmaxes

# Access the covariance matrix
cov = ui.get_covar_results().extra_output
The 1 \sigma covariance results for Thurber problem.
Parameter Best Fit Lower Bound Upper Bound
\beta_1 1288.12 -12.1594 12.1594
\beta_2 1452.67 -55.506 55.506
\beta_3 557.281 -39.7166 39.7166
\beta_4 70.2984 -7.58595 7.58595
\beta_5 0.943534 -0.0471354 0.0471354
\beta_6 0.387899 -0.0217024 0.0217024
\beta_7 0.0403176 -0.0107599 0.0107599

Sherpa example: API level UI - confidence and covariance methods

Utility wrappers to Sherpa API functions provide a convenient Python function and NumPy array-based API.

Click to Show/Hide Sherpa API example

API interface to confidence

import conf

def calc_chi2(p):
    err = numpy.sqrt(y)
    resid = (y - calc(p, x))/err
    return (sum(resid**2), resid)

pmins, pmaxes = conf.conf(calc_chi2, fit, range(len(p0)), popt)

API interface to covariance

import conf

pmins, pmaxes, cov = conf.covar(calc_chi2, range(len(p0)), popt)
# file conf.py

import sherpa.estmethods
import numpy

import logging
logger = logging.getLogger('sherpa')
logger.setLevel(logging.ERROR)

_min = -numpy.finfo(numpy.float32).max
_max = numpy.finfo(numpy.float32).max
_tol = numpy.finfo(numpy.float32).eps

def conf(stat_cb, fit_cb, parnums, pars,
    parmins=None, parmaxes=None, parhardmins=None, parhardmaxes=None,
    sigma=1, eps=0.01, tol=0.2, maxiters=200, remin=0.01,
    verbose=False, do_parallel=False, numcores=1, openinterval=False):

    xorig = numpy.array(pars)
    mins  = numpy.array([_min]*len(pars))
    maxes = numpy.array([_max]*len(pars))

    if parmins is None: parmins = mins
    if parmaxes is None:  parmaxes = maxes
    if parhardmins is None: parhardmins = mins
    if parhardmaxes is None: parhardmaxes = maxes

    def stat(pvals):
        stat, fvec = stat_cb(pvals)
        return stat

    def fit(pvals, pmins, pmaxes, i):
        mu = numpy.array(xorig)
        mu[i] = pvals[i]

        keepers = range(len(pvals))
        keepers.remove(i)

        current_pars = pvals.take(keepers)
        current_mins = pmins.take(keepers)
        current_maxes = pmaxes.take(keepers)

        def cb(modified_x):
            mu.put(keepers, modified_x)
            return stat_cb(mu)

        xopt = fit_cb(cb, current_pars, current_mins, current_maxes)

        mu.put(keepers, xopt)

        return stat(mu)

    get_par_name = lambda i : 'par' + str(i)
    report_progress = lambda i, lower, upper : None
    (mins, maxes, flags,
     nfits, junk) = sherpa.estmethods.confidence(
        pars, parmins, parmaxes, parhardmins, parhardmaxes,
        sigma, eps, tol, maxiters, remin, verbose, parnums,
        stat, fit, report_progress, get_par_name,
        do_parallel, numcores, openinterval)

    return mins, maxes

def covar(stat_cb, parnums, pars,
          parmins=None, parmaxes=None, parhardmins=None, parhardmaxes=None,
          sigma=1, eps=0.01, tol=0.01, maxiters=200, remin=0.01):

    def stat(pvals):
        stat, fvec = stat_cb(pvals)
        return stat

    fit_cb = lambda : None
    report_progress = lambda i, lower, upper : None

    xorig = numpy.array(pars)
    mins  = numpy.array([_min]*len(pars))
    maxes = numpy.array([_max]*len(pars))

    if parmins is None: parmins = mins
    if parmaxes is None: parmaxes = maxes
    if parhardmins is None: parhardmins = mins
    if parhardmaxes is None: parhardmaxes = maxes

    (mins, maxes, flags,
     junk, cov) = sherpa.estmethods.covariance(
        pars, parmins, parmaxes, parhardmins, parhardmaxes,
        sigma, eps, tol, maxiters, remin, parnums,
        stat, fit_cb, report_progress)

    return mins, maxes, cov

Näive Approach to Sampling Parameter Space

Draw selected parameters from a uniform distribution while the others remain at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).

import numpy

popt = numpy.array(ui.get_fit_results().parvals)
scales = numpy.sqrt(cov.diagonal())

num = 10000

# Run parameters 1 and 2.
params = [0, 1]

samples = [numpy.random.uniform(val-abs(scale), val+abs(scale), num)
           for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T
alternate text

Log-likelihood density (log scale) by sampling from a uniform distribution

Plotting uniform example

Calculate the log-likelihood on the uniform sample of selected parameters. Remaining parameters are at their best-fit values.

Click to Show/Hide

import sherpa.stats

_min = -numpy.finfo(numpy.float32).max
_max = numpy.finfo(numpy.float32).max

cstat = sherpa.stats.CStat()
def calc_stat(p):
    stat = cstat.calc_stat(y, calc(p, x), numpy.ones_like(y))[0]
    return (stat, None)

nm = sherpa.optmethods.NelderMead()
popt = nm.fit(calc_stat, popt, [_min]*len(p0), [_max]*len(p0))[1]

scales = numpy.sqrt(cov.diagonal())

num = 10000

# Run parameters 1 and 2.
params = [0, 1]

samples = [numpy.random.uniform(val-abs(scale), val+abs(scale), num)
           for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T

def calc_likelihood(p):
    return -0.5 * calc_stat(p)[0]

others = list(set(range(len(popt))) - set(params))
bestfits = numpy.repeat(popt.take(others)[:,numpy.newaxis], num, axis=1).T
samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))

import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
              c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)




Draw selected parameters from a normal distribution assuming no correlations. The other parameter values are kept at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).

samples = [numpy.random.normal(val, scale, num)
           for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T
alternate text

Log-likelihood density (log scale) using a normal distribution (uncorrelated)

Plotting normal example

Calculate the log-likelihood on the normal sample of selected parameters. Remaining parameters are at their best-fit values.

Click to Show/Hide

samples = [numpy.random.normal(val, scale, num)
           for val, scale in zip(popt.take(params), scales.take(params))]
samples = numpy.asarray(samples).T

samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))

import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
              c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)




Draw selected parameters from a multivariate normal distribution using the covariance matrix. The other parameter values are kept at their best-fit values. The magnitudes of the colorbar are on a log scale (log-log-likelihood).

samples = (popt.take(params) +
           numpy.random.multivariate_normal(numpy.zeros_like(popt),
                                            cov, num).take(params, axis=1))
alternate text

Log-likelihood density (log scale) using multivariate normal distribution (correlated)

Plotting a multivariate normal example

Calculate the log-likelihood on the multivariate normal sample of selected parameters using the covariance matrix. Remaining parameters are at their best-fit values.

Click to Show/Hide

samples = (popt.take(params) +
           numpy.random.multivariate_normal(numpy.zeros_like(popt),
                                            cov, num).take(params, axis=1))
samples = numpy.concatenate([samples, bestfits], axis=1)
likelihood = numpy.asarray(map(calc_likelihood, samples))

import pylab
pylab.scatter(samples[:,params[0]], samples[:,params[1]],
              c=-1.*numpy.log(-1.*likelihood), cmap=pylab.cm.jet)

pyBLoCXS Example

Calculate draws by probing the posterior probability using Metropolis-Hastings. Assumes a flat prior for each parameter.

import pyblocxs

pyblocxs.ui = ui

ui.set_stat('cash')
ui.set_method('simplex')
ui.fit()
ui.covar()

pyblocxs.set_sampler('MH')
stats, accept, params = pyblocxs.get_draws(niter=1e4)

Plot a trace of \beta_1 draws versus iteration. Rejected draws are shown as gaps in the curve.

pyblocxs.plot_trace(params[0], 'b1')
alternate text

A trace plot show the draws for \beta_1 per iteration





Metropolis-Hastings

Plot the log-likelihood density using the Sherpa Cash statistic and pyBLoCXS with Metropolis-Hastings.

import pylab
pylab.figure()
pylab.scatter(params[0], params[1], c=stats, cmap=pylab.cm.jet)
alternate text

Log-likelihood density using Metropolis-Hastings in pyBLoCXS





Metropolis-Hastings Mixed with Metropolis

Plot the log-likelihood density using the Sherpa Cash statistic and pyBLoCXS with Metropolis-Hastings mixed with Metropolis. Jumps occurs from the current parameter proposal. pyBLoCXS assumes a flat prior for each parameter. User-defined priors can be set up using pyblocxs.set_prior().

pyblocxs.set_sampler('MetropolisMH')
stats, accept, params = pyblocxs.get_draws(niter=1e4)

pylab.figure()
pylab.scatter(params[0], params[1], c=stats, cmap=pylab.cm.jet)
alternate text

Log-likelihood density using Metropolis and Metropolis-Hastings in pyBLoCXS

pyBLoCXS Confidence

Use the parameter draws to calculate summary statistics, quantiles, and plot the CDF.

print("Mean: {0}".format(numpy.mean(params[0])))
print("Median: {0}".format(numpy.median(params[0])))
print("Std Dev: {0}".format(numpy.std(params[0])))

sp = numpy.sort(params[0])

print("95% quantile: {0}".format(sp[0.95*(len(params[0])-1)]))
print("50% quantile: {0}".format(sp[0.50*(len(params[0])-1)]))

pylab.figure()
pyblocxs.plot_cdf(params[0], 'b1')
alternate text

Cumulative density, quantiles and median for \beta_1



Bin up the parameter draws and plot the PDF.

pylab.figure()
pyblocxs.plot_pdf(params[0], 'b1')
alternate text

Probability density for \beta_1

Multiple Modes

alternate text

Multi-modal log-likelihood density for a spectral fit.

alternate text

Conclusions

  • Multi-dimensional parameter space is not always uniform, so Sherpa provides options to explore its topology.
  • Sherpa’s root-finding method conf can estimate fit parameter confidence limits.
  • pyBLoCXS includes methods to efficiently probe the posterior probability distribution. Confidence limits and summary statistics follow once the parameter distribution is known.

Future Work

  • Sherpa
    • Improve distribution methods; building from source and package management (Ubuntu debs, Red Hat RPMs, Mac disk images).
    • Provide additional access for Python community users; repository on github
  • pyBLoCXS
    • Include support for instrumental uncertainties (non-linear) within the MCMC loop.
    • Continue development and testing - include additional distributions beyond student’s t.

Platforms

  • Copyright Smithsonian Astrophysical Observatory (SAO)

  • GPL v2 License

  • Supported Platforms

    o Linux - CentOS, Fedora, Ubuntu

    o Mac - OSX 10.5, 10.6