Ska Analysis Tutorial

Overview

This document gives a tutorial introduction to analysis using the Ska analysis environment, including basic configuration and a number of examples.

The Ska analysis environment consists of a full-featured Python installation with interactive analysis, plotting and numeric capability along with the Ska engineering telemetry archive.

The telemetry archive consists of:

  • Tools to ingest and compress telemetry from the CXC Chandra archive products.
  • Compressed telemetry files in HDF5 format. Each MSID has three associated products:
    • Full time-resolution data: time, value, quality
    • 5-minute statistics: min, max, mean, sampled value, number of samples
    • Daily statistics: min, max, mean, sampled value, standard deviation, percentiles (1, 5, 16, 50, 84, 95, 99), number of samples.

Configure

The first requirement is to make sure that your local machine is set up to run X11. On a Microsoft windows PC that probably means having Cygwin/X installed and running. On a Mac you will need X11 installed. If you don’t have this already the best option is to install X11.pkg from http://xquartz.macosforge.org/trac/wiki/Releases. From Linux you are already set with X-windows.

To set up and to use the archive it is best to start with a “clean” environment by opening a new X-terminal window. It is assumed you are using csh or tcsh.

The first step is to log in to a linux machine on either the HEAD LAN (e.g. ccosmos) or on the GRETA LAN (chimchim). On the GRETA network the Ska analysis environment is available from any GRETA machine, but the 64-bit machine chimchim will run analysis and telemetry access tasks much faster.

To configure the Ska runtime environment and make a new xterm window with a black background do the following:

ssh -Y <username>@<hostname>  # where <username> and <hostname> are filled in
 <Enter password>

Initial file setup

First make a new xterm window, for instance with the following command:

xterm -bg black -fg green &

Now focus on the new xterm window and set up your path and various environment variables so all the tools are accessible and use the correct libraries.:

source /proj/sot/ska/bin/ska_envs.csh

If you have not used Python and matplotlib before you should do the following setup:

mkdir -p ~/.matplotlib
cp $SKA/include/cfg/matplotlibrc ~/.matplotlib/

Next setup the interactive Python tool. If you already have a ~/.ipython directory then rename it for safekeeping:

mv ~/.ipython ~/.ipython.bak

Now make a new default IPython profile and copy the Ska customizations:

mkdir -p ~/.ipython
ipython profile create --ipython-dir=~/.ipython
cp $SKA/include/cfg/ipython_config.py ~/.ipython/profile_default/
cp $SKA/include/cfg/10_define_impska.py ~/.ipython/profile_default/startup/

Finally it is quite useful to define aliases to get into one of the Ska environments and adjust your prompt to indicate that you are using Ska. The command and file to modify depends on the shell you are using and the network. First if you don’t know what shell you are using then do:

echo $0

This should say either csh or tcsh or bash.

Shell Network File
csh or tcsh HEAD ~/.cshrc.user
csh or tcsh GRETA ~/.cshrc
bash HEAD or GRETA ~/.bashrc

Now put the appropriate lines at the end of the indicated file:

# Csh or tcsh
alias ska  'unsetenv PERL5LIB; source /proj/sot/ska/bin/ska_envs.csh; set prompt="ska-$prompt:q"'
alias skadev  'unsetenv PERL5LIB; source /proj/sot/ska/dev/bin/ska_envs.csh; set prompt="ska-dev-$prompt:q"'
alias skatest  'unsetenv PERL5LIB; source /proj/sot/ska/test/bin/ska_envs.csh; set prompt="ska-test-$prompt:q"'
alias pylab "ipython --pylab"

# Bash
alias ska='unset PERL5LIB; . /proj/sot/ska/bin/ska_envs.sh; export PS1="ska-$PS1"'
alias skadev='unset PERL5LIB; . /proj/sot/ska/dev/bin/ska_envs.sh; export PS1="ska-dev-$PS1"'
alias skatest='unset PERL5LIB; . /proj/sot/ska/test/bin/ska_envs.sh; export PS1="ska-test-$PS1"'
alias pylab='ipython --pylab'

The ska, skadev and skatest aliases are a one-way ticket. Once you get into a Ska environment the recommended way to get out or change to a different version is to start a new window.

The pylab alias is just a quicker way to get into ipython --pylab.

Ska, Ska-dev, and Ska-test

On the GRETA network the Ska-test environment will typically correspond to the Ska environment on the HEAD network. The GRETA Ska environment is tied to the FOT Matlab tools and may lag behind the latest updates to the HEAD Ska environment.

The Ska-dev environment is potentially unstable and should not normally be used unless you are developing the Ska environment.

Basic Functionality Test

Once you have done the configuration setup that was just described, open a new xterm window and get into the Ska environment by use the ska or skatest alias:

% ska

To test the basic functionality of your setup, try the following at the linux shell prompt. In this tutorial all shell commands are shown with a “% ” to indicate the shell prompt. Your prompt may be different and you should not include the “% ” when copying the examples.:

% ipython --pylab

You should see something that looks like:

Python 2.7.1 (r271:86832, Feb  7 2011, 11:30:54)
Type "copyright", "credits" or "license" for more information.

IPython 0.12 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

Welcome to pylab, a matplotlib-based Python environment [backend: Qt4Agg].
For more information, type 'help(pylab)'.

In [1]:

Now read some data and make a simple plot by copying the following lines in ipython:

import Ska.engarchive.fetch as fetch
tephin = fetch.MSID('tephin', '2009:001', '2009:002')
plot(tephin.times, tephin.vals)

You should get a figure like:

_images/basic_func.png

Tools overview

There are four key elements that are the basis for doing plotting and analysis with the engineering archive.

  • Ska.engarchive.fetch: module to read and manipulate telemetry data
  • IPython: interactive python interpreter
  • matplotlib: python plotting package with an interface similar to Matlab
  • NumPy: python numerical package for fast vector and array math

Ska.engarchive.fetch

The tools to access and manipulate telemetry with the Ska engineering archive are described in the Fetch Tutorial.

IPython

IPython is a command-line tool that provides a python prompt that is the basis for interactive analysis. At the core it provides an interpreter for python language commands but with the addition of external packages like matplotlib and numpy it becomes a full-featured data analysis environment. IPython is similar in many ways to the command-line interface in Matlab or IDL.

Links

Pylab

For interactive data analysis IPython has a special --pylab command line option which automatically imports elements of the Numpy and the Matplotlib environments. This provides a Matlab-like environment allowing very simple and direct commands like:

% ipython --pylab

x = arange(0, 10, 0.2)
y = sin(x)
print x
plot(x, y)

Keyboard navigation and history

One of the most useful features of IPython is the ability to edit and navigate you command line history. This lets you quickly re-do commands, perhaps with a slight variation based on seeing the last result. Try cut-n-pasting the above lines in an IPython session. This should bring up a plot of a sine wave.

Now hit up-arrow once and get back the plot(x, y) line. Hit the left-arrow key (not backspace) once and type **2 so that the line reads plot(x, y**2). Now you can hit Return to see the new curve overlayed within the same plot window. It is not necessary to forward-space to the end of the line, you can hit Return with the cursor anywhere in the line.

Now say you want to change the x values slightly. One option is to just hit the up-arrow 5 times, but a much faster way is to remember that the line started with x, so type x and then start hitting up-arrow. Only lines that start with x will be displayed and you are immediately at the x = arange(0, 10, 0.2) line. Now use the right-arrow and backspace to change 10 to 15 and hit Return. Of course y needs to be recalculated, so hit y then up-arrow, then Return. Finally pl up-arrow and Return. Nice and fast!

Bonus points: to speed up by another factor of several, use Ctrl-p (prev) instead of up-arrow, Ctrl-n (next) instead of down-arrow, Ctrl-f (forward) instead of right-arrow and Ctrl-b (back) instead of left-arrow. That way your fingers never leave the keyboard home keys. Ctrl-a gets you to the beginning of the line and Ctrl-e gets you to the end of the line. Triple bonus: on a Mac or Windows machine re-map the Caps-lock key to be Control so it’s right next to your left pinky. How often do you need Caps-lock?

Your command history is saved between sessions (assuming that you exit IPython gracefully) so that when you start a new IPython you can use up-arrow to re-do old commands. You can view your history within the current session by entering history.

Linux and shell commands

A select set of useful linux commands are available from the IPython prompt. These include ls (list directory), pwd (print working directory), cd (change directory), and rm (remove file). Any shell command can be executed by preceding it with an exclamation point ”!”.

Tab completion

IPython has a very useful tab completion feature that can be used both to complete file names and to inspect python objects. As a first example do:

ls /proj/sot/ska/<TAB>

This will list everything that matches /proj/sot/ska. You can continue this way searching through files or hit Return to complete the command.

Tab completion also works to inspect object attributes. Every variable or constant in python is actually a object with a type and associated attributes and methods. For instance try to create a list of 3 numbers:

a = [3, 1, 2, 4]
print a
a.<TAB>

This will show the available methods for a:

In [17]: a.<TAB>
a.append   a.extend   a.insert   a.remove   a.sort
a.count    a.index    a.pop      a.reverse

Here you see useful looking functions like append or sort which you can get help for and use:

help a.sort
a.sort()
print a

For a more concrete example, say you want to fetch some daily telemetry values but forgot exactly how to do the query and what are the available columns. Use help and TAB completion to remind yourself:

import Ska.engarchive.fetch as fetch
help fetch
tephin = fetch.MSID('tephin', '2009:001', '2009:002', stat='daily')
tephin.<TAB>
 tephin.bads               tephin.midvals            tephin.samples
 tephin.colnames           tephin.mins               tephin.stat
 tephin.content            tephin.msid               tephin.state_codes
 tephin.datestart          tephin.MSID               tephin.state_intervals
 tephin.datestop           tephin.p01s               tephin.stds
 tephin.dt                 tephin.p05s               tephin.tdb
 tephin.filter_bad         tephin.p16s               tephin.times
 tephin.filter_bad_times   tephin.p50s               tephin.tstart
 tephin.indexes            tephin.p84s               tephin.tstop
 tephin.iplot              tephin.p95s               tephin.unit
 tephin.logical_intervals  tephin.p99s               tephin.vals
 tephin.maxes              tephin.plot               tephin.write_zip
 tephin.means              tephin.raw_vals

OK, now you remember you wanted times and maxes. But look, there are other tidbits there for free that look interesting. So go ahead and print a few:

print tephin.colnames
print tephin.dt
print tephin.MSID

To make it even easier you don’t usually have to use print. Try the following:

tephin.colnames
tephin.dt
tephin.MSID

Don’t be scared to try printing an array value (e.g. tephin.vals) even if it is a billion elements long. Numpy will only print an abbreviated version if it is too long. But beware that this applies to Numpy arrays which as we’ll see are a special version of generic python lists. If you print a billion-element python list you’ll be waiting for a while.

NumPy

Even though it was not explicit we have already been using NumPy arrays in the examples so far. NumPy is a Python library for working with multidimensional arrays. The main data type is an array. An array is a set of elements, all of the same type, indexed by a vector of nonnegative integers.

NumPy has an excellent basic tutorial available. Here I just copy the Quick Tour from that tutorial but you should read the rest as well. In these examples the python prompt is shown as “>>>” in order to distinguish the input from the outputs.

Arrays can be created in different ways:

>>> a = array( [ 10, 20, 30, 40 ] )   # create an array out of a list
>>> a
array([10, 20, 30, 40])
>>> b = arange( 4 )                   # create an array of 4 integers, from 0 to 3
>>> b
array([0, 1, 2, 3])
>>> c = linspace(-pi,pi,3)            # create an array of 3 evenly spaced samples from -pi to pi
>>> c
array([-3.14159265,  0.        ,  3.14159265])

New arrays can be obtained by operating with existing arrays:

>>> d = a+b**2                        # elementwise operations
>>> d
array([10, 21, 34, 49])

Arrays may have more than one dimension:

>>> x = ones( (3,4) )
>>> x
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])
>>> x.shape                            # a tuple with the dimensions
(3, 4)

and you can change the dimensions of existing arrays:

>>> y = arange(12)
>>> y
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> y.shape = 3,4              # does not modify the total number of elements
>>> y
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
     [ 8,  9, 10, 11]])

It is possible to operate with arrays of different dimensions as long as they fit well (broadcasting):

>>> 3*a                                # multiply each element of a by 3
array([ 30,  60,  90, 120])
>>> a+y                                # sum a to each row of y
array([[10, 21, 32, 43],
       [14, 25, 36, 47],
       [18, 29, 40, 51]])

Similar to Python lists, arrays can be indexed, sliced and iterated over:

>>> a[2:4] = -7,-3                     # modify last two elements of a
>>> for i in a:                        # iterate over a
...     print i
...
10
20
-7
-3

When indexing more than one dimension, indices are separated by commas:

>>> x[1,2] = 20
>>> x[1,:]                             # x's second row
array([ 1,  1, 20,  1])
>>> x[0] = a                           # change first row of x
>>> x
array([[10, 20, -7, -3],
       [ 1,  1, 20,  1],
       [ 1,  1,  1,  1]])

Matplotlib

The pylab mode of matplotlib is a collection of command style functions that make matplotlib work like matlab. Each pylab function makes some change to a figure: eg, create a figure, create a plotting area in a figure, plot some lines in a plotting area, decorate the plot with labels, etc.... Pylab is stateful, in that it keeps track of the current figure and plotting area, and the plotting functions are directed to the current axes. On the matplotlib FAQ page there is a very good discussion on Matplotlib, pylab, and pyplot: how are they related?.

This tutorial has been copied and adapted from the matplotlib pyplot tutorial.

To see matplotlib in action and make a simple plot do:

clf()
plot([1,2,3])
ylabel('some numbers')
_images/pyplot_simple.png

You may be wondering why the x-axis ranges from 0-2 and the y-axis from 1-3. If you provide a single list or array to the plot() command, matplotlib assumes it is a sequence of y values, and automatically generates the x values for you. Since python ranges start with 0, the default x vector has the same length as y but starts with 0. Hence the x data are [0,1,2].

plot() is a versatile command, and will take an arbitrary number of arguments. For example, to plot x versus y, you can issue the command:

clf()
plot([1,2,3,4], [1,4,9,16])

For every x, y pair of arguments, there is a optional third argument which is the format string that indicates the color and line type of the plot. The letters and symbols of the format string are from matlab, and you concatenate a color string with a line style string. The default format string is ‘b-‘, which is a solid blue line. For example, to plot the above with red circles, you would issue:

clf()
plot([1,2,3,4], [1,4,9,16], 'ro')
axis([0, 6, 0, 20])
_images/pyplot_formatstr.png

See the plot() documentation for a complete list of line styles and format strings. The axis() command in the example above takes a list of [xmin, xmax, ymin, ymax] and specifies the viewport of the axes.

If matplotlib were limited to working with lists, it would be fairly useless for numeric processing. Generally, you will use numpy arrays. In fact, all sequences are converted to numpy arrays internally. The example below illustrates a plotting several lines with different format styles in one command using arrays.

# evenly sampled time at 200ms intervals
t = arange(0., 5., 0.2)

# red dashes, blue squares and green triangles
clf()
plot(t, t, 'r--', t, t**2, 'bs', t, t**3, 'g^')
_images/pyplot_three.png

Working with multiple figures and axes

Both Matlab and Pylab have the concept of the current figure and the current axes. All plotting commands apply to the current axes. The function gca() returns the current axes and gcf() returns the current figure. Normally, you don’t have to worry about this, because it is all taken care of behind the scenes. Below is a script to create two subplots.

def f(t):
    return exp(-t) * cos(2*pi*t)

t1 = arange(0.0, 5.0, 0.1)
t2 = arange(0.0, 5.0, 0.02)

figure(1)
clf()
subplot(2, 1, 1)  # (nrows, ncols, fignum)
plot(t1, f(t1), 'bo', t2, f(t2), 'k')

subplot(2, 1, 2)
plot(t2, cos(2*pi*t2), 'r--')
_images/pyplot_two_subplots.png

The figure() command here is optional because figure(1) will be created by default, just as a subplot(1, 1, 1) will be created by default if you don’t manually specify an axes. The subplot() command specifies numrows, numcols, fignum where fignum ranges from 1 to numrows*numcols. The commas in the subplot command are optional if numrows*numcols<10. You can create an arbitrary number of subplots and axes. If you want to place an axes manually, ie, not on a rectangular grid, use the axes() command, which allows you to specify the location as axes([left, bottom, width, height]) where all values are in fractional (0 to 1) coordinates.

You can create multiple figures by using multiple figure() calls with an increasing figure number. Of course, each figure can contain as many axes and subplots as your heart desires:

figure(1)                # the first figure
clf()
subplot(2, 1, 1)         # the first subplot in the first figure
plot([1,2,3])
subplot(2, 1, 2)         # the second subplot in the first figure
plot([4,5,6])


figure(2)                # a second figure
clf()
plot([4,5,6])            # creates a subplot(111) by default

figure(1)                # figure 1 current; subplot(2, 1, 2) still current
subplot(2, 1, 1)         # make subplot(2, 1, 1) in figure1 current
title('Easy as 1,2,3')   # subplot 2, 1, 1 title

You can clear the current figure with clf() and the current axes with cla(). If you find this statefulness, annoying, don’t despair, this is just a thin stateful wrapper around an object oriented API, which you can use instead (see artist-tutorial)

Working with text

The text() command can be used to add text in an arbitrary location, and the xlabel(), ylabel() and title() are used to add text in the indicated locations (see text-intro for a more detailed example):

mu, sigma = 100, 15
x = mu + sigma * normal(size=10000)

# the histogram of the data
clf()
n, bins, patches = hist(x, 50, normed=1, facecolor='g', alpha=0.75)

xlabel('Smarts')
ylabel('Probability')
title('Histogram of IQ')
text(60, .025, r'$\mu=100,\ \sigma=15$')
axis([40, 160, 0, 0.03])
grid(True)
_images/pyplot_text.png

All of the text() commands return an matplotlib.text.Text instance. Just as with with lines above, you can customize the properties by passing keyword arguments into the text functions or using setp():

t = xlabel('my data', fontsize=14, color='red')

These properties are covered in more detail in text-properties <http://matplotlib.sourceforge.net/users/text_props.html#text-properties>.

Using mathematical expressions in text

matplotlib accepts TeX equation expressions in any text expression. For example to write the expression \sigma_i=15 in the title, you can write a TeX expression surrounded by dollar signs:

title(r'$\sigma_i=15$')

The r preceeding the title string is important – it signifies that the string is a raw string and not to treate backslashes and python escapes. matplotlib has a built-in TeX expression parser and layout engine, and ships its own math fonts – for details see mathtext-tutorial. Thus you can use mathematical text across platforms without requiring a TeX installation. For those who have LaTeX and dvipng installed, you can also use LaTeX to format your text and incorporate the output directly into your display figures or saved postscript – see usetex-tutorial.

Annotating text

The uses of the basic text() command above place text at an arbitrary position on the Axes. A common use case of text is to annotate some feature of the plot, and the annotate() method provides helper functionality to make annotations easy. In an annotation, there are two points to consider: the location being annotated represented by the argument xy and the location of the text xytext. Both of these arguments are (x,y) tuples.

_images/pyplot_annotate.png

In this basic example, both the xy (arrow tip) and xytext locations (text location) are in data coordinates.