spatter README
==============

Author: Alan Grosskurth <alan.grosskurth@utoronto.ca>

If you have questions, corrections, or suggestions, please email the author.

INTRODUCTION

spatter is a set of software tools for microarray image analysis. It
operates on two-color cDNA images stored as greyscale TIFF files.
Currently, spatter can automatically determine the exact pixel data
for each spot, given the array geometry. Data extraction and analysis
are not implemented yet.

The two main design goals of spatter are extensibility and automation.
In order to achieve extensiblity, spatter is built using a "pipe and
filter" architecture[GS93]. Each filter in the pipeline is an
independent C++ program, with the exception of 'spatter-imagecomposer',
which is a Python program. In order to achieve automation, some of the
algorithms spatter uses are based on heuristics. While these heuristics
work well for images the author has tested with, they may not be
suitable for all images. Please email the author if you have images
for which spatter does not produce reasonable results.

LICENSE

spatter is free software distributed under the terms of the GNU
General Public License (GPL).

DEPENDENCIES

The following library is needed to build spatter:

   LTI-lib
   http://ltilib.sourceforge.net

This library is not used once spatter is built (since the binaries are
linked statically).

Additionally, to run the 'spatter-imagecomposer' program you need:

   Python
   http://www.python.org

   Python Imaging Library (PIL)
   http://www.pythonware.com/products/pil

   numarray
   http://www.stsci.edu/resources/software_hardware/numarray

BUILDING AND INSTALLING

Run the following commands to build and install spatter:

   ./configure
   make
   make install

If there is no configure present when spatter is unpacked, run the
following command to generate it:

   ./bootstrap

By default, spatter will be installed to the prefix '/usr/local'. This
means the eight programs will be installed to '/usr/local/bin'. An
alternate prefix can be specified by

   ./configure --prefix=/path/to/install

PROGRAMS

spatter is currently made up of eight independent programs:

1) spatter-imagecomposer

   Description:  Composes Cy3 and Cy5 TIFF images into a single JPEG image
         Input:  Cy3 and Cy5 TIFF images
        Output:  JPEG image

   Note: Cy3 is also known as '532nm' or 'green'
         Cy5 is also known as '635nm' or 'red'

2) spatter-gridfinder

   Description:  Finds absolute pixel origins of subgrids
         Input:  Array geometry, JPEG image
        Output:  Grid

   'spatter-gridfinder' is also broken down into its three constituent
   parts, which when run in succession will produce the same output as
   running 'spatter-gridfinder'.

   2a) spatter-grid-adjustcoarse

       Description:  Finds rough absolute pixel origins of subgrids
             Input:  Array geometry, JPEG image
            Output:  Grid

   2b) spatter-grid-adjustfine

       Description:  Refines absolute pixel origins of subgrids
             Input:  Grid, JPEG image
            Output:  Grid

   2c) spatter-grid-adjustoneoff

       Description:  Examines subgrid origins to see if they are off
                     by exactly one row or column in either direction and
                     adjusts them accordingly
             Input:  Grid, JPEG image
            Output:  Grid

3) spatter-gridviewer

   Description:  Draws overlay of grids onto a JPEG image
         Input:  Grid, JPEG image
        Output:  JPEG image with grid overlayed

4) spatter-spotfinder

   Description:  Locates origins of spots and segments spot regions
                 into foreground and background pixels using seeded
                 region-growing (SRG)
         Input:  Grid, JPEG image
        Output:  Spotlist

5) spatter-spotviewer

   Description:  Draws overlay of spot outlines onto a JPEG image
         Input:  Spotlist, JPEG image
        Output:  JPEG image with spot outlines overlayed

DATA FILE FORMATS

1) Geometry

   The file consists of one line of the form

      r:c:dr:dr:gr:gc:dgr:dgc

   where

      'r' is the number of rows (sometimes called 'metarows')
      'c' is the number of columns (sometimes called 'metacolumns')
      'dr' is the row distance in pixels
      'dc' is the column distance in pixels
      'gr' is the rows in each subgrid
      'gc' is the columns in each subgrid
      'dgr' is the subgrid row distance in pixels
      'dgc' is the subgrid column distance in pixels

   Note: 'dr' and 'dc' are very likely the same, as are 'dgr' and 'dgc'

   Example:  4:4:450:450:22:22:19:19

2) Grid

   The first line in the file is identical to the geometry.
   Then, for each of the (p x q) subgrids, there is one line of the form

      gy:gx

   where

      'gy' is the pixel row coordinate of the subgrid origin
      'gx' is the pixel column coordinate of the subgrid origin

   Example:  4:4:450:450:22:22:19:19
             23:24
             34:489
             22:924
             21:1379
             475:21
             473:480
             472:933
             469:1380
             924:27
             924:472
             921:923
             921:1377
             1372:33
             1373:474
             1373:933
             1369:1382

3) Spotlist

   For each of the (p x q x m x n) spots, there is one line of the form

      sy:sx:sw:sh:sm

   where

      'sy' is the pixel row coordinate of the spot origin
      'sx' is the pixel column coordinate of the spot origin
      'sw' is the spot width in pixels
      'sh' is the spot height in pixels
      'sm' is the binary mask of foreground pixels, represented in hex

   Example:  23:24:19:19:000000000000000000001f000ff007ff00fff...
             23:62:19:19:1ffffffffffffffffffffffffffffffffffff...
             23:81:19:19:00000000000000000000010001f800ffc01ff...
             ...

   Further explanation of 'sm':

   'sm' deserves some more explanation. Consider the case of a spot
   which is 8 pixels tall and 7 pixels wide. Assume it has the
   following mask, with '1's representing foreground pixels and '0's
   representing background pixels:

      000000000
      000110000
      001111100
      000111100
      000111000
      000000000

   When the data is serialized, it is represented as the following string:

      000000000000110000001111100000111100000111000000000000

   Now we group the digits in groups of four:

      00 0000 0000 0011 0000 0011 1110 0000 1111 0000 0111 0000 0000 0000

   And covert each group to a hex digit:

      0     0    0    3    0    3    e    0    f    0    7    0    0    0

   Hence, the value of 'sm' is:

      000303e0f07000

   This hex string achieves much better compression than the
   corresponding binary string.

ALGORITHMS

Descriptions of algorithms used in spatter are coming soon.

REFERENCES

[GS93]  David Garlan and Mary Shaw. An Introduction to Software
        Architecture. In Advances in Software Engineering and Knowledge
        Engineering, I (Ambriola V, Tortora G, Eds.) World Scientific
        Publishing Company, 1993.
        http://www.cs.cmu.edu/afs/cs/project/
               able/www/paper_abstracts/intro_softarch.html
