.. C-Munipack - User's manual

   Copyright 2012 David Motl
   
   Permission is granted to copy, distribute and/or modify this document under the 
   terms of the GNU Free Documentation License, Version 1.2 or any later version published 
   by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and 
   no Back-Cover Texts.

   $Id: matching.rst,v 1.2 2015/07/12 07:44:57 dmotl Exp $

.. _frame-matching: 
   
.. index:: frame; matching

Frame matching
==============

.. note::

   I'd like to point out, that I did not invent the algorithm presented below. I 
   reimplemented the algorithm that was part of the original Munipack software. According
   to the original source code, Filip Hroch is the author, but I haven't found any paper
   regarding this method. In case you know about it, please let me know.

Up to this point, the source frames are processed independently. 
To make a light curve of a particular object, we need to trace that object throughout 
the set of frames. This step is called *matching* in the C-Munipack software. It takes 
two or more lists of stellar objects (result of photometry) and determines which objects 
on one list are the same physical object on the rest of the lists.

The matching algorithm implemented in the C-Munipack software is an extension of the
algorithm published by Valdes [valdes95]_. The main difference is that it builds 
polygons of :math:`n \geq 3` vertices instead of triangles. The text follows the terminology 
of Valdes wherever possible. The matching algorithms based on similarity of triangles
were described and implemented also by Stetson [stetson89]_ and Groth [groth86]_.

The algorithm works on two catalogs of objects, each object is described by its 
coordinates in some local coordinate system and its brightness. We assume that the 
coordinate systems of these catalogs are shifted, scaled, rotated or even inverted.
To make it even more difficult, the catalogs can overlap only partially; objects 
from one catalog need to be present in the second catalog and vice versa. Problems
with separation of close objects might cause that objects present in one catalogs
are missing in the other catalogs. In addition to that, due to random and field 
distortions to position of objects are subject of noise and thus they are known 
only imprecisely.

The matching algorithm works in three subsequent phases. In the first phase, possible
transformations between the coordinate systems of two catalogs are found. Then, the
best transformation is chosen and last a relation between the objects from the two
catalogs is established. 

Although the method in its basic form works on a pair of catalogs only, its extension
to an arbitrary number of frames is obvious. One frame from the set of source frames 
is chosen as a reference catalog and all other source frames are matched against the 
reference one. The alternative is to use any other catalog of the same field and
match the source frames against it.

.. rubric:: Similar triangles

A catalog of objects can be seen a large number of triangles - it is possible
to create a triangle for every combination of three objects. If the coordinate
systems of two catalogs are shifted, scaled, rotated or inverted (flipped) to
each other and two catalogs show at least partially the same set of physical 
objects, there must be many triangles that have the similar shape in both catalogs,
because the said operations do not change the their shape. Not all triangles 
from one catalog have their counterparts in the other one. Also, due to the noise 
component in position measurements and field distortion, the triangles are never 
exactly the same.

Following Stetson [stetson89]_, shape of a triangle is described by two shape 
indices which form a point in two-dimensional *triangle space* :math:`(x_t, y_t)`
defined as:

.. math::
   :label: u
   
   u = \frac{\text{length of second longest side}}{\text{length of longest side}}

and

.. math:: 
  :label: v
  
   v = \frac{\text{length of shortest side}}{\text{length of longest side}}

It can be shown, using the law of sines, that if a triangle is made by shifting, rotation, 
scaling, inversion or their combination from an original triangle, it projects to the 
same point in the triangle space. The Euclidean distance in the triangle space is used 
to measure similarity of two triangles. If it is lesser than some value :math:`\epsilon`, the 
triangles match.


.. rubric:: Building polygons

The algorithm implemented in the C-Munipack software follows the method described
by Valdes. It sorts the objects by the brightness in decreasing order and takes up to 
:math:`N_{obj}` objects from both catalogs. But it does not stop when it find a pair of 
triangles that match. It continues to build up a pair of polygons of *N* vertices
from them by taking one of the sides and finding two objects that would form yet
another similar triangles. On success, it adds the objects to the list to make a
quadrilateral and so on up to a polygon with *N* sides. The value *N* is a configurable 
parameter. A constant value :math:`\epsilon = 0.005` is used to check if two triangles match.

.. rubric:: Reducing the number of false matches

The following optimization reduces the number of false matches. When a pair of 
objects *(A, B)/ and *(A', B')* is takes, their distances :math:`d = \left|AB\right|` and 
:math:`d' = \left|A'B'\right|` are computed. The ratio *d:d'* is used as a scale between 
the two frames. Also, points in half between these points *P* and *P'* are determined. 
For any point *C* from the first catalog, its distance to the point *P* is divided 
by *d:d'* to get expected distance of a desired object *C'* to the point *P'*. Then, 
it is possible to restrict the search to objects of distance to *P'* not very far 
from the expected value. Quick sort algorithm is used to sort objects by distance 
from the point *P'*, which allows to stop search when the distance is above upper 
limit.

.. rubric:: From polygon to transformation

By means of the algorithm described in the previous section, two similar polygons 
*P_1* and *P_2* were found, each of which has *N* vertices. The next step is
to find the best-fitting transformation between coordinates of vertices of *P_1* 
and *P_2*. The scale factor is estimated first. Next, the inversion is determined. 
Finally, the linear least squares (LLSQ) method is applied to find the values of 
the rest of the parameters.

The estimation of the *scale* factor *s* between the two polygons *P* and *P'*
is estimated as the robust mean of ratios between corresponding vertex distances
of the two polygons. The mean value is determined using the robust mean algorithm 
(see :ref:`robust-mean`).

The *inversion* is represented by value *m = 1* if mirroring does not 
occur (local coordinate systems are of the same handedness) or *m = -1* if it does.
To determine the value *m*, first two vertices of polygons *P* and *P'* are used
to make oriented lines and we check whether the third vertices in *P* and *P'* are
on the same side of the oriented lines. The test can be performed in 3-D space by 
comparison of sign of z-coordinate of crossproduct of the vectors :math:`\vec{AB}` and 
:math:`\vec{AC}` to the sign of z-coordinate of crossproduct of the vectors :math:`\vec{A'B'}` 
and :math:`\vec{'A'C}`. The value *m = 1* if the signs are equal, *m = -1* otherwise.

The *rotation* and *offset* between the two frames are estimated by 
determining coefficients *A*, *B*, *X_0* and *Y_0* by means of the LLSQ method. 

The first estimation of the transformation matrix *M_0* is created by combining the
scale factor, the *m* value and the results of the LLSQ fitting. The variance :math`\sigma_0^2` 
of the solution is determined using the minimized value of the sum of squares *S*:

.. math::
   :label: sigma_2_2

	\sigma_0^2 = \frac{S}{2N - 4} = \frac{1}{2}\frac{S}{N-2}

The denominator, *2N-4*, is the statistical degrees of freedom; *2N* is the number 
of equations (each vertex gives two equations) and there are four free parameters.
The variance will be used to determine allowed displacement between expected and observed
position of a star.

.. rubric:: Refining the transformation

The transformation found using polygon vertices only is further refined in the second stage
by including stars from *S* and *S'*. It was stated above, that not all stars from S* and
*S'* must necessarily have corresponding objects on the other frame, thus it is necessary
to use a robust approach and rule such stars out.

For each star from a set *S*, its coordinates are transformed into the second frame using
the matrix *M_0* (first estimation). Then, a star from the set *S'* with minimum distance 
to the computed location is found, and if its distance is less than maximum displacement 
*t*, the two stars are included in the second estimation. The value *t_0* is determined using
the variance :math:`\sigma_0^2`:

.. math::
   :label: t0

	t_0 = \sigma_0.\text{clip}.\sqrt{6}

The stars from the set *S* that were successfully identified in *S'* are used in the
LLSQ method (same as in the first iteration) in order to determine a second estimation
of the transformation matrix *M_1*. The scale and mirror value are kept unchanged. Also,
the new value of sample variance :math:`\sigma_1^2` is determined.

Number of identified stars and the sum of squares from the second iteration are
used to select a best-fitting solution. 

.. rubric:: Vote array

The vote array is organized as a map, where keys are unordered sets of polygon vertices and it
stores number of matches, stars that were identified in the second stage, sum of squares 
and the transformation matrix. When a new candidate is identified, the program compares
the set of polygon vertices with existing keys in the vote array. When there is a matching
item, its match counter is incremented by one. If not, a new record is added to the vote array
and its match counter is initialized to one.

The selection of the best-fitting solution is based on the number of matches. If the vote array
contains multiple records with the maximum match count, the first one wins.

.. rubric:: Cross-reference table

The result of the matching algorithm applied to a set of source frames is a cross-reference 
table which represents relations between objects detected on the source frames. The table
should allow to efficiently find the measurements for a particular physical object. In the C-Munipack
software, each detected object on a single source frame is assigned a unique integer number
at the final stage of the photometry step; these identification numbers are frame-wise.

The cross-reference table is a map where keys are frame identification and frame-wise object
identifier and the values are global object identifiers. If the matching is done against one 
of the source frames, object identifiers on that source frame are used as global identifiers.
If a catalog file is used, the object identifiers in the catalog are used as global identifiers.

The table is stored in a distributed form, each object on each frame has got an attribute
that defines if the object was found on a reference frame or a catalog file and if so, what
the global identification was.
