Supervised classification is a technique for extracting
information from image data. The goal is to classify pixels in an image
into different classes based on features of the pixels. There are two
stages: training stage and classification stage. During the training
stage, a set of vectors (each vector is associated with a pixel) called
training samples are used to train a classifier. Each training sample
vector is made up of the class the pixel belongs to and feature values
of the pixel. In the classification stage, the trained classifier is
used to classify pixels with known feature values but unknown class.
User has a choice of
- train (and save) a
classifier; or
- load a previously saved classifier
to perform classification.
To do the former, type in a new filename.
To do the latter, select a filename from the list.
Train and save a classifier
User is required to provide the number of training samples to use.
Note that in
addition to training, evaluation of the classifier is also performed.
E.g., if the number of training samples to use is 5000, then 5000 x 2 =
10,000 samples will be extracted. The first 5000 will be used as
training samples and the remaining 5000 will be used as test samples
for evaluation.
All bands in the "feature" products are used.
User has a choice of
- using
the data in the "class" band as classes (the first band in the "class"
product without "lat_band" or "lon_band" in its name is used as the
"class" band); or
- pick the classes from the list of "regions" in the "class" product.
To choose (1), deselect all in the "Classes" list.
In
this case, if the data in the training samples only assume the
values 0 (representing, say, water) or 1 (representing, say, land),
then there will only be two classes.
If the data is biomass,
there will be as many classes as there are biomass values in the
training set. For data values that are continuous like biomass, it is
recommended to quantize the values.
Remember to uncheck the "Quantize class value" box if the data is discrete labels such as landcover classes.
It is up to the user to ensure that the training or test samples set contains roughly equal number of samples for each class.
To
choose (2), select the "regions" in the "Classes" list that correspond
to classes (e.g., a "region" called "water" will become the class
with label "water").
"Regions"
are represented as "vectors" and can be created using the "New
Vector Data Container" tool and other drawing tools such as "Rectangle
drawing tool".
A pixel inside that "region" will have the name of the region as its class instead of its data value.
The
operator will endeavour to extract the same number samples for each
class when constructing the training or test samples set.
Two separate files are created: one with no extension and one with extension .xml.
Load a previously saved classifier
The minimum information the user needs to know to use a saved
classifier is the list of features which is contained in the XML
file among other useful information.
At least one "feature" product is required. However, user can specify more than one.
For
each name in "featureNames" in the XML file, the operator will search
for a band in the "feature" products whose name contains it.
If evaluation of the classifier is desired, then the user must provide
- a "class" product as the first product in the list ahead of all the "feature" products; and
- the number of test samples to use for evaluation.
If
there is more than one product in the list, the operator will determine
if the first one is a "class" product by looking for a "feature" band
in the product. If it does not contain any "feature" band, then it is
taken as the "class" product.
If
the saved classifier was constructed using "regions" as classes,
the "class" product must contain "vectors" with the same names.
The names can be found in the XML file under "regionVectorNames".
If "regions" were not used, then the operator will use the first band in the "class"
product as the
"class" band.