.. _postprocessors-overview:
Postprocessors Overview
=======================
.. automodule:: perun.postprocess
.. _postprocessors-list:
Supported Postprocessors
------------------------
Perun's tool suite currently contains the following five postprocessors:
1. :ref:`postprocessors-normalizer` scales the resources of the given profile to the
interval (0, 1). The main intuition behind the usage of this postprocessor is to be able to
compare profiles from different workloads or parameters, which may have different scales of
resource amounts.
2. :ref:`postprocessors-regression-analysis` (authored by **Jirka Pavela**) attempts to do a
regression analysis by finding the fitting model for dependent variable based on other
independent one. Currently the postprocessor focuses on finding a well
suited model (linear, quadratic, logarithmic, etc.) for the amount of time duration
depending on size of the data structure the function operates on.
3. :ref:`postprocessors-clusterizer` tries to classify resources to uniquely identified clusters,
which can be used for further postprocessing (e.g. by regression analysis) or to group
similar amounts of resources.
4. :ref:`postprocessors-regressogram` (authored by **Simon Stupinsky**) also known as the binning
approach, is the simplest non-parametric estimator. This method trying to fit models through
data by dividing the interval into N equal-width bucket and the resultant value in each bucket
is equal to result of selected statistical aggregation function (mean/median) within the values
in the relevant bucket. In short, we can describe the regressogram as a step function
(i.e. constant function by parts).
5. :ref:`postprocessors-moving-average` (authored by **Simon Stupinsky**) also know as the rolling
average or running average, is the statistical analysis belongs to non-parametric approaches.
This method is based on the analysis of the given data points by creating a series of values based
on the specific aggregation function, most often average or possibly median. The resulting values
are derived from the different subsets of the full data set. We currently support the two main
methods of this approach and that the **Simple** Moving Average and the **Exponential** Moving
Average. In the first method is an available selection from two aggregation function: **mean**
or **median**.
All of the listed postprocessors can be run from command line. For more information about command
line interface for individual postprocessors refer to :ref:`cli-postprocess-units-ref`.
Postprocessors modules are implementation independent and only requires a simple python interface
registered within Perun. For brief tutorial how to create and register your own postprocessors
refer to :ref:`postprocessors-custom`.
.. _postprocessors-normalizer:
Normalizer Postprocessor
~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: perun.postprocess.normalizer
Command Line Interface
""""""""""""""""""""""
.. click:: perun.postprocess.normalizer.run:normalizer
:prog: perun postprocessby normalizer
.. _postprocessors-regression-analysis:
Regression Analysis
~~~~~~~~~~~~~~~~~~~
.. automodule:: perun.postprocess.regression_analysis
.. _postprocessors-regression-analysis-cli:
Command Line Interface
""""""""""""""""""""""
.. click:: perun.postprocess.regression_analysis.run:regression_analysis
:prog: perun postprocessby regression_analysis
.. _postprocessors-regression-analysis-examples:
Examples
""""""""
.. literalinclude:: /../examples/complexity-sll-short-with-ra.perf
:language: json
:linenos:
:emphasize-lines: 44-153
The profile above shows the complexity profile taken from :ref:`collectors-trace-examples` and
postprocessed using the full method. The highlighted part shows all of the fully computed models of
form :math:`y = b_0 + b_1*f(x)`, represented by their types (e.g. `linear`, `quadratic`, etc.),
concrete found coeficients :math:`b_0` and :math:`b_1` and e.g. coeficient of determination
:math:`R^2` for measuring the fitting of the model.
.. image:: /../examples/complexity-scatter-with-models-full.*
The :ref:`views-scatter` above shows the interpreted models of different complexity example,
computed using the **full computation** method. In the picture, one can see that the depedency of
running time based on the structural size is best fitted by `linear` models.
.. image:: /../examples/complexity-scatter-with-models-initial-guess.*
The next `scatter plot` displays the same data as previous, but regressed using the `initial guess`
strategy. This strategy first does a computation of all models on small sample of data points. Such
computation yields initial estimate of fitness of models (the initial sample is selected by
random). The best fitted model is then chosen and fully computed on the rest of the data points.
The picture shows only one model, namely `linear` which was fully computed to best fit the given
data points. The rest of the models had worse estimation and hence was not computed at all.
.. _postprocessors-clusterizer:
Clusterizer
~~~~~~~~~~~
.. automodule:: perun.postprocess.clusterizer
.. _postprocessors-clusterizer-cli:
Command Line Interface
""""""""""""""""""""""
.. click:: perun.postprocess.clusterizer.run:clusterizer
:prog: perun postprocessby clusterizer
.. _postprocessors-clusterizer-examples:
Examples
""""""""
.. literalinclude:: /../examples/clusterized-profile.perf
:language: json
:linenos:
:emphasize-lines: 32
The profile above shows an example of profile postprocessed by clusterizer (note that this is only
an excerpt of the whole profile). Each resource is annotated by a new field named ``cluster``,
which can be used in further interpretation of the profiles (either by :ref:`views-bars`,
:ref:`views-scatter` or :ref:`postprocessors-regression-analysis`).
.. image:: /../examples/clusterizer-memory-scatter.*
The :ref:`views-scatter` above shows the memory profile of a simple example, which randomly
allocates memory with linear dependency and was collected by :ref:`collectors-memory`. Since
:ref:`collectors-memory` does not collect any other information, but memory rallocation records.
Such profile cannot be used to infer any models. However the :ref:`views-scatter` above was
postprocessed by clusterizer and hence, we can plot the dependency of amount of allocated memory
per each cluster. The :ref:`views-scatter` itself ephasize the linear dependency of allocated
memory depending on some unknown parameters (here represented by `cluster`).
We can use :ref:`postprocessors-regression-analysis` to prove our assumption, and on the plot below
we can see that the best model for the amount of allocated memory depending on clusters is indeed
**linear**.
.. image:: /../examples/clusterizer-memory-scatter-with-models.*
.. _postprocessors-regressogram:
Regressogram method
~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: perun.postprocess.regressogram
.. _postprocessors-regressogram-cli:
Command Line Interface
""""""""""""""""""""""
.. click:: perun.postprocess.regressogram.run:regressogram
:prog: perun postprocessby regressogram
.. _postprocessors-regressogram-examples:
Examples
""""""""
.. code-block:: json
{
"bucket_stats": [
13.0,
25.5
],
"uid": "linear::test2",
"bucket_method": "doane",
"method": "regressogram",
"r_square": 0.7575757575757576,
"x_interval_end": 9.0,
"statistic_function": "mean",
"x_interval_start": 0.0
}
.. _Doanes: https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram_bin_edges.html#numpy.histogram_bucket_edges
The example above shows an example of profile post-processed by regressogram method (note that
this is only an excerpt of the whole profile). Each such model of shows the computed values in
the individual buckets, that are represented by *bucket_stats*. The next value in this example
is *statistic_function*, which represented the statistic to compute the value in each bucket. Further
contains the name of the method (*bucket_method*) by which was calculated the optimal number of
buckets, in this case specifically computed with Doanes_ formula, and *coefficient of determination*
(:math:`R^2`) for measuring the fitting of the model. Each such model can be used in the further
interpretation of the models (either by :ref:`views-scatter` or:ref:`degradation-method-aat`).
.. image:: /../examples/exp_data_regressogram.*
The :ref:`views-scatter` above shows the interpreted model, computed using the **regressogram**
method. In the picture, one can see that the dependency of running time based on the structural
size is best fitted by `exponential` models.
.. _postprocessors-moving-average:
Moving Average Methods
~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: perun.postprocess.moving_average
.. _postprocessors-moving-average-cli:
Command Line Interface
""""""""""""""""""""""
.. click:: perun.postprocess.moving_average.run:moving_average
:prog: perun postprocessby moving_average
.. click:: perun.postprocess.moving_average.run:simple_moving_average
:prog: perun postprocessby moving_average sma
.. click:: perun.postprocess.moving_average.run:simple_moving_median
:prog: perun postprocessby moving_average smm
.. click:: perun.postprocess.moving_average.run:exponential_moving_average
:prog: perun postprocessby moving_average ema
.. _postprocessors-moving-average-examples:
Examples
""""""""
.. code-block:: json
{
"bucket_stats": [
0.0,
3.0,
24.0,
81.0,
192.0,
375.0
],
"per_key": "structure-unit-size",
"uid": "pow::test3",
"x_interval_end": 5,
"r_square": 1.0,
"method": "moving_average",
"moving_method": "sma",
"x_interval_start": 0,
"window_width": 1
}
The example above shows an example of profile post-processed by moving average postprocessor (note
that this in only an excerpt of the whole profile). Each such model of moving average model shows
the computed values, that are represented by *bucket_stats*. The important role has value *moving_method*,
that represents the method, which was used to create this model. In this field may be one from the
following shortcuts *SMA*, *SMM*, *EMA*, which represents above described methods. The value *r_square*
serves to assess the suitability of the model and represents the *coefficient of determination*
(:math:`R^2`). Another significant value in the context of the information about the moving average
models is the *window_width*. This value represents the width of the window, that was used at creating
this model. Since each model can be used in the further interpretation (either by :ref:`views-scatter`
or :ref:`degradation-method-aat`), another values have auxiliary character and serves for a different
purposes at its interpretation. Additional values that contain the information about postprocess parameters
can be found in the whole profile, specifically in the part about used post-processors.
.. image:: /../examples/exp_data_ema.*
The :ref:`views-scatter` above shows the interpreted model, computed using the **exponential moving average**
method, running with default values of parameters. In the picture, one can see that the dependency of running
time based on the structural size is best fitted by `exponential` models.
.. _postprocessors-kernel-regression:
Kernel Regression Methods
~~~~~~~~~~~~~~~~~~~~~~~~~
.. automodule:: perun.postprocess.kernel_regression
.. _postprocessors-kernel_regression-cli:
Command Line Interface
""""""""""""""""""""""
.. click:: perun.postprocess.kernel_regression.run:kernel_regression
:prog: perun postprocessby kernel-regression
.. _postprocessors-kernel-regression-estimator_settings:
.. click:: perun.postprocess.kernel_regression.run:estimator_settings
:prog: perun postprocessby kernel-regression estimator-settings
.. _postprocessors-kernel-regression-user_selection:
.. click:: perun.postprocess.kernel_regression.run:user_selection
:prog: perun postprocessby kernel-regression user-selection
.. _postprocessors-kernel-regression-method_selection:
.. click:: perun.postprocess.kernel_regression.run:method_selection
:prog: perun postprocessby kernel-regression method-selection
.. _postprocessors-kernel-regression-kernel_smoothing:
.. click:: perun.postprocess.kernel_regression.run:kernel_smoothing
:prog: perun postprocessby kernel-regression kernel-smoothing
.. _postprocessors-kernel-regression-kernel_ridge:
.. click:: perun.postprocess.kernel_regression.run:kernel_ridge
:prog: perun postprocessby kernel-regression kernel-ridge
.. _postprocessors-kernel-regression-examples:
Examples
""""""""
.. code-block:: json
{
"per_key": "structure-unit-size",
"uid": "quad::test1",
"kernel_mode": "estimator",
"r_square": 0.9990518378010778,
"method": "kernel_regression",
"x_interval_start": 10,
"bandwidth": 2.672754640321602,
"x_interval_end": 64,
"kernel_stats": [
115.6085941489687,
155.95838478107163,
190.27598428091824,
219.36576520977312,
252.80699243117965,
268.4600214673941,
283.3744716372719,
282.7535719770607,
276.27153279181573,
269.69580474542016,
244.451017529157,
226.98819185034756,
180.72465187812492
]
}
The example above shows an example of profile post-processed by *kernel regression* (note that this
is only an excerpt of the whole profile). Each such kernel model shows the values of resulting kernel
estimate, that are part of *kernel_stats* list. Another fascinating value is stored in *kernel_mode*
field and means the relevant mode, which executing the *kernel regression* over this model. In this
field may be one from the following words, which represents the individual modes of kernel regression
postprocessor. The value *r_square* serves to assess the suitability of the kernel model and represents
the *coefficient of determination* (:math:`R^2`). In the context of another kernel estimates for decreasing
or increasing the resulting accuracy is important the field *bandwidth*, which represents the kernel
bandwidth in the current kernel model. Since each model can be used in the further interpretation
(either by :ref:`views-scatter` or :ref:`degradation-method-aat`), another values have auxiliary character
and serves for a different purposes at its interpretation. Additional values that contain the information
about selected parameters at kernel regression postprocessor and its modes, can be found in the whole profile,
specifically in the part about used post-processors.
.. image:: /../examples/example_kernel.*
The :ref:`views-scatter` above shows the interpreted model, computed using the *kernel regression* postprocessor,
concretely with default value of parameters in **estimator-settings** mode of this postprocessor. In the picture, can
be see that the dependency of running time based on the structural size.
.. _postprocessors-custom:
Creating your own Postprocessor
-------------------------------
New postprocessors can be registered within Perun in several steps. Internally they can be
implemented in any programming language and in order to work with perun requires one to three
phases to be specified as given in :ref:`postprocessors-overview`---``before()``, ``postprocess()``
and ``after()``. Each new postprocessor requires a interface module ``run.py``, which contains the
three function and, moreover, a CLI function for Click_ framework.
You can register your new postprocessor as follows:
1. Run ``perun utils create postprocess mypostprocessor`` to generate a new modules in
``perun/postprocess`` directory with the following structure. The command takes a predefined
templates for new postprocessors and creates ``__init__.py`` and ``run.py`` according to the
supplied command line arguments (see :ref:`cli-utils-ref` for more information about
interface of ``perun utils create`` command)::
/perun
|-- /postprocess
|-- /mypostprocessor
|-- __init__.py
|-- run.py
|-- /normalizer
|-- /regression_analysis
|-- __init__.py
2. First, implement the ``__init__py`` file, including the module docstring with brief
postprocessor description and definitions of constants that are used for internal
checks which has the following structure:
.. literalinclude:: /_static/templates/postprocess_init.py
:language: python
:linenos:
3. Next, implement the ``run.py`` module with ``postprocess()`` fucntion, (and optionally with
``before()`` and ``after()`` functions). The ``postprocess()`` function should do the actual
postprocessing of the profile. Each function should return the integer status of the phase,
the status message (used in case of error) and dictionary including params passed to
additional phases and 'profile' with dictionary w.r.t. :ref:`profile-spec`.
.. literalinclude:: /_static/templates/postprocess_run.py
:language: python
:linenos:
4. Additionally, implement the command line interface function in ``run.py``, named the same as
your collector. This function will be called from the command line as ``perun postprocessby
mypostprocessor`` and is based on Click_libary.
.. literalinclude:: _static/templates/postprocess_run_api.py
:language: python
:linenos:
:diff: _static/templates/postprocess_run.py
5. Finally register your newly created module in :func:`get_supported_module_names` located in
``perun.utils.__init__.py``:
.. literalinclude:: _static/templates/supported_module_names_postprocess.py
:language: python
:linenos:
:diff: _static/templates/supported_module_names.py
6. Preferably, verify that registering did not break anything in the Perun and if you are not
using the developer installation, then reinstall Perun::
make test
make install
7. At this point you can start using your postprocessor either using ``perun postprocessby`` or using the
following to set the job matrix and run the batch collection of profiles::
perun config --edit
perun run matrix
8. If you think your postprocessor could help others, please, consider making `Pull Request`_.
.. _Pull Request: https://github.com/tfiedor/perun/pull/new/develop
.. _Click: http://click.pocoo.org/5/