Command Line Interface

Perun can be run from the command line (if correctly installed) using the command interface inspired by git.

The Command Line Interface is implemented using the Click library, which allows both effective definition of new commands and finer parsing of the command line arguments. The intefrace can be broken into several groups:

1. Core commands: namely init, config, add, rm, status, log, run commands (which consists of commands run job and run matrix) and check commands (which consists of commands check all, check head and check profiles). These commands automate the creation of performance profiles, detection of performance degradation and are used for management of the Perun repository. Refer to Perun Commands for details about commands.

2. Collect commands: group of collect COLLECTOR commands, where COLLECTOR stands for one of the collector of Supported Collectors. Each COLLECTOR has its own API, refer to Collect units for thorough description of API of individual collectors.

3. Postprocessby commands: group of postprocessby POSTPROCESSOR commands, where POSTPROCESSOR stands for one of the postprocessor of Supported Postprocessors. Each POSTPROCESSOR has its own API, refer to Postprocess units for thorough description of API of individual postprocessors.

4. View commands: group of view VISUALIZATION commands, where VISUALIZATION stands for one of the visualizer of Supported Visualizations. Each VISUALIZATION has its own API, refer to Show units for thorough description of API of individual views.

5. Utility commands: group of commands used for developing Perun or for maintenance of the Perun instances. Currently this group contains create command for faster creation of new modules.

Graphical User Interface is currently in development and hopefully will extend the flexibility of Perun’s usage.

perun

Perun is an open source light-weight Performance Versioning System.

In order to initialize Perun in current directory run the following:

perun init

This initializes basic structure in .perun directory, together with possible reinitialization of git repository in current directory. In order to set basic configuration and define jobs for your project run the following:

perun config --edit

This opens editor and allows you to specify configuration of your project and choose set of collectors for capturing resources. See Automating Runs and Perun Configuration files for more details.

In order to generate first set of profiles for your current HEAD run the following:

perun run matrix
perun [OPTIONS] COMMAND [ARGS]...

Options

--no-pager

Disables the paging of the long standard output (currently affects only status and log outputs). See paging to change the default paging strategy.

-v, --verbose

Increases the verbosity of the standard output. Verbosity is incremental, and each level increases the extent of output.

--version

Prints the current version of Perun.

Commands

add
check
collect
config
fuzz
init
log
postprocessby
rm
run
show
status
utils

Perun Commands

perun init

Initializes performance versioning system at the destination path.

perun init command initializes the perun’s infrastructure with basic file and directory structure inside the .perun directory. Refer to Perun Internals for more details about storage of Perun. By default following directories are created:

  1. .perun/jobs: storage of performance profiles not yet assigned to concrete minor versions.
  2. .perun/objects: storage of packed contents of performance profiles and additional informations about minor version of wrapped vcs system.
  3. .perun/cache: fast access cache of selected latest unpacked profiles
  4. .perun/local.yml: local configuration, storing specification of wrapped repository, jobs configuration, etc. Refer to Perun Configuration files for more details.

The infrastructure is initialized at <path>. If no <path> is given, then current working directory is used instead. In case there already exists a performance versioning system, the infrastructure is only reinitialized.

By default, a control system is initialized as well. This can be changed by by setting the --vcs-type parameter (currently we support git and tagit—a lightweight git-based wrapped based on tags). Additional parameters can be passed to the wrapped control system initialization using the --vcs-params.

perun init [OPTIONS] <path>

Options

--vcs-type <vcs_type>

In parallel to initialization of Perun, initialize the vcs of <type> as well (by default git).

--vcs-path <vcs_path>

Sets the destination of wrapped vcs initialization at <path>.

--vcs-param <vcs_param>

Passes additional (key, value) parameter to initialization of version control system, e.g. separate-git-dir dir.

--vcs-flag <vcs_flag>

Passes additional flag to a initialization of version control system, e.g. bare.

-c, --configure

After successful initialization of both systems, opens the local configuration using the editor set in shared config.

-t, --config-template <config_template>

States the configuration template that will be used for initialization of local configuration. See Predefined Configuration Templates for more details about predefined configurations.

Arguments

<path>

Optional argument

perun config

Manages the stored local and shared configuration.

Perun supports two external configurations:

  1. local.yml: the local configuration stored in .perun directory, containing the keys such as specification of wrapped repository or job matrix used for quick generation of profiles (run perun run matrix --help or refer to Automating Runs for information how to construct the job matrix).
  2. shared.yml: the global configuration shared by all perun instances, containing shared keys, such as text editor, formatting string, etc.

The syntax of the <key> in most operations consists of section separated by dots, e.g. vcs.type specifies type key in vcs section. The lookup of the <key> can be performed in three modes, --local, --shared and --nearest, locating or setting the <key> in local, shared or nearest configuration respectively (e.g. when one is trying to get some key, there may be nested perun instances that do not contain the given key). By default, perun operates in the nearest config mode.

Refer to Perun Configuration files for full description of configurations and Configuration types for full list of configuration options.

E.g. using the following one can retrieve the type of the nearest perun instance wrapper:

$ perun config get vcs.type
vcs.type: git
perun config [OPTIONS] COMMAND [ARGS]...

Options

-l, --local

Sets the local config, i.e. .perun/local.yml, as the source config.

-h, --shared

Sets the shared config, i.e. shared.yml., as the source config

-n, --nearest

Sets the nearest suitable config as the source config. The lookup strategy can differ for set and get/edit.

Commands

edit
get
reset
set

perun config get

Looks up the given <key> within the configuration hierarchy and returns the stored value.

The syntax of the <key> consists of section separated by dots, e.g. vcs.type specifies type key in vcs section. The lookup of the <key> can be performed in three modes, --local, --shared and --nearest, locating the <key> in local, shared or nearest configuration respectively (e.g. when one is trying to get some key, there may be nested perun instances that do not contain the given key). By default, perun operates in the nearest config mode.

Refer to Perun Configuration files for full description of configurations and Configuration types for full list of configuration options.

E.g. using the following can retrieve the type of the nearest perun wrapper:

$ perun config get vcs.type
vcs.type: git

$ perun config --shared get general.editor
general.editor: vim
perun config get [OPTIONS] <key>

Arguments

<key>

Required argument

perun config set

Sets the value of the <key> to the given <value> in the target configuration file.

The syntax of the <key> corresponds of section separated by dots, e.g. vcs.type specifies type key in vcs section. Perun sets the <key> in three modes, --local, --shared and --nearest, which sets the <key> in local, shared or nearest configuration respectively (e.g. when one is trying to get some key, there may be nested perun instances that do not contain the given key). By default, perun will operate in the nearest config mode.

The <value> is arbitrary depending on the key.

Refer to Perun Configuration files for full description of configurations and Configuration types for full list of configuration options and their values.

E.g. using the following can set the log format for nearest perun instance wrapper:

$ perun config set format.shortlog "| %source% | %collector% |"
format.shortlog: | %source% | %collector% |
perun config set [OPTIONS] <key> <value>

Arguments

<key>

Required argument

<value>

Required argument

perun config edit

Edits the configuration file in the external editor.

The used editor is specified by the general.editor option, specified in the nearest perun configuration..

Refer to Perun Configuration files for full description of configurations and Configuration types for full list of configuration options.

perun config edit [OPTIONS]

perun add

Links profile to concrete minor version storing its content in the .perun dir and registering the profile in internal minor version index.

In order to link <profile> to given minor version <hash> the following steps are executed:

  1. We check in <profile> that its origin key corresponds to <hash>. This serves as a check, that we do not assign profiles to different minor versions.
  2. The origin is removed and contents of <profile> are compresed using zlib compression method.
  3. Binary header for the profile is constructed.
  4. Compressed contents are appended to header, and this blob is stored in .perun/objects directory.
  5. New blob is registered in <hash> minor version’s index.
  6. Unless --keep-profile is set. The original profile is deleted.

If no <hash> is specified, then current HEAD of the wrapped version control system is used instead. Massaging of <hash> is taken care of by underlying version control system (e.g. git uses git rev-parse).

<profile> can either be a pending tag, pending tag range or a fullpath. Pending tags are in form of i@p, where i stands for an index in the pending profile directory (i.e. .perun/jobs) and @p is literal suffix. The pending tag range is in form of i@p-j@p, where both i and j stands for indexes in the pending profiles. The pending tag range then represents all of the profiles in the interval <i, j>. When i > j, then no profiles will be add; when j; when j is bigger than the number of pending profiles, then all of the non-existing pending profiles will be obviously skipped. Run perun status to see the tag anotation of pending profiles. Tags consider the sorted order as specified by the following option format.sort_profiles_by.

Example of adding profiles:

$ perun add mybin-memory-input.txt-2017-03-01-16-11-04.perf

This command adds the profile collected by memory collector during profiling mybin command with input.txt workload on 1st March at 16:11 to the current HEAD.

An error is raised if the command is executed outside of range of any perun, if <profile> points to incorrect profile (i.e. not w.r.t. Specification of Profile Format) or <hash> does not point to valid minor version ref.

See Perun Internals for information how perun handles profiles internally.

perun add [OPTIONS] <profile>

Options

-m, --minor <minor>

<profile> will be stored at this minor version (default is HEAD).

--keep-profile

Keeps the profile in filesystem after registering it in Perun storage. Otherwise it is deleted.

-f, --force

If set to true, then the profile will be registered in the <hash> minor versionindex, even if its origin <hash> is different. WARNING: This can screw the performance history of your project.

Arguments

<profile>

Required argument(s)

perun rm

Unlinks the profile from the given minor version, keeping the contents stored in .perun directory.

<profile> is unlinked in the following steps:

  1. <profile> is looked up in the <hash> minor version’s internal index.
  2. In case <profile> is not found. An error is raised.
  3. Otherwise, the record corresponding to <hash> is erased. However, the original blob is kept in .perun/objects.

If no <hash> is specified, then current HEAD of the wrapped version control system is used instead. Massaging of <hash> is taken care of by underlying version control system (e.g. git uses git rev-parse).

<profile> can either be a index tag or a path specifying the profile. Index tags are in form of i@i, where i stands for an index in the minor version’s index and @i is literal suffix. Run perun status to see the tags of current HEAD’s index. The index tag range is in form of i@i-j@i, where both i and j stands for indexes in the minor version’s index. The index tag range then represents all of the profiles in the interval <i, j>. registered in index. When i > j, then no profiles will be removed; when j; when j is bigger than the number of pending profiles, then all of the non-existing pending profiles will be obviously skipped. Tags consider the sorted order as specified by the following option format.sort_profiles_by.

Examples of removing profiles:

$ perun rm 2@i

This commands removes the third (we index from zero) profile in the index of registered profiles of current HEAD.

An error is raised if the command is executed outside of range of any Perun or if <profile> is not found inside the <hash> index.

See Perun Internals for information how perun handles profiles internally.

perun rm [OPTIONS] <profile>

Options

-m, --minor <minor>

<profile> will be stored at this minor version (default is HEAD).

-A, --remove-all

Removes all occurrences of <profile> from the <hash> index.

Arguments

<profile>

Required argument(s)

perun status

Shows the status of vcs, associated profiles and perun.

Shows the status of both the nearest perun and wrapped version control system. For vcs this outputs e.g. the current minor version HEAD, current major version and description of the HEAD. Moreover status prints the lists of tracked and pending (found in .perun/jobs) profiles lexicographically sorted along with additional information such as their types and creation times.

Unless perun --no-pager status is issued as command, or appropriate paging option is set, the outputs of status will be paged (by default using less.

An error is raised if the command is executed outside of range of any perun, or configuration misses certain configuration keys (namely format.status).

Profiles (both registered in index and stored in pending directory) are sorted according to the format.sort_profiles_by. The option --sort-by sets this key in the local configuration for further usage. This means that using the pending or index tags will consider this order.

Refer to Customizing Statuses for information how to customize the outputs of status or how to set format.status in nearest configuration.

perun status [OPTIONS]

Options

-s, --short

Shortens the output of status to include only most necessary information.

-sb, --sort-by <format__sort_profiles_by>

Sets the <key> in the local configuration for sorting profiles. Note that after setting the <key> it will be used for sorting which is considered in pending and index tags!

perun log

Shows history of versions and associated profiles.

Shows the history of the wrapped version control system and all of the associated profiles starting from the <hash> point, outputing the information about number of profiles, about descriptions ofconcrete minor versions, their parents, parents etc.

If perun log --short is issued, the shorter version of the log is outputted.

In no <hash> is given, then HEAD of the version control system is used as a starting point.

Unless perun --no-pager log is issued as command, or appropriate paging option is set, the outputs of log will be paged (by default using less.

Refer to Customizing Logs for information how to customize the outputs of log or how to set format.shortlog in nearest configuration.

perun log [OPTIONS] <hash>

Options

-s, --short

Shortens the output of log to include only most necessary information.

Arguments

<hash>

Optional argument

perun run

Generates batch of profiles w.r.t. specification of list of jobs.

Either runs the job matrix stored in local.yml configuration or lets the user construct the job run using the set of parameters.

perun run [OPTIONS] COMMAND [ARGS]...

Options

-ot, --output-filename-template <output_filename_template>

Specifies the template for automatic generation of output filename This way the file with collected data will have a resulting filename w.r.t to this parameter. Refer to format.output_profile_template for more details about the format of the template.

-m, --minor-version <minor_version_list>

Specifies the head minor version, for which the profiles will be collected.

-c, --crawl-parents

If set to true, then for each specified minor versions, profiles for parents will be collected as well

-f, --force-dirty

If set to true, then even if the repository is dirty, the changes will not be stashed

Commands

job
matrix

perun run job

Run specified batch of perun jobs to generate profiles.

This command correspond to running one isolated batch of profiling jobs, outside of regular profilings. Run perun run matrix, after specifying job matrix in local configuration to automate regular profilings of your project. After the batch is generated, each profile is taged with origin set to current HEAD. This serves as check to not assing such profiles to different minor versions.

By default the profiles computed by this batch job are stored inside the .perun/jobs/ directory as a files in form of:

bin-collector-workload-timestamp.perf

In order to store generated profiles run the following, with i@p corresponding to pending tag, which can be obtained by running perun status:

perun add i@p
perun run job -c time -b ./mybin -w file.in -w file2.in -p normalizer

This command profiles two commands ./mybin file.in and ./mybin file2.in and collects the profiling data using the Time Collector. The profiles are afterwards normalized with the Normalizer Postprocessor.

perun run job -c complexity -b ./mybin -w sll.cpp -cp complexity targetdir=./src

This commands runs one job ‘./mybin sll.cpp’ using the Trace Collector, which uses custom binaries targeted at ./src directory.

perun run job -c mcollect -b ./mybin -b ./otherbin -w input.txt -p normalizer -p clusterizer

This commands runs two jobs ./mybin input.txt and ./otherbin input.txt and collects the profiles using the Memory Collector. The profiles are afterwards postprocessed, first using the Normalizer Postprocessor and then with Regression Analysis.

Refer to Automating Runs and Perun’s Profile Format for more details about automation and lifetimes of profiles. For list of available collectors and postprocessors refer to Supported Collectors and Supported Postprocessors respectively.

perun run job [OPTIONS]

Options

-b, --cmd <cmd>

Command that is being profiled. Either corresponds to some script, binary or command, e.g. ./mybin or perun. [required]

-a, --args <args>

Additional parameters for <cmd>. E.g. status or -al is command parameter.

-w, --workload <workload>

Inputs for <cmd>. E.g. ./subdir is possible workloadfor ls command.

-c, --collector <collector>

Profiler used for collection of profiling data for the given <cmd> [required]

-cp, --collector-params <collector_params>

Additional parameters for the <collector> read from the file in YAML format

-p, --postprocessor <postprocessor>

After each collection of data will run <postprocessor> to postprocess the collected resources.

-pp, --postprocessor-params <postprocessor_params>

Additional parameters for the <postprocessor> read from the file in YAML format

perun run matrix

Runs the jobs matrix specified in the local.yml configuration.

This commands loads the jobs configuration from local configuration, builds the job matrix and subsequently runs the jobs collecting list of profiles. Each profile is then stored in .perun/jobs directory and moreover is annotated using by setting origin key to current HEAD. This serves as check to not assing such profiles to different minor versions.

The job matrix is defined in the yaml format and consists of specification of binaries with corresponding arguments, workloads, supported collectors of profiling data and postprocessors that alter the collected profiles.

Refer to Automating Runs and Job Matrix Format for more details how to specify the job matrix inside local configuration and to Perun Configuration files how to work with Perun’s configuration files.

perun run matrix [OPTIONS]

Options

-q, --without-vcs-history

Will not print the VCS history tree during the collection of the data.

perun check

Applies for the points of version history checks for possible performance changes.

This command group either runs the checks for one point of history (perun check head) or for the whole history (perun check all). For each minor version (called the target) we iterate over all of the registered profiles and try to find a predecessor minor version (called the baseline) with profile of the same configuration (by configuration we mean the tuple of collector, postprocessors, command, arguments and workloads) and run the checks according to the rules set in the configurations.

The rules are specified as an ordered list in the configuration by degradation.strategies, where the keys correspond to the configuration (or the type) and key method specifies the actual method used for checking for performance changes. The applied methods can then be either specified by the full name or by its short string consisting of all first letter of the function name.

The example of configuration snippet that sets rules and strategies for one project can be as follows:


degradation:

apply: first strategies:

  • type: mixed postprocessor: regression_analysis method: bmoe
  • cmd: mybin type: memory method: bmoe
  • method: aat

Currently we support the following methods:

1. Best Model Order Equality (BMOE) 2. Average Amount Threshold (AAT)
perun check [OPTIONS] COMMAND [ARGS]...

Options

-c, --compute-missing

whenever there are missing profiles in the given point of history the matrix will be rerun and new generated profiles assigned.

Commands

all
head
profiles

perun check head

Checks for changes in performance between between specified minor version (or current head) and its predecessor minor versions.

The command iterates over all of the registered profiles of the specified minor version (target; e.g. the head), and tries to find the nearest predecessor minor version (baseline), where the profile with the same configuration as the tested target profile exists. When it finds such a pair, it runs the check according to the strategies set in the configuration (see Configuring Degradation Detection or Perun Configuration files).

By default the hash corresponds to the head of the current project.

perun check head [OPTIONS] <hash>

Arguments

<hash>

Optional argument

perun check all

Checks for changes in performance for the specified interval of version history.

The commands crawls through the whole history of project versions starting from the specified <hash> and for all of the registered profiles (corresponding to some target minor version) tries to find a suitable predecessor profile (corresponding to some baseline minor version) and runs the performance check according to the set of strategies set in the configuration (see Configuring Degradation Detection or Perun Configuration files).

perun check all [OPTIONS] <hash>

Arguments

<hash>

Optional argument

perun check profiles

Checks for changes in performance between two profiles.

The commands checks for the changes between two isolate profiles, that can be stored in pending profiles, registered in index, or be simply stored in filesystem. Then for the pair of profiles <baseline> and <target> the command runs the performance chekc according to the set of strategies set in the configuration (see Configuring Degradation Detection or Perun Configuration files).

<baseline> and <target> profiles will be looked up in the following steps:

  1. If profile is in form i@i (i.e, an index tag), then ith record registered in the minor version <hash> index will be used.
  2. If profile is in form i@p (i.e., an pending tag), then ith profile stored in .perun/jobs will be used.
  3. Profile is looked-up within the minor version <hash> index for a match. In case the <profile> is registered there, it will be used.
  4. Profile is looked-up within the .perun/jobs directory. In case there is a match, the found profile will be used.
  5. Otherwise, the directory is walked for any match. Each found match is asked for confirmation by user.
perun check profiles [OPTIONS] <baseline> <target>

Options

-m, --minor <minor>

Will check the index of different minor version <hash> during the profile lookup.

Arguments

<baseline>

Required argument

<target>

Required argument

Collect Commands

perun collect

Generates performance profile using selected collector.

Runs the single collector unit (registered in Perun) on given profiled command (optionaly with given arguments and workloads) and generates performance profile. The generated profile is then stored in .perun/jobs/ directory as a file, by default with filename in form of:

bin-collector-workload-timestamp.perf

Generated profiles will not be postprocessed in any way. Consult perun postprocessby --help in order to postprocess the resulting profile.

The configuration of collector can be specified in external YAML file given by the -p/--params argument.

For a thorough list and description of supported collectors refer to Supported Collectors. For a more subtle running of profiling jobs and more complex configuration consult either perun run matrix --help or perun run job --help.

perun collect [OPTIONS] COMMAND [ARGS]...

Options

-m, --minor-version <minor_version_list>

Specifies the head minor version, for which the profiles will be collected.

-cp, --crawl-parents

If set to true, then for each specified minor versions, profiles for parents will be collected as well

-c, --cmd <cmd>

Command that is being profiled. Either corresponds to some script, binary or command, e.g. ./mybin or perun.

-a, --args <args>

Additional parameters for <cmd>. E.g. status or -al is command parameter.

-w, --workload <workload>

Inputs for <cmd>. E.g. ./subdir is possible workloadfor ls command.

-p, --params <params>

Additional parameters for called collector read from file in YAML format.

-ot, --output-filename-template <output_filename_template>

Specifies the template for automatic generation of output filename This way the file with collected data will have a resulting filename w.r.t to this parameter. Refer to format.output_profile_template for more details about the format of the template.

Collect units

perun collect trace

Generates trace performance profile, capturing running times of function depending on underlying structural sizes.

  • Limitations: C/C++ binaries
  • Metric: mixed (captures both time and size consumption)
  • Dependencies: SystemTap (+ corresponding requirements e.g. kernel -dbgsym version)
  • Default units: us for time, element number for size

Example of collected resources is as follows:

{
    "amount": 11,
    "subtype": "time delta",
    "type": "mixed",
    "uid": "SLList_init(SLList*)",
    "structure-unit-size": 0
}

Trace collector provides various collection strategies which are supposed to provide sensible default settings for collection. This allows the user to choose suitable collection method without the need of detailed rules / sampling specification. Currently supported strategies are:

  • userspace: This strategy traces all userspace functions / code blocks without

the use of sampling. Note that this strategy might be resource-intensive. * all: This strategy traces all userspace + library + kernel functions / code blocks that are present in the traced binary without the use of sampling. Note that this strategy might be very resource-intensive. * u_sampled: Sampled version of the userspace strategy. This method uses sampling to reduce the overhead and resources consumption. * a_sampled: Sampled version of the all strategy. Its goal is to reduce the overhead and resources consumption of the all method. * custom: User-specified strategy. Requires the user to specify rules and sampling manually.

Note that manually specified parameters have higher priority than strategy specification and it is thus possible to override concrete rules / sampling by the user.

The collector interface operates with two seemingly same concepts: (external) command and binary. External command refers to the script, executable, makefile, etc. that will be called / invoked during the profiling, such as ‘make test’, ‘run_script.sh’, ‘./my_binary’. Binary, on the other hand, refers to the actual binary or executable file that will be profiled and contains specified functions / static probes etc. It is expected that the binary will be invoked / called as part of the external command script or that external command and binary are the same.

The interface for rules (functions, static probes) specification offers a way to specify profiled locations both with sampling or without it. Note that sampling can reduce the overhead imposed by the profiling. Static rules can be further paired - paired rules act as a start and end point for time measurement. Without a pair, the rule measures time between each two probe hits. The pairing is done automatically for static locations with convention <name> and <name>_end or <name>_END. Otherwise, it is possible to pair rules by the delimiter ‘#’, such as <name1>#<name2>.

Trace profiles are suitable for postprocessing by Regression Analysis since they capture dependency of time consumption depending on the size of the structure. This allows one to model the estimation of trace of individual functions.

Scatter plots are suitable visualization for profiles collected by trace collector, which plots individual points along with regression models (if the profile was postprocessed by regression analysis). Run perun show scatter --help or refer to Scatter Plot for more information about scatter plots.

Refer to Trace Collector for more thorough description and examples of trace collector.

perun collect trace [OPTIONS]

Options

-m, --method <method>

Select strategy for probing the binary. See documentation for detailed explanation for each strategy. [required]

-f, --func <func>

Set the probe point for the given function.

-s, --static <static>

Set the probe point for the given static location.

-d, --dynamic <dynamic>

Set the probe point for the given dynamic location.

-fs, --func-sampled <func_sampled>

Set the probe point and sampling for the given function.

-ss, --static-sampled <static_sampled>

Set the probe point and sampling for the given static location.

-ds, --dynamic-sampled <dynamic_sampled>

Set the probe point and sampling for the given dynamic location.

-g, --global-sampling <global_sampling>

Set the global sample for all probes, sampling parameter for specific rules have higher priority.

--with-static, --no-static

The selected method will also extract and profile static probes.

-b, --binary <binary>

The profiled executable. If not set, then the command is considered to be the profiled executable and is used as a binary parameter

-t, --timeout <timeout>

Set time limit for the profiled command, i.e. the command will be terminated after reaching the time limit. Useful for endless commands etc.

--cleanup, --no-cleanup

Enable/disable the pre-cleanup of possibly running systemtap processes that could cause the corruption of the output file due to multiple writes.

-vt, --verbose-trace

Set the trace file output to be more verbose, useful for debugging.

perun collect memory

Generates memory performance profiel, capturing memory allocations of different types along with target address and full call trace.

  • Limitations: C/C++ binaries
  • Metric: memory
  • Dependencies: libunwind.so and custom libmalloc.so
  • Default units: B for memory

The following snippet shows the example of resources collected by memory profiler. It captures allocations done by functions with more detailed description, such as the type of allocation, trace, etc.

{
    "type": "memory",
    "subtype": "malloc",
    "address": 19284560,
    "amount": 4,
    "trace": [
        {
            "source": "../memory_collect_test.c",
            "function": "main",
            "line": 22
        },
    ],
    "uid": {
        "source": "../memory_collect_test.c",
        "function": "main",
        "line": 22
    }
},

Memory profiles can be efficiently interpreted using Heap Map technique (together with its heat mode), which shows memory allocations (by functions) in memory address map.

Refer to Memory Collector for more thorough description and examples of memory collector.

perun collect memory [OPTIONS]

Options

-s, --sampling <sampling>

Sets the sampling interval for profiling the allocations. I.e. memory snapshots will be collected each <sampling> seconds.

--no-source <no_source>

Will exclude allocations done from <no_source> file during the profiling.

--no-func <no_func>

Will exclude allocations done by <no func> function during the profiling.

-a, --all

Will record the full trace for each allocation, i.e. it will include all allocators and even unreachable records.

perun collect time

Generates time performance profile, capturing overall running times of the profiled command.

  • Limitations: none
  • Metric: running time
  • Dependencies: none
  • Default units: s

This is a wrapper over the time linux unitility and captures resources in the following form:

{
    "amount": 0.59,
    "type": "time",
    "subtype": "sys",
    "uid": cmd
    "order": 1
}

Refer to Time Collector for more thorough description and examples of trace collector.

perun collect time [OPTIONS]

Options

-w, --warm-up-repetition <warmup>

Before the actual timing, the collector will execute <int> warm-up executions.

-r, --repeat <repeat>

The timing of the given binaries will be repeated <int> times.

Postprocess Commands

perun postprocessby

Postprocesses the given stored or pending profile using selected postprocessor.

Runs the single postprocessor unit on given looked-up profile. The postprocessed file will be then stored in .perun/jobs/ directory as a file, by default with filanem in form of:

bin-collector-workload-timestamp.perf

The postprocessed <profile> will be looked up in the following steps:

  1. If <profile> is in form i@i (i.e, an index tag), then ith record registered in the minor version <hash> index will be postprocessed.
  2. If <profile> is in form i@p (i.e., an pending tag), then ith profile stored in .perun/jobs will be postprocessed.
  3. <profile> is looked-up within the minor version <hash> index for a match. In case the <profile> is registered there, it will be postprocessed.
  4. <profile> is looked-up within the .perun/jobs directory. In case there is a match, the found profile will be postprocessed.
  5. Otherwise, the directory is walked for any match. Each found match is asked for confirmation by user.

Tags consider the sorted order as specified by the following option format.sort_profiles_by.

For checking the associated tags to profiles run perun status.

Example 1. The following command will postprocess the given profile stored at given path by normalizer, i.e. for each snapshot, the resources will be normalized to the interval <0, 1>:

perun postprocessby ./echo-time-hello-2017-04-02-13-13-34-12.perf normalizer

Example 2. The following command will postprocess the second profile stored in index of commit preceeding the current head using interval regression analysis:

perun postprocessby -m HEAD~1 1@i regression-analysis --method=interval

For a thorough list and description of supported postprocessors refer to Supported Postprocessors. For a more subtle running of profiling jobs and more complex configuration consult either perun run matrix --help or perun run job --help.

perun postprocessby [OPTIONS] <profile> COMMAND [ARGS]...

Options

-ot, --output-filename-template <output_filename_template>

Specifies the template for automatic generation of output filename This way the postprocessed file will have a resulting filename w.r.t to this parameter. Refer to format.output_profile_template for more details about the format of the template.

-m, --minor <minor>

Will check the index of different minor version <hash> during the profile lookup

Arguments

<profile>

Required argument

Postprocess units

perun postprocessby normalizer

Normalizes performance profile into flat interval.

  • Limitations: none
  • Dependencies: none

Normalizer is a postprocessor, which iterates through all of the snapshots and normalizes the resources of same type to interval (0, 1), where 1 corresponds to the maximal value of the given type.

Consider the following list of resources for one snapshot generated by Time Collector:

[
    {
        'amount': 0.59,
        'uid': 'sys'
    }, {
        'amount': 0.32,
        'uid': 'user'
    }, {
        'amount': 2.32,
        'uid': 'real'
    }
]

Normalizer yields the following set of resources:

[
    {
        'amount': 0.2543103448275862,
        'uid': 'sys'
    }, {
        'amount': 0.13793103448275865,
        'uid': 'user'
    }, {
        'amount': 1.0,
        'uid': 'real'
    }
]

Refer to Normalizer Postprocessor for more thorough description and examples of normalizer postprocessor.

perun postprocessby normalizer [OPTIONS]

perun postprocessby regression_analysis

Finds fitting regression models to estimate models of profiled resources.

  • Limitations: Currently limited to models of amount depending on structural-unit-size
  • Dependencies: Trace Collector

Regression analyzer tries to find a fitting model to estimate the amount of resources depending on structural-unit-size.

The following strategies are currently available:

  1. Full Computation uses all of the data points to obtain the best fitting model for each type of model from the database (unless --regression_models/-r restrict the set of models)
  2. Iterative Computation uses a percentage of data points to obtain some preliminary models together with their errors or fitness. The most fitting model is then expanded, until it is fully computed or some other model becomes more fitting.
  3. Full Computation with initial estimate first uses some percent of data to estimate which model would be best fitting. Given model is then fully computed.
  4. Interval Analysis uses more finer set of intervals of data and estimates models for each interval providing more precise modeling of the profile.
  5. Bisection Analysis fully computes the models for full interval. Then it does a split of the interval and computes new models for them. If the best fitting models changed for sub intervals, then we continue with the splitting.

Currently we support linear, quadratic, power, logaritmic and constant models and use the coeficient of determination (\(R^2\)) to measure the fitness of model. The models are stored as follows:

{
    "uid": "SLList_insert(SLList*, int)",
    "r_square": 0.0017560012128507133,
    "coeffs": [
        {
            "value": 0.505375215875552,
            "name": "b0"
        },
        {
            "value": 9.935159839322705e-06,
            "name": "b1"
        }
    ],
    "x_interval_start": 0,
    "x_interval_end": 11892,
    "model": "linear",
    "method": "full",
}

Note that if your data are not suitable for regression analysis, check out Clusterizer to postprocess your profile to be analysable by this analysis.

For more details about regression analysis refer to Regression Analysis. For more details how to collect suitable resources refer to Trace Collector.

perun postprocessby regression_analysis [OPTIONS]

Options

-m, --method <method>

Will use the <method> to find the best fitting models for the given profile. [required]

-r, --regression_models <regression_models>

Restricts the list of regression models used by the specified <method> to fit the data. If omitted, all regression models will be used in the computation.

-s, --steps <steps>

Restricts the number of number of steps / data parts used by the iterative, interval and initial guess methods

-dp, --depending-on <per_key>

Sets the key that will be used as a source of independent variable.

-o, --of <of_key>

Sets key for which we are finding the model.

perun postprocessby clusterizer

Clusters each resource to an appropriate cluster in order to be postprocessable by regression analysis.

  • Limitations: none
  • Dependencies: none

Clusterizer tries to find a suitable cluster for each resource in the profile. The clusters are either computed w.r.t the sort order of the resource amounts, or are computed according to the sliding window.

The sliding window can be further adjusted by setting its width (i.e. how many near values on the x axis will we fit to a cluster) and its height (i.e. how big of an interval of resource amounts will be consider for one cluster). Both width and height can be further augmented. Width can either be absolute, where we take in maximum the absolute number of resources, relative, where we take in maximum the percentage of number of resources for each cluster, or weighted, where we take the number of resource depending on the frequency of their occurrences. Similarly, the height can either be absolute, where we set the interval of amounts to an absolute size, or relative, where we set the interval of amounts relative to the to the first resource amount in the cluster (so e.g. if we have window of height 0.1 and the first resource in the cluster has amount of 100, we will cluster every resources in interval 100 to 110 to this cluster).

For more details about regression analysis refer to Clusterizer.

perun postprocessby clusterizer [OPTIONS]

Options

-s, --strategy <strategy>

Specifies the clustering strategy, that will be applied for the profile

-wh, --window-height <window_height>

Specifies the height of the window (either fixed or proportional)

-rwh, --relative-window-height

Specifies that the height of the window is relative to the point

-fwh, --fixed-window-height

Specifies that the height of the window is absolute to the point

-ww, --window-width <window_width>

Specifies the width of the window, i.e. how many values will be taken by window.

-rww, --relative-window-width

Specifies whether the width of the window is weighted or fixed

-fww, --fixed-window-width

Specifies whether the width of the window is weighted or fixed

-www, --weighted-window-width

Specifies whether the width of the window is weighted or fixed

perun postprocessby regressogram

Execution of the interleaving of profiled resources by regressogram models.

  • Limitations: none
  • Dependencies: none

Regressogram belongs to the simplest non-parametric methods and its properties are the following:

Regressogram: can be described such as step function (i.e. constant function by parts). Regressogram uses the same basic idea as a histogram for density estimate. This idea is in dividing the set of values of the x-coordinates (<per_key>) into intervals and the estimate of the point in concrete interval takes the mean/median of the y-coordinates (<of_resource_key>), respectively of its value on this sub-interval. We currently use the coefficient of determination (\(R^2\)) to measure the fitness of regressogram. The fitness of estimation of regressogram model depends primarily on the number of buckets into which the interval will be divided. The user can choose number of buckets manually (<bucket_window>) or use one of the following methods to estimate the optimal number of buckets (<bucket_method>):

  • sqrt: square root (of data size) estimator, used for its speed and simplicity
  • rice: does not take variability into account, only data size and commonly overestimates
  • scott: takes into account data variability and data size, less robust estimator
  • stone: based on leave-one-out cross validation estimate of the integrated squared error
  • fd: robust, takes into account data variability and data size, resilient to outliers
  • sturges: only accounts for data size, underestimates for large non-gaussian data
  • doane: generalization of Sturges’ formula, works better with non-gaussian data
  • auto: max of the Sturges’ and ‘fd’ estimators, provides good all around performance

For more details about these methods to estimate the optimal number of buckets or to view the code of these methods, you can visit SciPy.

For more details about this approach of non-parametric analysis refer to Regressogram method.

perun postprocessby regressogram [OPTIONS]

Options

-bn, --bucket_number <bucket_number>

Restricts the number of buckets to which will be placed the values of the selected statistics.

-bm, --bucket_method <bucket_method>

Specifies the method to estimate the optimal number of buckets.

-sf, --statistic_function <statistic_function>

Will use the <statistic_function> to compute the values for points within each bucket of regressogram.

-of, --of-key <of_key>

Sets key for which we are finding the model (y-coordinates).

-per, --per-key <per_key>

Sets the key that will be used as a source variable (x-coordinates).

perun postprocessby moving_average

Execution of the interleaving of profiled resources by moving average models.

  • Limitations: none
  • Dependencies: none

Moving average methods are the natural generalizations of regressogram method. This method uses the local averages/medians of y-coordinates (<of_resource_key>), but the estimate in the x-point (<per_key>) is based on the centered surroundings of this points, more precisely:

Moving Average: is a widely used estimator in the technical analysis, that helps smooth the dataset by filtering out the ‘noise’. Among the basic properties of this methods belongs the ability to reduce the effect of temporary variations in data, better improvement of the fitness of data to a line, so called smoothing, to show the data’s trend more clearly and highlight any value below or above the trend. The most important task with this type of non-parametric approach is the choice of the <window-width>. If the user does not choose it, we try approximate this value by using the value of coefficient of determination (\(R^2\)). At the begin of the analysis is set the initial value of window width and then follows the interleaving of the current dataset, which runs until the value of coefficient of determination will not reach the required level. By this way is guaranteed the desired smoothness of the resulting models. The two basic and commonly used <moving-methods> are the simple moving average (sma) and the exponential moving average (ema).

For more details about this approach of non-parametric analysis refer to Moving Average Methods.

perun postprocessby moving_average [OPTIONS] COMMAND [ARGS]...

Options

-mp, --min_periods <min_periods>

Provides the minimum number of observations in window required to have a value. If the number of possible observations smaller then result is NaN.

-of, --of-key <of_key>

Sets key for which we are finding the model (y-coordinates).

-per, --per-key <per_key>

Sets the key that will be used as a source variable (x-coordinates).

Commands

ema
sma
smm

perun postprocessby moving_average sma

Simple Moving Average

In the most of cases, it is an unweighted Moving Average, this means that the each x-coordinate in the data set (profiled resources) has equal importance and is weighted equally. Then the mean is computed from the previous n data (<no-center>), where the n marks <window-width>. However, in science and engineering the mean is normally taken from an equal number of data on either side of a central value (<center>). This ensures that variations in the mean are aligned with the variations in the mean are aligned with variations in the data rather than being shifted in the x-axis direction. Since the window at the boundaries of the interval does not contain enough count of points usually, it is necessary to specify the value of <min-periods> to avoid the NaN result. The role of the weighted function in this approach belongs to <window-type>, which represents the suite of the following window functions for filtering:

  • boxcar: known as rectangular or Dirichlet window, is equivalent to no window at all: –
  • triang: standard triangular window: / - blackman: formed by using three terms of a summation of cosines, minimal leakage, close to optimal
  • hamming: formed by using a raised cosine with non-zero endpoints, minimize the nearest side lobe
  • bartlett: similar to triangular, endpoints are at zero, processing of tapering data sets
  • parzen: can be regarded as a generalization of k-nearest neighbor techniques
  • bohman: convolution of two half-duration cosine lobes
  • blackmanharris: minimum in the sense that its maximum side lobes are minimized (symmetric 4-term)
  • nuttall: minimum 4-term Blackman-Harris window according to Nuttall (so called ‘Nuttall4c’)
  • barthann: has a main lobe at the origin and asymptotically decaying side lobes on both sides
  • kaiser: formed by using a Bessel function, needs beta value (set to 14 - good starting point)

For more details about this window functions or for their visual view you can see SciPyWindow.

perun postprocessby moving_average sma [OPTIONS]

Options

-wt, --window_type <window_type>

Provides the window type, if not set then all points are evenly weighted.For further information about window types see the notes in the documentation.

--center, --no-center

If set to False, the result is set to the right edge of the window, else is result set to the center of the window

-ww, --window_width <window_width>

Size of the moving window. This is a number of observations used for calculating the statistic. Each window will be a fixed size.

perun postprocessby moving_average smm

Simple Moving Median

The second representative of Simple Moving Average methods is the Simple Moving Median. For this method are applicable to the same rules like in the first described method, except for the option for choosing the window type, which do not make sense in this approach. The only difference between these two methods are the way of computation the values in the individual sub-intervals. Simple Moving Median is not based on the computation of average, but as the name suggests, it based on the median.

perun postprocessby moving_average smm [OPTIONS]

Options

--center, --no-center

If set to False, the result is set to the right edge of the window, else is result set to the center of the window

-ww, --window_width <window_width>

Size of the moving window. This is a number of observations used for calculating the statistic. Each window will be a fixed size.

perun postprocessby moving_average ema

Exponential Moving Average

This method is a type of moving average methods, also know as Exponential Weighted Moving Average, that places a greater weight and significance on the most recent data points. The weighting for each far x-coordinate decreases exponentially and never reaching zero. This approach of moving average reacts more significantly to recent changes than a Simple Moving Average, which applies an equal weight to all observations in the period. To calculate an EMA must be first computing the Simple Moving Average (SMA) over a particular sub-interval. In the next step must be calculated the multiplier for smoothing (weighting) the EMA, which depends on the selected formula, the following options are supported (<decay>):

  • com: specify decay in terms of center of mass: \({\alpha}\) = 1 / (1 + com), for com >= 0
  • span: specify decay in terms of span: \({\alpha}\) = 2 / (span + 1), for span >= 1
  • halflife: specify decay in terms of half-life, \({\alpha}\) = 1 - exp(log(0.5) / halflife), for halflife > 0
  • alpha: specify smoothing factor \({\alpha}\) directly: 0 < \({\alpha}\) <= 1

The computed coefficient \({\alpha}\) represents the degree of weighting decrease, a constant smoothing factor, The higher value of \({\alpha}\) discounts older observations faster, the small value to the contrary. Finally, to calculate the current value of EMA is used the relevant formula. It is important do not confuse Exponential Moving Average with Simple Moving Average. An Exponential Moving Average behaves quite differently from the second mentioned method, because it is the function of weighting factor or length of the average.

perun postprocessby moving_average ema [OPTIONS]

Options

-d, --decay <decay>

Exactly one of “com”, “span”, “halflife”, “alpha” can be provided. Allowed values and relationship between the parameters are specified in the documentation (e.g. –decay=com 3).

perun postprocessby kernel-regression

Execution of the interleaving of profiles resources by kernel models.

  • Limitations: none
  • Dependencies: none

In statistics, the kernel regression is a non-parametric approach to estimate the conditional expectation of a random variable. Generally, the main goal of this approach is to find non-parametric relation between a pair of random variables X <per-key> and Y <of-key>. Different from parametric techniques (e.g. linear regression), kernel regression does not assume any underlying distribution (e.g. linear, exponential, etc.) to estimate the regression function. The main idea of kernel regression is putting the kernel, that have the role of weighted function, to each observation point in the dataset. Subsequently, the kernel will assign weight to each point in depends on the distance from the current data point. The kernel basis formula depends only to the bandwidth from the current (‘local’) data point X to a set of neighboring data points X.

Kernel Selection does not important from an asymptotic point of view. It is appropriate to choose the optimal kernel since this group of the kernels are continuously on the whole definition field and then the estimated regression function inherit smoothness of the kernel. For example, a suitable kernels can be the epanechnikov or normal kernel. This postprocessor offers the kernel selection in the kernel-smoothing mode, where are available five different types of kernels. For more information about these kernels or this kernel regression mode you can see perun postprocessby kernel-regression kernel-smoothing.

Bandwidth Selection is the most important factor at each approach of kernel regression, since this value significantly affects the smoothness of the resulting estimate. In case, when is choose the inappropriate value, in the most cases can be expected the following two situations. The small bandwidth value reproduce estimated data and vice versa, the large value leads to over-leaving, so to average of the estimated data. Therefore are used the methods to determine the bandwidth value. One of the most widespread and most commonly used methods is the cross-validation method. This method is based on the estimate of the regression function in which will be omitted i-th observation. In this postprocessor is this method available in the estimator-setting mode. Another methods to determine the bandwidth, which are available in the remaining modes of this postprocessor are scott and silverman method. More information about these methods and its definition you cas see in the part perun postprocessby kernel-regression method-selection.

This postprocessor in summary offers five different modes, which does not differ in the resulting estimate, but in the way of computation the resulting estimate. Better said, it means, that the result of each mode is the kernel estimate with relevant parameters, selected according to the concrete mode. In short we will describe the individual methods, for more information about it, you can visit the relevant parts of documentation:

  • Estimator-Settings: Nadaraya-Watson kernel regression with specific settings for estimate
  • User-Selection: Nadaraya-Watson kernel regression with user bandwidth
  • Method-Selection: Nadaraya-Watson kernel regression with supporting bandwidth selection method
  • Kernel-Smoothing: Kernel regression with different types of kernel and regression methods
  • Kernel-Ridge: Nadaraya-Watson kernel regression with automatic bandwidth selection

For more details about this approach of non-parametric analysis refer to Kernel Regression Methods.

perun postprocessby kernel-regression [OPTIONS] COMMAND [ARGS]...

Options

-of, --of-key <of_key>

Sets key for which we are finding the model (y-coordinates).

-per, --per-key <per_key>

Sets the key that will be used as a source variable (x-coordinates).

Commands

estimator-settings
kernel-ridge
kernel-smoothing
method-selection
user-selection

perun postprocessby kernel-regression estimator-settings

Nadaraya-Watson kernel regression with specific settings for estimate.

As has been mentioned above, the kernel regression aims to estimate the functional relation between explanatory variable y and the response variable X. This mode of kernel regression postprocessor calculates the conditional mean E[y|X] = m(X), where y = m(X) + \(\epsilon\). Variable X is represented in the postprocessor by <per-key> option and the variable y is represented by <of-key> option.

Regression Estimator <reg-type>:

This mode offer two types of regression estimator <reg-type>. Local Constant (`ll`) type of regression provided by this mode is also known as Nadaraya-Watson kernel regression:

Nadaraya-Watson: expects the following conditional expectation: E[y|X] = m(X), where function m(*) represents the regression function to estimate. Then we can alternatively write the following formula: y = m(X) + \(\epsilon\), E (\(\epsilon\)) = 0. Then we can suppose, that we have the set of independent observations {(\({x_1}\), \({y_1}\)), …, (\({x_n}\), \({y_n}\))} and the Nadaraya-Watson estimator is defined as:

\[m_{h}(x) = \sum_{i=1}^{n}K_h(x - x_i)y_i / \sum_{j=1}^{n}K_h(x - x_j)\]

where \({K_h}\) is a kernel with bandwidth h. The denominator is a weighting term with sum 1. It easy to see that this kernel regression estimator is just a weighted sum of the observed responses \({y_i}\). There are many other kernel estimators that are various in compare to this presented estimator. However, since all are asymptotic equivalently, we will not deal with them closer. Kernel Regression postprocessor works in all modes only with Nadaraya-Watson estimator.

The second supported regression estimator in this mode of postprocessor is Local Linear (`lc`). This type is an extension of that which suffers less from bias issues at the edge of the support.

Local Linear: estimator, that offers various advantages compared with other kernel-type estimators, such as the Nadaraya-Watson estimator. More precisely, it adapts to both random and fixed designs, and to various design densities such as highly clustered designs and nearly uniform designs. It turns out that the local linear smoother repairs the drawbacks of other kernel regression estimators. An regression estimator m of m is a linear smoother if, for each x, there is a vector \(l(x) = (l_1(x), ..., l_n(x))^T\) such that:

\[m(x) = \sum_{i=1}^{n}l_i(x)Y_i = l(x)^TY\]

where \(Y = (Y_1, ..., Y_n)^T\). For kernel estimators:

\[l_i(x) = K(||x - X_i|| / h) / \sum_{j=1}^{n}K(||x - X_j|| / h)\]

where K represents kernel and h its bandwidth.

For a better imagination, there is an interesting fact, that the following estimators are linear smoothers too: Gaussian process regression, splines.

Bandwidth Method <bandwidth-method>:

As has been said in the general description of the kernel regression, once of the most important factors of the resulting estimate is the kernel bandwidth. When the inappropriate value is selected may occur to under-laying or over-laying fo the resulting kernel estimate. Since the bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimate postprocessor offers the method for its selection. Two most popular data-driven methods of bandwidth selection that have desirable properties are least-squares cross-validation (cv_ls) and the AIC-based method of Hurvich et al. (1998), which is based on minimizing a modified Akaike Information Criterion (aic):

Cross-Validation Least-Squares: determination of the optimal kernel bandwidth for kernel regression is based on minimizing

\[CV(h) = n^{-1} \sum_{i=1}^{n}(Y_i - g_{-i}(X_i))^2,\]

where \(g_{-i}(X_i)\) is the estimator of \(g(X_i)\) formed by leaving out the i-th observation when generating the prediction for observation i.

Hurvich et al.’s (1998) approach is based on the minimization of

\[AIC_c = ln(\sigma^2) + ((1 + tr(H) / n) / (1 - (tr(H) + 2) / n),\]

where

\[\sigma^2 = 1 / n \sum_{i=1}^{n}(Y_i - g(X_i))^2 = Y'(I - H)'(I - H)Y / n\]

with \(g(X_i)\) being a non-parametric regression estimator and H being an n x n matrix of kernel weights with its (i, j)-th element given by \(H_{ij} = K_h(X_i, X_j) / \sum_{l=1}^{n} K_h(X_i, X_l)\), where \(K_h(*)\) is a generalized product kernel.

Both methods for kernel bandwidth selection the least-squared cross-validation and the AIC have been shown to be asymptotically equivalent.

The remaining options at this mode of kernel regression postprocessor are described within usage to it and you can see this in the list below. All these options are parameters to EstimatorSettings (see EstimatorSettings), that optimizing the kernel bandwidth based on the these specified settings.

In the case of confusion about this approach of kernel regression, you can visit StatsModels.

perun postprocessby kernel-regression estimator-settings [OPTIONS]

Options

-rt, --reg-type <reg_type>

Provides the type for regression estimator. Supported types are: “lc”: local-constant (Nadaraya-Watson) and “ll”: local-linear estimator. Default is “ll”. For more information about these types you can visit Perun Documentation.

-bw, --bandwidth-method <bandwidth_method>

Provides the method for bandwidth selection. Supported values are: “cv-ls”: least-squarescross validation and “aic”: AIC Hurvich bandwidth estimation. Default is “cv-ls”. For more information about these methods you can visit Perun Documentation.

--efficient, --uniformly

If True, is executing the efficient bandwidth estimation - by taking smaller sub-samples and estimating the scaling factor of each sub-sample. It is useful for large samples and/or multiple variables. If False (default), all data is used at the same time.

--randomize, --no-randomize

If True, the bandwidth estimation is performed by taking <n_res> random re-samples of size <n-sub-samples> from the full sample. If set to False (default), is performed by slicing the full sample in sub-samples of <n-sub-samples> size, so that all samples are used once.

-nsub, --n-sub-samples <n_sub_samples>

Size of the sub-samples (default is 50).

-nres, --n-re-samples <n_re_samples>

The number of random re-samples used to bandwidth estimation. It has effect only if <randomize> is set to True. Default values is 25.

--return-median, --return-mean

If True, the estimator uses the median of all scaling factors for each sub-sample to estimate bandwidth of the full sample. If False (default), the estimator used the mean.

perun postprocessby kernel-regression user-selection

Nadaraya-Watson kernel regression with user bandwidth.

This mode of kernel regression postprocessor is very similar to estimator-settings mode. Also offers two types of regression estimator <reg-type> and that the Nadaraya-Watson estimator, so known as local- constant (lc) and the local-linear estimator (ll). Details about these estimators are available in perun postprocessby kernel-regression estimator-settings. In contrary to this mode, which selected a kernel bandwidth using the EstimatorSettings and chosen parameters, in this mode the user itself selects a kernel bandwidth <bandwidth-value>. This value will be used to execute the kernel regression. The value of kernel bandwidth in the resulting estimate may change occasionally, specifically in the case, when the bandwidth value is too low to execute the kernel regression. Then will be a bandwidth value approximated to the closest appropriate value, so that is not decreased the accuracy of the resulting estimate.

perun postprocessby kernel-regression user-selection [OPTIONS]

Options

-rt, --reg-type <reg_type>

Provides the type for regression estimator. Supported types are: “lc”: local-constant (Nadaraya-Watson) and “ll”: local-linear estimator. Default is “ll”. For more information about these types you can visit Perun Documentation.

-bv, --bandwidth-value <bandwidth_value>

The float value of <bandwidth> defined by user, which will be used at kernel regression. [required]

perun postprocessby kernel-regression method-selection

Nadaraya-Watson kernel regression with supporting bandwidth selection method.

The last method from a group of three methods based on a similar principle. Method-selection mode offers the same type of regression estimators <reg-type> as the first two described methods. The first supported option is ll, which represents the local-linear estimator. Nadaraya-Watson or local constant estimator represents the second option for <reg-type> parameter. The more detailed description of these estimators is located in perun postprocessby kernel-regression estimator-settings. The difference between this mode and the two first modes is in the way of determination of a kernel bandwidth. In this mode are offered two methods to determine bandwidth. These methods try calculated an optimal bandwidth from predefined formulas:

Scotts’s Rule of thumb to determine the smoothing bandwidth for a kernel estimation. It is very fast compute. This rule was designed for density estimation but is usable for kernel regression too. Typically produces a larger bandwidth and therefore it is useful for estimating a gradual trend:

\[bw = 1.059 * A * n^{-1/5},\]

where n marks the length of X variable <per-key> and

\[A = min(\sigma(x), IQR(x) / 1.349),\]

where \(\sigma\) marks the StandardDeviation and IQR marks the InterquartileRange.

Silverman’s Rule of thumb to determine the smoothing bandwidth for a kernel estimation. Belongs to most popular method which uses the rule-of-thumb. Rule is originally designs for density estimation and therefore uses the normal density as a prior for approximating. For the necessary estimation of the \(\sigma\) of X <per-key> he proposes a robust version making use of the InterquartileRange. If the true density is uni-modal, fairly symmetric and does not have fat tails, it works fine:

\[bw = 0.9 * A * n^{-1/5},\]

where n marks the length of X variable <per-key> and

\[A = min(\sigma(x), IQR(x) / 1.349),\]

where \(\sigma\) marks the StandardDeviation and IQR marks the InterquartileRange.

perun postprocessby kernel-regression method-selection [OPTIONS]

Options

-rt, --reg-type <reg_type>

Provides the type for regression estimator. Supported types are: “lc”: local-constant (Nadaraya-Watson) and “ll”: local-linear estimator. Default is “ll”. For more information about these types you can visit Perun Documentation.

-bm, --bandwidth-method <bandwidth_method>

Provides the helper method to determine the kernel bandwidth. The <method_name> will be used to compute the bandwidth, which will be used at kernel regression.

perun postprocessby kernel-regression kernel-smoothing

Kernel regression with different types of kernel and regression methods.

This mode of kernel regression postprocessor implements non-parametric regression using different kernel methods and different kernel types. The calculation in this mode can be split into three parts. The first part is represented by the kernel type, the second part by bandwidth computation and the last part is represented by regression method, which will be used to interleave the given resources. We will look gradually at individual supported options in the each part of computation.

Kernel Type <kernel-type>:

In non-parametric statistics a kernel is a weighting function used in estimation techniques. In kernel regression is used to estimate the conditional expectation of a random variable. As has been said, kernel width must be specified when running a non-parametric estimation. The kernel in view of mathematical definition is a non-negative real-valued integrable function K. For most applications, it is desirable to define the function to satisfy two additional requirements:

Normalization:

\[\int_{-\infty}^{+\infty}K(u)du = 1,\]

Symmetry

\[K(-u) = K(u),\]

for all values of u. The second requirement ensures that the average of the corresponding distribution is equal to that of the sample used. If K is a kernel, then so is the function \(K^*\) defined by \(K^*(u) = \lambda K (\lambda u)\), where \(\lambda > 0\). This can be used to select a scale that is appropriate for the data. This mode offers several types of kernel functions:

Kernel Name Kernel Function, K(u) Efficiency
Gaussian (normal) \(K(u)=(1/\sqrt{2\pi})e^{-(1/2)u^2}\) 95.1%
Epanechnikov \(K(u)=3/4(1-u^2)\) 100%
Tricube \(K(u)=70/81(1-|u^3|)^3\) 99.8%
Gaussian order4 \(\phi_4(u)=1/2(3-u^2)\phi(u)\), where \(\phi\) is the normal kernel not applicable
Epanechnikov order4 \(K_4(u)=-(15/8)u^2+(9/8)\), where K is the non-normalized Epanechnikov kernel not applicable

Efficiency is defined as \(\sqrt{\int{}{}u^2K(u)du}\int{}{}K(u)^2du\), and its measured to the Epanechnikov kernel.

Smoothing Method <smoothing-method>:

Kernel-Smoothing mode of this postprocessor offers three different non-parametric regression methods to execute kernel regression. The first of them is called spatial-average and perform a Nadaraya-Watson regression (i.e. also called local-constant regression) on the data using a given kernel:

\[m_{h}(x) = \sum_{i=1}^{n}K_h((x - x_i) / h)y_i / \sum_{j=1}^{n}K_h((x - x_j) / h),\]

where K(x) is the kernel and must be such that E(K(x)) = 0 and h is the bandwidth of the method. Local-Constant regression was also described in perun postprocessby kernel-regression estimator-settings. The second supported regression method by this mode is called local-linear. Compared with previous method, which offers computational with different types of kernel, this method has restrictions and perform local- linear regression using only Gaussian (Normal) kernel. The local-constant regression was described in perun postprocessby kernel-regression estimator-settings and therefore will not be given no further attention to it. Local Polynomial regression is the last method in this mode and perform regression in N-D using a user-provided kernel. The local-polynomial regression is the function that minimizes, for each position:

\[m_{h}(x) = \sum_{i=0}^{n}K((x - x_i) / h)(y_i - a_0 - P_q(x_i -x))^2,\]

where K(x) is the kernel such that E(K(x)) = 0, q is the order of the fitted polynomial <polynomial-order>, \(P_q(x)\) is a polynomial or order q in x, and h is the bandwidth of the method. The polynomial \(P_q(x)\) is of the form:

\[F_d(k) = { n \in N^d | \sum_{i=1}^{d}n_i = k }\]
\[P_q(x_1, ..., x_d) = \sum_{k=1}^{q}{}\sum_{n \in F_d(k)}^{}{} a_{k,n}\prod_{i=1}^{d}x_{i}^{n_i}\]

For example we can have:

\[P_2(x, y) = a_{110}x + a_{101}y + a_{220}x^2 + a_{221}xy + a_{202}y^2\]

The last part of the calculation is the bandwidth computation. This mode offers to user enter the value directly with use of parameter <bandwidth-value>. The parameter <bandwidth-method> offers to user the selection from the two methods to determine the optimal bandwidth value. The supported methods are Scotts’s Rule and Silverman’s Rule, which are described in perun postprocessby kernel-regression method-selection. This parameter cannot be entered in combination with <bandwidth-value>, then will be ignored and will be accepted value from <bandwidth-value>.

perun postprocessby kernel-regression kernel-smoothing [OPTIONS]

Options

-kt, --kernel-type <kernel_type>

Provides the set of kernels to execute the kernel-smoothing with kernel selected by the user. For exact definitions of these kernels and more information about it, you can visit the Perun Documentation.

-sm, --smoothing-method <smoothing_method>

Provides kernel smoothing methods to executing non-parametric regressions: local-polynomial perform a local-polynomial regression in N-D using a user-provided kernel; local-linear perform a local-linear regression using a gaussian (normal) kernel; and spatial-average performa Nadaraya-Watson regression on the data (so called local-constant regression) using a user-provided kernel.

-bm, --bandwidth-method <bandwidth_method>

Provides the helper method to determine the kernel bandwidth. The <bandwidth_method> will be used to compute the bandwidth, which will be used at kernel-smoothing regression. Cannot be entered in combination with <bandwidth-value>, then will be ignored and will be accepted value from <bandwidth-value>.

-bv, --bandwidth-value <bandwidth_value>

The float value of <bandwidth> defined by user, which will be used at kernel regression. If is entered in the combination with <bandwidth-method>, then method will be ignored.

-q, --polynomial-order <polynomial_order>

Provides order of the polynomial to fit. Default value of the order is equal to 3. Is accepted only by local-polynomial <smoothing-method>, another methods ignoring it.

perun postprocessby kernel-regression kernel-ridge

Nadaraya-Watson kernel regression with automatic bandwidth selection.

This mode implements Nadaraya-Watson kernel regression, which was described above in perun postprocessby kernel-regression estimator-settings. While the previous modes provided the methods to determine the optimal bandwidth with different ways, this method provides a little bit different way. From a given range of potential bandwidths <gamma-range> try to select the optimal kernel bandwidth with use of leave-one-out cross-validation. This approach was described in perun postprocessby kernel-regression estimator-settings, where was introduced the least-squares cross- validation and it is a modification of this approach. Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set. The original gamma-range will be divided on the base of size the given step <gamma-step>. The selection of specific value from this range will be executing by minimizing mean-squared-error in leave-one-out cross-validation. The selected bandwidth-value will serves for gaussian kernel in resulting estimate: \(K(x, y) = exp(-gamma * ||x-y||^2)\).

perun postprocessby kernel-regression kernel-ridge [OPTIONS]

Options

-gr, --gamma-range <gamma_range>

Provides the range for automatic bandwidth selection of the kernel via leave-one-outcross-validation. One value from these range will be selected with minimizing the mean-squared error of leave-one-out cross-validation. The first value will be taken as the lower bound of the range and cannot be greater than the second value.

-gs, --gamma-step <gamma_step>

Provides the size of the step, with which will be executed the iteration over the given <gamma-range>. Cannot be greater than length of <gamma-range>, else will be setto value of the lower bound of the <gamma_range>.

Show Commands

perun show

Interprets the given profile using the selected visualization technique.

Looks up the given profile and interprets it using the selected visualization technique. Some of the techniques outputs either to terminal (using ncurses) or generates HTML files, which can be browseable in the web browser (using bokeh library). Refer to concrete techniques for concrete options and limitations.

The shown <profile> will be looked up in the following steps:

  1. If <profile> is in form i@i (i.e, an index tag), then ith record registered in the minor version <hash> index will be shown.
  2. If <profile> is in form i@p (i.e., an pending tag), then ith profile stored in .perun/jobs will be shown.
  3. <profile> is looked-up within the minor version <hash> index for a match. In case the <profile> is registered there, it will be shown.
  4. <profile> is looked-up within the .perun/jobs directory. In case there is a match, the found profile will be shown.
  5. Otherwise, the directory is walked for any match. Each found match is asked for confirmation by user.

Tags consider the sorted order as specified by the following option format.sort_profiles_by.

Example 1. The following command will show the first profile registered at index of HEAD~1 commit. The resulting graph will contain bars representing sum of amounts per each subtype of resources and will be shown in the browser:

perun show -m HEAD~1 0@i bars sum --of 'amount' --per 'subtype' -v

Example 2. The following command will show the profile at the absolute path using in raw JSON format:

perun show ./echo-time-hello-2017-04-02-13-13-34-12.perf raw

For a thorough list and description of supported visualization techniques refer to Supported Visualizations.

perun show [OPTIONS] <profile> COMMAND [ARGS]...

Options

-m, --minor <minor>

Will check the index of different minor version <hash> during the profile lookup

Arguments

<profile>

Required argument

Show units

perun show bars

Customizable interpretation of resources using the bar format.

  • Limitations: none.
  • Interpretation style: graphical
  • Visualization backend: Bokeh

Bars graph shows the aggregation (e.g. sum, count, etc.) of resources of given types (or keys). Each bar shows <func> of resources from <of> key (e.g. sum of amounts, average of amounts, count of types, etc.) per each <per> key (e.g. per each snapshot, or per each type). Moreover, the graphs can either be (i) stacked, where the different values of <by> key are shown above each other, or (ii) grouped, where the different values of <by> key are shown next to each other. Refer to resources for examples of keys that can be used as <of>, <key>, <per> or <by>.

Bokeh library is the current interpretation backend, which generates HTML files, that can be opened directly in the browser. Resulting graphs can be further customized by adding custom labels for axes, custom graph title or different graph width.

Example 1. The following will display the sum of sum of amounts of all resources of given for each subtype, stacked by uid (e.g. the locations in the program):

perun show 0@i bars sum --of 'amount' --per 'subtype' --stacked --by 'uid'

The example output of the bars is as follows:

                                <graph_title>
                        `
                        -         .::.                ````````
                        `         :&&:                ` # \  `
                        -   .::.  ::::        .::.    ` @  }->  <by>
                        `   :##:  :##:        :&&:    ` & /  `
        <func>(<of>)    -   :##:  :##:  .::.  :&&:    ````````
                        `   ::::  :##:  :&&:  ::::
                        -   :@@:  ::::  ::::  :##:
                        `   :@@:  :@@:  :##:  :##:
                        +````||````||````||````||````

                                    <per>

Refer to Bars Plot for more thorough description and example of bars interpretation possibilities.

perun show bars [OPTIONS] <aggregation_function>

Options

-o, --of <of_key>

Sets key that is source of the data for the bars, i.e. what will be displayed on Y axis. [required]

-p, --per <per_key>

Sets key that is source of values displayed on X axis of the bar graph.

-b, --by <by_key>

Sets the key that will be used either for stacking or grouping of values

-s, --stacked

Will stack the values by <resource_key> specified by option –by.

-g, --grouped

Will stack the values by <resource_key> specified by option –by.

-f, --filename <filename>

Sets the outputs for the graph to the file.

-xl, --x-axis-label <x_axis_label>

Sets the custom label on the X axis of the bar graph.

-yl, --y-axis-label <y_axis_label>

Sets the custom label on the Y axis of the bar graph.

-gt, --graph-title <graph_title>

Sets the custom title of the bars graph.

-v, --view-in-browser

The generated graph will be immediately opened in the browser (firefox will be used).

Arguments

<aggregation_function>

Optional argument

perun show flamegraph

Flame graph interprets the relative and inclusive presence of the resources according to the stack depth of the origin of resources.

  • Limitations: memory profiles generated by Memory Collector.
  • Interpretation style: graphical
  • Visualization backend: HTML

Flame graph intends to quickly identify hotspots, that are the source of the resource consumption complexity. On X axis, a relative consumption of the data is depicted, while on Y axis a stack depth is displayed. The wider the bars are on the X axis are, the more the function consumed resources relative to others.

Acknowledgements: Big thanks to Brendan Gregg for creating the original perl script for creating flame graphs w.r.t simple format. If you like this visualization technique, please check out this guy’s site (http://brendangregg.com) for more information about performance, profiling and useful talks and visualization techniques!

The example output of the flamegraph is more or less as follows:

                    `
                    -                         .
                    `                         |
                    -              ..         |     .
                    `              ||         |     |
                    -              ||        ||    ||
                    `            |%%|       |--|  |!|
                    -     |## g() ##|     |#g()#|***|
                    ` |&&&& f() &&&&|===== h() =====|
                    +````||````||````||````||````||````

Refer to Flame Graph for more thorough description and examples of the interpretation technique. Refer to perun.profile.convert.to_flame_graph_format() for more details how the profiles are converted to the flame graph format.

perun show flamegraph [OPTIONS]

Options

-f, --filename <filename>

Sets the output file of the resulting flame graph.

-h, --graph-height <graph_height>

Increases the width of the resulting flame graph.

perun show flow

Customizable interpretation of resources using the flow format.

  • Limitations: none.
  • Interpretation style: graphical, textual
  • Visualization backend: Bokeh, ncurses

Flow graph shows the values resources depending on the independent variable as basic graph. For each group of resources identified by unique value of <by> key, one graph shows the dependency of <of> values aggregated by <func> depending on the <through> key. Moreover, the values can either be accumulated (this way when displaying the value of ‘n’ on x axis, we accumulate the sum of all values for all m < n) or stacked, where the graphs are output on each other and then one can see the overall trend through all the groups and proportions between each of the group.

Bokeh library is the current interpretation backend, which generates HTML files, that can be opened directly in the browser. Resulting graphs can be further customized by adding custom labels for axes, custom graph title or different graph width.

Example 1. The following will show the average amount (in this case the function running time) of each function depending on the size of the structure over which the given function operated:

perun show 0@i flow mean --of 'amount' --per 'structure-unit-size'
    --acumulated --by 'uid'

The example output of the bars is as follows:

                                <graph_title>
                        `
                        -                      ______    ````````
                        `                _____/          ` # \  `
                        -               /          __    ` @  }->  <by>
                        `          ____/      ____/      ` & /  `
        <func>(<of>)    -      ___/       ___/           ````````
                        `  ___/    ______/       ____
                        -/  ______/        _____/
                        `__/______________/
                        +````||````||````||````||````

                                  <through>

Refer to Flow Plot for more thorough description and example of flow interpretation possibilities.

perun show flow [OPTIONS] <aggregation_function>

Options

-o, --of <of_key>

Sets key that is source of the data for the flow, i.e. what will be displayed on Y axis, e.g. the amount of resources. [required]

-t, --through <through_key>

Sets key that is source of the data value, i.e. the independent variable, like e.g. snapshots or size of the structure.

-b, --by <by_key>

For each <by_resource_key> one graph will be output, e.g. for each subtype or for each location of resource. [required]

-s, --stacked

Will stack the y axis values for different <by> keys on top of each other. Additionaly shows the sum of the values.

--accumulate, --no-accumulate

Will accumulate the values for all previous values of X axis.

-ut, --use-terminal

Shows flow graph in the terminal using ncurses library.

-f, --filename <filename>

Sets the outputs for the graph to the file.

-xl, --x-axis-label <x_axis_label>

Sets the custom label on the X axis of the flow graph.

-yl, --y-axis-label <y_axis_label>

Sets the custom label on the Y axis of the flow graph.

-gt, --graph-title <graph_title>

Sets the custom title of the flow graph.

-v, --view-in-browser

The generated graph will be immediately opened in the browser (firefox will be used).

Arguments

<aggregation_function>

Optional argument

perun show heapmap

Shows interactive map of memory allocations to concrete memories for each function.

  • Limitations: memory profiles generated by Memory Collector.
  • Interpretation style: textual
  • Visualization backend: ncurses

Heap map shows the underlying memory map, and links the concrete allocations to allocated addresses for each snapshot. The map is interactive, one can either play the full animation of the allocations through snapshots or move and explore the details of the map.

Moreover, the heap map contains heat map mode, which accumulates the allocations into the heat representation—the hotter the colour displayed at given memory cell, the more time it was allocated there.

The heap map aims at showing the fragmentation of the memory and possible differences between different allocation strategies. On the other hand, the heat mode aims at showing the bottlenecks of allocations.

Refer to Heap Map for more thorough description and example of heapmap interpretation possibilities.

perun show heapmap [OPTIONS]

perun show scatter

Interactive visualization of resources and models in scatter plot format.

Scatter plot shows resources as points according to the given parameters. The plot interprets <per> and <of> as x, y coordinates for the points. The scatter plot also displays models located in the profile as a curves/lines.

  • Limitations: none.
  • Interpretation style: graphical
  • Visualization backend: Bokeh

Features in progress:

  • uid filters
  • models filters
  • multiple graphs interpretation

Graphs are displayed using the Bokeh library and can be further customized by adding custom labels for axis, custom graph title and different graph width.

The example output of the scatter is as follows:

                          <graph_title>
                  `                         o
                  -                        /
                  `                       /o       ```````````````````
                  -                     _/         `  o o = <points> `
                  `                   _- o         `    _             `
    <of>          -               __--o            `  _-  = <models> `
                  `    _______--o- o               `                 `
                  -    o  o  o                     ```````````````````
                  `
                  +````||````||````||````||````

                              <per>

Refer to Scatter Plot for more thorough description and example of scatter interpretation possibilities. For more thorough explanation of regression analysis and models refer to Regression Analysis.

perun show scatter [OPTIONS]

Options

-o, --of <of_key>

Data source for the scatter plot, i.e. what will be displayed on Y axis. [default: amount]

-p, --per <per_key>

Keys that will be displayed on X axis of the scatter plot. [default: structure-unit-size]

-f, --filename <filename>

Outputs the graph to the file specified by filename.

-xl, --x-axis-label <x_axis_label>

Label on the X axis of the scatter plot.

-yl, --y-axis-label <y_axis_label>

Label on the Y axis of the scatter plot.

-gt, --graph-title <graph_title>

Title of the scatter plot.

-v, --view-in-browser

Will show the graph in browser.

Utility Commands

perun utils

Contains set of developer commands, wrappers over helper scripts and other functions that are not the part of the main perun suite.

perun utils [OPTIONS] COMMAND [ARGS]...

Commands

create
stats
temp

perun utils create

According to the given <template> constructs a new modules in Perun for <unit>.

Currently this supports creating new modules for the tool suite (namely collect, postprocess, view) or new algorithms for checking degradation (check). The command uses templates stored in ../perun/templates directory and uses _jinja as a template handler. The templates can be parametrized by the following by options (if not specified ‘none’ is used).

Unless --no-edit is set, after the successful creation of the files, an external editor, which is specified by general.editor configuration key.

perun utils create [OPTIONS] <template> <unit>

Options

-nb, --no-before-phase

If set to true, the unit will not have before() function defined.

-na, --no-after-phase

If set to true, the unit will not have after() function defined.

--author <author>

Specifies the author of the unit

-ne, --no-edit

Will open the newly created files in the editor specified by general.editor configuration key.

-st, --supported-type <supported_types>

Sets the supported types of the unit (i.e. profile types).

Arguments

<template>

Required argument

<unit>

Required argument