# Perun Internals¶

Conceptually one Perun instances serves as a wrapper around the existing version control system (e.g. some repository). Perun takes specializes on storing the performance profiles and manages the link between minor versions and their corresponding profiles. Currently as a target vcs we support only git, with a custom lightweigth vcs being in development (called tagit). The architecture of Perun contains an interface that can be used to register support for new version control system as described in Creating Support for Custom VCS. Internal structure of one instance of Perun is inspired by git: performance profiles are similarly stored as objects compressed by zlib method and identified by hashes. Perun Storage describes the internal model of Perun more briefly.

The diagram above highlights the responsibilities and storage of individual systems. Version control systems manage the functionality of the project—its versions and precise code changes—but lack proper support for managing performance. On the other hand, performance versioning systems manages the performance of project—its individual performance profiles, data visualizations of various statistics—but lack the precise functionality changes. This means that vcs stores the actual code chungs and version references and pvs stores the actual profiling data.

This diagram shows one of the proper usages of Perun’s tool suite. Each developer keeps his own instance of both versioning and performance systems. In this mode one can share both the code changes and performance measurement through the wider range of developers.

## Version Control Systems¶

Version Control System manages the history of functionality of one project, i.e. stores the changes between different versions (or snapshots) of project. Each code change usually requires corresponding the performance profiles in order to detect potential performance degradation early in the development. The following subsection Version Control System API describes the layer which serves as an interface in Perun which supplies the necessary information between the version control and performance versioning systems.

### Version Control System API¶

perun.vcs.init(vcs_init_params)[source]

Calls the implementation of initialization of wrapped underlying version control system.

The initialization should take care of both reinitialization of existing version control system instances and newly created instances. Init is called during the perun init command from command line interface.

Parameters: vcs_init_params (dict) – dictionary of keyword arguments passed to initialization method of the underlying vcs module true if the underlying vcs was successfully initialized
perun.vcs.walk_minor_versions(head_minor_version)[source]

Generator of minor versions for the given major version, which yields the MinorVersion named tuples containing the following information: date, author, email, checksum (i.e. the hash representation of the minor version), commit_description and commit_parents (i.e. other minor versions).

Minor versions are walked through this function during the perun log command.

Parameters: head_minor_version (str) – the root minor versions which is the root of the walk. iterable stream of minor version representation
perun.vcs.walk_major_versions()[source]

Generator of major versions for the current wrapped repository.

This function is currently unused, but will be needed in the future.

Returns: iterable stream of major version representation
perun.vcs.get_minor_head()[source]

Returns the string representation of head of current major version, i.e. for git this returns the massaged HEAD reference.

This function is called mainly during the outputs of perun log and perun status but also during the automatic generation of profiles (either by perun run or perun collect), where the retrieved identification is used as origin.

perun.vcs.get_head_major_version()[source]

Returns the string representation of current major version of the wrapped repository.

Major version is displayed during the perun status output, which shows the current working major version of the project.

Returns: string representation of the major version
perun.vcs.get_minor_version_info(minor_version)[source]

Yields the specification of concrete minor version in form of the MinorVersion named tuples containing the following information: date, author, email, checksum (i.e. the hash representation of the minor version), commit_description and commit_parents (i.e. other minor versions).

This function is a non-generator alternative of perun.vcs.walk_minor_versions() and is used during the perun status output to display the specifics of minor version.

Parameters: minor_version (str) – the specification of minor version (in form of sha e.g.) for which we are retrieving the details minor version named tuple
perun.vcs.check_minor_version_validity(minor_version)[source]

Checks whether the given minor version specification corresponds to the wrapped version control system, and is not in wrong format.

Minor version validity is mostly checked during the lookup of the minor versions from the command line interface.

Parameters: minor_version (str) – the specification of minor version (in form of sha e.g.) for which we are checking the validity VersionControlSystemException – when the given minor version is invalid in the context of the wrapped version control system.
perun.vcs.massage_parameter(parameter, parameter_type=None)[source]

Conversion function for massaging (or unifying different representations of objects) the parameters for version control systems.

Massaging is mainly executed during from the command line interface, when one can e.g. use the references (like HEAD) to specify concrete minor versions. Massing then unifies e.g. the references or proper hash representations, to just one representation for internal processing.

Parameters: parameter (str) – vcs parameter (e.g. revision, minor or major version) which will be massaged, i.e. transformed to unified representation parameter_type (str) – more detailed type of the parameter string representation of parameter
perun.vcs.is_dirty()[source]

Tests whether the wrapped repository is dirty.

By dirty repository we mean a repository that has either a submitted changes to its index (i.e. we are in the middle of commit) or any unsubmitted changes to tracked files in the current working directory.

Note that this is crucial for performance testing, as any uncommited changes may skew the profiled data and hence the resulting profiles would not correctly represent the performance of minor versions.

Returns: whether the given repository is dirty or not
perun.vcs.save_state()[source]

Saves the state of the repository in case it is dirty.

When saving the state of the repository one should store all of the uncommited changes to the working directory and index. Any issues while this process happens should be handled by user itself, hence no workarounds and mending should take place in this function.

perun.vcs.restore_state(saved, state)[source]

Restores the previous state of the the repository

When restoring the state of the repository one should pop the stored changes from the stash and reapply them on the current directory. This make sure, that after the performance testing, the project is in the previous state and developer can continue with his work.

Parameters: saved (bool) – whether the stashed was something state (str) – the previous state of the repository
perun.vcs.checkout(minor_version)[source]

Checks out the new working directory corresponding to the given minor version.

According to the supplied minor version, this command should remake the working directory so it corresponds to the state defined by the minor version.

Parameters: minor_version (str) – minor version that will be checked out

### Creating Support for Custom VCS¶

You can register support for your own version control system as follows:

1. Create a new module in perun/vcs directory implementing functions from Version Control System API.
2. Finally register your newly created vcs wrapper in get_supported_module_names() located in perun.utils.__init__.py:
  1 2 3 4 5 6 7 8 9 10 11 --- /mnt/f/phdwork/perun/gh-pages/docs/_static/templates/supported_module_names.py +++ /mnt/f/phdwork/perun/gh-pages/docs/_static/templates/supported_module_names_collectors.py @@ -6,7 +6,7 @@ )) return { 'vcs': ['git'], - 'collect': ['trace', 'memory', 'time'], + 'collect': ['trace', 'memory', 'time', 'mycollector'], 'postprocess': ['filter', 'normalizer', 'regression-analysis'], 'view': ['alloclist', 'bars', 'flamegraph', 'flow', 'heapmap', 'raw', 'scatter'] }[package] 
1. Optionally implement batch of automatic test cases using (preferably based on pytest) in tests directory. Verify that registering did not break anything in the Perun, your wrapper is correct and optionally reinstall Perun:

make test
make install

2. If you think your wrapper could help others, please, consider making Pull Request.

## Perun Storage¶

The current internal representation of Perun storage is based on git internals and is meant for easy distribution, flexibility and easier managing. The possible extension of Perun to different versions of storages is currently under consideration. Internal objects and files for one local instance of Perun are stored in the filesystem in the .perun directory consisting of the following infrastructure:

.perun/
|-- /jobs
|-- /logs
|-- /objects
|-- local.yml

.perun/jobs:

Contains pending jobs, i.e. those that were generated by collectors, postprocessed by some postprocessors, or automatically generated by perun run commands, but are not yet assigned to concrete minor versions. These profiles contains the tag origin that maps the profile to concrete minor version, i.e. the parent of the profile. This key serves as a prevention of assigning profiles to incorrect minor versions.

.perun/jobs
|-- /baseline.perf
|-- /sll-comparison.perf
|-- /skip-lists-medium-height.perf
|-- /skip-lists-unlimited-height.perf

.perun/objects:

Corresponds to main storage of Perun and contains object primitives. Every object of Perun is represented by unique identifier (mostly by sha representation) and corresponds either to an object blob (containing compressed profile) or to an index of a corresponding minor version, which lists assigned profiles for the given minor version.

.perun/objects
|-- /07
|-- f2b4bfa06f6b1be5713f2bbae7740838456758
|-- 99dc4c5891947bdf7e26341231ca533432a1f1
|-- /3d
|-- 3859b46db4eea5866a0b2b28997fac25a95430
|-- /ff
|-- d35c8962d8d2019d7762a7bc6980c1d0f2fcd7
|-- d88aabca6e5427c78ea647e955ffa00d1cd615


Each object from .perun/objects is represented by hash value, where the first two characters are used to specify directory and the rest of the hash value a file name, where the index or compressed file is stored.

.perun/logs:
Contains various logs for various phases. Currently this holds logs for each minor version, for which we precollected new profiles during the perun check command. This behaviour can be set up by setting degradation.log_collect to true.
local.yml:
Contains local configuration, e.g. the specification of wrapped repository, job matrixes or formatting strings corresponding to concrete VCS. See Perun Configuration files for more information about configuration of Perun.

### Perun Index Specification¶

Each minor version of vcs, which has any profile assigned, has corresponding index file in the .perun/object according to its identification. The index file itself is stored in binary format with the following specification.

Index signature [4B]:
Signature are the first bytes of the index containing ascii string pidx, which serves as an quick identification of minor version index.
Index version [4B]:
Specification of version of conding of the index. Versioning is introduced for potential future backward compatibility with possible different specifications of index.
Number of Entries [4B]:
Integer count of the number of entries found in the index. Each entry of the index is of variable length and lists the profiles with mapping to their corresponding objects.
Entries [variable length]:

One entry of the index corresponds to one assigned profile. Each entry is of variable lenght and contains the identification of the original profile file, together with timestamp of creation and the identification of the compressed object, that contains the actual profiling data. Each entry can be broken into following parts:

• Creation time [4B]: creation time of the profile represented as 4B timestamp.
• Profile ID [20B]: unique identification of the profile, i.e. specification of the concrete compressed object located in the .perun/objects. Profile ID is always in form of SHA-1 hash, which is obtained from the contents.
• Origin Path [variable length: Original path to the profile represented as ascii string of variable length terminated by null byte.
Checksum [20B]:
Checksum of the whole index, which serves for error detection.

### Perun Object Specification¶

Each non-index object consist of short header ended with zero byte, consisting of header signature string, type of the profile and lenght of the content, and raw content of the performance profile w.r.t. Specification of Profile Format. First we compute the checksum for these data, which serves as an identification in the minor version indexes and in .perun/objects directory. Finally, the object is compressed using zlib method and stored in the .perun/objects compressed.

Signature [7B]:
Signature is a 7B prefix containing ascii string “profile”. Serves for quick identification of profile.
Type [variable length]:
Ascii specification of the profile type. This serves for quick and easy parsing of profiles.
Content Lenght [4B]:
Integer count of the non-header data followed after the zero byte in bytes.
Content [variable length]:
Contents of the performance profile w.r.t. Specification of Profile Format.

### The Lifetime of profile: Internals¶

The following subsections describes in more detail the basics of profile manipulations, namely registering, removing and lookuping up profiles.

#### Registering new profile¶

Given a profile, w.r.t. Specification of Profile Format, called sll-vs-skiplist.perf, registering this profile in HEAD minor version index, the following steps are executed:

1. sll-vs-skiplist.perf is loaded and parsed into JSON. Profile is verified whether it is in format specified by Specification of Profile Format.
2. origin key is compared with the massaged HEAD minor version. In case it differes, an error is raised and adding the profiles is canceled, as we are trying to register performance profile corresponding to other point of history. Otherwise the origin is removed from the profile and will not be stored in persistent storage.
3. We construct the header for the profile consisting of profile prefix, the type of the specified by type and length of the unpacked JSON representation of profile, joined by spaces and ended by null byte.
4. JSON contents of performance profile are appended to the header resulting into one object.
5. An SHA-1 hash checksum is computed out of the object. The hash serves both as a check that the profile was not damaged during next usage, as well as identification in the filesystem.
6. The object is compressed using zlib compression method and stored in the .perun/objects directory. First two characters of checksum specifies the target directory and the rest specifies the resulting filename.
7. An index corresponding to the HEAD minor version is opened (if it does not exist, it is newly created first). Minor version index is also represented by its hash, where first two characters of hash is used as directory and the rest as filename.
8. An entry for sll-vs-skiplist.perf with given modification time is registered within the index pointing to the checksum object with compressed data. The number of registered profiles in index is increased.
9. Unless it is specified otherwise, the sll-vs-skiplist.perf is removed from filesystem.

#### Removing profile from index¶

Given a profile filename sll-vs-skiplist.perf, removing it from the HEAD minor version index, requires the following steps to be executed:

1. An index corresponding to the HEAD minor version is opened. Minor version index is represented by its hash, where first two characters of hash is used as directory and the rest as filename. If the index does not exist, removing ends with an error.
2. An entry for sll-vs-skiplist.perf is looked up within within the index. If it is not found, the removing ends with an error. Other wise, the entry is removed from the index and the number of registered profiles in index is decreased.
3. The original compresed object, which was stored in the entry is kept in the .perun/objects directory.

#### Looking up profile¶

Profiles are looked-up during the perun show, perun add, perun postprocessby or perun rm and can be found in several places, namely the filesystem, pending storage or registered in index. Priorities during the lookup are usually as follows:

1. If the specification of profile is in form of i@i or i@p (i.e. the index and pending tags respectively), then i th profile registered in index or stored in pending jobs directory (.perun/jobs) is used.
2. Index of corresponding minor version is searched.
3. Absolute path in filesystem is checked.
4. .perun/jobs directory is searched for match, i.e. one can specify just partial name of the profile during the lookup.
5. Otherwise the whole scope of filesystem is walked. Each successful match asks user for confirmation until the profile is found.

Refer to Command Line Interface for precise specification of lookups during individual commands.