Processing
Basic Terms
Workflow: sequence of actions that must be performed for data analysis. The processing workflow is a combination of data processing nodes configured by a user to solve a particular task.
The processing sequence is determined by connection of the previous workflow node output and input of the subsequent one. The node input and output are input and output ports.
Workflow Example
The workflow node performs a separate operation with data. A list of possible operations is represented by a number of ready components. Thus, the component is a prototype or template of the future workflow node. To create the workflow node that performs the required operation with data, the corresponding component must be dragged from the component panel to the workflow construction area (refer to "The first workflow").
The workflow nodes are created from 2 types of components :
- Standard components are provided by the platform.
- Derived components are created and configured by a user. The derived component can be created from combination of the workflow nodes implementing the random processing logics.
Thus, a set of tools for implementation of different data processing logics is not limited by standard platform components, and it can be expanded by users.
The supernode is most commonly used to create the derived component. The supernode is a special node that can include other workflow nodes. Random logics can be implemented in the Supernode. At the same time, the workflow handler can consider it as the "black box".
The supernode accepts information by means of input ports, performs processing and sends results to output ports. Input and output ports are set by a user.
Both data sets, tables, and variables, objects containing only one value can be transferred from handler to handler. The statistical table data (for example, column sum, average value, etc.) can be transformed to variables using the special handler.
In their turn, variables can be used in handlers to transform tables. As tables and variables have different structure, the ports corresponding to them cannot be connected with each other, and they are differently identified.
Standard Components
Transformation
A set of components for the initial preparation and ordinary processing of the source data sets.
- Grouping
- Date and time
- Enrich data
- Replace
- Calculator
- JS Calculator
- Cross table
- Union
- Features of fields
- Ungroup
- Collapse columns
- Lag
- Join
- Connection
- Sort
- Row filter
Control
The group components are designated for the workflows optimization by supernodes creation, repeated use of nodes and development of the workflow execution logics using antecedents and loops.
Research
These handlers enable to assess and/or visualize the structure and statistical data characteristics. They also enable to perform exploratory and descriptive analysis.
Preprocessing
Data preprocessing for its further use in the Data Mining algorithms. Such methods as imputation, sampling, outliers elimination, etc. are applied.
Data Mining
The handlers related to this group represent the tools for implementation of different Data Mining methods: clustering, association rules, etc.
- EM clustering
- Association rules
- Clustering
- Transaction clustering
- Logistic regression
- Neural network (classification)
- Neural network (regression)
- Self-organizing networks
Variables
In Megaladata it is possible to create and use variables. The handlers related to this group enable to perform different operations with them, namely, change, creation of variables from the table, calculation of the new variables using different functions.
Integration
The integration mechanisms are designated for data exchange between the third party external systems and Megaladata platform.
Articles in Section: