Stream processing made easy with Apache StreamPipes

Space Flyby

Article from Issue 244/2021
Author(s): , Author(s): , Author(s):

You don't need to be a stream processing expert to create useful custom solutions with Apache StreamPipes. We'll use StreamPipes to build a simple app that calculates when the International Space Station will fly overhead.

Our modern world is increasingly dependent on continuous data streams that generate large volumes of data in real time. These streams might come from science experiments, weather stations, business applications, or sensors on a factory shop floor. Many of the software systems that interact with these data streams follow an architecture in which events drive individual components. Continuous data sources (producers) such as sensors trigger events, and various components (consumers) process them. Producers and consumers are decoupled using a middleware layer that handles the distribution of the data, usually in the form of a message broker. This approach reduces complexity, because any number of services can receive and process incoming data streams virtually simultaneously. This flexible architecture gives rise to a new generation of tools that provide users with an easy way to create custom solutions that process data from incoming streams. One example is the open source framework Apache StreamPipes [1].

StreamPipes has been an incubator project at the Apache Software Foundation since November 2019 and is part of a growing number of solutions for the Internet of Things (IoT). The StreamPipes toolbox [2] is aimed at business users with limited technical knowledge. The main goal is to make stream-processing technologies accessible to nonexperts. Various modules are available to connect IoT data streams from a variety of sources, to generate analyses of these data streams, and to examine live or historical data.

StreamPipes offers a variety of connectors and algorithms for analyzing industrial data, with the focus on integrating data from the production and automation environment. But users without access to their own production line can also benefit from StreamPipes: For example, real-time data from publicly available APIs and widely used protocols such as MQTT can be used to connect existing data sources.

One important StreamPipes component is the Pipeline Editor. Users can rely on graphical, dataflow-oriented modeling to independently generate processing pipelines that the underlying stream processing infrastructure then automatically executes. On the application side, StreamPipes is useful for applications such as continuous monitoring (e.g., condition monitoring), detection of time-critical situations, live computation of key performance indicators, and integration of machine learning models. Figure 1 provides a rough overview of StreamPipes, from data connection, processing, and analysis through to deployment.

Figure 1: An overview of StreamPipes.

Stream Processing Made Easy

Figure 2 shows the different layers of the StreamPipes architecture. Most users will want to connect existing data streams in the first step. For this purpose, StreamPipes provides a library with the StreamPipes Connect module to connect data based on standard protocols or certain special systems already supported by StreamPipes. Connect adapters, which can also be installed on lightweight edge devices such as Raspberry Pis, handle the task of collecting and forwarding data streams to the internal message broker – Apache Kafka is used under the hood. In the Connect adapters, users can define their own transformation rules (e.g., to convert value units).

Figure 2: An overview of the StreamPipes architecture

One layer above the transport layer are reusable algorithms (e.g., for detecting statistical trends, preprocessing data, or image processing), each of which encapsulates a specific function and is available as an event-driven microservice. In addition to algorithms, StreamPipes also provides data sinks in this way, such as connectors for databases or dashboards.

Each individual microservice provides a machine-readable description of the algorithm's requirements and functionality. For example, certain required data types or measurement units can be specified that the data stream must provide to initialize the component. The algorithm kit can be extended at runtime with a software development kit, so that the user can install additional algorithms at any time, when new requirements arise, without restarting the application.

Users interact with the web-based front end, which makes it easy to build pipelines by linking data streams with algorithms and data sinks. In contrast to other graphical tools for modeling data flows, StreamPipes integrates a matching component directly into the core application. This component continuously checks the consistency of processing pipelines while the model is being built and relies on semantic checking to prevent modeling of faulty connections.

From Data to Application in a Few Clicks

For an example of a simple StreamPipes application, consider the International Space Station (ISS). Scientists rely on an open API to determine the current position of the ISS in its orbit around the Earth. The goal of the StreamPipes application will be to calculate other key figures based on incoming data and display the results on a live dashboard.

First you will need to install StreamPipes. The easiest way to set up StreamPipes is to use a Docker-based installation (Listing 1), which downloads and starts all the required components. Both Docker and Docker Compose must be present on the system; Docker needs a RAM allocation of 2 to 3GB.

Listing 1

Install and Launch StreamPipes

# download and unzip latest release from
$ cd incubator-streampipes-installer/compose
$ docker-compose up -d

During the initial installation, the Docker images for StreamPipes and other images used in the background (for example, Apache Kafka) are loaded. Once the system is started, you can complete the setup in a web browser. By default, the interface is accessible on port 80. After you log in with your choice of user credentials (they are only saved locally), the StreamPipes welcome page appears (Figure 3).

Figure 3: The StreamPipes welcome page.

Simple IoT Data Connection with Connect

The first step is for the application to receive the position data of the ISS as a continuous data stream. For this purpose, you need to change to the Connect module. The data marketplace, which is now visible, shows you the existing adapters, each of which can be configured individually (Figure 4). For example, you will find generic adapters for MQTT, PLC controls, Kafka, or databases, as well as some specific adapters for source systems such as Slack. For this ISS application, I will use the preconfigured ISS Location adapter.

Figure 4: The data marketplace within the Connect module.

Each adapter has a wizard to configure the required parameters. In this case, the matching adapter generates an event with only three parameters: a timestamp and the coordinates of the current ISS location (latitude and longitude in WGS84 format).

At the end of the wizard, assign a name to the new adapter (here ISS-Location) and start the process. From now on, regular updates of the ISS position will reach the underlying Apache Kafka infrastructure. A quick look at the pipeline editor shows a new icon in the Data Streams tab.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Stream Processing 101

    Batch processing strategies won't help if you need to process large volumes of incoming data in real time. Stream processing is a promising alternative to conventional batch techniques.

  • Loki Workshop

    Loki is a powerful, scalable, and easy-to-use solution for aggregating log data.

  • DCCP

    The DCCP protocol gives multimedia developers a powerful alternative to TCP and UDP.

  • FOSSPicks

    There's a collective groan in Graham's household whenever he gets a new device and finds a terminal prompt. The latest victim to his nmap skills is an LG OLED television!

  • Pipes in the Shell

    Pipes in the shell offer a surprising amount of versatility, including the ability to transfer data between computers.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More