Stream processing made easy with Apache StreamPipes
Building Pipelines
After implementing the adapter, you can create a pipeline to compute some interesting data. The pipeline editor relies on the drag-and-drop principle: you can drag data streams, data processors, and data sinks into the editing area and link them together.
A schematic of the ISS application is shown in Figure 5. The program will transform the geographic coordinates using a reverse geocoding procedure to find the location nearest to the current coordinates. To do this, I will use an integrated component that covers a selection of around 5,000 cities worldwide. In addition, I'll use a Speed Calculator that calculates an average speed based on several successive locations. When I'm done, the processing pipeline should generate a notification as soon as the ISS enters a defined radius around a certain location.
Start to assemble the pipeline by first dragging the ISS-Location data stream you just created into the editing area and selecting the Reverse Geocoding component from the Data Processors tab. The two components are now linked. The StreamPipes core now checks the compatibility – in this case, the geocoder needs an input event with a coordinate pair consisting of Latitude and Longitude, which the ISS data stream provides.
After the check, a configuration dialog opens. You can parameterize many algorithms here, for example, by specifying configurable threshold values. For the Geocoder, the only possible configurations are already preselected. After pressing Save, move on to add the next pipeline element – in this case, the Speed Calculator component – and configure it. To visualize the results, click on the Data Sinks tab and select the Dashboard Sink item. This allows you to set up a matching visualization in the live dashboard later.
Now you just need some notification that the space station is approaching. To do this, connect the Static Distance Calculator component to the ISS data stream through another output. Two inputs are required. The first one is a pair of coordinates for the location to which you want to calculate the current distance – in this case, I will use the coordinates for the city of Karlsruhe, Germany (Latitude 49.006889, Longitude 8.403653).
Then add a Numerical Filter to this component, with a value of distance
for Field to Filter, <
as the FilterOperation, and, say, 200
as the Threshold. The actual notification is generated by the Notification component, which you can now configure by adding a title and some additional text (Figure 6). Finally, add another dashboard sink to the Distance Calculator to visualize the distance.
A click on Save Pipeline starts the pipeline, after you enter a name, and takes the user to the overview. In the background, the existing microservices are instantiated with the selected configurations. The detailed view shows the configured distributed system; all components now exchange data via automatically created topics in Apache Kafka.
In addition to the standard wrapper used in this example, which runs directly on the Java Virtual Machine (JVM) and can also run on a Raspberry Pi, there are other wrappers for scalable applications based on Apache Flink or Kafka Streams.
Data Exploration
At this point, you still have to visualize the results. Two modules are available for this purpose: the Live Dashboard and the Data Explorer to display live or historical data. The live dashboard is the right choice to visualize the ISS. First of all, you need to set up a new dashboard; different widgets then handle the task of displaying the live data. I decided to display the speed, the closest city, and the distance to Karlsruhe by means of a single-value widget, and I added a map display of the current position (Figure 7). Dashboards like this can also be accessed separately from the actual StreamPipes web application via generated links.
With just a few clicks, I have created an application that analyzes a continuously incoming data stream. The algorithms described in this article specialize in geographic operations, but the library contains many other modules, including modules for calculating statistics and identifying trends, as well as image processing and object recognition. There is also a JavaScript evaluator that offers great flexibility when it comes to transforming data streams.
Extension of the Toolbox
No-code solutions for data stream analysis initially include only a limited set of algorithms or data sinks. Extensions are therefore necessary for applications that are not covered by existing components.
To simplify the development of new algorithms for StreamPipes, a software development kit is available. Currently, an SDK exists for Java, and support for JavaScript and Python are in the works. The Java SDK lets you create new components using a Maven archetype. The command from Listing 2 generates a new project, including the required Java classes to create a new pipeline element.
Listing 2
Generate Project via Maven
mvn archetype:generate \ -DarchetypeGroupId=org.apache.streampipes \ -DarchetypeArtifactId=streampipes-archetype-pe-processors-jvm \ -DarchetypeVersion=0.67.0
The anatomy of a pipeline element follows the scheme shown in Figure 8. One (or more) data processors in StreamPipes are encapsulated in a standalone microservice that can be accessed via an API. The API provides StreamPipes' central pipeline management with a description of the available processors (or sinks) and is called whenever a pipeline starts or terminates.
The description contains information such as required user configurations (for example, input parameters or selection menus) that the web interface displays. It also defines stream requirements. In this way, the pipeline element developer can define requirements for the incoming data stream, such as the presence of a numerical value for a corresponding filter or, in the example described, geocoordinates (latitude and longitude in WGS84 format).
An output strategy defines the syntax of the outgoing events delivered by the component. It describes the transformation of incoming data streams into an outgoing data stream. For example, OutputStrategies.keep()
can be used to specify that the output data stream corresponds to the input data stream in the structure. Finally, event grounding defines message formats supported by the component for transmission. This can be JSON or binary formats such as Apache Thrift, as well as various supported protocols. StreamPipes supports Kafka and Java Message Service (JMS) out the box.
If you now start the newly created component via the integrated init method, it automatically logs into StreamPipes' pipeline management and can be installed via the user interface. The Maven archetype generates a Dockerfile in addition to the required code classes, which ensures an easy transition to the production system. The online documentation contains several tutorials that explain how to create new components for StreamPipes.
As soon as the user starts a pipeline, the API calls the runtime and the implemented function. Messages are continuously received via the selected protocol (Apache Kafka in this case); the calculated results are then sent back to the broker.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
OpenMandriva Lx 23.03 Rolling Release is Now Available
OpenMandriva "ROME" is the latest point update for the rolling release Linux distribution and offers the latest updates for a number of important applications and tools.
-
CarbonOS: A New Linux Distro with a Focus on User Experience
CarbonOS is a brand new, built-from-scratch Linux distribution that uses the Gnome desktop and has a special feature that makes it appealing to all types of users.
-
Kubuntu Focus Announces XE Gen 2 Linux Laptop
Another Kubuntu-based laptop has arrived to be your next ultra-portable powerhouse with a Linux heart.
-
MNT Seeks Financial Backing for New Seven-Inch Linux Laptop
MNT Pocket Reform is a tiny laptop that is modular, upgradable, recyclable, reusable, and ships with Debian Linux.
-
Ubuntu Flatpak Remix Adds Flatpak Support Preinstalled
If you're looking for a version of Ubuntu that includes Flatpak support out of the box, there's one clear option.
-
Gnome 44 Release Candidate Now Available
The Gnome 44 release candidate has officially arrived and adds a few changes into the mix.
-
Flathub Vying to Become the Standard Linux App Store
If the Flathub team has any say in the matter, their product will become the default tool for installing Linux apps in 2023.
-
Debian 12 to Ship with KDE Plasma 5.27
The Debian development team has shifted to the latest version of KDE for their testing branch.
-
Planet Computers Launches ARM-based Linux Desktop PCs
The firm that originally released a line of mobile keyboards has taken a different direction and has developed a new line of out-of-the-box mini Linux desktop computers.
-
Ubuntu No Longer Shipping with Flatpak
In a move that probably won’t come as a shock to many, Ubuntu and all of its official spins will no longer ship with Flatpak installed.