Indoor navigation with machine learning
Data Cleanup
If there is a clear relationship between the properties and the target variable, the data is redundant. Unsupervised learning finds out if the redundancy is broken (e.g., due to a write error when acquiring the data).
Line 1 in Listing 15 transfers the input data to a pandas DataFrame, and line 2 assigns the data to one of the four clusters. The anonymous values
through 3
correspond to the rooms encountered during the supervised learning example. But which four rooms are identified?
Listing 15
Data Cleanup
01 dfu = pd.DataFrame(Xu, columns = [0, 1, 2, 3, 4, 5, 6]) 02 dfu['Target'] = kmeans.predict(Xu) 03 kList = classifier.predict(clusterCenters) 04 transD = {i: el for i, el in enumerate(kList)} 05 dfu['Target'] = dfu['Target'].map(transD)
Line 3 in Listing 15 uses the Random Forest classifier classifier
(trained in the supervised learning example) to transform the four K-Means cluster focal points back to the target values from supervised learning: the identifiers for the four rooms. Line 4 prepares a dictionary for translation using map
in line 5 to replace the numbers with room names.
Now I can compare the values: Does the mapping of the rooms from the source data match the clusters that K-Means found? To do this, I add an additional column Targetu
to the DataFrame object in Listing 16. The new DataFrame object dfgroup
takes only the values that differ in the target columns. Line 5 counts the differences.
Listing 16
Room Assignments Source Data
:dfDu = df.copy() dfDu['Targetu']= dfu['Target'] dfDu[dfDu['Target'] != dfDu ['Targetu']].iloc[:,-2:] dfgroup = dfDu[dfDu['Target'] != dfDu ['Targetu']].iloc[:,-2:] dfgroup.groupby(['Target', 'Targetu'])['Targetu'].count()
Listing 17 shows the output from Listing 16. K-Means recognizes that the living room is a better fit than the hallway in 75 cases and better than the kitchen in four cases. I already found in supervised learning that the hallway was interpreted as the living room eight times.
Listing 17
Room Assignments Output
Target Targetu Hallway Living_room 75 Kitchen Living_room 4 Patio Kitchen 2 Living_room 2 Living_room Kitchen 2 Patio 6
Further suggestions to clean up the data are only hinted at here. In my example, errors in the assignments only occur in neighboring positions. It is particularly difficult to distinguish between the hallway and the living room and, to a lesser extent, between the living room and the patio. Little points to errors caused by carelessness (i.e., completely wrongly assigned rooms).
In addition, you could consider the decision-making statistics from supervised learning and remove the unclear values. This improves the classifier's learning ability. For the evaluation, you would define a threshold to create more categories. For example, the data is then classified as "probably living room or hallway" rather than a supposedly unambiguous but actually uncertain statement.
Reducing the Dimensions
Two-dimensional diagrams show the dependence of two parameters, and three parameters span a three-dimensional space. The fourth dimension is often illustrated by a time stamp on consecutive diagrams. In looking for Tom, I am dealing with seven component values. In order to be able to show the dependency on a target value, I picked out two component values earlier.
I do this cautiously. From supervised learning, I know the components with the highest prioritization. PCA condenses the feature's information and reduces the dimensions without knowing the target values. It is a powerful approach that also detects outliers. I have limited myself to a few use cases.
Listing 18 turns out to be largely self-explanatory. After importing the PCA library, the code reduces the number of components, in this case from seven to seven – so nothing is gained initially. However, the method returns the explained_variance_ratio_
attribute, and the cumsum
function returns the cumulative sum. Figure 14 shows that already a single component returns 65 percent of the results correctly, and two components even return 85 percent.
Listing 18
Principle Component Analyis
from sklearn.decomposition import PCA pca_7 = PCA(n_components=7) pca_7.fit(Xu) x = list(range(1,8)) plt.grid() plt.plot(x, np.cumsum(pca_7.explained_variance_ratio_ * 100)) plt.xlabel('Number of components') plt.ylabel('Explained variance') plt.show()
Given clusters = 4
(i.e., four clusters), Listing 19 outputs a Voronoi diagram in Figure 15, which generates a cluster distribution similar to Figure 10 without knowledge of the prioritized components. The axes have lost their old meaning. In my example, seven signal strengths give rise to two new scales with new metrics. The slightly distorted point clouds are mirror images.
Listing 19
Voronoi Diagram
from scipy.spatial import Voronoi, voronoi_plot_2d from matplotlib import cm x1, x2 = 1, 0 # clusters = 4 clusters = 16 Xur = pca_2_reduced kmeansp = KMeans(n_clusters=clusters, init='k-means++', max_iter=300, n_init=10, random_state=0) kmeansp.fit(Xur) y_pred = kmeansp.predict(Xur) ccp = kmeansp.cluster_centers_[:,[x1, x2]] fig, ax1 = plt.subplots(figsize=(4,5), dpi=120) vor = Voronoi(ccp) voronoi_plot_2d(vor, ax = ax1, line_width = 1) plt.scatter(Xur[:,x1], Xur[:,x2], s= 3, c=y_pred, cmap = plt.get_cmap('viridis')) plt.scatter(ccp[:, 0], ccp[:, 1], s=150, c='red', marker = 'X') for i, p in enumerate(ccp): plt.annotate(f'$\\bf{i}$', (p[0]+1, p[1]+2))
While decision trees do not require metrics, K-Means and PCA compare distances between datapoints. Typically, preprocessing raises the scales to a comparable level. The error by omission factor remains relatively small at this point because the signal strengths of all attributes are of a similar magnitude.
Figure 16 illustrates that preprocessing plays an important role in estimating the number of clusters. Changing just one variable, clusters = 18
, paints a whole new picture. Because of the Voronoi cells [7] and the coloring, the diagram looks quite convincing, but it doesn't tell me where Tom is located.
Conclusions
It is impossible to calculate Tom's location using an analytical solution, mainly due to indoor obstacles weakening the signal and providing varying results. To find Tom, I instead relied on machine learning methods. The methods discussed in this article use weak artificial intelligence. So far, I have not seen any approaches from artificial intelligence (i.e., self-reflecting systems).
With supervised machine learning, I used the Random Forest classifier to categorize new data. K-Means, as an example of unsupervised learning, let me look at the data without a target variable, find interconnections, and evaluate the quality of the data. Combining the Random Forest classifier and K-Means, I cleaned up the data using semi-supervised learning.
In addition, using Python's scikit-learn libraries ensures easy access to machine learning programming. This gives users more time to explore the constraints and understand the dependencies of the results.
In the end, I think Tom is probably in the living room – or the hallway. Happy hunting!
Infos
- UCI dataset: https://archive.ics.uci.edu/ml/datasets/Wireless+Indoor+Localization
- Pandas library: https://pandas.pydata.org/
- Kernel Density Estimation: https://en.wikipedia.org/wiki/Kernel_density_estimation
- K-Means Classifier: https://en.wikipedia.org/wiki/K-means_clustering
- scikit-learn: https://scikit-learn.org
- PCA: https://en.wikipedia.org/wiki/Principal_component_analysis
- Voronoi diagram: https://en.wikipedia.org/wiki/Voronoi_diagram
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Gnome 47.1 Released with a Few Fixes
The latest release of the Gnome desktop is all about fixing a few nagging issues and not about bringing new features into the mix.
-
System76 Unveils an Ampere-Powered Thelio Desktop
If you're looking for a new desktop system for developing autonomous driving and software-defined vehicle solutions. System76 has you covered.
-
VirtualBox 7.1.4 Includes Initial Support for Linux kernel 6.12
The latest version of VirtualBox has arrived and it not only adds initial support for kernel 6.12 but another feature that will make using the virtual machine tool much easier.
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.