Indoor navigation with machine learning
Data Cleanup
If there is a clear relationship between the properties and the target variable, the data is redundant. Unsupervised learning finds out if the redundancy is broken (e.g., due to a write error when acquiring the data).
Line 1 in Listing 15 transfers the input data to a pandas DataFrame, and line 2 assigns the data to one of the four clusters. The anonymous values
through 3
correspond to the rooms encountered during the supervised learning example. But which four rooms are identified?
Listing 15
Data Cleanup
01 dfu = pd.DataFrame(Xu, columns = [0, 1, 2, 3, 4, 5, 6]) 02 dfu['Target'] = kmeans.predict(Xu) 03 kList = classifier.predict(clusterCenters) 04 transD = {i: el for i, el in enumerate(kList)} 05 dfu['Target'] = dfu['Target'].map(transD)
Line 3 in Listing 15 uses the Random Forest classifier classifier
(trained in the supervised learning example) to transform the four K-Means cluster focal points back to the target values from supervised learning: the identifiers for the four rooms. Line 4 prepares a dictionary for translation using map
in line 5 to replace the numbers with room names.
Now I can compare the values: Does the mapping of the rooms from the source data match the clusters that K-Means found? To do this, I add an additional column Targetu
to the DataFrame object in Listing 16. The new DataFrame object dfgroup
takes only the values that differ in the target columns. Line 5 counts the differences.
Listing 16
Room Assignments Source Data
:dfDu = df.copy() dfDu['Targetu']= dfu['Target'] dfDu[dfDu['Target'] != dfDu ['Targetu']].iloc[:,-2:] dfgroup = dfDu[dfDu['Target'] != dfDu ['Targetu']].iloc[:,-2:] dfgroup.groupby(['Target', 'Targetu'])['Targetu'].count()
Listing 17 shows the output from Listing 16. K-Means recognizes that the living room is a better fit than the hallway in 75 cases and better than the kitchen in four cases. I already found in supervised learning that the hallway was interpreted as the living room eight times.
Listing 17
Room Assignments Output
Target Targetu Hallway Living_room 75 Kitchen Living_room 4 Patio Kitchen 2 Living_room 2 Living_room Kitchen 2 Patio 6
Further suggestions to clean up the data are only hinted at here. In my example, errors in the assignments only occur in neighboring positions. It is particularly difficult to distinguish between the hallway and the living room and, to a lesser extent, between the living room and the patio. Little points to errors caused by carelessness (i.e., completely wrongly assigned rooms).
In addition, you could consider the decision-making statistics from supervised learning and remove the unclear values. This improves the classifier's learning ability. For the evaluation, you would define a threshold to create more categories. For example, the data is then classified as "probably living room or hallway" rather than a supposedly unambiguous but actually uncertain statement.
Reducing the Dimensions
Two-dimensional diagrams show the dependence of two parameters, and three parameters span a three-dimensional space. The fourth dimension is often illustrated by a time stamp on consecutive diagrams. In looking for Tom, I am dealing with seven component values. In order to be able to show the dependency on a target value, I picked out two component values earlier.
I do this cautiously. From supervised learning, I know the components with the highest prioritization. PCA condenses the feature's information and reduces the dimensions without knowing the target values. It is a powerful approach that also detects outliers. I have limited myself to a few use cases.
Listing 18 turns out to be largely self-explanatory. After importing the PCA library, the code reduces the number of components, in this case from seven to seven – so nothing is gained initially. However, the method returns the explained_variance_ratio_
attribute, and the cumsum
function returns the cumulative sum. Figure 14 shows that already a single component returns 65 percent of the results correctly, and two components even return 85 percent.
Listing 18
Principle Component Analyis
from sklearn.decomposition import PCA pca_7 = PCA(n_components=7) pca_7.fit(Xu) x = list(range(1,8)) plt.grid() plt.plot(x, np.cumsum(pca_7.explained_variance_ratio_ * 100)) plt.xlabel('Number of components') plt.ylabel('Explained variance') plt.show()
Given clusters = 4
(i.e., four clusters), Listing 19 outputs a Voronoi diagram in Figure 15, which generates a cluster distribution similar to Figure 10 without knowledge of the prioritized components. The axes have lost their old meaning. In my example, seven signal strengths give rise to two new scales with new metrics. The slightly distorted point clouds are mirror images.
Listing 19
Voronoi Diagram
from scipy.spatial import Voronoi, voronoi_plot_2d from matplotlib import cm x1, x2 = 1, 0 # clusters = 4 clusters = 16 Xur = pca_2_reduced kmeansp = KMeans(n_clusters=clusters, init='k-means++', max_iter=300, n_init=10, random_state=0) kmeansp.fit(Xur) y_pred = kmeansp.predict(Xur) ccp = kmeansp.cluster_centers_[:,[x1, x2]] fig, ax1 = plt.subplots(figsize=(4,5), dpi=120) vor = Voronoi(ccp) voronoi_plot_2d(vor, ax = ax1, line_width = 1) plt.scatter(Xur[:,x1], Xur[:,x2], s= 3, c=y_pred, cmap = plt.get_cmap('viridis')) plt.scatter(ccp[:, 0], ccp[:, 1], s=150, c='red', marker = 'X') for i, p in enumerate(ccp): plt.annotate(f'$\\bf{i}$', (p[0]+1, p[1]+2))
While decision trees do not require metrics, K-Means and PCA compare distances between datapoints. Typically, preprocessing raises the scales to a comparable level. The error by omission factor remains relatively small at this point because the signal strengths of all attributes are of a similar magnitude.
Figure 16 illustrates that preprocessing plays an important role in estimating the number of clusters. Changing just one variable, clusters = 18
, paints a whole new picture. Because of the Voronoi cells [7] and the coloring, the diagram looks quite convincing, but it doesn't tell me where Tom is located.
Conclusions
It is impossible to calculate Tom's location using an analytical solution, mainly due to indoor obstacles weakening the signal and providing varying results. To find Tom, I instead relied on machine learning methods. The methods discussed in this article use weak artificial intelligence. So far, I have not seen any approaches from artificial intelligence (i.e., self-reflecting systems).
With supervised machine learning, I used the Random Forest classifier to categorize new data. K-Means, as an example of unsupervised learning, let me look at the data without a target variable, find interconnections, and evaluate the quality of the data. Combining the Random Forest classifier and K-Means, I cleaned up the data using semi-supervised learning.
In addition, using Python's scikit-learn libraries ensures easy access to machine learning programming. This gives users more time to explore the constraints and understand the dependencies of the results.
In the end, I think Tom is probably in the living room – or the hallway. Happy hunting!
Infos
- UCI dataset: https://archive.ics.uci.edu/ml/datasets/Wireless+Indoor+Localization
- Pandas library: https://pandas.pydata.org/
- Kernel Density Estimation: https://en.wikipedia.org/wiki/Kernel_density_estimation
- K-Means Classifier: https://en.wikipedia.org/wiki/K-means_clustering
- scikit-learn: https://scikit-learn.org
- PCA: https://en.wikipedia.org/wiki/Principal_component_analysis
- Voronoi diagram: https://en.wikipedia.org/wiki/Voronoi_diagram
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News
-
The GNU Project Celebrates Its 40th Birthday
September 27 marks the 40th anniversary of the GNU Project, and it was celebrated with a hacker meeting in Biel/Bienne, Switzerland.
-
Linux Kernel Reducing Long-Term Support
LTS support for the Linux kernel is about to undergo some serious changes that will have a considerable impact on the future.
-
Fedora 39 Beta Now Available for Testing
For fans and users of Fedora Linux, the first beta of release 39 is now available, which is a minor upgrade but does include GNOME 45.
-
Fedora Linux 40 to Drop X11 for KDE Plasma
When Fedora 40 arrives in 2024, there will be a few big changes coming, especially for the KDE Plasma option.
-
Real-Time Ubuntu Available in AWS Marketplace
Anyone looking for a Linux distribution for real-time processing could do a whole lot worse than Real-Time Ubuntu.
-
KSMBD Finally Reaches a Stable State
For those who've been looking forward to the first release of KSMBD, after two years it's no longer considered experimental.
-
Nitrux 3.0.0 Has Been Released
The latest version of Nitrux brings plenty of innovation and fresh apps to the table.
-
Linux From Scratch 12.0 Now Available
If you're looking to roll your own Linux distribution, the latest version of Linux From Scratch is now available with plenty of updates.
-
Linux Kernel 6.5 Has Been Released
The newest Linux kernel, version 6.5, now includes initial support for two very exciting features.
-
UbuntuDDE 23.04 Now Available
A new version of the UbuntuDDE remix has finally arrived with all the updates from the Deepin desktop and everything that comes with the Ubuntu 23.04 base.