Indoor navigation with machine learning
Data Cleanup
If there is a clear relationship between the properties and the target variable, the data is redundant. Unsupervised learning finds out if the redundancy is broken (e.g., due to a write error when acquiring the data).
Line 1 in Listing 15 transfers the input data to a pandas DataFrame, and line 2 assigns the data to one of the four clusters. The anonymous values
through 3
correspond to the rooms encountered during the supervised learning example. But which four rooms are identified?
Listing 15
Data Cleanup
01 dfu = pd.DataFrame(Xu, columns = [0, 1, 2, 3, 4, 5, 6]) 02 dfu['Target'] = kmeans.predict(Xu) 03 kList = classifier.predict(clusterCenters) 04 transD = {i: el for i, el in enumerate(kList)} 05 dfu['Target'] = dfu['Target'].map(transD)
Line 3 in Listing 15 uses the Random Forest classifier classifier
(trained in the supervised learning example) to transform the four K-Means cluster focal points back to the target values from supervised learning: the identifiers for the four rooms. Line 4 prepares a dictionary for translation using map
in line 5 to replace the numbers with room names.
Now I can compare the values: Does the mapping of the rooms from the source data match the clusters that K-Means found? To do this, I add an additional column Targetu
to the DataFrame object in Listing 16. The new DataFrame object dfgroup
takes only the values that differ in the target columns. Line 5 counts the differences.
Listing 16
Room Assignments Source Data
:dfDu = df.copy() dfDu['Targetu']= dfu['Target'] dfDu[dfDu['Target'] != dfDu ['Targetu']].iloc[:,-2:] dfgroup = dfDu[dfDu['Target'] != dfDu ['Targetu']].iloc[:,-2:] dfgroup.groupby(['Target', 'Targetu'])['Targetu'].count()
Listing 17 shows the output from Listing 16. K-Means recognizes that the living room is a better fit than the hallway in 75 cases and better than the kitchen in four cases. I already found in supervised learning that the hallway was interpreted as the living room eight times.
Listing 17
Room Assignments Output
Target Targetu Hallway Living_room 75 Kitchen Living_room 4 Patio Kitchen 2 Living_room 2 Living_room Kitchen 2 Patio 6
Further suggestions to clean up the data are only hinted at here. In my example, errors in the assignments only occur in neighboring positions. It is particularly difficult to distinguish between the hallway and the living room and, to a lesser extent, between the living room and the patio. Little points to errors caused by carelessness (i.e., completely wrongly assigned rooms).
In addition, you could consider the decision-making statistics from supervised learning and remove the unclear values. This improves the classifier's learning ability. For the evaluation, you would define a threshold to create more categories. For example, the data is then classified as "probably living room or hallway" rather than a supposedly unambiguous but actually uncertain statement.
Reducing the Dimensions
Two-dimensional diagrams show the dependence of two parameters, and three parameters span a three-dimensional space. The fourth dimension is often illustrated by a time stamp on consecutive diagrams. In looking for Tom, I am dealing with seven component values. In order to be able to show the dependency on a target value, I picked out two component values earlier.
I do this cautiously. From supervised learning, I know the components with the highest prioritization. PCA condenses the feature's information and reduces the dimensions without knowing the target values. It is a powerful approach that also detects outliers. I have limited myself to a few use cases.
Listing 18 turns out to be largely self-explanatory. After importing the PCA library, the code reduces the number of components, in this case from seven to seven – so nothing is gained initially. However, the method returns the explained_variance_ratio_
attribute, and the cumsum
function returns the cumulative sum. Figure 14 shows that already a single component returns 65 percent of the results correctly, and two components even return 85 percent.
Listing 18
Principle Component Analyis
from sklearn.decomposition import PCA pca_7 = PCA(n_components=7) pca_7.fit(Xu) x = list(range(1,8)) plt.grid() plt.plot(x, np.cumsum(pca_7.explained_variance_ratio_ * 100)) plt.xlabel('Number of components') plt.ylabel('Explained variance') plt.show()
Given clusters = 4
(i.e., four clusters), Listing 19 outputs a Voronoi diagram in Figure 15, which generates a cluster distribution similar to Figure 10 without knowledge of the prioritized components. The axes have lost their old meaning. In my example, seven signal strengths give rise to two new scales with new metrics. The slightly distorted point clouds are mirror images.
Listing 19
Voronoi Diagram
from scipy.spatial import Voronoi, voronoi_plot_2d from matplotlib import cm x1, x2 = 1, 0 # clusters = 4 clusters = 16 Xur = pca_2_reduced kmeansp = KMeans(n_clusters=clusters, init='k-means++', max_iter=300, n_init=10, random_state=0) kmeansp.fit(Xur) y_pred = kmeansp.predict(Xur) ccp = kmeansp.cluster_centers_[:,[x1, x2]] fig, ax1 = plt.subplots(figsize=(4,5), dpi=120) vor = Voronoi(ccp) voronoi_plot_2d(vor, ax = ax1, line_width = 1) plt.scatter(Xur[:,x1], Xur[:,x2], s= 3, c=y_pred, cmap = plt.get_cmap('viridis')) plt.scatter(ccp[:, 0], ccp[:, 1], s=150, c='red', marker = 'X') for i, p in enumerate(ccp): plt.annotate(f'$\\bf{i}$', (p[0]+1, p[1]+2))
While decision trees do not require metrics, K-Means and PCA compare distances between datapoints. Typically, preprocessing raises the scales to a comparable level. The error by omission factor remains relatively small at this point because the signal strengths of all attributes are of a similar magnitude.
Figure 16 illustrates that preprocessing plays an important role in estimating the number of clusters. Changing just one variable, clusters = 18
, paints a whole new picture. Because of the Voronoi cells [7] and the coloring, the diagram looks quite convincing, but it doesn't tell me where Tom is located.
Conclusions
It is impossible to calculate Tom's location using an analytical solution, mainly due to indoor obstacles weakening the signal and providing varying results. To find Tom, I instead relied on machine learning methods. The methods discussed in this article use weak artificial intelligence. So far, I have not seen any approaches from artificial intelligence (i.e., self-reflecting systems).
With supervised machine learning, I used the Random Forest classifier to categorize new data. K-Means, as an example of unsupervised learning, let me look at the data without a target variable, find interconnections, and evaluate the quality of the data. Combining the Random Forest classifier and K-Means, I cleaned up the data using semi-supervised learning.
In addition, using Python's scikit-learn libraries ensures easy access to machine learning programming. This gives users more time to explore the constraints and understand the dependencies of the results.
In the end, I think Tom is probably in the living room – or the hallway. Happy hunting!
Infos
- UCI dataset: https://archive.ics.uci.edu/ml/datasets/Wireless+Indoor+Localization
- Pandas library: https://pandas.pydata.org/
- Kernel Density Estimation: https://en.wikipedia.org/wiki/Kernel_density_estimation
- K-Means Classifier: https://en.wikipedia.org/wiki/K-means_clustering
- scikit-learn: https://scikit-learn.org
- PCA: https://en.wikipedia.org/wiki/Principal_component_analysis
- Voronoi diagram: https://en.wikipedia.org/wiki/Voronoi_diagram
« Previous 1 2 3
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
MNT Seeks Financial Backing for New Seven-Inch Linux Laptop
MNT Pocket Reform is a tiny laptop that is modular, upgradable, recyclable, reusable, and ships with Debian Linux.
-
Ubuntu Flatpak Remix Adds Flatpak Support Preinstalled
If you're looking for a version of Ubuntu that includes Flatpak support out of the box, there's one clear option.
-
Gnome 44 Release Candidate Now Available
The Gnome 44 release candidate has officially arrived and adds a few changes into the mix.
-
Flathub Vying to Become the Standard Linux App Store
If the Flathub team has any say in the matter, their product will become the default tool for installing Linux apps in 2023.
-
Debian 12 to Ship with KDE Plasma 5.27
The Debian development team has shifted to the latest version of KDE for their testing branch.
-
Planet Computers Launches ARM-based Linux Desktop PCs
The firm that originally released a line of mobile keyboards has taken a different direction and has developed a new line of out-of-the-box mini Linux desktop computers.
-
Ubuntu No Longer Shipping with Flatpak
In a move that probably won’t come as a shock to many, Ubuntu and all of its official spins will no longer ship with Flatpak installed.
-
openSUSE Leap 15.5 Beta Now Available
The final version of the Leap 15 series of openSUSE is available for beta testing and offers only new software versions.
-
Linux Kernel 6.2 Released with New Hardware Support
Find out what's new in the most recent release from Linus Torvalds and the Linux kernel team.
-
Kubuntu Focus Team Releases New Mini Desktop
The team behind Kubuntu Focus has released a new NX GEN 2 mini desktop PC powered by Linux.