Safer Internet Searches

Alternative Engines

Author(s):

If you are interested in data privacy, you might want to try an alternative search engine. We discuss a few search engines that serve up good results, along with an option for setting up your own search engine.

A majority of users rely on Google to find information on the Internet. Although convenient, Google can expose your personal information. Your searches result in companies tracking your online activities and then bombarding you with targeted advertising.

The privacy concerns alone might make you want to consider an alternative search engine, but there are other reasons you might consider another option. What if you want to include a search service on your personal network or website? By integrating an alternative search engine into your network or website, you can make it easier for people to find your information, while maintaining your independence from the big search giants like Google.

Luckily, there are several alternative search engines that give excellent results regardless of your motivation.

Tracking

When it come to privacy concerns, you probably want to know how you are being tracked. Beyond knowing what data is being collected, you might also want to know how much data a company saves, how long they hold onto it, and how they use that data.

If you use Google, you can find the answers to these questions by checking your preferences. Keep in mind that many of these values are set by default, meaning you might not have selected these preferences in the first place. To find out your preferences, use Google Takeout to download your preferences and then delete all your data. Google Takeout runs in the background and sends you an email when it is done.

Google also tracks you with its vast array of third-party cookies. These cookies are so unpopular that Google plans to retire them and replace them with a group profiling scheme. You can mitigate the privacy effects of these cookies by using ad blockers.

In Firefox, the privacy settings are stricter; you also can enhance Firefox with ad blockers. The Brave browser, on the other hand, blocks ads by default. Brave also lets you connect an account to your browser and get paid for advertising using Brave's own crypto currency (Basic Attention Tokens) without revealing your identity to advertisers.

Keep in mind that you can expose your IP address other ways beyond using a search engine.

To avoid tracking, your best option is to choose a search engine that doesn't track. Following are few alternatives that protect your privacy and yield good results.

DuckDuckGo

DuckDuckGo [1], the most well-known alternative search engine, does not track you (Figure 1). For instance, if you use DuckDuckGo's map function, DuckDuckGo will not even collect your position unless you activate it yourself. (Google, on the other hand, collects everything you do and can use it on all of their services.) DuckDuckGo does not even store your personal data.

Figure 1: DuckDuckGo promises not to track you, store your data, or follow you around with ads.

In addition, DuckDuckGo offer tools to help protect your privacy. For example, a Firefox extension tells you who is tracking you and how much. The extension will tell you all the details it can find about any website you visit, as well as which websites are the worst offenders.

Qwant

Qwant [2], based in France, does their own web indexing and uses their own algorithms, creating an independent search engine (Figure 2). Working with regulators in the European Union, as well as the US, Qwant wants to help balance the competition in the search engine field.

Figure 2: Based in Europe, Qwant promises to respect your privacy and not target you with ads.

Qwant offers products similar to those offered by Google. While Qwant does have advertisers, they don't track your every move. Features include news, images, videos, and maps (map results are sourced by Bing).

Ecosia

The Ecosia [3] search engine protects your privacy while protecting the environment. Using a portion of their profits from your searches, Ecosia plants trees (Figure 3). To see how they do this, check out their monthly financial reports online [4] .

Figure 3: Ecosia's search screen keeps a running total of trees planted.

With Ecosia, your searches are encrypted, and Ecosia does not store searches permanently. They don't sell your data to advertisers, and they don't use external tracking tools. You can even turn off tracking for the small amount of data that they do collect in order to optimize their services.

While you have full insight into everything they do, your search results are a collaboration with Bing (although you can choose between Bing and Google maps). The search results are the same as what you'd find on Bing; you just get better privacy and the satisfaction of supporting new forests.

YaCy

YaCy [5] (Figure 4), a distributed search engine, gives you the option of joining a search engine based on peer-to-peer (P2P) networking or setting up your own portal. In the P2P option, users collaboratively host the search data, but this doesn't mean you have to host your own section.

Figure 4: With YaCy, you can join a distributed web search engine or set up your own portal.

To see how YaCy works, go to YaCy's demo page [6]. While the YaCy interface is much more complex, the demo still gives you results.

YaCy's distributed nature makes the search engine more secure, plus it gives you the option to run your own instance.

Searx

Searx [7], a metasearch engine, uses other search engines' indexes to get results (Figure 5). Searx anonymizes your search request in multiple ways and the result is sent back securely to you. You can even run the search through the TOR network for more security.

Figure 5: You can customize Searx's appearance, but it starts out simply.

Searx uses 70 search engines, most of which have an open API. For a list of your search engine choices, visit Searx's GitHub page [8].

Setting Up a Search Engine

If you want to set up a search engine for a private network or personal website, Searx and YaCy can help you get started. Both open source packages are simple to install and offer several installation methods, including Docker images, which provide an especially easy way to become familiar with each search engine as well as how the search engine affects your system.

With both Searx and YaCy, you need to pay attention to your system load. With YaCy, you also need to be mindful of your bandwidth due to YaCy's distributed nature.

Both YaCy and Searx can be tested using public instances, and both offer detailed websites with information on how to change settings to match your system, as well as defaults for other settings.

Finally, Searx and YaCy are also available as source code and other binary packages. YaCy, which uses Java, is the simpler option; you run the ant binary to compile. However, you can also install the Searx package from your distribution's standard repository.

Searx as a Solution

If you like Docker, you can stick with it for installation. You do need to pull the Searx package and select which ports to use:

docker pull searx/searx

All Searx settings are available for both regular and Docker installations. There are advantages to using the locally compiled version with performance being one such reason. To get deeper into the installation and settings, see the Searx installation document [9].

If you want to use another installation method, download the source code from GitHub [10] and use the appropriate script to install. You'll find several scripts. The NGINX and Apache web servers are supported. If you want to add Searx to an existing site, follow the instructions on the NGINX and Apache websites.

You should also be able to find the Searx package in your distribution's repository using a general binary. This works fine unless you want to squeeze all the performance possible out of your system.

Searx Settings

Regardless of installation method, you will use the default settings file (settings.yml) to set your preferences. You can use the default options, which contains many well-known search engines and some lesser known ones.

You also need to set up a directory for settings.yml and then point to it. The files in this directory are the ones controlling Searx. Since the settings file is written in YAML, you have access to all of YAML's features for creating links (like adding a site) that work as searches for your service.

When you compile, you will need most expected libraries for handling networking, the build, etc. You should check out the uwsgi install, which is a minimal binary protocol for communicating between the nodes.

In the main settings, you have one absolutely vital task to perform: setting the secret_key value (Listing 1). To choose this value, your best option is to let OpenSLL create it for you, but you can also use a password manager. The other options under the main settings are useful but not necessary to get started.

Listing 1

Setting secret_key

01 server: port : 8888
02 bind_address : "127.0.0.1" # address to listen on
03 secret_key : "SuperSecretKey" # change this!
04 base_url : <http://localhost:/> # Set custom base_url.
05 Possible values: False or
06 "<https://your.custom.host/location/>"
07 image_proxy : False # Proxying image results
08 through searx http_protocol_version : "1.0" # 1.0 and
09 1.1 are supported
10 method: "POST" # POST queries are more secure as
11 they don't show up in history but may cause problems when using
12 Firefox containers
13 default_http_headers:
14 X-Content-Type-Options :nosniff
15 X-XSS-Protection : 1;
16 mode=block
17 X-Download-Options : noopen
18 X-Robots-Tag : noindex, nofollow
19 Referrer-Policy : no-referrer

If you look through the rest of the settings file, you will find the valid search providers for your instance. You can also use this file to limit your searches if you have concerns about a particular provider.

To add more providers (you can even add a single page), you need to add a section to the settings file. Copy the default file and edit what you need. As a simple example, you can edit the Wikipedia settings entry in the file:

- name : wikipedia engine : wikipedia shortcut : wp base_url :
'https://{language}.wikipedia.org/'

In addition to the language variable, you'll find the query, page, and params variables, which can be used to control your searches. You also can control the type of results that are returned. Result options include strings, images, and videos, as well as torrent files.

YaCy as a Solution

One of the biggest differences between YaCy and Searx is that YaCy runs independently of other search engines. YaCy creates its own distributed index. Just like in torrent files that use distributed hash tables (DHTs), you keep your own part of the tables.

To run YaCy, you need to set the amount of space that you will allow YaCy to occupy on your system, although the installation script has a default. Like Searx, you can use a Docker image to run YaCy. YaCy offers three different Docker images: amd64, arm64v8, and arm32v7.

To install YaCy with Docker, use the standard values found on YaCy's web page:

docker run -d --name yacy -p 8090:8090 -p 8443:8443 -v yacy_data:/opt/yacy_search_server/DATA --log-opt max-size=200m--log-opt max-file=2 yacy/yacy_search_server:latest

These standard values help you manage resource usage. Once the server is running, you can also access a management interface from your browser. If you want to be able to use the management interface from another computer, you need to set an administrator password. If you lose the password, you will need to go back to the command line in the root of the YaCy directory and run:

bin/password.sh

This command will handle changing the password, whether your server is running or not.

You can also clone the GitHub repository and compile the binaries [11]. Confusingly, the GitHub repo does not mention at the top that you must compile before running the standard script (startYACY.sh).

YaCy needs Java. When you download the GitHub repo, you need ant to compile. You'll find the details further down in the GitHub document. If you need to install YaCy on multiple machines, you can create a Debian package directly with the compiler.

Configuring YaCy

Whichever method you choose for installation, you need to set up some values to get the most out of your system. First, you should specify how you want to use YaCy. For the most basic configuration, you set an interface language, name, and search use case (Figure 6).

Figure 6: Under Basic Configuration, you can setup your interface language and use case.

The search use case sets the type of search. An internal search will just find files on your network; more common is a search of the entire YaCy community.

In the YaCy Administration dialog, you can edit all your settings, including working memory, disk space, and more.

Clicking on RAM/Disk Usage & Updates lets you adjust the settings for working memory and disk space. The default memory for the Java Virtual Machine (JVM) is set to 600MB.

The other values in the RAM/Disk Usage & Updates dialog save you from running out of disk space. You can use the Steady-state minimum option to disable crawls when free disk space falls below a specified minimum megabytes. This will only be an issue when you have the ports open and you collaborate with the index or when you start your own crawl. HTCache configuration lets you control the size of the content retrieved via HTTP or FTP; the default size is 4GB.

Putting YaCy to Work

Once you've configured YaCy, you can start a crawl from any web address. From the Administration dialog, click on Load Web pages, Crawler and enter the web address. YaCy will look through all the documents on the server and index them for you. You can use this to index your own internal network or add your new web page to the common index.

In addition to private searching, YaCy lets you share your search engine with others. You can customize YaCy for your website. Click on Portal Configuration to set color, title text, and even the logo that appears above the search box. From here, you also can see what the search engine will look like with your customizations.

If you use YaCy seriously, you should consider contributing to the YaCy index. To do this, you need to open your port to other peers on the network. In particular, you'll need to open port 8090, which is usually blocked by default.

Conclusion

If you care about privacy, you should consider using one of the alternate search engines discussed in this article. Along with a better search experience, you can protect your personal data and maybe even the environment.

With YaCy and Searx, you can even set up a personal search engine for your home or office. Searx lets you spread your search over many services, while making your search anonymous. Searx takes some work to set up, but you can achieve a standard install in less than an hour. If you want to go all out, you can join the YaCy network and index as much as your disk space and CPU can handle. The search results are not necessarily as good as commercial solutions, but you will probably find what you need.

Regardless of which search engine you choose, you can rest easy knowing your personal information is protected.

The Author

Mats Tage Axelsson is chasing around in circles trying to make his computer do more than ever.