Safer Internet Searches
YaCy as a Solution
One of the biggest differences between YaCy and Searx is that YaCy runs independently of other search engines. YaCy creates its own distributed index. Just like in torrent files that use distributed hash tables (DHTs), you keep your own part of the tables.
To run YaCy, you need to set the amount of space that you will allow YaCy to occupy on your system, although the installation script has a default. Like Searx, you can use a Docker image to run YaCy. YaCy offers three different Docker images: amd64
, arm64v8
, and arm32v7
.
To install YaCy with Docker, use the standard values found on YaCy's web page:
docker run -d --name yacy -p 8090:8090 -p 8443:8443 -v yacy_data:/opt/yacy_search_server/DATA --log-opt max-size=200m--log-opt max-file=2 yacy/yacy_search_server:latest
These standard values help you manage resource usage. Once the server is running, you can also access a management interface from your browser. If you want to be able to use the management interface from another computer, you need to set an administrator password. If you lose the password, you will need to go back to the command line in the root of the YaCy directory and run:
bin/password.sh
This command will handle changing the password, whether your server is running or not.
You can also clone the GitHub repository and compile the binaries [11]. Confusingly, the GitHub repo does not mention at the top that you must compile before running the standard script (startYACY.sh
).
YaCy needs Java. When you download the GitHub repo, you need ant
to compile. You'll find the details further down in the GitHub document. If you need to install YaCy on multiple machines, you can create a Debian package directly with the compiler.
Configuring YaCy
Whichever method you choose for installation, you need to set up some values to get the most out of your system. First, you should specify how you want to use YaCy. For the most basic configuration, you set an interface language, name, and search use case (Figure 6).
The search use case sets the type of search. An internal search will just find files on your network; more common is a search of the entire YaCy community.
In the YaCy Administration dialog, you can edit all your settings, including working memory, disk space, and more.
Clicking on RAM/Disk Usage & Updates lets you adjust the settings for working memory and disk space. The default memory for the Java Virtual Machine (JVM) is set to 600MB.
The other values in the RAM/Disk Usage & Updates dialog save you from running out of disk space. You can use the Steady-state minimum option to disable crawls when free disk space falls below a specified minimum megabytes. This will only be an issue when you have the ports open and you collaborate with the index or when you start your own crawl. HTCache configuration lets you control the size of the content retrieved via HTTP or FTP; the default size is 4GB.
Putting YaCy to Work
Once you've configured YaCy, you can start a crawl from any web address. From the Administration dialog, click on Load Web pages, Crawler and enter the web address. YaCy will look through all the documents on the server and index them for you. You can use this to index your own internal network or add your new web page to the common index.
In addition to private searching, YaCy lets you share your search engine with others. You can customize YaCy for your website. Click on Portal Configuration to set color, title text, and even the logo that appears above the search box. From here, you also can see what the search engine will look like with your customizations.
If you use YaCy seriously, you should consider contributing to the YaCy index. To do this, you need to open your port to other peers on the network. In particular, you'll need to open port 8090, which is usually blocked by default.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
KDE Plasma 5.27 Beta is Ready for Testing
The latest beta iteration of the KDE Plasma desktop is now available and includes some important additions and fixes.
-
Netrunner OS 23 Is Now Available
The latest version of this Linux distribution is now based on Debian Bullseye and is ready for installation and finally hits the KDE 5.20 branch of the desktop.
-
New Linux Distribution Built for Gamers
With a Gnome desktop that offers different layouts and a custom kernel, PikaOS is a great option for gamers of all types.
-
System76 Beefs Up Popular Pangolin Laptop
The darling of open-source-powered laptops and desktops will soon drop a new AMD Ryzen 7-powered version of their popular Pangolin laptop.
-
Nobara Project Is a Modified Version of Fedora with User-Friendly Fixes
If you're looking for a version of Fedora that includes third-party and proprietary packages, look no further than the Nobara Project.
-
Gnome 44 Now Has a Release Date
Gnome 44 will be officially released on March 22, 2023.
-
Nitrux 2.6 Available with Kernel 6.1 and a Major Change
The developers of Nitrux have officially released version 2.6 of their Linux distribution with plenty of new features to excite users.
-
Vanilla OS Initial Release Is Now Available
A stock GNOME experience with on-demand immutability finally sees its first production release.
-
Critical Linux Vulnerability Found to Impact SMB Servers
A Linux vulnerability with a CVSS score of 10 has been found to affect SMB servers and can lead to remote code execution.
-
Linux Mint 21.1 Now Available with Plenty of Look and Feel Changes
Vera has arrived and although it is still using kernel 5.15, there are plenty of improvements sure to please everyone.