Troubleshooting for beginners
It's not easy for beginners to solve problems in an operating system they haven't used before. We show you how to deal with some common issues.
One thing that might be new to a Windows user is the Linux terminal: a special window in which you can type text commands. This may seem a bit antiquated at first glance – in contrast to GUIs everywhere – but, this is a deceiving picture. The so-called command line is often a powerful and very efficient tool. With it, you are able to exploit otherwise inaccessible Linux strength. This article will show the power of many such useful commands.
When working with these commands, there is one golden rule for Linux troubleshooting: Keep calm. Panicking and clicking blindly is always misleading. Such behavior not only prevents you from studying the cause of problems but also can easily lead to undesirable but irreversible changes. Instead, the right approach is to try to understand the root cause of the problem. If the cause is not obvious, it often helps to systematically rule out one possible cause after another.
Becoming familiar with your own system can be a huge help. I've read a lot of letters to editors where readers talked about the installation of two, three, or even more different Linux distributions in parallel. This makes no sense to me. It is far better to deeply understand the specialties of one distribution as to know half a dozen superficially.
One thing you can do before an error occurs is baseline your system. You can do this to gather information on your fresh installed and healthy system that you can later compare with data from a system that might be in trouble. This comparison can tell you what is normal and what is probably a sign of problems.
For example, seeing a load average of 8 when you usually measure only 2 on your dual-core CPU is always suspicious. In this case, a tool such as top will show you what is eating up your compute power.
Or, if you typically have 20 or 30 open Internet connections (type
lsof -i4 in a terminal window) and then suddenly have 2,000, something malicious may be going on. You can make a simple check of your write performance with
dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync
If this command shows more than 80MBps on a regular day and then only 2MBps sometime later, you know you have to investigate your I/O stack. Listing 1 shows some commands you can use for simple baselining, just get a feeling what is normal for your system.
Simple Baselining Commands
01 # Memory usage (unit: MB) 02 free -m 03 04 # Load Average 05 uptime 06 07 # Available disk space 08 df -k /<mount point> 09 10 # All established TCP connections 11 lsof -i -sTCP:ESTABLISHED 12 13 # CPU, memory and I/O statistics 14 vmstat 2
Who Knows What?
Another general rule for the troubleshooter is to consult the logs. Log files contain a large variety of information about the system and its behavior, and you will often found valuable hints there. I used the Linux distribution Fedora 20 for this article, which comes with a specialty: It uses the so-called systemd journal  instead of the classic syslog under
This means the admin has to use the special command
journalctl to read the logs. If called without parameters,
journalctl will show the full contents of the journal, starting with the oldest entry collected. The big advantage of the journal, however, shows up when parameters are passed to the command. With them, it is possible to filter any field of the log line without using
grep. For example:
shows all log entries for the process with PID 1436.
journalctl --since "2014-06-18 \ 10:00:00" --until "2014-06-18 \ 13:00:00"
lists all entries between 10:00 and 13:00 on June 18. Using
brings up kernel messages.
As shortcuts for a few types of field/value matches, file paths may be specified. If such a path refers to an executable file, this is equivalent to an
_EXE=</path/file>. Similarly, if a path refers to an device node, this is equivalent to
_KERNEL_DEVICE=<device file>. Thus,
shows all log messages that refer to the disk
The first problems you might encounter are related to the boot process, which can have a number of causes. The countermeasures required depend on the point in time during the boot process at which the error occurs. Most modern Linux distributions hide the boot messages behind a graphical splash screen.
The first step in this case is to remove the boot parameter
quiet, as well as additional parameters like
rhgb (Fedora) or
splash (Ubuntu). To do so, choose an entry from the GRUB menu that you see after powering on and then press e. A window opens in which you can delete the above-mentioned parameter entries. Then, press Esc while booting and you should see all the messages during startup.
If you see no messages except "operating system not found" or just a black screen, then the boot manager is not found or is damaged. If this happens you first need to check whether the boot device is recognized by the BIOS. The second thing to check is the order of the boot devices. If you've placed an optical drive before the first HDD or SSD, it should contain no media. Remove any attached USB sticks. If this does not work, the partition table or the filesystem might be damaged. You should try to boot from an emergency disk – like SystemRescueCd  – and repair the filesystem.
If the boot manager has found the kernel, but booting stops with a blinking cursor or a sudden reboot, then the kernel itself or the hardware should be blamed. There can be numerous causes in this case, and making a diagnosis might be difficult. You could try kernel parameter like
noapic, although modern CPUs need both ACPI (Advanced Configuration and Power Interface) as well as APIC (Advanced Programmable Interrupt Controller) to perform well. Updating the BIOS if possible is a good idea, and you could additionally remove all non-mandatory hardware components on a trial basis to further test.
If booting seems to work but you end up with a black screen, then the problem is likely with the graphic device driver. Try booting with the parameter
nomodeset. This action will cause the kernel to use a simple VGA text mode. If this works, the problem may be the monitor detection. You could then test the kernel parameter
video=1024x768-24@75, which configures a resolution of 1024x768 pixel, 24-bit color depth, and 72Hz refresh rate. If necessary, you can play with the values until they match your monitor. Often, a good solution is to use another electrical connection to the monitor, for example, VGA, DVI, or HDMI instead of DisplayPort.
Last but not least, you might encounter the famous Unable to mount root fs message along with a kernel panic. This message indicates a problem with
initramfs that is responsible for mounting the root device. In this case, you can use
dmesg | less – a little program that reads all messages from the kernel ring buffer – and scroll to the storage driver messages.
Is there an error message? If so, it probably contains a useful hint. Otherwise, you can try
blkid as root user. This lists all block devices, their Label, and their UUID, which most contemporary Linux distributions use to identify the root filesystem. The root device value from the GRUB2 configuration I mentioned above should appear in the
blkid output (Figure 1).
Buy this article as PDF
The company is collaborating with Google and Intel to use Kubernetes as an engine for Fuel
Customers can take a free test drive of SLES for HPC on the Azure Cloud
San Francisco-based chip company announces their first fully open source chip platform.
The whole distro gets rebuilt on glibc 2.3
Ubuntu Vendor tries to solve app packaging and distribution problem across distributions.
Founder of ownCloud launches the Nextcloud project.
Will The Machine change the way future programmers think about memory?
The new Torus distributed storage system is available under an open source license on GitHub
Juries decides Google’s use of Java APIs Was Fair Use