Source code browsers
Navigators
If you've ever struggled to get a sense of someone else's code, the right tool could save you hours of grepping.
Open source is all about code. Contributors read tons of code, of which they have written only a small fraction. Being able to comprehend a program is crucial to the contribution process, and free software is all about contribution. In other words, you need tools to read and understand code.
These tools are called "source code browsers" or "source navigators." Linux has many of them, and normally they fall in two large categories. Older ones implement their own (simplified) parsers to recognize language symbols, such as function definitions, and record their location in the source code. This works fast and reasonably well, yet most programming languages have complex grammar that simplified parsers can't fully embrace. Newer browsers rely on the tool set to build an abstract syntax tree (AST) [1]. This makes indexing more precise, but also slower and more cumbersome to generate. Choosing one approach over another depends on the situation, and I hope this text provides you with some guidance.
Ubiquitous Ctags
Ctags is the de facto standard for source code indexers in Linux. As the name suggests, it builds on the "tag" concept. Put simply, a tag is a syntax construct that has an index entry, such as a class, function, or macro definition. This index comes in a so-called "tags" file, and the main purpose of the non-interactive ctags
command is to generate tags
files from source-code trees. Tags files have a well-defined format, and virtually all code editors in Linux understand it.
Despite the "C" prefix, Ctags supports a wide range of programming languages. The output of
ctags --list-languages
varies between Ctags flavors, but C/C++/C#, Java, Perl, Python, and a few dozen others are usually there. Ctags guesses a programming language by the file's suffix. If it guesses wrong, you can override its choice with --language-force
. For each programming language, Ctag recognizes different kinds of tags, and
ctags --list-kinds
displays known tags for each supported language.
To build an index, you supply ctags
a list of files to consider. For example, ctags -R
treats subdirectories recursively; -f
sets the output filename. This should be enough in most scenarios. However, because Ctags is "neither a preprocessor nor a compiler" [2], it may get certain things wrong. If this is your case, use the -I
switch to ignore or substitute identifiers that require special handling.
A tags
file is text, yet it was designed to be machine-readable. Ctags can also build a human-friendly tabular cross-reference if you call it as
ctags -x <other arguments>
For each tag, Ctags reports its name, kind, and line number in the source code. The original output is in fact quite similar, except it stores not a location, but the EX editor command to use to find the tag. Most often, it's a regular expression search, but Ctags provides --excmd
and few other command-line switches to adjust this behavior.
The description above applies to the original Ctags. In so-called "Etags mode," the file format is different, and --excmd
and friends are just ignored. "E" was for Emacs originally, but now many other programs (e.g., Midnight Commander's internal editor) recognize tags in this format (Figure 1). You can often tell the format by the filename: tags
is for Ctags, while TAGS
is for Etags.
The most popular Ctags implementation to date is Exuberant Ctags [3], and it is likely what your distro ships. It provides both ctags
and etags
commands; you can also enable Etags mode with the -e
switch to ctags
. Universal Ctags [4] are also gaining momentum. As the Exuberant Ctags homepage suggests, it isn't actively maintained now, whereas Universal Ctags attempts to continue the development and sports completely rewritten C/C++, Python, and HTML, as well as many new parsers (e.g., for Rust). The downside is you'll probably need to compile the program yourself. Luckily, the homepage describes this process in detail.
Compared with original Ctags, which indexes where the tag was defined, Ultimate Ctags can also track where it was referenced: see the Reference tags section on the Docs page. This brings Ultimate Ctags on a par with the second nominee, Cscope.
Venerable Cscope
Can you imagine software written in the PDP-11 era that still remains in use today? Can you imagine that the software was made free (as in speech) thanks to an infamous SCO Group predecessor? Meet Cscope [5]: a C code browser with some (limited) support for C++ and Java, born in Bell Labs, and open-sourced by Santa Cruz Operation in 2000. Cscope was briefly introduced in a Linux Voice cover feature last year [6], and now it is time to pay this tool the respect it truly deserves.
Cscope should be available in your package manager. Before you start using it, you'll need a cross-reference database for your source code, which you can build separately with cscope -b
. This is not a must, because when you launch Cscope's curses-based interface, it automatically indexes all C, Bison/Yacc, and Lex files in the current directory. Add the -R
switch to recurse into subdirectories, which is usually what you want to do. Some projects even provide dedicated makefile targets to generate a Cscope cross-reference database. For example, make cscope
in Linux kernel source code produces a so-called inverted index that makes symbol lookup a bit faster. Should you want to achieve the same effect on your own, run cscope -bq
. Also, consider -k
to enable the "kernel mode." In this mode, Cscope doesn't look into standard locations like /usr/include
, because kernels (and other low-level code) don't use them. On startup, Cscope detects changes to source code and rebuilds the cross-reference as necessary. This makes subsequent launches faster. Note you still need to tell Cscope where to look for source code, even if the database already exists. To trigger a rebuild from within Cscope, type Ctrl+R.
Cscope records not only where symbols were defined, but also how they were used, so you can find all expressions that involve a given variable, or functions calling a given function, or functions that the given function calls. Many other tools restrict your searches to C language identifiers. In Cscope, you can grep
for arbitrary text strings and regular expressions (see the info box titled "Ack"). For dessert, Cscope can look up a file by name or find all files that #include
the specific header.
Ack
Source-code files are text. Language tokens are pieces of text. The first tool that springs to mind when you think about searching for text is Grep.
You can surely use grep
to navigate source code, but you have a better alternative: ack
[7]. Ack is pure Perl (Are you scared yet?), and you can install it from your distro's package manager or via CPAN.
What makes Ack a better Grep? Two things: It's fast, and it's designed to search code, which means fewer keystrokes for common tasks. Ack ignores non-code directories (e.g., .git
or .svn
), backup files, and the like, and it doesn't need -R
to recurse into subdirectories. In a multilanguage project, you can tell it to look only in Python sources with ack --python
. Ack sports Perl regular expressions (guess why) and happily highlights matches it finds.
Ack can't do the semantic analysis that Clang can – nor can it brew your coffee. However, none of the tools I cover here can do a free-text search (except Cscope), so Ack certainly deserves being in your toolbox. Don't forget to share the ~/.ackrc
snippets you found most useful in Linux Voice forums [8]!
Cscope's curses-based interface splits the screen into halves. You enter search terms in the lower half and get results in the upper (Figure 2). Cscope supports POSIX extended regular expression syntax. Filenames allow partial matches while C symbols don't. Putting foo in the Find this file field matches foo.c
, foo.h
, and foobar.c
. Putting foo in the Find this C symbol field matches the first, but not the second, expression below:
void foo(); int foobar = 1;.
The Tab key lets you switch halves, and you select fields with arrow keys. Some symbols, such as ^
, are reserved (see cscope(1)
[9]). To enter them, first, type \
as an escape character. For each search result, Cscope displays the location (file, function, and line number) and some context. It also assigns the result a single-letter hotkey you can type to open it in the editor ($CSCOPE_EDITOR
). The spacebar switches search results pages. You can save the results in a file with >
or >>
. Should you need them later, load this file with <
or cscope -F
. To refine the results, type ^
or |
. Both filter through an external shell command. Entering ^
replaces the original results, whereas |
simply displays filtered lines and keeps the results untouched.
A few other hotkeys are available. Ctrl+C toggles case sensitivity. Ctrl+Y/Ctrl+A repeat your last search, and Ctrl+B/Ctrl+F do the same, yet in a search field above or below the current one. This comes in handy if you typed your query in a wrong box. For those accustomed to GNU Readline, history support in Cscope may feel limited, and it probably is. Pressing ?
brings the help page, and Ctrl+D exits Cscope.
The man page [9] describes more hotkeys and command-line switches. I suggest you spend some time learning them, because it greatly improves your Cscope experience. Cscope also runs in line mode or as a Vim extension. I leave exploring those options as an exercise to a curious reader (i.e., you).
Woboq Code Browser
Once you understand the traditional tools, you can compare them to Clang-based alternatives. Naturally, this limits support to C/C++, but Clang is a real C/C++ compiler, so it should have no problem handling even the most convoluted syntax constructs, provided they are correct.
On the other hand, if you call Clang to index your code, you should supply it all the information the build system (CMake, Autoconf) normally does. This is not the case with Ctags or Cscope, which can simply scan files one by one, looking for specific patterns, such as function declarations. For Clang, build information usually comes via a JSON compilation database (compile_commands.json
). CMake introduced this format first, and in a nutshell, it contains the list of source files and exact commands used to build them.
Woboq Code Browser [10] builds on Clang and produces a set of annotated HTML pages showing a project's source code. A bit of JavaScript makes them interactive, and no code is required on the back end; yet, you'll probably want to serve these pages with a web server, because most browsers don't allow Ajax requests to file:// URLs by default. (That's a security flaw.)
You'd want to compile Code Browser yourself, because it probably hasn't made its way to your distro repositories. It uses CMake, which you'll need to tell where to find the llvm-config
tool on your system. On Ubuntu, it's at /usr/bin/llvm-config
; otherwise, the process is straightforward.
How you build compile_commands.json
depends on the build system of the project you are trying to index. If it's CMake, the day just got better, because you only need to use
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON
For other build systems, say Autoconf or Qmake, Woboq Code Browser provides a fake_compiler.sh
helper script. However, there is a better tool: Build EAR (Bear), which you'll probably find it in your package manager, and it's available online [11]. Bear sets LD_PRELOAD
to inject a dynamic library that traces calls to the compiler and collects command-line arguments. To use Bear, you just need to prefix a make
invocation with bear
:
bear make
The bear --help
command lists a few available options. With the JSON compilation database ready, Code Browser can index your project (Listing 1).
Listing 1
Code Browser Indexing
This implies you did an in-tree build of Code Browser. The $BUILDDIRECTORY
argument is where $PROJECTNAME
(your project) is built, and $VERSION
is the project's version. The $OUTPUTDIRECTORY
argument should be set to wherever your web server looks for static HTML (e.g., ~/public_html/$PROJECTNAME
). The -a
switch tells codebrowser_generator
to process all source code found in compile_commands.json
. The second command builds index.html
for each subdirectory in the project, and the last line copies scripts and stylesheets.
The end result is worth the fuss. You may choose a theme of your liking (Qt Creator/KDevelop/Solarized) to feel at home. Mouse over a symbol to see a pop-up box containing the description and references. For global symbols, the reference kind (e.g., value read or address taken) is also shown. Click on a symbol to jump to the declaration. Location history is also supported, yet it is bare bones, with no way to clear the history and no indication as to which files history items belong. Similarly, the sidebar on the right contains definitions collected from the current source code file (Figure 3); however, you can't tell whether the definition is a function, variable, or type.
Another downside is licensing. Code Browser is dual-licensed (CC BY-NC-SA 3.0 [12] and proprietary), so I feel it's okay to use the open source version to index open source code, as long as you keep the Woboq branding. For anything else, you'll probably need a commercial license. You should contact Woboq directly if you are serious about Code Browser deployment.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Budgie 10.10 Scheduled for Q1 2025 with a Surprising Desktop Update
If Budgie is your desktop environment of choice, 2025 is going to be a great year for you.
-
Firefox 134 Offers Improvements for Linux Version
Fans of Linux and Firefox rejoice, as there's a new version available that includes some handy updates.
-
Serpent OS Arrives with a New Alpha Release
After months of silence, Ikey Doherty has released a new alpha for his Serpent OS.
-
HashiCorp Cofounder Unveils Ghostty, a Linux Terminal App
Ghostty is a new Linux terminal app that's fast, feature-rich, and offers a platform-native GUI while remaining cross-platform.
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.