Surprises in SQL – State-of-the-art options in the standard query language

And the Performance?

The main benefit of window functions is the variety of options you have with simple syntax. Window functions are also typically much faster than their purely relational equivalents. Despite the huge range of options that window functions offer, you can easily estimate the additional overhead they cause in a database. You just need to remember that most window functions are performed in two simple steps: preparation and application.

In the preparation phase, you need to make sure the data is sorted as per the PARTITION BY and ORDER BY clauses. In other words, the columns of the PARTITION BY clause – if they exist – are placed in front of the ORDER BY clause, and the data is sorted accordingly. You can omit this sorting if the data is already sorted in this order. The data could be in order if it was sorted previously – for example, for a different window function – or, as in the previous example, by a global ORDER BY clause. Sorting is also unnecessary if the data comes from an index with appropriate sorting.

The additional overhead of the preparatory phase is thus mainly equivalent to the overhead for an ORDER BY clause for each different OVER clause. Identical OVER clauses do not cause this overhead multiple times. The previous example requires no preparatory overhead, because the global ORDER BY clause has already sorted the data as required in advance.

After this preparation, the individual window functions are then executed. This overhead can be roughly compared with normal use of aggregate functions with GROUP BY, with the exception that the function is executed for each row in the worst case. Of course, each SQL vendor does its best to avoid this worst case.

Time Travel with SQL:2011

The final trailblazing SQL feature I'll examine in this article is temporal and bi-temporal tables. Although the term temporal itself does not give you many clues, this feature encompasses functions that everyone has already used. Generally speaking, you can use temporal and bi-temporal tables to make data changes traceable. Consider a customer database: how does the user handle the fact that a customer has changed their name? Is it sufficient to overwrite the name with a simple UPDATE? Or do you need to be able to see that the name has changed? Where traceability was required, users previously had to take care of this themselves. That is, mark the old row in the customer database as deprecated and create a new row with the new name.

SQL:2011 now offers time-versioned tables. An UPDATE of this kind of table would look something like the following:

UPDATE customers
 FOR PORTION OF validity
 FROM DATE'2015-01-01'
 TO DATE'9999-12-31'
 SET name = 'New name'
 WHERE id = 123

The user defines in the UPDATE – immediately after the table name – the period of time for which the new name is valid. To allow this to happen, the table has to be prepared appropriately by creating two columns for the start and end times and grouping the data to create an application time period (validity in this example). SQL does all the rest. Whatever you need to achieve, the desired state is done  – for example, an UPDATE of the last valid row and an INSERT for the new name.

SQL:2011 naturally offers everything else you need to handle these tables, including constraints that understand the time concept (WITHOUT OVERLAPS), but also queries with a specific time:

SELECT *
 FROM customers
 WHERE id = 123
 AND validity CONTAINS DATE '2015-12-01'

This query returns the new name, because SQL periods always include the start time and exclude the end time.

Application time versioning is designed to reflect reality. For instance, the valid start for a new name is the day of the wedding – not the time when the new name is entered on the system. The application can choose validity arbitrarily – which explains the "application time" label.

Validity periods are different for system-versioned tables. Once a table is system versioned by means of appropriate ALTER TABLE instructions, the database handles versioning in a completely transparent way. All SELECT, UPDATE, … instructions still continue to work unchanged; you don't need to modify the application. However, the application cannot choose the validity periods itself. Instead, the database always uses the current system time.

All INSERT, UPDATE, and DELETE instructions always take immediate effect. System versioning thus does not reflect reality but always references the time at which the change was made in the database.

Both types of versioning – application time or system versioning – are optional and can be used independently of one another. It is also possible to apply both kinds of versioning to a single table at the same time; however, this approach creates bi-temporal tables. Application time versioning would use the wedding date as the date for the change, but system versioning uses the time at which the change was entered on the system, which could be weeks later. (Probably very few newlyweds inform their banks of a name change just one day after the ceremony.)

Which Databases?

If you try out one or more of the functions described in this article, you might discover they don't work! Not all SQL databases are state-of-the-art. To be more precise: the SQL standard only mandates a couple of functions. Most of the functions in the standard – including all the functions described in this article  – are optional. Additionally, many manufacturers still regard SQL:92 as the definitive statement on SQL. Just imagine this headline for comparison: "Leading NoSQL database now also supports Windows 3.1!"

These modern SQL functions are useful, even though just a few databases (including some open source databases) support them. If you work with SQL on a regular basis, it is worth your while to learn these state-of-the-art SQL features so you can select a database that supports the functionality when you need it. Figure 3 shows some SQL database systems that support the functionality described in this article.

Figure 3: Availability of selected SQL functions.

The Author

As a freelance trainer, Markus Winand specializes in teaching developers efficient techniques for SQL. He wrote SQL Performance Explained and is currently working on his next book, which is being published on an on-going basis at http://modern-sql.com.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • MySQL Workbench 5.1

    A small database is easy to plan on paper, but the structure quickly becomes more complex as you add more elements. MySQL Workbench can help you keep the tables arranged.

  • Data Management

    Open source database management systems offer greater flexibility and lower costs while avoiding vendor lock-in. Finding the right one depends on your project's needs.

  • Patterns in the Archive

    To help him check his Google Drive files with three different pattern matchers, Mike builds a command-line tool in Go to maintain a meta cache.

  • Workspace: phpMyFAQ

    Transform information into knowledge by deploying a phpMyFAQ-based database of frequently asked questions.

  • Top Ten Tops

    The famous Unix admin utility known as Top has many imitators. We take a look at some of the top Top tools.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News