dd(1): deceptively simple
Paw Prints: Writings of the maddog
Unix (and Linux) command line programs are like old friends. You get caught up in the day-to-day hustle of life and you may forget about them temporarily, but sooner or later you remember them and that warm feeling comes over you....
dd(1) is one of those programs that gives me a warm feeling. How simple dd(1) seems to most people, just reading in data at one end of the program and outputting the data at the other end, perhaps doing a little data blocking or unblocking and perhaps a little conversion. And of course a lot of us use dd(1) to clone disk drives and other “utility” tasks, because dd(1) is simple, fast and can work from a command line.
Yet on two separate occasions dd “saved my bacon”, so there is a soft spot in my heart for the command.
The first time was around 1984, when I had first started working for Digital Equipment Corporation. A salesman had a nine-track tape that had some data on it, and he had been told by the customer that if Digital could get the FORTRAN programs and the data off the tape, compiled onto our Ultrix system and run, that the customer would buy a lot of systems. The salesman, not having any other place to go, found me and asked if I would help. The salesman told me that “the format of the programs and the data on the tape was well documented.” I had heard stories of “well documented tapes” before, and I was skeptical, but I agreed to help.
Amazingly enough, the programs and the data on the tape WERE documented well. The tape had been made on an IBM system, so all the sources on the tape were in EBCDIC (pronounced “eb seh dik”, a character encoding used by IBM) instead of ASCII (you know how to pronounce that) and the instructions told how the records (80 character card images) and blocks (fixed size) were put on the tape. First came the programs, then the data. The data was a different block size, but still 80 character records.
After mounting the tape on the system, one “for” loop in shell with the dd(1) command set up to unblock the records and to translate the EBCDIC to ASCII, and I had pulled all the FORTRAN programs off the tape into separate source code files for compilation. Another “for” loop and another dd(1) command and I had all the data from the tape onto the disk in separate files.
I looked at the FORTRAN source code. I noticed that their logical unit numbers for I/O more or less matched up with what Ultrix was expecting for standard in, standard out and standard error. I compiled and linked the programs and ran them, redirecting the data files to standard input and capturing the output by re-directing standard out onto the disk.
At that point I had done everything the customer and salesperson had requested....total time was about an hour.
Then I noticed that the customer had a plotting program also written in FORTRAN to help them visualize the data. I had no plotter available, but the VT125 character cell terminal on my desk had a crude graphics mode called ReGIS. With ReGIS you could draw things on the character-cell terminal using byte codes. I decided to create a small set of subroutines that would match up with the customer's subroutine calls and output ReGIS byte codes to the terminal.
The subroutine calls were simple and few, so I finished the subroutines in a couple of hours, and linked them into the plotting program. Now I could see the customer's data on a simple, relatively fast and relatively inexpensive VT125 instead of having to put a piece of paper in a plotter and wait for a pen to draw the diagram.
I put all of this back onto the magnetic tape, but in a tar file, and wrote the same careful instructions about how to pull it off and what I had done.
The salesman bought me dinner.
The next time that dd(1) played a role was with the TK50 streaming tape drive that I blogged about yesterday.
Everyone tried to keep the TK50 streaming, but the VAX processors, disks and memories of the day could not keep up with the data streaming needs of the TK50. The buffer feeding data to the TK50 would empty and sooner or later the TK50 would stop, back up, and re-position itself. This would make backups take a very, very long time. It did not make any difference if you were using tar(1), dump(8) or any other program to write data...we just could not keep the tiny buffer full.
The engineers were discussing this one day and I happened to mention that we had a similar problem on a large IBM mainframe one time. We had fixed the problem on the IBM by creating a ring-buffer and using asynchronous I/O to fill the buffers as fast as possible and write them out as fast as possible. Of course asynchronous I/O was not as useful in a data-stream, but the engineers at Digital thought about it and made a version of dd(1) that had an option to allocate “n” buffers for data I/O. This was enough to allow dd(1) to buffer data and keep ahead of the streaming tape drives.
We encouraged the engineers to also add this functionality to tar(1), dump(8) and other utilities, but they (quite correctly) pointed out that these programs were maintained by other people, and to spread these changes to all these programs would cause huge amounts of work integrating the code into future versions of those commands.
Then the engineers came up with a unique idea. People could put the new dd(1) as a filter in a pipeline with tar(1) and dump(8). If dd(1) was the last filter on the line before the streaming device, it could do its buffering and keep the device streaming. This concept worked very well.
A couple of years later I was at a DECUS user group meeting and a developer of a backup utility program approached me. He told me that he had tried everything to get his TK50 to stream, but he could not get it to stream. I told him that I could get it to stream any time I wanted, and I told him the story about the buffers in dd(1). His eyes lit up, and a couple of days later I got an email from him that only said “its streaming, its streaming!”
It is a shame that Ultrix and Digital Unix were never “open source”. The current versions of dd(1) seem to be missing the buffering technology. Perhaps in the days of gigabyte memories and large I/O buffers this buffering technology is not needed as much, but it was interesting nevertheless.
Carpe Diem!comments powered by Disqus
New partnership will bring more and better CS training to US schools
Criminals offer online help over Tor network
Sophisticated malware is still present on Joomla and WordPress sites around the world.
Future versions of Ubuntu's code service will support the popular Git version control system used with Linux and other open source projects.
New release marks the arrival of AMD’s unified driver strategy.
A new study by IDC charts big changes in the big hardware market.
Azure CTO says Redmond has already considered the unthinkable.
Lead developer quells rumors that the Debian version is slated for center stage.
MSBuild is now just another GitHub project as Redmond continues its path to the light.
Malware could pass data and commands between disconnected computers without leaving a trace on the network.