System Software Notes : Unix Utilities

For all the UNIX utilities that are considered here there are useful online documentation in form of man pages. It is highly recommended that these man pages are read to obtain details about the usage of these utilities. Additionally, some other books may also be considered.

Make :: RCS :: sed :: grep :: awk

Make

make utility provides a convenient way of preparing programs from source files. The most common use is when there are several source files, such as C files (.c and .h files), lex file (.l) file, and so on. In such a situation the final program is to be generated by following a sequence of steps, like - run flex over the .l file, compile the C files using -c option to generate .o files, link the .o files to get the target executable program. Often after the target is generated, it becomes necessary to make modifications to one or more of the source files. In that case it is necessary to execute the sequence of steps only as far as the modifications in the one or more files are concerned. i.e., it may be required to compile and produce .o files for only the modified file(s), other .o files are not to be changed, and then link all the .o files to get the target. The make utility helps in this situation by performing only the required steps and in the correct sequence.

For make to be able to perform the essential steps only, it requires the steps to be specified in a file, called the makefile. By default the make utility assumes the name of this file to be Makefile or makefile. Any other name can be used for the makefile by using the commandline option -f<makefilename>. The contents of the makefile essentially describes the dependencies of the various generated files upon other files. For example, a .o file depends on the corresponding .c file, the target program depends on the .o files, and so on. Thus when the utility is invoked it reads the dependencies and by looking at the time-stamps of the files, if it finds that if a file is older than any of its dependency files, then that file is generated again. For this purpose, along with each dependency description the series of commands that are to be executed to generate a file from the files that it depends on, are specified in the makefile. To make the creation of specifications easy, make allows several conventions to be used inside the makefile, such as use of macros, implicit rules, etc.

To use this utility for a program, one has to create a make file. Then in the directory containing the makefile, give the command -

make

Example

Books : Unix Utilities by Tare (Tata McGraw Hill), Sun Solaris Manual (Program Compilation and Linking)

TOP

RCS

RCS (Revision Control System) is a utility that basically deals with the fact that programs often undergo continuous modifications and often it is required to retain multiple versions of the same program file(s). When several files together make a program, then it becomes necessary to keep track of the different versions of each file that are to be taken together to get a particular version of the target program. In a small scale let us assume that a proram called game is created using the files g1.c, g2.c, g3.h, and g4.l (game can be generated using make utility !). Now, to have a version of the program with a new feature suppose some modifications are made in the files g3.h and g2.c. Now to be able to have both versions of the program, one has to retain the multiple versions of the files g2.c and g3.h. Without any better mechanism to handle this situation, one might resort to creating multiple directories and maintain the entire set of files corresponding to each version of the program in a distinct directory. However, maintiaining the program files in this method is likely to become difficult when the number of files is large, number of versions many, and more importantly, if there are multiple persons who might make modifications to the files.

Under RCS system, a library of the different files is maintained. Initially each file has to be entered in the library. The library is simply the collection of the files in an RCS's internal format. The collection can exist in the same directory, or a separate directory may be maintained. Afterwards, to make changes to any file it has to be fetched out from the library. Then after making the modifications, it may be redeposited in the library. For all these operations there are a set of commands and command line options. When a file is entered in the library for the first time, it is assigned a version number 1.1, and when revised versions of the file are redeposited in the library using the command for the purpose, the new versions are given version numbers 1.2, 1.3, and so on. Internally the system uses a format to store the multiple versions of a file in which only the last version is stored in the actual form. The filename used by the library to store a file is same as the actual filename given by the user, but with a suffix ,v. i.e., for file g1.c the RCS library will have a file g1.c,v in which all the versions of g1.c are stored. For the version previous to the last the differences between the two versions are appended to the file, and so on. To make modifications, a user can specify the particular version of a file that is to be fetched out from the library. RCS allows some symbolic name to be attached to a version of a file. Thus, if at a point of time

g1.c has versions 1.1, 1.2 and 1.3
g2.c has versions 1.1 and 1.2
g3.h has versions 1.1, 1.2. 1.3 and 1.4
g4.l has version 1.1

then one can associate the string, say simple to version 1.2 of g1.c, version 1.1 of g2.c, 1.3 of g3.h and 1.1 of g4.l. Similarly, the string fancy can be associated to version 1.2 of g1.c, version 1.2 of g2.c, 1.4 of g3.h and 1.1 of g4.l. Then to generate the simple version of game one can use RCS commands to automatically fetch the relevant files for simple. To generate the fancy version of game one can get the relevant versions of files for fancy. RCS also provides methods to control concurrent modifications to the same file by multiple users by not allowing one to fetch a file until it has not been redeposited after an earlier fetch. However, fetching for read-only purpose can be performed any number of time. While depositing or re-depositing a file to the RCS library, one is also required to provide a short description of the nature of modifications which is recorded in the library file.

There are several operations possible through the RCS commands. Some commonly used commands are -

ci <filename> - To check-in (deposit or redeposit) a file in the RCS library.
co [-r <version number>] [-l] <filename> - To check-out (fetch) a file (of given revision number) from the RCS library for modification. If -r version number is omitted, the last version is fetched out. If -l is omitted, the file is checked out for read-only purpose, i.e., a subsequent ci may not be allowed. This option prevents another subsequent fetching of the same file for modification until a ci is done. Thus this option effectively locks the file for other users.
rlog <filename> - To see the description of the various versions of the file.

RCS is a product of the GNU project. Besides, RCS there are other systems too which addresses these issues. Some of these are SCCS (Source Code Control System - distributed with the standard UNIX systems), CMS (Code Management System - in Digital Electronic Corporation's VMS operating systems), etc.

Read man pages for rcs, ci, co, rlog, rcsdiff

TOP

sed

SED is a Stream editor - it performs some editing operations on an input stream. Input stream means input that is available in a sequence - it may be from a file or a pipe. The editing is performed according to some specification provided to sed in the command line. The output, that is the edited form of the input is printed on the terminal. Usually editing specification contains some conditions to select certain lines from the input stream, and some editing actions such as replace an existing pattern in the selected line by some other given pattern, etc. The condition for selecting lines can, for instance, be - lines containing certain pattern specified as a regular expression. For example, to replace all occurance of TU in a file campus_news.txt by Tezpur University, sed can be invoked as -

sed "s/TU/Tezpur University/" campus_news.txt

Books : Unix Utilities by Tare (Tata McGraw Hill), Programming in UNIX by Kernighan and Pike (Prentice Hall of India).

TOP

grep

GREP is a utility that is used to locate lines in a text file that contains some specified pattern. The pattern to be searched is specified as a regular expression. The lines containing the pattern are printed on the terminal. There are several related features in grep which can be invoked by suitable command line options. A very common use of grep is to search a single pattern in several files. eg. to search the pattern "software" or "Software" in all ".txt" files in a directory, the grep command can be -

grep [sS]oftware *.txt

Some other operations possible using grep are - finding lines that do not match the pattern (-v option), counting number of lines that matches the pattern (-c option), and so on.

TOP

awk

AWK is a powerful pattern scanning and processing language. The name awk is derived from the name of the designers of the language - Aho, Kernighan, and Weinberger. Implemented as a tool, awk serves similar purpose as does sed, but it provides much more features. Briefly stated, this tool can be used by specifying the pattern scanning and processing logic and the input file(s) which are to be processed. The pattern scanning and processing logic is written using the awk language and is called the awk program. The awk program is applied over the input file(s) and the output is printed on the terminal. The awk program itself can be given in a file, or it can be completely specified in the command line (if the program is not large). Like most other UNIX utilities, awk allows a number of command line options to control its behaviour.

awk [ options ] -f <program-file> [ -- ] <file> ...
awk [ options ] [ -- ] <program-text> <file> ...

An AWK program consists of a sequence of pattern-action statements and optional function definitions.

pattern { action statements }
function name(parameter list) { statements }

Like in sed, the pattern specifies a condition for selecting lines from the input file, and the action statements are processing logic to be used in respect of the selected line. The pattern in case of awk provides more possibilities than its counterpart in sed. Similarly, since awk is a programming language in itself, hence the action statements in an awk program can be much more capable than the action specified in sed. The awk programming language supports features such as variables, composite data structures (arrays, records, etc.), conditional statements, loops, etc.

Example :

Print and sort the login names of all users:
BEGIN { FS = ":" }
{ print $1 | "sort" }

Count lines in a file:

{ nlines++ }
END { print nlines }

Explanation : BEGIN is a special pattern and causes the corresponding action to be taken before input processing starts. In the above example, the action in the BEGIN block is to set the in-built variable FS (field-separator to be used in the lines of input) to ":", i.e., colon. In the following statement, there is no explicit pattern. This means this action is to be taken for all lines of input. Thus the overall effect is to consider the segments separated by colon in the input lines as separate fields and print the first field. But the output of the print is piped to the function "sort", so the fields will be printed in sorted order. Similarly, in the second example, the action of incrementing the variable nlines is executed for each line of input (since pattern is not specified). END is a special pattern which causes the corresponding action to be taken when all input has been processed. Hence, in the second example the variable nlines shall count the number of lines in the input and finally this value will be printed on the terminal. To count the number of lines containing the pattern "Tezpur" or "Guwahati" we can write

/Tezpur/ || /Guwahati/ { nlines++ }
END { print nlines }

Books : Unix Utilities by Tare (Tata McGraw Hill), Programming in UNIX by Kernighan and Pike (Prentice Hall of India).

TOP