PROGRAM LINKING A whole program usually is not written in a single file. Apart from code and data definitions in multiple files, a user code often makes references to code and data defined in some "libraries". Linking is the process in which references to "externally" defined objects (code and data) are processed so as to make them operational. Traditionally linking used to be performed as a task after basic translation of the user program files, and the output of this stage is a single executable program file. This is known as static linking. A more versatile technique is more commonly used these days which is called - dynamic linking. object modules of user files LINKING + --------------------> Executable program existing library modules Two important aspects in linking are - locating the individual object modules in the combined executable program image, and adjusting the addresses used for external references in the various places in the program. STATIC AND DYNAMIC LINKING Static linking - In static linking all the modules that are required to complete a program are physically placed together to generate a executable program file. The file can then be "loaded" at any subsequent time to run the program. Dynamic linking - In dynamic linking the actual task of linking is performed just prior to running the program and with the individual modules actually in the memory. This approach has several advantages over static linking - 1. A single copy of a object module in the memory may form part of the execution image of several programs, thus reducing overall memory requirement in the system. 2. Size of the executable program files remain small, since the component modules are not physically copied in that file. 3. Actual requirement of having individual modules in a program may be determined at run time of a program and linking can be done accordingly. This happens when a program takes a particular course of execution involving certain object modules, depending on some run-time conditions, such as user options. Dynamic execution can allow a program to control the choice of modules to be linked in a particular run. CASE STUDY - I : Dynamic Linking in UNIX (Sun Solaris) Sun Solaris implements both static and dynamic linking facilities. There are two utilities that facilitate this : ld - the link editor (see: man ld), and /usr/lib/ld.so.1 - the runtime linker. The link editor, ld, is to be used for both static and dynamic linking. Usually the compiler driver (eg., cc for C language) invokes ld after producing the relocatable objects by carrying out the basic language translation. The default behaviour of ld is to go for dynamic linking, and static linking can be insisted either by command line options or by the type of input object files which can only be used for static linking. The input modules to ld can be of the following types - 1. Relocatable objects - output of the basic translation performed by a compiler. 2. Archive libraries (see: man ar) - to be used for static linking. 3. Shared objects - object modules to be used in dynamic linking. Depending on the type of input modules and the command options, ld can produce the following types of output - 1. Relocatable object - by concatenating input relocatable objects. 2. Static executable - statically linked ready-to-run program file. 3. Dynamic executable - by using input relocatable objects and shared objects. Such executable shall require a runtime linker to actually run. 4. Shared objects - a module that can be used by a ld to create a dynamic executable (or another shared object) and by the runtime linker to run such an executable. CASE STUDY - II : Dynamic Linking in MS-Windows [From "Windows Internels", Matt Pietrek, 1993, Addison-Wesley Publishing Company] Module : All of the code, data and resources that a file presents to a program. It is either an executable program or a Dynamic Link Library (DLL) to be used by some program. The word module refers to the in-memory representation of the information from the file in the disk. Each module primarily contains logically related code or data or both, that may form a part of one or more programs. A module may be created out of one or more source files. Further a module may contain one or more segments, such as CODE segment (one or more), DATA segment (one or more). File Format : Windows executable files and DLL's use the NE (New Executable) format. It is the next generation of the older MZ (Mark Zbikowski) file format used in DOS. Other newer file formats which are also used are LE (Linear Executables), LX, and PE. Usually, most systems follow the convention to put a distinct number at the very beginning of a file which works as a signature that indicates the type of the file to any application that reads the file. In NE files this signature is the number 454Eh. The contents of a NE file can be viewed in a readable format using thte tools TDUMP (from Borland) or EXEHDR (from MicroSoft). Module Table (or Module DB) : When a program is to be run, information about several modules may be required. Hence Windows maintain an in-memory representation of the various tables in the corresponding NE files. This segment, or block of memory that holds this information is called the module table, or Module Dataase (MDB). This representation may not be an exact copy of the image present in the files, since in the file one important purpose is permanent backup for repeated cration of the module, whereas in case of the in-memory representation an iimportant consideration is efficiency. Module Handle : A global handle to the module table of a particular module is called the module handle. It is used to locate the various components of a module in the memory. Information in the NE file header and the Module Table : 1. Segment Table : Number of segments in the module and their attributes, viz., type (code, data, etc.) size, offset in the file, whether relocation is required, etc. In the MDB (in-memory representation of these tables), there is also the "segment selector" value for each segment, which is the starting address of the segment in the memory. This value is used to initialise the appropriate segment registers inside the programs. 2. Module Reference Table : List of other modules that are required by this module. 3. Imported Names Table : List of external names that this module uses. Normally, most external objects' references are converted to their entry numbers in the modules where they are respectively defined. Still some names may remain and require to be resolved by the loader. 4. Entry Table : For each segment, the offset (from the begining of the segment) of each object (i.e., function or data) that is defined in this module. The name of the objects are not included in this table. 5. Non-resident Name Table : For situations where some other module may use at run-time, the name of an object defined in this module for linking, the names of the object along with their entry number in the Entry Table is maintained separately. Since this requirement is relatively rare, hence this table is not loaded by default in the in-memory Module table. It is loaded only when the requirement actually arises. 6. Resident Name Table : Certain names pertaining to this module are likely to be frequently required by other modules. These names are included in this table. Eg., the name of the Module itself. 7. Relocation Records : For each segment the locations (offsets from the start of the segment) where an external object's reference is present that needs to filled up by the loader. In each such record the module name and the entry number of the object in the Entry Table of that module is recorded, so that the loader can easily locate that entry in that module at rn-time. 8. Resource Table Working Principles : After the source files are translated to machine language, the translated versions of the files (object files) that are to be combined to form a module are linked together by a "linker" to resolve cross references amongst themselves (in case of UNIX, this is done by the "linkage editor"). However, the module thus formed may still have references that can only be resolved when linked to other modules. This task of resolving dependencies between modules is carried out by a "loader" when a program is about to be executed (in UNIX this is done by the "runtime linker"). But the linker must prepare the modules in such a way that the task of the loader can be efficiently accomplished at runtime. First of all, when the linker generates an EXE or a DLL module from a set of given files, it is provided information regarding which other modules may be used to resolve the inter-module references from the target module (i.e., the module being generated). This information is either available in some specified import library (e.g. IMPORT.LIB or LIBW.LIB) or in the IMPORTS section of the DEF file. Only those external references (names) are allowed to remain in the module being generated which are mentioned in the given import library or IMPORTS section. The import library or IMPORTS section contains the names of functions or data objects, the module where each is defined, and an entry ordinal number to locate it within the module. The linker uses this information to prepare "fixup records" for each external symbol reference in each segment of the target module. The fixup records contains the offset within a segment where an external symbol's address is required, the name of the module that contains the symbol definition, and the entry ordinal number of the symbol in that module. When a new program is to be executed, the loader loads the EXE module of that program (if it is not already loaded). This means the segments of the EXE module are allocated memory, and the in-memory representation of the NE file headers is created which is called the Module Table or the Module Data Base. From the MDB, the other required modules are identified from the Module Reference Table. These modules are also loaded (if they are not already loaded). This may go on recurssively. Once all the required modules are loaded, the external references in each of them are resolved by using the Relocation records in each MDB and looking up the Entry Table of various modules that are mentioned in the relocation records. Windows also support explicit Dynamic Linking, i.e., a users program can determine the address of an external function and call that function using the function pointer. To do this the function GetProcAddress() can be called with the module handle and the function name as parameters. Some Dynamic linking primitives : GetModuleHandle() GetModuleFilename() GetExePtr() GetProcAddr() Example : Portion of TDUMP Output of PBRUSH.DLL Segment Table Offset: 00C0h Segment Number: 01h Segment Type: CODE Alloc Size : 16BAh Sector Offset: 0019h File Length: 16B9h Attributes: Relocations Segment Number: 02h Segment Type: DATA Alloc Size : 0180h Sector Offset: 019Ah File Length: 00CEh Attributes: Moveable Shareable Preloaded No Resource Table present Resident Name Table Offset: 00D0h Module Name: 'PBRUSH' Module Reference Table Offset: 00DAh Module 1: KERNEL Module 2: GDI Imported Names Table Offset: 00DEh name offset KERNEL 0001h GDI 0008h Entry Table Offset: 00EAh Fixed Segment Records ( 8 Entries) Segment: 0001h Entry 1: Offset: 0075h Exported Single data Entry 2: Offset: 009Eh Exported Single data Entry 3: Offset: 0208h Exported Single data Entry 4: Offset: 0220h Exported Single data Entry 5: Offset: 0341h Exported Single data Entry 6: Offset: 0541h Exported Single data Entry 7: Offset: 0770h Exported Single data Entry 8: Offset: 0D3Ch Exported Single data Non-Resident Name Table Offset: 0105h Module Description: 'Virtual bitmap manager' Name VSTRETCHBLT Entry: 6 Name WEP Entry: 1 Name DISCARDBAND Entry: 8 Name VDELETEOBJECT Entry: 7 Name VPATBLT Entry: 4 Name VBITBLT Entry: 5 Name GETVCACHEDC Entry: 3 Name VCREATEBITMAP Entry: 2 Segment Relocation Records Segment 0001h relocations type offset target PTR 0061h KERNEL.4 PTR 0831h KERNEL.132 PTR 0B8Dh KERNEL.5 PTR 0138h KERNEL.6 PTR 0BF6h KERNEL.7 ... rest of file omitted