PROGRAM LINKING

	A whole program usually is not written in a single file.  Apart  from
code and data  definitions  in  multiple  files,  a  user  code  often  makes
references to code and data defined  in  some  "libraries".  Linking  is  the
process in which references to "externally" defined objects (code  and  data)
are processed so as to make them operational. Traditionally linking  used  to
be performed as a task after basic translation of the user program files, and
the output of this stage is a single executable program file. This  is  known
as static linking. A more versatile technique is  more  commonly  used  these
days which is called - dynamic linking.


         object modules
	 of user files		LINKING
	      +		  --------------------> Executable program
	 existing library
	 modules

	Two important aspects in linking are - locating the individual object
modules in the combined executable program image, and adjusting the addresses
used  for  external  references  in  the  various  places  in  the   program.


STATIC AND DYNAMIC LINKING

Static linking - In static linking all  the  modules  that  are  required  to
complete a program are physically placed together to  generate  a  executable
program file. The file can then be "loaded" at any subsequent time to run the
program.

Dynamic linking - In dynamic linking the actual task of linking is  performed
just prior to running the program and with the individual modules actually in
the memory. This approach  has  several  advantages  over  static  linking  -

    1. A single copy of a object module in the memory may form  part  of  the
    execution  image  of  several  programs,  thus  reducing  overall  memory
    requirement in the system.

    2. Size of the executable program files remain small, since the component
    modules are not physically copied in that file.

    3. Actual requirement of having individual modules in a  program  may  be
    determined at run time of a program and linking can be done  accordingly.
    This happens when a  program  takes  a  particular  course  of  execution
    involving certain object modules, depending on some run-time  conditions,
    such as user options. Dynamic execution can allow a  program  to  control
    the choice of modules to be linked in a particular run.


CASE STUDY - I : Dynamic Linking in UNIX (Sun Solaris)

    Sun Solaris implements both static and dynamic linking facilities. There
are two utilities that facilitate this : ld - the link editor (see: man ld),
and /usr/lib/ld.so.1 - the runtime linker.

    The link editor, ld, is to be used for both static and dynamic linking.
Usually the compiler driver (eg., cc for C language) invokes ld after
producing the relocatable objects by carrying out the basic language
translation. The default behaviour of ld is to go for dynamic linking, and
static linking can be insisted either by command line options or by the type
of input object files which can only be used for static linking. The input
modules to ld can be of the following types -

    1. Relocatable objects - output of the basic translation performed by a
    compiler.
    2. Archive libraries (see: man ar) - to be used for static linking.
    3. Shared objects - object modules to be used in dynamic linking.

    Depending on the type of input modules and the command options, ld can
produce the following types of output -

    1. Relocatable object - by concatenating input relocatable objects.
    2. Static executable - statically linked ready-to-run program file.
    3. Dynamic executable - by using input relocatable objects and shared
    objects. Such executable shall require a runtime linker to actually run.
    4. Shared objects - a module that can be used by a ld to create a dynamic
    executable (or another shared object) and by the runtime linker to run
    such an executable.


CASE STUDY - II : Dynamic Linking in MS-Windows

[From "Windows Internels", Matt Pietrek, 1993, Addison-Wesley Publishing
Company]

Module : All of the code, data and resources that a file presents to a
program. It is either an executable program or a Dynamic Link Library (DLL)
to be used by some program. The word module refers to the in-memory
representation of the information from the file in the disk.  Each module
primarily contains logically related code or data or both, that may form a
part of one or more programs.  A module may be created out of one or more
source files. Further a module may contain one or more segments, such as CODE
segment (one or more), DATA segment (one or more).

File Format : Windows executable files and DLL's use the NE (New Executable)
format. It is the next generation of the older MZ (Mark Zbikowski) file
format used in DOS. Other newer file formats which are also used are LE
(Linear Executables), LX, and PE. Usually, most systems follow the convention
to put a distinct number at the very beginning of a file which works as a
signature that indicates the type of the file to any application that reads
the file. In NE files this signature is the number 454Eh.  The contents of a
NE file can be viewed in a readable format using thte tools TDUMP (from
Borland) or EXEHDR (from MicroSoft).

Module Table (or Module DB) : When a program is to be run, information about
several modules may be required. Hence Windows maintain an in-memory
representation of the various tables in the corresponding NE files. This
segment, or block of memory that holds this information is called the module
table, or Module Dataase (MDB). This representation may not be an exact copy
of the image present in the files, since in the file one important purpose is
permanent backup for repeated cration of the module, whereas in case of the
in-memory representation an iimportant consideration is efficiency.

Module Handle : A global handle to the module table of a particular module is
called the module handle. It is used to locate the various components of a
module in the memory.

Information in the NE file header and the Module Table :

    1. Segment Table : Number of segments in the module and their attributes,
    viz., type (code, data, etc.) size, offset in the file, whether
    relocation is required, etc. In the MDB (in-memory representation of
    these tables), there is also the "segment selector" value for each
    segment, which is the starting address of the segment in the memory. This
    value is used to initialise the appropriate segment registers inside the
    programs.

    2. Module Reference Table : List of other modules that are required by
    this module.

    3. Imported Names Table : List of external names that this module uses.
    Normally, most external objects' references are converted to their
    entry numbers in the modules where they are respectively defined. Still
    some names may remain and require to be resolved by the loader.

    4. Entry Table : For each segment, the offset (from the begining of the
    segment) of each object (i.e., function or data) that is defined in this
    module. The name of the objects are not included in this table.

    5. Non-resident Name Table : For situations where some other module may
    use at run-time, the name of an object defined in this module for
    linking, the names of the object along with their entry number in the
    Entry Table is maintained separately. Since this requirement is
    relatively rare, hence this table is not loaded by default in the
    in-memory Module table. It is loaded only when the requirement actually
    arises.

    6. Resident Name Table : Certain names pertaining to this module are
    likely to be frequently required by other modules. These names are
    included in this table. Eg., the name of the Module itself.

    7. Relocation Records : For each segment the locations (offsets from the
    start of the segment) where an external object's reference is present
    that needs to filled up by the loader. In each such record the module
    name and the entry number of the object in the Entry Table of that module
    is recorded, so that the loader can easily locate that entry in that
    module at rn-time.

    8. Resource Table

Working Principles : After the source files are translated to machine
language, the translated versions of the files (object files) that are to be
combined to form a module are linked together by a "linker" to resolve cross
references amongst themselves (in case of UNIX, this is done by the "linkage
editor"). However, the module thus formed may still have references that can
only be resolved when linked to other modules. This task of resolving
dependencies between modules is carried out by a "loader" when a program is
about to be executed (in UNIX this is done by the "runtime linker"). But the
linker must prepare the modules in such a way that the task of the loader can
be efficiently accomplished at runtime. First of all, when the linker
generates an EXE or a DLL module from a set of given files, it is provided
information regarding which other modules may be used to resolve the
inter-module references from the target module (i.e., the module being
generated). This information is either available in some specified import
library (e.g. IMPORT.LIB or LIBW.LIB) or in the IMPORTS section of the DEF
file. Only those external references (names) are allowed to remain in the
module being generated which are mentioned in the given import library or
IMPORTS section.  The import library or IMPORTS section contains the names of
functions or data objects, the module where each is defined, and an entry
ordinal number to locate it within the module. The linker uses this
information to prepare "fixup records" for each external symbol reference in
each segment of the target module. The fixup records contains the offset
within a segment where an external symbol's address is required, the name of
the module that contains the symbol definition, and the entry ordinal number
of the symbol in that module.

When a new program is to be executed, the loader loads the EXE module of that
program (if it is not already loaded). This means the segments of the EXE
module are allocated memory, and the in-memory representation of the NE file
headers is created which is called the Module Table or the Module Data Base.
From the MDB, the other required modules are identified from the Module
Reference Table. These modules are also loaded (if they are not already
loaded). This may go on recurssively. Once all the required modules are
loaded, the external references in each of them are resolved by using the 
Relocation records in each MDB and looking up the Entry Table of various
modules that are mentioned in the relocation records.

Windows also support explicit Dynamic Linking, i.e., a users program can
determine the address of an external function and call that function using
the function pointer. To do this the function GetProcAddress() can be called
with the module handle and the function name as parameters.

Some Dynamic linking primitives :

    GetModuleHandle()
    GetModuleFilename()
    GetExePtr()
    GetProcAddr()


Example :

Portion of TDUMP Output of PBRUSH.DLL

Segment Table                     Offset: 00C0h
    Segment Number: 01h
    Segment Type:   CODE          Alloc Size : 16BAh
    Sector Offset:  0019h         File Length: 16B9h
    Attributes: Relocations

    Segment Number: 02h
    Segment Type:   DATA          Alloc Size : 0180h
    Sector Offset:  019Ah         File Length: 00CEh
    Attributes: Moveable  Shareable  Preloaded

No Resource Table present

Resident Name Table               Offset: 00D0h
    Module Name: 'PBRUSH'

Module Reference Table            Offset: 00DAh
    Module  1: KERNEL
    Module  2: GDI

Imported Names Table              Offset: 00DEh
    name                                 offset
    KERNEL                                0001h
    GDI                                   0008h

Entry Table                       Offset: 00EAh
  Fixed Segment Records (  8 Entries)    Segment: 0001h
    Entry    1: Offset: 0075h   Exported   Single data
    Entry    2: Offset: 009Eh   Exported   Single data
    Entry    3: Offset: 0208h   Exported   Single data
    Entry    4: Offset: 0220h   Exported   Single data
    Entry    5: Offset: 0341h   Exported   Single data
    Entry    6: Offset: 0541h   Exported   Single data
    Entry    7: Offset: 0770h   Exported   Single data
    Entry    8: Offset: 0D3Ch   Exported   Single data

Non-Resident Name Table           Offset: 0105h
    Module Description: 'Virtual bitmap manager'
    Name VSTRETCHBLT                   Entry:     6
    Name WEP                           Entry:     1
    Name DISCARDBAND                   Entry:     8
    Name VDELETEOBJECT                 Entry:     7
    Name VPATBLT                       Entry:     4
    Name VBITBLT                       Entry:     5
    Name GETVCACHEDC                   Entry:     3
    Name VCREATEBITMAP                 Entry:     2

Segment Relocation Records
    Segment 0001h relocations
    type    offset      target
    PTR     0061h       KERNEL.4
    PTR     0831h       KERNEL.132
    PTR     0B8Dh       KERNEL.5
    PTR     0138h       KERNEL.6
    PTR     0BF6h       KERNEL.7

        ... rest of file omitted