A set of i/o modules is a key element of a computer system.

Input/Output Organization

The computer system’s input/output (I/O) architecture is its interface to the outside world.
Till now we have discussed the two important modules of the computer system – The processor and The memory module.

The third key component of a computer system is a set of I/O modules

Each I/O module interfaces to the system bus and controls one or more peripheral devices.

There are several reasons why an I/O device or peripheral device is not directly connected to the system bus. Some of them are as follows –

There are a wide variety of peripherals with various methods of operation. It would be impractical to include the necessary logic within the processor to control several devices.

The data transfer rate of peripherals is often much slower than that of the memory or processor. Thus, it is impractical to use the high-speed system bus to communicate directly with a peripheral.

Peripherals often use different data formats and word lengths than the computer to which they are attached.
Thus, an I/O module is required.

Input/Output Modules

The major functions of an I/O module are categorized as follows –

Control and timing

Processor Communication

Device Communication

Data Buffering

Error Detection

During any period of time, the processor may communicate with one or more external devices in unpredictable manner, depending on the program’s need for I/O.

The internal resources, such as main memory and the system bus, must be shared among a number of activities, including data I/O.

Control & timings:

The I/O function includes a control and timing requirement to co-ordinate the flow of traffic between internal resources and external devices.

For example, the control of the transfer of data from an external device to the processor might involve the following sequence of steps –

The processor interacts with the I/O module to check the status of the attached device.
The I/O module returns the device status.
If the device is operational and ready to transmit, the processor requests the transfer of data, by means of a command to the I/O module.
The I/O module obtains a unit of data from external device.
The data are transferred from the I/O module to the processor.
If the system employs a bus, then each of the interactions between the processor and the I/O module involves one or more bus arbitrations.

Processor & Device Communication

During the I/O operation, the I/O module must communicate with the processor and with the external device.

Processor communication involves the following –

Command decoding :

The I/O module accepts command from the processor, typically sent as signals on control bus.

Data :

Data are exchanged betweeen the processor and the I/O module over the data bus.

Status Reporting :

Because peripherals are so slow, it is important to know the status of the I/O module. For example, if an I/O module is asked to send data to the processor(read), it may not be ready to do so because it is still working on the previous I/O command. This fact can be reported with a status signal. Common status signals are BUSY and READY.

Address Recognition :

Just as each word of memory has an address, so thus each of the I/O devices. Thus an I/O module must recognize one unique address for each peripheral it controls.

On the other hand, the I/O must be able to perform device communication. This communication involves command, status information and data.

Data Buffering:

An essential task of an I/O module is data buffering. The data buffering is required due to the mismatch of the speed of CPU, memory and other peripheral devices. In general, the speed of CPU is higher than the speed of the other peripheral devices. So, the I/O modules store the data in a data buffer and regulate the transfer of data as per the speed of the devices.

In the opposite direction, data are buffered so as not to tie up the memory in a slow transfer operation. Thus the I/O module must be able to operate at both device and memory speed.

Error Detection:

Another task of I/O module is error detection and for subsequently reporting error to the processor. One class or error includes mechanical and electrical malfunctions reported by the device (e.g. paper jam). Another class consists of unintentional changes to the bit pattern as it is transmitted from devices to the I/O module.

There will be many I/O devices connected through I/O modules to the system. Each device will be indentified by a unique address.

When the processor issues an I/O command, the command contains the address of the device that is used by the command. The I/O module must interpret the addres lines to check if the command is for itself.

Generally in most of the processors, the processor, main memory and I/O share a common bus(data address and control bus).

Two types of addressing are possible –

Memory-mapped I/O
Isolated or I/O mapped I/O

Memory-mapped I/O:

There is a single address space for memory locations and I/O devices.

The processor treats the status and address register of the I/O modules as memory location.

For example, if the size of address bus of a processor is 16, then there are 216 combinations and all together 216 address locations can be addressed with these 16 address lines.

Out of these 216 address locations, some address locations can be used to address I/O devices and other locations are used to address memory locations.

Since I/O devices are included in the same memory address space, so the status and address registers of I/O modules are treated as memory location by the processor. Therefore, the same machine instructions are used to access both memory and I/O devices.

Isolated or I/O -mapped I/O:

In this scheme, the full range of addresses may be available for both.

The address refers to a memory location or an I/O device is specified with the help of a command line.

In general command line is used to identify a memory location or an I/O device.

if =1, it indicates that the address present in address bus is the address of an I/O device.

if =0, it indicates that the address present in address bus is the address of a memory location.

Since full range of address is available for both memory and I/O devices, so, with 16 address lines, the system may now support both 2 16 memory locations and 2 16 I/O addresses.

You may be interested in:
Computer Organization and Architecture – MCQs.
Computer Organization and Architecture Online Tests

LEARNING OBJECTIVES

• To provide an overview of the principles of input-output (I/O) system organisation and the related design of I/O modules.
• To enable a clear understanding of interrupts and their types, and the hardware and software needed to support them.
• To detail the different types of I/O operations, including direct memory access (DMA) for high-speed devices and I/O processors (IOPs) for large systems.
• To facilitate an understanding of buses, their different types (PCI (peripheral component interconnect), SCSI (small computer system interface), USB (Universal Serial Bus)), and distinctive features along with the corresponding bus designs.
• To detail the features of FireWire, an advanced high-speed and high-performance serial bus for digital audio and video pieces of equipment.
• To introduce the reader to InfiniBand, an advanced high-speed communication link for modern distributed systems, cluster architectures, and high-end servers.
• To discuss ports and their different types, including serial, parallel, and USB ports.

The synergy between computers and communications is at the heart of today's information technology revolution. Apart from the CPU (central processing unit) and a variety of memories constituting the internal memory system, the third key resource of a modern computer system is a set of various I/O modules attached with numerous peripheral devices that can now be interfaced to the system bus or central control to communicate with the main system for migration of information from the outside world into the computer or from the computer to the outside world. Besides, most computers also have a wired or wireless I/O connection to the Internet to communicate with the external world. Thus, I/O modules, on the one hand, work in tune with the CPU and main memory, and on the other hand, they establish a connection with the external peripheral devices (mostly electro-mechanical) to execute physical I/O operations. The continuous development in the design of I/O modules and its allied interfaces makes it more and more intelligent, and consequently enhances the performance of the computer system to which it is attached. Large computer systems nowadays have such powerful I/O modules that an I/O module can be itself treated as a stand-alone full-fledged computer system. This chapter presents an overview of the I/O organisation detailing different types of I/O modules and their interfaces with the main computer systems, including the buses and ports, and their structures and functions, and above all, the mechanisms employed to control these devices to achieve certain predefined goals.

FIGURE 5.1

Block diagram of an external device along with its I/O interface.

Input–Output System

The I/O system performs the task of transferring information between main memory or the CPU and the outside world. Figure 5.1 shows a model of an I/O system, which includes an I/O module that works as a mediator to communicate between the high-speed CPU or memory and the much slow-speed electro-mechanical devices of different types connected by the well-defined links (buses). This link is used to communicate control, status, and physical data between the I/O modules and the external devices, as shown in Figure 5.1, for realizing desired I/O operation. Each of these different types of devices as connected with the system must have a control logic of its own that will manage the operation of the device as directed by the I/O module. The device should also have necessary electronic circuits in the form of a transducer to convert the I/O data into appropriate forms for execution. Usually, a small buffer is associated with the transducer to temporarily hold the data during the data transfer operation to support smooth coordination between the slower I/O devices and the relatively faster memory or CPU.

Nowadays, the I/O organisation is considered as one of the key parameters to characterize a particular generation of computers, which in turn determines the class and category of the computer and the devices attached to it. In fact, the differences between a small (micro, mini) and a large (supermini, mainframe, supercomputer) system, besides other characteristics, mostly depend on the capability and capacity of its attached I/O module, the amount of hardware available to communicate with its peripherals, and also the number of peripherals that could be hooked up with it.

I/O Module: I/O Interface

The primary responsibility of the I/O module and its associated interfaces is to resolve the differences (mainly due to the devices with various characteristics having different types of operations with numerous data formats and codes along with the different disparities in the data transfer rate with the CPU and memory) that exist between the central computer system and the different types of peripheral devices attached with it. To perform the responsibilities as described, a set of useful functions is executed by the I/O module (I/O interface) at the time of data transfer that are two fold in nature: CPU-I/O module handshaking and I/O module device negotiation. CPU-I/O communication is mostly carried out via memory which is located centrally (Figure 5.1), supporting the CPU, on the one hand, and communicating with the I/O module, on the other hand. Fundamentally, there exist many different methods, consisting of various arrangements of different sets of lines (buses) that connect the I/O modules with the memory in a system. Of course, each of these methods has its own performance merits and drawbacks, and the related cost implications also vary over a wide range.

A brief detail of I/O module, its different functions, and its communication with memory is given in the website: http://routledge.com/9780367255732.

I/O Module Design

The ultimate target of I/O module design is to what extent it can relieve and release the CPU from the tedious burden of I/O activities (non-intelligent activity), and hence, this design varies considerably depending on the amount of intelligence to be injected into an I/O module so that how much burden of an entire I/O operation it can bear independently using the different types of external peripherals attached with it. The I/O logic within this module interacts with the CPU (or rest of the computer system) hiding the details of the devices via a set of control lines (e.g. system bus lines), as shown in Figure 5.2. These lines are also used by the CPU to communicate with I/O modules in the form of issuing commands. Other control lines attached with the I/O logic are used to control and drive the peripherals. In addition, every I/O module is also equipped with a set of addresses; each address in the set is the address (identification) of the peripheral attached to the module.

FIGURE 5.2

Schematic I/O module between CPU (rest of the system) and devices.

If an I/O module is quite primitive and needs a detailed control from the CPU during I/O operation, this type of I/O module is usually referred to as an I/O controller or device controller, which again may be of various types depending on their intelligence and capabilities, and to the extent they require assistance from the CPU and its involvement in the execution of I/O operation. If the I/O module is equipped to handle most of the detailed I/O processing burdens on its own setting the CPU aside, relieving the CPU totally from the headache of I/O operations, presenting to the CPU a high-level interface, then this type of I/O module is generally referred to as an I/O channel or I/O processor.

A brief detail of I/O module design is given in the website: http://routledge. com/9780367255732.

Types of I/O Operations: Definitions and Differences

I/O organisations (I/O systems) are usually distinguished by the extent to which the CPU is involved in the execution of I/O operations and is relieved from I/O activities. The CPU merely executes the I/O instructions and may accept the data temporarily, but the ultimate transfer of information to and from external devices involves the memory unit. That is why, the data transfer between the central computer and I/O devices may be handled in a variety of ways, mainly, either between an I/O device and the CPU or between an I/O device and main memory. This gives rise to four different I/O schemes, which are in current use:

A. Programmed I/O (PIO);

B. Interrupt-driven I/O;

C. DMA;

and, the ultimate target is to find out a technique to completely free the CPU from any involvement in I/O operations with the introduction of

D. IOP or I/O channel.

Programmed I/O (Using Buffer)

If I/O operations are completely controlled by the CPU (i.e. the CPU executes the programs that initiate, direct, and terminate the I/O operations), the computer is said to be using PIO. Here, data is exchanged directly between the CPU and the I/O module using the accumulator, and the buffer register is connected to the selected I/O device. The CPU is totally involved in this I/O activity, thus initiating, controlling, and executing entire I/O operations (excluding the physical read/write from and to devices), including memory interactions till the entire I/O completion. This, however, causes a huge waste (unproductive) of costly CPU time. Each I/O device connected to the system via I/O modules has a unique address or identifier which CPU uses when it issues an I/O command. Each I/O module must then interpret the address lines to determine whether the command is for itself. Under this scheme, I/O devices, main memory, and the CPU are normally connected via a common shared bus (system bus). This gives rise to the possibility of two modes of I/O addressing: (i) memory-mapped I/O and (ii) I/O-mapped I/O (isolated I/O). In memory- mapped I/O, the address line of the system bus that is used to select main memory locations can also be used to select I/O devices. Each junction between the system bus and I/O device is called an I/O port and is assigned a unique address. The I/O port includes a data buffer register, which is here a part of main memory address space (Motorola 68000 series). That is why, it is called memory-mapped I/O. A memory reference instruction that causes data to be fetched from or stored at address X automatically becomes an I/O instruction if X indicates the address of an I/O port. For most CPUs of this type, a relatively large set of different instructions is available for referencing memory. An advantage of memory-mapped I/O is that this large repertoire of instructions can also be used for I/O activities, thereby enabling more efficient programming. The second strategy in organisation is sometimes called I/O-mapped I/O (or isolated I/O), where the memory and I/O address space are kept totally separate. Here, the I/O port includes a data buffer register which is associated with the I/O module (devices), and memory reference instructions do not affect the I/O devices, but separate I/O instructions are required to activate the I/O. Consequently, an I/O device and a main memory location may have the same address. If I/O-mapped I/O (isolated I/O) strategy is used, there are only a few I/O instructions. This scheme is used in INTEL 8085 (using 8255A chip) and 8086 series of processors.

More details of PIO, and a comparison between the two modes, are given in the website: http://routledge.com/9780367255732.

Page 2

The primary disadvantage of PIO is that the CPU is totally involved in the slow I/O operation, and spends most of its time remaining idle called busy waiting. The way to get rid of busy waiting is to have the CPU issue an I/O command to an I/O module as usual to start the I/O device, and tell the I/O module to generate an interrupt when it is done. While the I/O module starts working, the CPU is no longer involved in this activity and would proceed to do its some other useful work. The I/O module will interrupt the CPU at the right time to request service when it is ready to exchange data with the CPU. The CPU would then again get involved, leaving its own ongoing processing, and execute the data transfer as usual, and after then would go back to resume its former processing. This approach known as interrupt-driven I/O can be accomplished by setting the INTERRUPT ENABLE bit in a device register; the software would expect that the hardware will give it a signal when the I/O module would request.

Although interrupt-driven I/O is more efficient, and is a big step forward from PIO due to eliminating needless waiting of the CPU, it is still far from optimal. The problem is that every time an interrupt occurs, it requires the CPU to get involved for every word (character) of data that goes from memory to I/O module or from I/O module to memory, and interrupt processing then becomes expensive. However, the Intel 8259A chip supports interrupt-driven I/O that can handle up to 8 I/O modules, and a cascade arrangement can extend it to handle even up to 64 modules.

A brief detail of interrupt-driven I/O and various forms of interrupt implementations are given in the website: http://routledge.com/9780367255732.

Interrupt-Driven I/O: Design Issues

In implementing interrupt-driven I/O, two aspects in the design are to be considered.

a. There will almost invariably be multiple I/O modules connected to the system. How the CPU would determine which device (source) issued the interrupt;

b. If multiple interrupts have occurred, how the CPU would decide the order of their processing (priority).

These two aspects are logically linked with each other because multiple interrupts issued by multiple devices may be attached with single I/O module or may be attached with multiple I/O modules. The first task of the interrupt system is to identify the source of the interrupt. There is also a high probability that several sources request interrupt services simultaneously, or while an interrupt routine is running, a second I/O device wants to generate its interrupt. The solutions consist of techniques commonly in use, which fall broadly into four general categories:

i. Multiple interrupt lines (independent requesting);

ii. Software poll;

iii. Daisy Chaining (vectored, hardware poll);

iv. Bus arbitration (vectored).

53.2.1.1 Multiple Interrupt Lines

Providing multiple interrupt lines between the CPU and I/O modules is probably the most straightforward approach towards the solution of this problem. This method also corresponds to independent requesting of interrupts issued from devices. This scheme actually uses separate BUS REQUEST and BUS GRANT lines for each unit sharing the bus. The bus control unit immediately identifies all requesting units individually and is able to respond very rapidly to requests for bus access. Priority is also determined by the bus control unit, and may be programmable. While this method has some distinct advantages, it also suffers from severe drawbacks. The PDP-11 Unibus system has implemented this technique (bus control) by combining this independent requesting with a daisy chaining (hardware) approach (discussed next).

A brief detail of the different design approaches of interrupt-driven I/O is given in the website: http://routledge.com/9780367255732.

53.2.1.2 Priority and Level: Its Determination

The categories (ii), (iii), and (iv) as mentioned above are significant when several devices (masters and/or slaves) connected to a shared bus (common bus) would simultaneously issue interrupts, requesting access to CPU at the same time. The problem of selecting one I/O device to service, from many such devices that have generated interrupts, bears a strong resemblance to the bus arbitration process for bus control. In fact, these categories belong to essentially different individual-selection mechanisms that determine appropriate selection considering the priority of each individual interrupt being issued, while negotiating such concurrent competing requests. Priority of simultaneous interrupts can be realized by the techniques implemented by software, hardware, or a combination of both.

53.2.1.2.1 Polling (Software Approach) The software method to identify the highest-priority interrupt is realized by a polling procedure. In this method, there is one common branch address for all interrupts. The program that takes care of interrupts begins at the branch address and polls the interrupt sources in the order of their priority. The highest-priority source is tested first, and if its interrupt signal is on, control branches to a service routine for this source; otherwise, the next-lower-priority source is tested, and so on. The particular serviced routine thus reached belongs to the highest-priority device among all devices that interrupted the computer at that instant. As depicted in Figure 5.3, this method uses a set of lines called poll count lines which are directly connected to all units in parallel on the

FIGURE 5.3

Polling priority interrupt.

bus. When a unit requests access to the bus, it puts a signal on the common BUS REQUEST line. In response to the received signal, the bus controller proceeds to generate a sequence of numbers on the poll count lines. These numbers, which can be thought of as device addresses, are compared by each unit with the unique address already assigned to that unit. When a requesting device Ij finds that its address matches the number on the poll count lines, it activates the BUS BUSY. The bus controller responds bj terminating the polling process and Ij connects to the bus. The priority of a unit, however, is clearly determined by the position of its address in the polling sequence, which can be altered under program control, and hence does not depend on the physical position of the device unit on the bus.

Advantages

i. The sequence of numbers generated by the bus controller to match the unique unit address is normally programmable (the poll count lines are connected to a programmable register); hence, selection priority can be altered under program control;

ii. A failure in one unit need not affect any of the other units.

Disadvantages

i. The cost of more control lines (к poll count lines instead of one BUS GRANT line) to achieve flexibility (independence) over the connected units;

ii. The number of units that can share the bus is limited by the addressing capability of the poll count lines;

iii. If there are many interrupts, the time required to poll them may exceed the time permitted to service the I/O device. In this situation, a hardware-priority-interrupt unit can be used to speed up the operation.

Intel 8259A is one such programmable interrupt controller being used with the respective CPUs that alone manages the interrupts with priorities issued by the numerous devices.

5.3.2.2.2.2 Daisy Chaining (Hardware-Serial Connection) The hardware-based priority function can be established by either a serial or a parallel connection of interrupt lines. The serial connection is also known as the daisy chaining method. Under this interrupt system environment, there is a hardware-priority-interrupt unit that functions as an overall manager performing the responsibility of:

i. Accepting interrupt requests from many sources;

ii. Determining which of the incoming requests has the highest priority;

iii. Accordingly issuing an interrupt request to the computer, based on this determination.

As shown in Figure 5.4, all the devices in this system are connected to a common BUS REQUEST line. The unit requesting the bus service issues an interrupt to activate this line by sending a signal to the bus control unit. The bus control unit responds to a BUS REQUEST signal only if BUS BUSY is inactive. This response takes the form of a signal placed on the BUS GRANT line. On receiving the BUS GRANT signal, the requesting unit enables its physical bus connections and activates BUS BUSY for the duration of its new bus activity. Since each interrupt source (device) has its own interrupt vector, it can then directly access its own interrupt service routine. As the BUS GRANT line is connected serially from unit to unit, if two units simultaneously request bus access, the one closest to the bus control unit receives the BUS GRANT signal first and gains access to the bus. Selection priority is therefore completely determined by the order in which the units are linked together (chained) over the BUS GRANT line. The device with the highest priority is placed in the first position (closest to the bus control unit), followed by lower-priority devices up to the device with the lowest priority, which is placed last in the chain.

Advantages

i. Daisy Chaining is much faster than any software control and requires very few control lines employing a very simple arbitration algorithm;

ii. It can be used essentially with an unlimited number of devices.

FIGURE 5.4

Daisy chain priority interrupt.

Disadvantages

i. Since priority is wired-in, the priority of each unit cannot be changed under program control, hence offering less flexibility;

ii. If bus requests are generated at a sufficiently high rate, a high-priority device can lock out a low-priority device;

iii. There is a susceptibility to failures involving the BUS GRANT line and its associated circuitry. If the first device is unable to propagate the BUS GRANT signal, then all other devices cannot gain access to the bus.

5.3.2.1.2.3 Bus Arbitration Technique This technique makes use of vectored interrupts. With bus arbitration, an I/O module by any means must first gain control of the bus (by raising the interrupt request line) before it can issue the interrupt. Thus, only one module can raise the line at a time. When the CPU detects the interrupt, it can easily identify the source and responds on the interrupt acknowledge line. The requesting module then places its vector (address of the service routine) on the data lines.

A brief detail of the different design approaches of interrupt-driven I/O as discussed is given in the website: http://routledge.com/9780367255732.

Page 3

The problem still remains; interrupts at different stages are required: more for PIO and relatively less for highly efficient interrupt-driven I/O, and processing an interrupt done only by CPU is, however, expensive. In addition, for both the cases, all data transfers also involve CPU and must be routed only through the CPU. As a result, the costly CPU time is badly wasted that causes an adverse impact on CPU activity, and also limits the I/O transfer rate. A befitting mechanism is thus needed that would target to ultimately relieve and release the CPU to a large extent from this hazardous time-consuming I/O-related activities, and that can be accomplished by simply letting the peripheral devices themselves to directly manage the memory buses without the intervention of the CPU. This would definitely improve not only the speed of data transfer but also the overall performance of the system, as the CPU is now released to carry out its own useful work.

Definition

With the development of hardware technology, the I/O device (or its controller) can be equipped with the ability to directly transfer a block of information to or from main memory without CPU's intervention. This requires that the I/O device (or its controller) is capable of generating memory addresses and transferring data to or from the system bus: i.e. it then must be a bus master. The CPU is still responsible for initiating each block transfer, and the I/O device controller can then carry out the physical data transfer without further program execution by the CPU. The CPU and I/O controller interact only when the CPU must yield control of the system bus to the I/O controller in response to a request issued from the controller. This request is in the form of an interrupt, to be serviced by the CPU. The CPU is now involved to process the interrupt. After servicing the interrupt, the CPU is no longer there, and it can then go back to resume the execution of its previously ongoing executing program. The access of the bus is now transferred and is now rest with the controller (requesting device) which then completes the required number of cycles for the data transfer, and then again hands over the control of the bus back to CPU. This type of I/O capability is called direct memory access (DMA). For large volumes of data transfer, DMA technique is found much faster than if it is carried out by the CPU, and is observed to be adequately efficient. To implement DMA and interrupt facilities, most computers may then require the system's I/O interface to contain special DMA and interrupt control units.

Essential Features

Figure 5.5 shows a block diagram indicating how the DMA mechanism works.

• The I/O device is connected to the system bus via a special interface circuit called a DMA controller;
• The DMA controller contains a data buffer register IODR (input-output data register) to temporarily store the data, just like in the case of PIO. In addition, there is an address register IOAR (input-output address register), and a data count register DC;
• The IOAR is used to store the address of the next word to be transferred. It is automatically incremented after each word transfer;
• The DATA count register DC stores the number of words that remain to be transferred. It is automatically decremented after each transfer and tested for zero. When the data count reaches zero, the DMA transfer halts;
• These registers allow the DMA controller to transfer data to and from a contiguous region of main memory.

FIGURE 5.5

Schematic diagram of DMA I/O.

The controller is normally provided with an interrupt capability, and in this situation, it sends interrupts to the CPU to signal the beginning and end of the data transfer. The logic necessary to control the DMA activities can easily be placed in a single integrated circuit. DMA controllers are available that can supervise DMA transfers involving several I/O devices, each with a different priority of access to the system bus.

Processing Details

When the CPU intends to receive or send (read or write) a block of data, it issues a request to the DMA module with certain information, and the DMA transfer operation then proceeds as follows:

• The read or write control line between the processor and the DMA module is used by the CPU to intimate the DMA whether a read or write is requested;
• The identification (address) of the I/O device to be involved is intimated by the CPU to the DMA, and is communicated on the data line;
• The CPU executes two I/O instructions, which load the DMA registers: the IOAR with the base address of the main-memory region to be used in the data transfer and the DC with the number of words to be transferred to or from this region;
• When the DMA controller is ready to transmit or receive data, it activates the DMA REQUEST line to the CPU. The CPU then activates DMA ACKNOWLEDGE and there after relinquishes control of the data and address lines, and gets back its own work. The CPU now waits for the next DMA breakpoint. In fact, DMA REQUEST and DMA ACKNOWLEDGE are essentially BUS REQUEST and BUS GRANT lines for the system bus. Simultaneous DMA requests from several DMA controllers, if any, can be resolved by using one of the bus-priority control techniques;
• The DMA controller now transfers data directly to or from the main memory. After a word is transferred, IOAR and DC are incremented and decremented, respectively;
• If the DC is decremented to zero, the DMA controller again relinquishes control of the system bus. It may also send an interrupt signal to the CPU, and the CPU responds either by halting the I/O device or by initiating a new DMA transfer;
• If the DC is not decremented to zero, but the I/O device is not ready to send or receive the next batch of data, the DMA controller returns control to the CPU by releasing the system bus and deactivating the DMA REQUEST line. The CPU responds by deactivating DMA ACKNOWLEDGE and resumes normal operation.

Intel 8257 chip supports four DMA channels, by which four peripheral devices can independently request for DMA data transfer at a time. The DMA controller has 8-bit internal data buffer, a read/write unit, a control unit, a priority-resolving unit along with a set of registers. Intel 8237 critically differs architecturally from 8257 and provides a better performance compared to 8257. It is an advanced programmable DMA controller capable of transferring a byte or a bulk of data between system memory and peripherals in either direction. Memory-to-memory data transfer facility is also available in this chip. This DMA controller can be interfaced to the processor family 80x86 with DRAM memory. Similar to 8257, the 8237 also supports four independent DMA channels (numbered 0, 1, 2, and 3), which may be expanded to any number, by cascading more number of 8237.

Here, each channel can be programmed independently, and any one of the channels may be made individually active at any point of time. But the distinctive feature of this chip is that it provides many programmable controls and dynamic reconfigurability attributes, which eventually enhance the data transfer rate of the system remarkably.

Many CPUs like those of the MC 68000 series have no internal mechanism for resolving multiple DMA requests-, this must be done by external logic. The DMA controller 68450 chip contains four copies of the basic DMA controller logic that enables the 68450 to carry out a sequence of DMA block transfers without reference to the CPU. When the current data count reaches zero, a DMA channel that has been programmed for chained DMA transfer (as mentioned in Bullet 4) fetches the new value of DC and IOAR from a memory region (MR) that stores a set of DC-IOAR pairs. A special memory address register in every DMA channel holds the base address of MR.

DMA is subsequently accepted as a standard approach, commonly used in all personal computers, minicomputers, and mainframes for carrying out I/O activities.

Different Transfer Types

Under DMA control, data can be transferred in one of the several following ways:

1. DMA block transfer: This type transfers a sequence of data word of arbitrary length in a single continuous burst when the DMA controller is the master of the system bus. Block DMA supports the maximum I/O data transmission rate, but it may cause the CPU to remain inactive for relatively longer periods. Auto-initialization may be programmed in this mode. This DMA mode is particularly required by secondary memory devices like magnetic disk drives where data transmission cannot be stopped or slowed down without loss of data, and block transfers are the norm.
2. Cycle stealing: This approach allows the DMA controller to use the system bus interspersed with CPU bus transactions while transferring long blocks of I/O data by a sequence of DMA bus transactions. During these cycles, the CPU will have to wait to get the control of the bus because DMA always has a higher bus priority than the CPU, as I/O devices cannot tolerate delays. The process of taking bus cycles away from the CPU by a DMA controller, or by way of forcing the processor to temporarily suspend its operation, is called cycle stealing. Cycle stealing not only reduces the maximum I/O transfer rate, but also reduces the interference by the DMA controller in the CPU's activities. It is possible to completely eliminate the interference by designing the DMA interface so that the bus cycles are to be stolen only when the CPU is not actually using the system bus. This is known as transparent DMA. Thus, by varying the degrees of overlap between CPU and DMA operations, it is possible to accommodate many I/O devices having different data- transfer characteristics.
3. Demand transfer: In this mode, the device continues transfers until DC (count) is reached zero, or an external condition (end of process) is detected, or DMA REQUEST signal goes inactive.
4. Cascading: In this mode, more than one 8237 can be connected level-wise together to the host 8237 to provide more than four DMA channels. The priorities of the DMA requests, however, may be preserved at each level.

Implementation Mechanisms: Different Approaches

The DMA mechanisms can be implemented in a variety of ways.

1. Here, the DMA module and the I/O devices individually share the system bus with the CPU and memory as shown in Figure 5.6. The DMA module is acting here as a surrogate (a substitute) for the CPU. The DMA controller uses PIO to exchange data between memory and an I/O device. Like the CPU-controlled PIO, this approach also requires two bus cycles for each transfer of a word. This configuration, while it may be looked inexpensive, is clearly inefficient also;
2. This drawback of consuming more bus cycles at the time of data transfer can be reduced substantially if the DMA and I/O functions are integrated. This means that there is a separate path between the DMA module and one or more I/O modules that does not include system bus. This is shown in Figure 5.7. Here, the DMA logic may be a part of an I/O module or may be a completely separate module that controls one or more I/O modules;
3. The approach already mentioned in (2) can be modified one step further by connecting I/O modules to the DMA module using an I/O bus. The transfer of data between the DMA and I/O modules can then take place off the system bus and the system bus will be used by the DMA module only at the time of exchanging data with the memory. This approach will reduce the number of I/O interfaces in the DMA module to one, and at the same time offers an easily expandable configuration. Figure 5.8 shows a schematic design of this approach.

FIGURE 5.6

Single bus: DMA detached from 1/0.

FIGURE 5.7

Single bus: DMA-I/O integrated.

FIGURE 5.8

DMA-I/O with separate I/O bus.

Introduction

While the introduction of a DMA controller in the I/O nodule is a radical breakthrough, it is after all not able to totally freeing the CPU. Moreover, DMA sometimes uses many bus cycles at a time, as in the case of a disk I/O, and during these cycles, the CPU will have to wait for bus access (as DMA always has a higher bus priority) that summarily restricts the performance to attain the desired level. Further development is thus targeted in quest of an enhanced I/O module so that this modified I/O module could control the entire I/O operation on its own, setting the CPU totally aside. This type of I/O module that almost fully relieves the CPU from the burden of I/O execution is often referred to as an I/O channel. The final and ultimate approach is then to convert this I/O channel to a full-fledged processor so that the CPU can now be relieved almost totally. This is accomplished by including a local memory to this I/O channel so that it can manage a large set of different devices with minimal or almost no involvement of the CPU. This module then basically consists of a local memory attached with a specialized processor and includes I/O devices. It shows that this unit as a whole then itself becomes a stand-alone computer. An I/O module having this kind of architecture is known as an I/O processor (IOP). An IOP can perform several independent data transfers between main memory and one or more I/O devices without recourse to the CPU. Usually, an IOP is connected to the devices it controls, by a separate bus system called the I/O bus or I/O interface. It is not uncommon for larger systems to use small computers as IOPs which are primarily communication links between I/O devices and main memory, and hence the use of the term channel for IOP. The IOPs are also called peripheral processing units (PPUs) to emphasize their subsidiary roles with respect to the CPU.

A channel or IOP is essentially a dedicated computer with its own instruction set processor to independently carry out entire I/O operations along with other processing tasks, such as arithmetic, logic, branching, and code translation required mostly for I/O processing. The CPU only initiates an I/O transfer by instructing the I/O channel to execute a specific program available in main memory, and then the CPU goes off. This program will indicate the device or devices to be taken, the area or areas of memory for storage, the priority, and the different types of actions to be taken in case of certain error conditions. The I/O channel uses this information and executes the entire I/O data processing, while CPU is fully devoted to its own work in parallel. When I/O activity is over, the channel interrupts the CPU and sends only all related necessary information. Traditionally, the use of I/O channels has been observed to be associated with mainframe or large-scale system attached with a large number of peripherals

(disks and tape storage devices), which are used simultaneously by many different users in multitasking as well as in on-line transaction processing (OLTP) environment handling bulk volume of data. As the development of chip-based microprocessors has dramatically progressed, the use of I/O channels has now extended to minicomputers and even to microcomputers. However, the fully developed I/O channel is best studied on the mainframe system, and possibly the best-known example in this regard is the flagship IBM/370 system.

I/O Channel

The IOP in the IBM 370 system is commonly called a channel. A typical computer system may have a number of channels, and each channel may be attached to one or more similar or different types of I/O devices through I/O modules. Three types of channels are in common use: a selector channel, a multiplexor channel, and a block multiplexor channel (a hybrid of features of both the multiplexor and selector channels). The interface being used from an I/O module (channel) to a device (i.e. a device controller along with a device) is either a serial or a parallel. Although a parallel interface is traditionally a common choice for high-speed devices, but with the emergence of next-generation advanced high-speed serial interfaces, parallel interfaces have eventually lost their inherent importance to a considerable extent, and hence, are found to be much less common. However, the I/O channel is best implemented in the mainframe system, and possibly the well-known example in this regard is the flagship IBM/370 system.

A brief detail of I/O channel along with its different types, and its implementation in IBM/370 system, is given with figure in the website: http://routledge.com/9780367255732.

I/O Processor (IOP) And Its Organisation

The I/O channel has been finally promoted to a full-fledged IOP using a mechanism (already described in the "Introduction", Section 5.3.4) to make the CPU almost totally free from any I/O activities. The handshaking between the CPU and IOP at the time of establishing communications may take different forms depending mostly on the particular configuration of the computer being used. However, the memory unit in most cases acts as a mediator providing message centre (input-output communication region (IOCR)) facilities where each processor leaves some information for the IOP to follow. This is one form of indirect handshaking. The direct handshaking between CPU and IOP is generally done through dedicated control lines. Standard DMA or bus grant/acknowledge lines are also used for arbitration of the system bus between these two processors. However, Figure 5.9 illustrates here a schematic block diagram of a representative system containing an IOP. A sequence of steps is then required to be followed at the time of CPU-IOP interaction and communication for needed information exchange.

During IOP operation, the CPU is free and executes its own tasks with other programs. There may be a situation when the CPU and the IOP both compete with each other to get simultaneous memory access, and hence, the IOP is often restricted to have only a limited number of devices so that the number of memory accesses can be minimized. In the case of operation of a slower device, this may even lead to a situation of memory-access saturation, since I/O operations use DMA, and the CPU may then have to wait during this transfer, which may cause a notable degradation in CPU performance.

All the Intel CPUs have explicit I/O instructions to read or write bytes, words, or longs. These instructions specify the I/O port number desired, either directly as a field with the

FIGURE 5.9

Block diagram of a representative system containing an IOP.

instruction or indirectly using the register DX to hold it. In addition, of course, DMA chips (Intel 8257/8237) are frequently used to relieve the CPU from handling I/O burden. None of the Motorola chips have I/O instructions. It is expected that the I/O device register will be addressed via memory mapping. Here too, DMA is widely used.

A brief detail of IOP along with its working is given with figure in the website: http://routledge.com/9780367255732.

Page 4

Three fundamental resources of a computer, namely the CPU, memory, and I/O, are interconnected via a collection of parallel wires called a bus. A bus is actually a communication medium connecting two or more resources. A key characteristic of a bus is that it is a shared transmission medium, on which multiple devices can be connected, and a signal issued by any one of these attached devices is available for reception by all other devices attached to the bus. A bus may be of a single line, and a sequence of binary digits (bits) can then be transmitted serially one bit at a time across this single line. A bus may also consist of multiple lines, and these bus lines can be used in parallel to transmit the same binary digits all at a time. For example, a data of 8 bits can be transmitted over an 8-bus line simultaneously. However, a number of varied types of buses were in widespread use from very early days introduced by different vendors, being employed by different systems built with a diverse spectrum of chips. A few notable ones from early days are IBM PC/AT bus, EISA (Intel 80386), Multibus (Intel), VME bus (Motorola 68000 series), Omnibus (PDP-8), Unibus (PDP/11 series), PCI bus, SCSI bus, USB, etc. Standardization in this area seems very unlikely. Recent developments in modern bus technologies, however, give rise to the introduction of much faster and more versatile buses, such as FireWire serial bus and InfiniBand serial bus.

Bus Structure

A bus that connects major computer components such as CPU, memory, and I/O is called a system bus. It usually consists of, typically, 50-100 parallel copper lines providing different functions etched onto the motherboard, with connections spaced at regular intervals for plugging in memory, I/O, and other add-on cards. Although there are many different bus designs, on any bus, these lines can be classified into three basic functional groups: data lines, address lines, and control lines. In addition, there may be power distribution lines that supply power to the attached modules. Each of these groups, however, carries various types of signals indicating respective operations to be performed.

Bus Arbitration

Normally, in a computer, the CPU is the bus master having control of the bus most of the time. In reality, an I/O module may become bus master while reading or writing directly to memory. A co-processor may also need to become bus master at a certain point of time. Since only one unit at a time is allowed to gain control of the bus (to be a bus master), it is thus essential to have some mechanism in place, to prevent the chaos at the time of simultaneous access of the bus by different resources. This mechanism is known as bus arbitration.

Bus Protocol

At the time of designing a resource (e.g. a processor), designers have the liberty to use any kind of bus they want inside the chip, but the situation is different when a third party designs a circuit board in which the said resource will be placed on the system bus. So, they must abide by certain well-defined rules concerning bus operation, and all the characteristics of the devices that would be attached to it, as well as other areas relating to bus usage. These rules are called bus protocol.

Bus Design Parameters

Although a variety of different bus implementation techniques exist, there are a few basic parameters or design elements that serve to classify and differentiate buses. The key elements to be considered are as follows:

a. Type: This element may be (i) dedicated and (ii) multiplexed (time multiplexing);

b. Method of arbitration: This element may be (i) centralized and (ii) distributed;

c. Timing: This element may be (i) synchronous and (ii) asynchronous;

d. Bus width: This element may be (i) address bus and (ii) data bus;

e. Data transfer type: This element may be (i) read, (ii) write, (iii) read-modify-write, (iv) read-after-write, and (v) block.

Bus Interfacing: Tri-State Devices

A bus line represents a logic path with a potentially very large fan-in and fan-out having many devices of different types connected to it. One of these devices is active (master) at any instant and can initiate bus transfers, whereas others are passive (slaves) and wait for requests. When the CPU orders a disk controller to read or write a block, the CPU is acting as a master and the disk controller is acting as a slave. However, later on, the disk controller may act as a master when it commands the memory (slave) to accept the words it is reading from the disk drives. Several combinations of such master/slave relationships may feature with resources. Some typical configurations are shown in Table 5.1. It is interesting to note that in no situation can memory ever be found to be a bus master.

The binary signals that computer resources often yield are mostly not strong enough to power a bus, especially, if the bus is relatively long or has many devices on it. For this reason, most bus masters are connected to the bus by a chip called a bus driver, which is essentially a digital amplifier. Similarly, most slaves are connected to the bus by a bus receiver. For devices that can act as both master and slave, a combined chip called a bus transceiver is used. These bus interface chips are often tri-state devices to allow them to float (disconnect) when they are not needed, or are hooked up in a somewhat different way called open collector, which provides the desired effect. Tri-state devices are those which can output 0, 1, or none of these, z (open circuit). When two or more devices on an open collector line assert the line at the same time, the result is the Boolean OR of all the signals. This arrangement is often called wired-OR. On most buses, some of the lines are tri-state and others which need the wired-OR property are open collector. If two or more resources want to become bus master at the same time, the bus arbitration is required to prevent this chaos by way of promptly making or breaking the connection of the competing resources with the

TABLE 5.1

Different Combinations of Master/Slave Configuration

Master	Slave	Example
CPU	Memory	Fetching instruction and data
CPU	I/O	Initiating data transfer
CPU	Co-processor	Handling of floating-point instructions
I/O	Memory	DMA
Co-processor	Memory	Fetching operands

bus using a tri-state device (non-inverting buffer) when it is required. Tri-state devices when used in the design of shared buses exhibit many distinct advantages, and all these have been already discussed in detail in Chapter 2.

A brief detail of bus organisation using tri-state buffer is given with figure in the website: http://routledge.com/9780367255732.

Some Representative Bus Systems of Early Days

Some notable and worthy bus systems introduced by reputed giant vendors of early days with their own specifications are IBM PC/AT Bus (for PC with Intel 80286), IBM PS/2, The EISA (for PC with Intel 80386 and onwards), Intel MULTIBUS (Multibus I and Multibus II), Motorola VME bus, DEC PDP UNIBUS, DEC VAX SBI, and MASSBUS.

A more detailed description of each of these is, however, given in the website: http://routledge.com/9780367255732.

PCI (Peripheral Component Interconnect): Local Bus

New approaches in the bus design gradually evolved, with the emergence of PCI, to make full use of the constantly emerging advanced technology, and also to fulfil the ever- increasing needs of the computing environment. It was developed by Intel (1992) for the Pentium processor as a truly processor-independent, low-cost, and high-performance local bus with a bus speed up to 133 MHz for interconnecting chips, expansion boards, and processor/memory subsystems, thereby replacing earlier bus architectures such as EISA,VL, and Micro Channel. Devices connected to the PCI bus appear to the processor as if they were directly connected to the processor bus, and they are assigned addresses in the memory address space of the processor. PCI provides a general-purpose set of functions, and the design was able to accommodate emerging high-speed devices (like disks) as well as to support both single and specialized needs of multiple-processor-based systems having a variety of microprocessor-based configurations. PCI was later adopted as an industry standard administered by the PCI Special Interest Group ("PCI SIG"), extending its definition to also characterize it a standard expansion bus interface connector for add-in boards, and as such, it is summarily still maintaining its position as an industry standard even after about two decades since its inception.

A schematic diagram of a typical use of the PCI bus with its interconnection to different resources in a single-processor system is shown in Figure 5.10a. Here, a combined DRAM controller and bridge to the PCI bus provides a tight coupling with the processor that can deliver data at high speeds. The bridge acts as a data buffer that negotiates the speed disparity between the PCI bus and the processor's I/O capability. In a multiprocessor system, the main system bus supports processors, caches, main memory, and the PCI bridges. These PCI bridges may be connected with one or more PCI configurations on the other side, as shown in Figure 5.10b. The presence of the bridge not only keeps the PCI bus independent of the processor speed as usual, but also provides a rapid transmission of data. In addition, a PCI-to-PCI bridge mechanism has been defined by PCI SIG where bridges are ASICs (application-specific integrated circuits) that electrically isolate two PCI buses while allowing bus transfers to be forwarded from one bus to another. Each bridge device has a primary PCI bus and a secondary PCI bus. Multiple bridge devices may be cascaded to create a system with many PCI buses. In some processors, like Compaq Alpha, the PCI processor bridge circuit is built on the processor chip itself, thereby further simplifying system design and packaging.

FIGURE 5.10

A schematic representation of PCI configuration: (a) use of PCI bus in a typical stand-alone desktop system and (b) use of PCI bus in a typical server system.

One of the salient features that the PCI pioneered is its plug-and-plai/ capability for connecting I/O devices. When a new device needs to be added to the system, it requires the device interface board only to be plugged in to one of the slots connected with the bus provided in the motherboard. The software then entirely bears the rest of the responsibilities for its operation. PCI, however, requires very few chips to configure it as a 32-bit or a 64-bit bus, and easily supports other buses attached with it. PCI interface is available over a wide range of I/O devices and is in use in systems based on many other processor families, including SUN.

Some of the key activities relating to the major characteristics of PCI bus are described one after another in brief in the website: http://routledge.com/9780367255732.