Operating system,Security and Networking: Distributed Applications

We can use any of the following two approaches in designing a distributed application.

Communication-Oriented Design: Begin with the communication protocol. Design a message format and syntax. Design the client and server components by specifying how each reacts to in-coming messages and how each generates outgoing messages.
Application-Oriented Design: Begin with the application. Design a conventional application program to solve the problem. Build and test a working version of the conventional program into two or more pieces, and add communication protocols that allow each piece to execute on a separate computer.

Semantics of Applications:

Normal Application: A main program which may call procedures defined within the program (proc A in this case). On return from this procedure the program continues. This procedure (proc A) may itself call other procedures (proc B in this case). Refer to the figure below:

Distributed Application: A client program executing on a machine 1 may call a procedure (proc A) which is defined and run on another machine ( we say server for machine 1 is machine 2) upon return from the call the program on machine 1 continues. The server program on machine 2 may in turn act as a client and call procedures on another machine 3 (now machine 3 is a server for machine 2). Refer to the figure below:

Passing Arguments in Distributed Programs:

Problem: Incompatibility in argument storage

For example, some machines may use 7 bit for storing characters while some others might use 8 bit, some machines may use Big-Indian representation while others might use Small-Indian representation.

Possible Solutions:

One solution may be to find out the architecture of receiving end, convert the data to be sent to that architecture and then send the data. However, this will lead to following problems:

It is not easy to find out the architecture of a machine.
If I change the architecture of my machine then this information has to be conveyed to the client.

Another solution is to have a standard format for networks. This may lead to inefficiency in the case when the two communicating machines have the same architecture because in this case the conversion is unnecessary.

XDR (External Data Representation):

XDR was the solution adopted by SUN RPC. RPC was mainly the outcome of the need for distributed file systems(NFS).

Buffer Paradigm:

The program allocates a buffer large enough to hold the external representation of a message and adds items one at a time. The library routine invoked to allocate space for the buffer is xdr_mem_create . After allocating space we may append data to this buffer using various conversion library routines like xdr_int (xdr_int coverts an integer to it's external representation and appends it to the buffer) to convert native objects to external representation and then append to the buffer. After all the data to be passed has been converted and appended we send the buffer.

ASN.1

First add the information related to the the data being sent to the buffer and then append the data to the buffer. For example, to send a character followed by an integer (if the sending machine uses one byte for char and two bytes for integers) we send the information as - one byte char, two byte integer ...

The routines for encoding and decoding are the same, depending on the type of the buffer which may be (specified at the time fo allocating space for the buffer) XDR_ENCODE or XDR_DECODE encoding or decoding are performed respectively.
For the routine xdr_int(xdrs, &i)

If the allocation was done as xdr_mem_create(xdrs, buf, BUFSIZE, XDR_ENCODE) then the value obtained by converting i to its external representation would be appended to the buffer.
If the allocation was done as xdr_mem_create(xdrs, buf, BUFSIZE, XDR_DECODE) then an integer will be extracted , decoded , and the value will be stored in the variable i.

There are routines (like xdr_stdin_create) to write/read from sockets and file descriptors.

Applications:

FTP:

Given a reliable end-to-end transport protocol like TCP, File Transfer might seem trivial. But, the details authorization, representation among heterogeneous machines make the protocol complex.

FTP offers many facilities :

Interactive Access : Most implementations provide an interactive interface that allows humans to easily interact with remote servers.
Format (representation) specification : FTP allows the client to specify the type and format of stored data.
Authentication Control : FTP requires client to authorize themselves by sending a login name and password to the server before requesting file transfers.

FTP Process Model

FTP allows concurrent accesses by multiple clients. Clients use TCP to connect to the server. A master server awaits connections and creates a slave process to handle each connection. Unlike most servers, the slave process does not perform all the necessary computation. Instead the slave accepts and handles the control connection from the client, but uses an additional process to handle a separate data transfer connection. The control connection carries the command that tells the server which file to transfer.

Data transfer connections and the data transfer processes that use them can be created dynamically when needed, but the control connection persists throughout a session. Once the control connection disappears, the session is terminated and the software at both ends terminates all data transfer processes.
In addition to passing user commands to the server, FTP uses the control connection to allow client and server processes to coordinate their use of dynamically assigned TCP protocol ports and the creation of data transfer processes that use those ports.

Proxy commands - allows one to copy files from any machine to any other arbitrary machine ie. the machine the files are being copied to need not be the client but any other machine.

Sometimes some special processing can be done which is not part of the protocol. eg. if a request for copying a file is made by issuing command 'get file_A.gz' and the zipped file does not exist but the file file_A does , then the file is automatically zipped and sent.

Consider what happens when the connection breaks during a FTP session. Two things may happen, certain FTP servers may again restart from the beginning and whatever portion of the file had been copied is overwritten. Other FTP servers may ask the client how much it has already read and it simply continues from that point.

TFTP:

TFTP stands for Trivial File Transfer Protocol. Many applications do not need the full functionality of FTP nor can they afford the complexity. TFTP provides an inexpensive mechanism that does not need complex interactions between the client and the server. TFTP restricts operations to simple file transfer and does not provide authentication. Diskless devices have TFTP encoded in read-only memory(ROM) and use it to obtain an initial memory image when the machine is powered on. The advantage of using TFTP is that it allows bootstrapping code to use the same underlying TCP/IP protocols. that the operating system uses once it begins execution. Thus it is possible for a computer to bootstrap from a server on another physical network. TFTP does not have a reliable stream transport service. It runs on top of UDP or any other unreliable packet delivery system using timeout and retransmission to ensure that data arrives. The sending side transmits a file in fixed size blocks and awaits acknowledgements for each block before sending the next.

Rules for TFTP:

The first packet sent requests file transfer and establishes connection between server and client. Other specifications are file name and whether it is to be transferred to client or to the server. Blocks of the file are numbered starting from 1 and each data packet has a header that specifies the number of blocks it carries and each acknowledgement contains the number of the block being acknowledged. A block of less than 512 bytes signals end of file. There can be five types of TFTP packets . The initial packet must use operation codes 1 or 2 specifying either a read request or a write request and also the file name. Once the read request or write request has been made the server uses the IP address and UDP port number of the client to identify subsequent operations.Thus data or ack msgs do not contain file name. The final message type is used to report errors.
TFTP supports symmetric re-transmission. Each side has a timeout and re-transmission.If the side sending data times out, then it re-transmits the last data block. If the receiving side times out it re-transmits the last acknowledgement. This ensures that transfer will not fail after a single packet loss.
Problem caused by symmetric re-transmission - Sorcerer's Apprentice Bug

When an ack for a data packet is delayed but not lost then the sender retransmits the same data packet which the receiver acknowledges. Thus both the acks eventually arrives at the sender and the sender now transmits the next data packet once corresponding to each ack. Therefore a retransmission of all the subsequent packets are triggered . Basically the receiver will acknowledge both copies of this packet and send two acks which causes the sender in turn to send two copies of the next packet.. The cycle continues with each packet being transmitted twice.
TFTP supports multiple file types just like FTP ie. binary and ascii data. TFTP may also be integrated with email . When the file type is of type mail then the FILENAME field is to be considered as the name of the mailbox and instead of writing the mail to a new file it should be appended to it. However this implementation is not commonly used .

Now we look at another very common application EMAIL

EMAIL (electronic mail - SMTP , MIME , ESMTP ):

Email is the most widely used application service which is used by computer users. It differs from other uses of the networks as network protocols send packets directly to destinations using timeout and re-transmission for individual segments if no ack returns. However in the case of email the system must provide for instances when the remote machine or the network connection has failed and take some special action.Email applications involve two aspects -

User-agent( pine, elm etc.)
Transfer agent( send mail daemon etc.)

When an email is sent it is the mail transfer agent (MTA) of the source that contacts the MTA of the destination. The protocol used by the MTA 's on the source and destination side is called SMTP. SMTP stands for Simple Mail Transfer Protocol.. There are some protocols that come between the user agent and the MTA eg. POP,IMAP which are discussed later.

Mail Gateways -

Mail gateways are also called mail relays, mail bridges and in such systems the senders machine does not contact the receiver's machine directly but sends mail across one or more intermediate machines that forward it on. These intermediate machines are called mail gateways.Mail gateways are introduce un-reliablity. Once the sender sends to first intermediate m/c then it discards its local copy. So failure at an intermediate machine may result in message loss without informing the sender or the receiver. Mail gateways also introduce delays. Neither the sender nor the receiver can determine how long the delay will last or where it has been delayed.

However mail gateways have an advantage providing inter operability ie. they provide connections among standard TCP/IP mail systems and other mail systems as well as between TCP/IP internet's and networks that do not support Internet protocols. So when there is a change in protocol then the mail gateway helps in translating the mail message from one protocol to another since it will be designed to understand both.

Operating system,Security and Networking

Pages

Thursday, August 29, 2013

Distributed Applications