Remote Procedure Call (RPC):-
RPC comes under the Application-Oriented Design, where the client-server communication is in the form of Procedure Calls. We call the machine making the procedure call as client and the machine executing the called procedure as server. For every procedure being called there must exist a piece of code which knows which machine to contact for that procedure. Such a piece of code is called a Stub. On the client side, for every procedure being called we need a unique stub. However, the stub on the server side can be more general; only one stub can be used to handle more than one procedures (see figure ). Also, two calls to the same procedure can be made using the same stub.
Now let us see how a typical remote procedure call gets executed :-
- Client program calls the stub procedure linked within its own address space. It is a normal local call.
- The client stub then collects the parameters and packs them into a message (Parameter Marshaling). The message is then given to the transport layer for transmission.
- The transport entity just attaches a header to the message and puts it out on the network without further ado.
- When the message arrives at the server the transport entity there passes it tot the server stub, which un-marshals the parameters.
- The server stub then calls the server procedure, passing the parameters in the standard way.
- After it has completed its work, the server procedure returns, the same way as any other procedure returns when it is finished. A result may also be returned.
- The server stub then marshals the result into a message and hands it off at the transport interface.
- The reply gets back to the client machine.
- The transport entity hands the result to the client stub.
- Finally, the client stub returns to its caller, the client procedure, along-with the value returned by the server in step 6.
One solution to this can be Copy-in Copy-out. What we pass is the value of the pointer, instead of the pointer itself. A local pointer, pointing to this value is created on the server side (Copy-in). When the server procedure returns, the modified 'value' is returned, and is copied back to the address from where it was taken (Copy-out). But this is disadvantageous when the pointer involved point to huge data structures. Also this approach is not foolproof. Consider the following example ( C-code) :
The procedure 'myfunction()' resides on the server machine. If the program executes on a single machine then we must expect the output to be '4'. But when run in the client-server model we get '3'. Why ? Because 'x, and 'y' point to different memory locations with the same value. Each then increments its own copy and the incremented value is returned. Thus '3' is passed back and not '4'.
Many RPC systems finesse the whole problem by prohibiting the use of reference parameters, pointers, function or procedure parameters on remote calls (Copy-in). This makes the implementation easier, but breaks down the transparency.
Protocol : Another key implementation issue is the
protocol to be used - TCP or UDP. If TCP is used then there may be problem in
case of network breakdown. No problem occurs if the breakdown happens before
client sends its request (client will be notified of this), or after the
request is sent and the reply is not received ( time-out will occur). In case
the breakdown occurs just after the server has sent the reply, then it won't be
able to figure out whether its response has reached the client or not. This
could be devastating for bank servers, which need to make sure that their reply
has in fact reached to the client ( probably an ATM machine). So UDP is
generally preferred over TCP, in making remote procedure calls.
Idempotent Operations:
If the server crashes, in the middle of the computation of a procedure on
behalf of a client, then what must the client do? Suppose it again sends its
request, when the server comes up. So some part of the procedure will be
re-computed. It may have instructions whose repeated execution may give
different results each time. If the side effect of multiple execution of the
procedure is exactly the same as that of one execution, then we call such
procedures as Idempotent Procedures. In general, such operations are
called Idempotent Operations. For e.g. consider ATM banking. If I send a request to withdraw Rs. 200 from my account and some how the request is executed twice, then in the two transactions of 'withdrawing Rs. 200' will be shown, whereas, I will get only Rs. 200. Thus 'withdrawing is a non-idempotent operation. Now consider the case when I send a request to 'check my balance'. No matter how many times is this request executed, there will arise no inconsistency. This is an idempotent operation.
Semantics of RPC :
If all operations could be cast into an idempotent form, then time-out and
re-transmission will work. But unfortunately, some operations are inherently
non-idempotent (e.g., transferring money from one bank account to another ). So
the exact semantics of RPC systems were categorized as follows: - Exactly once : Here every call is carried out 'exactly once', no more no less. But this goal is un-achievable as after a server crash it is impossible to tell that a particular operation was carried out or not.
- At most once : when this form is used control always returns to the caller. If everything had gone right, then the operation will have been performed exactly once. But, if a server crash is detected, re-transmission is not attempted, and further recovery is left up to the client.
- At least once : Here the client stub keeps trying over and over, until it gets a proper reply. When the caller gets control back it knows that the operation has been performed one or more times. This is ideal for idempotent operations, but fails for non-idempotent ones.
- Last of many : This a version of 'At least once', where the client stub uses a different transaction identifier in each re-transmission. Now the result returned is guaranteed to be the result of the final operation, not the earlier ones. So it will be possible for the client stub to tell which reply belongs to which request and thus filter out all but the last one.
SUN RPC Model:
The basic idea behind Sun RPC was to implement NFS (Network
File System). Sun RPC extends the remote procedure call model by defining a
remote execution environment. It defines a remote program at the
server side as the basic unit of software that executes on a remote machine.
Each remote program consists of one or more remote procedures and global data.
The global data is static data and all the procedures inside a remote program
share access to its global data. The figure below illustrates the conceptual
organization of three remote procedures in a single remote program.
Sun RPC allows both TCP and UDP for communication between remote procedures
and programs calling them. It uses the at least once semantic i.e., the remote
procedure is executed at least once. It uses copy-in method of parameter
passing but does not support copy-out style. It uses XDR for data representation.
It does not handle orphans(which are servers whose corresponding clients have
died). Thus if a client gives a request to a server for execution of a remote
procedure and eventually dies before accepting the results, the server does not
know whom to reply. It also uses a tool called rpcgen to generate
stubs automatically.
Thus we see that anything which can be a threat to application programmers, is provided by SUN RPC.
How A Client Invokes A Procedure On Another Host:
The remote procedure is a part of
a program executing in a remote host. Thus we would have to properly locate the
host, the program in it, and the procedure in the program. Each host can be
specified by a unique 32-bit integer. SUN RPC standard specifies that each
remote program executing on a computer must be assigned a unique 32-bit integer
that the caller uses to identify it. Furthermore, Sun RPC assigns a 32-bit
integer identifier for each remote procedure inside a given remote program. The
procedures are numbered sequentially: 1, 2, ...., N. To help ensure that
program numbers defined by separate organizations do not conflict, Sun RPC has
divided the set of program numbers into eight groups.
Thus it seems sufficient that if we are able to locate the host, the program in the host, and the procedure in the program, we would be able to uniquely locate the remote procedure which is to be executed.
Accommodating Multiple Versions Of A Remote Program:
Suppose somebody wants to change
the version of a remote procedure in a remote program. Then as per the
identification method described above, he or she would have to make sure that
the newer version is compatible with the older one. This is a bottleneck on the
server side. Sun RPC provides a solution to this problem. In addition to a
program number, Sun RPC includes a 32-bit integer version number
for each remote program. Usually, the first version of a program is assigned
version 1. Later versions each receive a unique version number.
Version numbers provide the ability to change the details of a remote procedure call without obtaining a new program number. Now, the newer client and the older client are disjoint, and no compatibility is required between the two. When no request comes for the older version for a pretty long time, it is deleted. Thus, in practice, each RPC message identifies the intended recipient on a given computer by a triple:
Thus it is possible to migrate from one version of a remote procedure to another gracefully and to test a new version of the server while an old version of the server continues to operate.
Mapping A Remote Program To A Protocol Port:
At the bottom of every
communication in the RPC model there are transport protocols like UDP and TCP.
Thus every communication takes place with the help of sockets. Now, how does
the client know to which port to connect to the server? This is a real problem
when we see that we cannot have a standard that a particular program on a
particular host should communicate through a particular port. Because the
program number is 32 bit and we can have 232 programs whereas both
TCP and UDP uses 16 bit port numbers to identify communication endpoints. Thus
RPC programs can potentially outnumber protocol ports. Thus it is impossible to
map RPC program numbers onto protocol ports directly. More important, because
RPC programs cannot all be assigned a unique protocol port, programmers cannot
use a scheme that depends on well-known protocol port assignments. Thus, at any
given time, a single computer executes only a small number of remote programs.
As long as the port assignments are temporary, each RPC program can obtain a
protocol port number and use it for communication.
If an RPC program does not use a reserved,
well-known protocol port, clients cannot contact it directly. Because, when the
server (remote program) begins execution, it asks the operating system to
allocate an unused protocol port number. The server uses the newly allocated
protocol port for all communication. The system may choose a different protocol
port number each time the server begins(i.e., the server may have a different
port assigned each time the system boots). The client (the program that issues the remote procedure call) knows the machine address and RPC program number for the remote program it wishes to contact. However, because the RPC program (server) only obtains a protocol port after it begins execution, the client cannot know which protocol port the server obtained. Thus, the client cannot contact the remote program directly.
Dynamic Port Mapping:
To solve the port identification
problem, a client must be able to map from an RPC program and a machine address
to the protocol port that the server obtained on the destination machine when
it started. The mapping must be dynamic because it can change if the machine
reboots or if the RPC program starts execution again.
Whenever a remote program (i.e., a server) begins execution, it allocates a local port that it will use for communication. The remote program then contacts the server on its local machine for registration and adds a pair of integers to the database:
(RPC program number, protocol port number)
Once an RPC program has registered itself, callers on other machines can find its protocol port by sending a request to the server. To contact a remote program, a caller must know the address of the machine on which the remote program executes as well as the RPC program number assigned to the program. The caller first contacts the server on the target machine, and sends an RPC program number. The server returns the protocol port number that the specified program is currently using. This server is called the RPC port mapper or simply the port mapper. A caller can always reach the port mapper because it communicates using the well known protocol port, 111. Once a caller knows the protocol port number the target program is using, it can contact the remote program program directly.
RPC Programming:
RPC Programming can be thought in
multiple levels. At one extreme, the user writing the application program uses
the RPC library. He/she need not have to worry about the communication through
the network. At the other end there are the low level details about network
communication. To execute a remote procedure the client would have to go
through a lot of overhead e.g., calling XDR for formatting of data, putting it
in output buffer, connecting to port mapper and subsequently connecting to the
port through which the remote procedure would communicate etc. The RPC
library contains procedures that provide almost everything required to
make a remote procedure call. The library contains procedures for marshaling
and unmarshaling of the arguments and the results respectively. Different XDR
routines are available to change the format of data to XDR from native, and
from XDR to native format. But still a lot of overhead remains to properly call
the library routines. To minimize the overhead faced by the application
programmer to call a remote procedure a tool named rpcgen is
devised which generates client and server stubs. The stubs are generated
automatically, thus they have loose flexibility e.g., the timeout time, the
number of re-transmissions are fixed. The program specification file is given as
input and both the server and client stubs are automatically generated by
rpcgen. The specification file should have a .x extension attached to it. It
contains the following information:-
- constant declarations ,
- global data (if any),
- information about all remote procedures ie.
- procedure argument type ,
- return type .




No comments:
Post a Comment