A course of lectures on network technologies. RTSP, RTP, UDP and TCP protocols in video surveillance systems

When we, talking on an IP phone, hear the voice of the interlocutor in the receiver, or, using a video conferencing system, communicate with our colleagues and relatives, we exchange a continuous stream of data. When transmitting streaming data such as voice and video over a packet network, it is very important to use mechanisms that would solve the following tasks:

Eliminate the effect of packet loss
Order Restoration and Packet Control
Delay smoothing (jitter)

For these purposes, it was developed RTP(Real-time Transport Protocol) is a real-time transmission protocol, which will be discussed in today's article. The protocol was developed by the IETF by the Audio-Video Transport Working Group and is described in RFC 3550.

As a rule, RTP works on top of UDP (User Datagram Protocol), because when transmitting multimedia data, it is very important to ensure their timely delivery.

RTP includes the ability to determine the type of payload and assign a sequence number of the packet in the stream, as well as the use of timestamps.

On the transmitting side, each packet is marked with a timestamp, the receiving side receives it and determines the total delay, after which the difference in total delays is calculated and jitter is determined. Thus, it becomes possible to set a constant packet output delay and thereby reduce the effect of jitter.

Another function of RTP is associated with possible packet loss while passing through the IP network, which is expressed in the appearance of short pauses in the conversation. Sudden silence in the handset, as a rule, has a very negative effect on the listener, therefore, with the capabilities of the RTP protocol, such periods of silence are filled with so-called “comfort noise”

RTP works in conjunction with another IETF protocol, namely RTCP (Real-time Transport Control Protocol), which is described in RFC 3550. RTCP is designed to collect statistical information, determine the quality of service QoS (Quality of Service), and also to synchronize between media streams of the RTP session.

The main function of RTCP is to establish feedback with the application to report on the quality of the information received. Participants in an RTCP session exchange information about the number of received and lost packets, jitter value, delay, etc. Based on the analysis of this information, a decision is made to change the transmission parameters, for example, to reduce the compression ratio of information in order to improve the quality of its transmission.

To perform these functions, RTCP sends special messages of certain types:

SR - Sender Report - source report with statistical information about RTP session
RR - Receiver Report - a report of the recipient with statistical information about the RTP session
SDES - contains a description of the source options, including cname (username)
BYE – Initiates the end of membership in a group
APP - Description of application functions

RTP is a unidirectional protocol, so two-way communication requires two RTP sessions, one on each side.

An RTP session is defined by the IP addresses of the participants, as well as a pair of unreserved UDP ports from the range 16384 - 32767. In addition, in order to organize feedback with the application, it is also necessary to establish a two-way RTCP session. For RTCP sessions, ports with a number one greater than RTP are occupied. So for example, if port 19554 is selected for RTP, then the RTCP session will take port 19555. Visually, the formation of an RTP/RTCP session is shown in the figure below.

This section discusses some aspects of the transport of RTP packets by network and transport protocols. Unless otherwise specified by the specifications of other protocols, the following basic rules apply when transmitting packets.

RTP relies on lower layer protocols to provide separation of RTP data streams and RTCP control information. For UDP and similar protocols, RTP uses an even port number, and the corresponding RTCP stream uses a port number greater than one.

RTP information packets do not contain any length field, hence RTP relies on the underlying protocol to provide a length indication as well. The maximum length of RTP packets is only limited by lower layer protocols.

Several RTP protocol packets can be transmitted in one lower layer protocol data unit, for example, in a UDP packet. This reduces header redundancy and can simplify synchronization between different streams.

9. List of protocol constants

This section contains a list of constants defined in the RTP protocol specification.

RTP (PT - payload type) traffic type constants are defined in profiles. However, the RTP header octet, which contains the marker bit(s) and the traffic type field, must not contain the reserved values 200 and 201 (decimal) to distinguish RTP packets from RTCP SR and RR packets. For a standard format with one marker bit and a seven-bit traffic type field, this restriction means that traffic types 72 and 73 should not be used.

Values of RTCP packet types (see Table 1) are chosen in the range from 200 to 204 to better control the correctness of the RTCP packet header when compared with RTP packets. When the RTCP packet type field is compared with the corresponding octet of the RTP header, this range corresponds to a marker bit of one (which is not normally the case in information packets) and the most significant bit of the standard traffic type field of one (whereas statically defined traffic types usually have PT values with a zero in the most significant digit). This range was also chosen to be more distant from the values 0 and 255, since fields consisting entirely of zeros or ones are mostly characteristic of the data.

Other types of RTCP packets are defined by the IANA Community. Developers have the ability to register the values they require for experimental research and then unregister when the need for those values is no longer needed.

Valid types of items in the SDES package are presented in Table. 2. Other SDES item types are assigned by the IANA Community. Developers have the ability to register the values they need when performing experimental studies and then unregister when those values are no longer needed.

10. Description of the traffic profile and format

As noted above (see Section 2), for full description RTP protocol for a specific application requires additional documents of two types: profile description and traffic format.

RTP can be used for many classes of applications with widely differing requirements. Flexibility to adapt to these requirements is ensured by using different profiles (see ). Typically an application uses only one profile, and no explicit indication of which profile is in this moment in use, no.

An optional document of the second type, the traffic format specification, defines how a particular type of traffic (eg H.261 encoded video) should be transmitted according to RTP. The same traffic format may be used for multiple profiles and may therefore be defined independently of the profile. Profile documents are only responsible for matching this format to the PT value .

The profile description may define the following items, but this list is not exhaustive.

Header of the RTP data packet. The octet in the header of the RTP data packet, which contains the token bit and the traffic type field, can be redefined according to the profile to meet different requirements, for example to provide more or less token bits (Section 3.3).

traffic types. A profile typically defines a set of traffic formats (eg, media encoding algorithms) and a default static mapping of these formats and PT values. Some of the traffic formats may be defined by reference to individual traffic format descriptions. For each defined type of traffic, the profile must specify the required RTP timestamp clock rate to use (Section 3.1).

RTP data packet header additions. If some additional functionality within a profile application class that does not depend on the type of traffic, then additional fields can be attached to the fixed header of the RTP data packet .

RTP data packet header extensions. The contents of the first 16 bits of the RTP Data Packet Header Extension structure shall be specified if the use of this mechanism is allowed by the profile. .

Types of RTCP packets. New, application-class specific types of RTCP packets may be defined (and registered with IANA).

RTCP reporting interval. The profile must define the values used in calculating the RTCP reporting interval: the RTCP session bandwidth fraction, the minimum reporting interval, and the bandwidth split between senders and receivers.

SR/RR package extension. If available Additional Information about a sender or receiver that is to be transmitted regularly, an extension section can be defined for RTCP SR and RR packets.

Using SDES. The profile may define relative priorities for RTCP SDES items to be transmitted or excluded (see section 4.2.2); alternative syntax or semantics for a CNAME clause (Section 4.4.1); LOC item format (Section 4.4.5); the semantics and use of the NOTE clause (Section 4.4.7) and the new SDES clauses to be registered with IANA.

Safety. A profile may define which security services and algorithms applications should use and may provide control over their use (clause 7).

Password-key matching. The profile can determine how the password entered by the user is converted into an encryption key.

The underlying protocol. The transmission of RTP packets may require the use of a particular underlying network or transport layer protocol.

Transport Compliance. Other than the standard mapping of RTP and RTCP to transport layer addresses specified in section 8, such as UDP ports, may be defined.

Encapsulation. RTP packet shaping may be defined to allow multiple RTP information packets to be transmitted in a single underlying protocol data unit (section 8).

Each application you develop should not require a new profile. It is more expedient to expand an existing profile within the same class of applications, rather than create a new one. This will make it easier for applications to interact, since each application typically runs under only one profile. Simple extensions, such as defining additional PT values or RTCP packet types, can be done by registering them with IANA and publishing their descriptions in a profile specification or traffic format specification.

11. RTP profile for audio and video conferencing with minimal control

RFC 1890 describes a profile for using the RTP version 2 real-time transport protocol and its associated RTCP control protocol within a group audio or video conference, the so-called RTP Profile for Audio and Video Conferences (RTP Profile for Audio and Video Conferences). with Minimal Control). This profile defines aspects of RTP not specified in the RTP protocol version 2 specification (RFC 1889). Minimum control means that no support for parameter negotiation or membership control is required (eg, when using static traffic type mappings and membership indications provided by RTCP). Consider the main provisions this profile.

11.1. RTP and RTCP packet formats and protocol parameters

This section contains a description of a number of items that can be defined or modified in a profile.

The header of the RTP information packet. The standard fixed header format of RTP information packets (one bit of marker) is used.

traffic types. Static values for traffic types are defined in sections 11.3 and 11.4.

RTP Information Packet Header Extensions. No additional fixed fields are attached to the RTP information packet headers.

RTP Information Packet Header Extensions. No RTP information packet header extensions are defined, but applications using this profile MAY use such extensions. That is, applications should not assume that the X bit of the RTP header is always zero. Applications must be prepared to ignore header expansion. If a header extension is defined in the future, then the contents of the first 16 bits must be specified so that many different extensions can be identified.

Types of RTCP packets. No additional RTCP packet types are defined in this profile specification.

RTCP reporting interval. When calculating the RTCP reporting interval, the constants proposed in RFC 1889 shall be used.

SR/RR package extensions. Extensions for RTCP SR and RR packets are not defined.

Using SDES. Applications can use any of the described SDES clauses. While the canonical name (CNAME) information is sent in every reporting interval, the other items need only be sent in every fifth reporting interval.

Safety. The default RTP security services are also defined by default by this profile.

Password-key matching. The password entered by the user is converted using the MD5 algorithm into a 16-octet digest. An N-bit key is obtained from the digest by using its first N bits. The password is intended to include only ASCII letters, numbers, hyphens, and spaces to reduce the possibility of corruption when transmitting passwords by phone, fax, telex, or email. The password may be preceded by an encryption algorithm specification. Any characters up to the first forward slash (ASCII code 0x2f) are taken as the name of the encryption algorithm. If there is no forward slash, then the default encryption algorithm is DES-CBC.

The password entered by the user is converted to its canonical form before the closing algorithm is applied. To do this, the password is converted to the ISO 10646 character set using UTF-8 encoding as defined in Annex P of ISO/IEC 10646-1:1993 (ASCII characters do not require any conversion); spaces are removed at the beginning and end of the password; two or more spaces are replaced with one space (ASCII or UTF-8 0x20); all letters are converted to lowercase letters

the underlying protocol. The profile defines the use of RTP over UDP in bidirectional and multicast mode.

Transport Compliance. The standard mapping of RTP and RTCP to transport layer addresses is used.

Encapsulation. Encapsulation of RTP packets is not defined.

11.2. Registering traffic types

This profile defines the standard encoding types used with RTP. Other encoding types must be registered with IANA before use. When registering a new coding type, the following information must be provided:

coding type convention name and RTP timestamp clock frequency (the convention names should be three or four characters long to provide a compact representation, if necessary);
an indication of who has the right to change the encoding type (for example, ISO, CCITT/ITU, other international standards organizations, a consortium, a particular company or group of companies);
any operating parameters;
links to available descriptions of the encoding algorithm, such as (in order of preference) RFC, published article, patent registration, technical report, codec source code, or reference book;
for private encoding types, contact information (postal address and e-mail address);
value to indicate the type of traffic of this profile, if necessary (see below).
Note that not all encoding types to be used with RTP need to be statically assigned. To establish a dynamic mapping between a traffic type (PT) value in the range of 96 to 127 and an encoding type, "non-RTP means" not covered in this article can be used.
The available space of values for traffic types is quite small. New traffic types are assigned statically (permanently) only if the following conditions are met:
coding is highly interested in the community Internet networks;
it offers benefits comparable to existing encodings and/or is required for interoperability with existing, widely used conferencing or multimedia systems;
the description is enough to create a decoder.

11.3. Audio coding

For applications that do not send packets during pauses, the first burst of active speech (the first packet after the pause) is distinguished by setting the marker bit in the header of the RTP information packet to one. Applications without silence suppression set this bit to zero.

The RTP clock used when generating the RTP timestamp is independent of the number of channels and coding type; it is equal to the number of sampling periods per second. For N-channel coding (stereo, quad, etc.), each sampling period (say 1/8000 second) generates N samples. The total number of samples generated per second is equal to the product of the sample rate and the number of channels.

When using multiple audio channels, they are numbered from left to right, starting with the first. In RTP audio packets, data from lower-numbered channels precedes data from higher-numbered channels. For more than two channels, the following notation is used:

l - left;
r - right;
c - central;
S - peripheral;
F - frontal;
R - back.

Number of channels	System name	Channel numbers
Number of channels	System name	1	2	3	4	5	6
2	stereo	l	r
3		l	r	c
4	quad	fl	Fr	Rl	Rr
4		l	c	r	S
5		fl	Fr	Fc	Sl	Sr
6		l	lc	c	r	rc	S

The samples of all channels belonging to the same sampling moment must be within the same packet. The interleaving of samples from different channels depends on the type of coding.

The sampling frequency should be selected from the set: 8000, 11025, 16000, 22050, 24000, 32000, 44100 and 48000 Hz (computers Macintosh Apple have their own sampling frequencies 22254.54 and 11127.27, which can be transformed into 22050 and 11025 s acceptable quality by skipping four or two samples in a 20-ms frame). However, most audio coding algorithms are defined for a more limited set of sample rates. Receivers must be prepared to receive multi-channel audio, but can also select mono.

For audio packetization, the default packetization interval shall be 20 ms unless specified otherwise in the encoding description. The packetization interval defines the minimum end-to-end delay. Longer packets have a relatively smaller portion of bytes for the header, but they cause more delay and make packet loss more significant. For non-interactive applications such as lectures or channels with significant bandwidth constraints, a higher packetization delay may be acceptable. The recipient must receive packets with a sound signal with a delay of 0 to 200 ms. This limit ensures an acceptable buffer size for the receiver.

In sample-based encodings, each signal sample is represented by a fixed number of bits. Within compressed audio data, individual sample codes may cross octet boundaries. The duration of the signal transmitted in the audio packet is determined by the number of samples in the packet.

For sample-based encoding types producing one or more octets per sample, samples from different channels sampled simultaneously are packed into adjacent octets. For example, for stereo encoding, the sequence of octets is: left channel, first sample; right channel, first count; left channel, second count; right channel, second sample, etc. In multi-octet encoding, the most significant octet is transmitted first. The packing of sample-based encodings producing less than one octet per sample is determined by the encoding algorithm.

The frame-based coding algorithm converts a fixed length audio block into another compressed data block, usually also of a fixed length. For frame-based encodings, the sender may combine several such frames into a single message.

For frame-based codecs, the channel order is defined for the whole block. That is, for stereo audio, the samples for the left and right channels are encoded independently; wherein the coding frame for the left channel precedes the frame for the right channel.

All frame-oriented audio codecs must be able to encode and decode multiple consecutive frames transmitted within a single packet. Since the frame size for frame-oriented codecs is specified, there is no need to use a separate notation for the same encoding but with a different number of frames per packet.

In table. 3 shows the values of traffic types (PT) defined by this profile for audio signals, their conventions and main specifications coding algorithms.

11.4. Video encoding

In table. 4 shows the values of coding types (PT), symbols of coding algorithms and technical characteristics of video coding algorithms defined by this profile, as well as unassigned, reserved and dynamically assigned PT values.

Traffic type values in the range 96 to 127 can be determined dynamically through the conference control protocol, which is not covered in this article. For example, the session directory may specify that, for a given session, traffic type 96 denotes PCMU coding, dual channel at 8000 Hz. The range of traffic type marked "reserved" is not used so that RTCP and RTP protocol packets can be reliably distinguished .

An RTP source only emits one type of traffic at any given time; interleaving of different types of traffic in one RTP session is not allowed. Multiple RTP sessions can be used in parallel to carry different types of traffic. The traffic types defined in this profile refer to either audio or video, but not both. However, it is possible to define combined traffic types that combine, for example, audio and video, with appropriate separation in the traffic format.

Audio applications using this profile must, at a minimum, be able to send and receive traffic types 0 (PCMU) and 5 (DVI4). This allows interoperability without format negotiation.

11.5. Port Assignment

As defined in the RTP protocol description, RTP data must be transmitted on an even numbered UDP port, and corresponding RTCP packets must be transmitted on a port number greater than one (odd number).

Applications running with this profile may use any such pair of UDP ports. For example, a pair of ports may be randomly assigned by the session management program. A single fixed pair of port numbers cannot be given because in some cases multiple applications using this profile must run correctly on the same host, and some operating systems do not allow multiple processes to use the same UDP port with different multicasts. addresses.

However, the default port numbers can be 5004 and 5005. Applications that use multiple profiles can choose this pair of ports as the indicator of that profile. But applications may also require that the port pair be explicitly specified.

12. List of used terms and abbreviations

ASCII (American Standard Code for Information Interchange) is the American standard code for information interchange. A seven-bit code for representing textual information, used with some modifications in most computing systems
CBC (cipher block chaining) - a chain of encrypted blocks, DES data encryption standard mode
CELP (code-excited linear prediction) - a type of audio coding using code-excited linear prediction
CNAME (canonical name) - canonical name
CSRC (contributing source) - included source. The source of the RTP packet stream that contributed to the combined stream produced by the RTP mixer. The mixer inserts into the header of the RTP packet a list of SSRC identifiers of those sources that participated in the formation of this packet. This list is called the CSRC list. Example: the mixer transmits the identifiers of the currently speaking teleconference participants whose voice sounds were mixed and used in the creation of the outgoing packet, pointing the recipient to the current source of messages, even if all sound packets contain the same SSRC identifier (such as the mixer)
DES (Data Encryption Standard) - data encryption standard
IANA (Internet Assigned Numbers Authority) - Internet Assigned Numbers Authority
IMA (Interactive Multimedia Association) - Interactive Multimedia Association
IP (Internet Protocol) - internet protocol, network layer protocol, datagram protocol. Allows packets to cross multiple networks on their way to their destination
IPM (IP Multicast) - multicast using the IP protocol
LD-CELP (low-delay code excited linear prediction) - a speech coding algorithm using code-excited linear prediction with low delay
LPC (linear predictive encoding) - linear prediction coding
NTP (Network Time Protocol) - a network time protocol, is a countdown in seconds relative to zero hours on January 1, 1900. The full NTP timestamp format is a 64-bit unsigned fixed-point number with an integer part in the first 32 bits and a fractional part in the last 32 bits. In some cases, a more compact representation is used, in which only the middle 32 bits are taken from the full format: the low 16 bits of the integer part and the high 16 bits of the fractional part
RPE/LTP (residual pulse excitation/long term prediction) - speech signal coding algorithm with differential pulse excitation and long-term prediction
RTCP (Real-Time Control Protocol) - real-time communication control protocol
RTP (Real-Time Transport Protocol) - real-time transport protocol
SSRC (synchronization source) - synchronization source. The source of the RTP packet stream, identified by the 32-bit numeric SSRC identifier that is carried in the RTP header, regardless of the network address. All packets with the same timing source use the same timing interval and the same sequence number space, so that the receiver groups the packets for playback using the timing source. Synchronization source example: The sender of a stream of packets received from a signal source such as a microphone, video camera, or RTP mixer. The synchronization source may change the data format after some time, for example, audio coding. The SSRC ID is a randomly selected value that is considered globally unique within a particular RTP session. A teleconference participant is not required to use the same SSRC identifier for all RTP sessions in a multimedia session; SSRC ID aggregation is provided through the RTCP protocol. If a participant generates multiple streams in one RTP session, for example from multiple cameras, then each stream must be identified by a separate SSRC
TCP (Transmission Control Protocol) is a transport layer protocol used in conjunction with the IP protocol
UDP (User Datagram Protocol) is a transport layer protocol without establishing a logical connection. UDP only provides for sending a packet to one or more stations on the network. Checking the correctness and ensuring the integrity (assured delivery) of data transmission is carried out at a higher level
ADPCM - adaptive differential pulse code modulation
jitter (jitter) - jitter, deviations of the phase or frequency of the signal; in relation to IP telephony - datagram delay irregularities in the network
ZPD - data transmission link (the second level of the Reference model of interaction of open systems)
IVS - informational computer networks
mixer (mixer) - an intermediate system that receives RTP packets from one or more sources, possibly changes the data format, combines the packets into a new RTP packet and then transmits it. Since multiple signal sources are generally out of sync, the mixer corrects the timing of the component streams and generates its own timing for the combined stream. Thus, all data packets generated by the mixer are identified as having the mixer as their clock source.
monitor (monitor) - an application that receives RTCP packets sent by RTP session participants, in particular, reception reports, and evaluates the current quality of service for distribution control, error detection and long-term statistics. Normally, the functions of a monitor lie with the applications used in the session, but the monitor can also be a separate application that is not otherwise used, sending, or receiving RTP information packets. Such applications are called third party monitors.
ITU-T - Telecommunication Standardization Sector of the International Telecommunication Union
end system - an application that generates the content transmitted in RTP packets and/or that consumes the content of received RTP packets. An end system may act as one or more (but usually only one) clock sources in each RTP session.
RTCP packet - a control packet consisting of a fixed header part, similar to the headers of RTP protocol information packets, followed by structural elements that change depending on the type of RTCP packet. Typically, multiple RTCP packets are transmitted together as a multiple RTCP packet in a single underlying protocol packet; this is provided by the length field in the fixed header of each RTCP packet
RTP packet - A protocol data unit consisting of a fixed RTP header, possibly an empty list of sources to include, an extension, and traffic. Typically, one underlying protocol packet contains one RTP packet, but there may be several
port is an abstraction used by transport layer protocols to distinguish between multiple destinations within a single host computer. The port is identified by its number. Thus, the port number is a number that identifies the specific application to which the forwarded data is intended. This number, along with information about which protocol (for example, TCP or UDP) is used at the upper layer, is contained among other service information in datagrams sent over the Internet. Transport selectors (TSELs) used by the transport OSI layer, are equivalent to ports
profile (profile) - a set of parameters of the RTP and RTCP protocols for a class of applications, which determines the features of their functioning. The profile defines the use of the marker bit and traffic type fields in the RTP data packet header, traffic types, RTP data packet header extensions, the first 16 bits of the RTP data packet header extension, RTCP packet types, RTCP reporting interval, SR/RR packet extension, use SDES packets, services and algorithms for ensuring communication security and features of using the underlying protocol
RTP session (RTP session) - communication of multiple participants interacting through the RTP protocol. For each participant, a session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). The destination transport address pair may be common to all participants (as in the case of IPM) or may be different for each (an individual network address and a common pair of ports, as in bidirectional communication). In a multimedia session, each type of traffic is carried in a separate RTP session with its own RTCP packets. Multicast RTP sessions are distinguished by different port pair numbers and/or different multicast addresses
non-RTP means - Protocols and mechanisms that may be needed in addition to RTP to provide an acceptable service. Particularly for multimedia conferencing, a conference control application may distribute multicast addresses and encryption keys, negotiate the encryption algorithm to be used, and determine dynamic mappings between RTP traffic type values and the traffic formats they represent (formats that do not have a predefined value). type of traffic). For simple applications can also be used Email or conference database
translator (translator) - an intermediate system that forwards RTP packets without changing the identifier of the synchronization source. Examples of translators: devices that transcode without mixing, multi-way or bi-directional replicators, application layer applications in firewalls
transport address - A combination of network address and port number that identifies a transport endpoint, such as an IP address and a UDP port number. Packets are transmitted from the source transport address to the destination transport address
RTP traffic - multimedia data transmitted in an RTP protocol packet, such as audio samples or compressed video data
PSTN - Public Switched Telephone Networks

One of the most important trends in the evolution of modern telecommunications is the development of IP-telephony - a set of new technologies that ensure the transmission of multimedia messages (voice, data, video) through information and computer networks (ICNs) built on the basis of the IP (Internet Protocol) protocol, in including local, corporate, global computer networks and the Internet. The concept of IP telephony includes Internet telephony, which allows organizing telephone communication between Internet subscribers, between subscribers telephone networks general use (PSTN) over the Internet, as well as telephone communication between PSTN and Internet subscribers with each other.

IP-telephony has a number of undeniable advantages that ensure its rapid development and expansion of the computer telephony market. It is beneficial to end users who are provided with telephone communication at a fairly low per-minute payment. For companies with remote branches, IP technology allows you to organize voice communications using existing corporate IP networks. Instead of several communication networks, one is used. The undoubted advantage of IP-telephony over a regular phone is also the ability to provide additional services through the use of a multimedia computer and various Internet applications. Thus, with IP telephony, businesses and individuals can expand their communications capabilities by incorporating advanced videoconferencing, application sharing, whiteboard-type tools, and more.

What international standards and protocols regulate the main parameters and algorithms for the operation of hardware and software communications used in IP telephony? Obviously, as the name suggests, this technology is based on the IP protocol, which, however, is used not only for telephony: it was originally developed for transmitting digital data to packet-switched IVS.

In networks that do not provide a guaranteed quality of service (these include networks built on the basis of the IP protocol), packets may be lost, the order of their arrival may change, the data transmitted in packets may be distorted. Various transport layer procedures are used to ensure reliable delivery of transmitted information under these conditions. When transmitting digital data, the TCP protocol (Transmission Control Protocol) is used for this purpose. This protocol provides reliable data delivery and restores the original packet order. If an error is detected in a packet or the packet is lost, the TCP procedures send a retransmission request.

For audio and video conferencing applications, packet delays have a much greater effect on signal quality than individual data distortions. Differences in delays can lead to gaps. Such applications require a different transport layer protocol that provides packet resequencing, delivery with minimum delay, real-time playback at precisely specified moments, traffic type recognition, multicast or two-way communication. Such a protocol is the real-time transport protocol RTP (Real-Time Transport Protocol). This protocol regulates the transmission of multimedia data in packets through the IVS at the transport level and is supplemented by the real-time data transmission control protocol RTCP (Real-Time Control Protocol). The RTCP protocol, in turn, provides control over the delivery of multimedia data, quality of service control, transfer of information about the participants in the current communication session, control and identification, and is sometimes considered part of the RTP protocol.

Many publications on IP telephony note that most of the network equipment and special software for this technology is developed on the basis of the Recommendation H.323 of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) (including TAPI 3.0, NetMeeting 2.0, etc.). How does H.323 relate to RTP and RTCP? H.323 is a broad conceptual framework that includes many other standards, each dealing with different aspects of information transfer. Most of these standards, such as audio and video codec standards, have wide application not only in IP-telephony. As for the RTP / RTCP protocols, they form the basis of the H.323 standard, are focused on providing exactly IP technology, and underlie the organization of IP telephony. This article is devoted to the consideration of these protocols.

2. Basic concepts

The RTP real-time transport protocol provides end-to-end real-time transmission of multimedia data such as interactive audio and video. This protocol implements traffic type recognition, packet sequence numbering, work with timestamps and transmission control.

The action of the RTP protocol is reduced to assigning each outgoing packet a timestamp. On the receiving side, packet timestamps indicate in what sequence and with what delays they need to be played back. Support for RTP and RTCP allows the receiving host to arrange the received packets in the proper order, reduce the effect of packet delay jitter on the network on signal quality, and restore synchronization between audio and video so that incoming information can be correctly heard and viewed by users.

Note that RTP itself does not have any mechanism to guarantee timely data transmission and quality of service, but uses underlying services to ensure this. It does not prevent out-of-order packets, but it does not assume that the underlying network is absolutely reliable and transmits packets in the correct sequence. The sequence numbers included in RTP allow the receiver to re-sequence the sender's packets.

RTP Protocol supports both bidirectional communication and data transfer to a group of destinations if the multicast is supported by the underlying network. RTP is designed to provide the information required individual applications, and in most cases integrated into the application.

Although RTP is considered a transport layer protocol, it usually functions on top of another transport layer protocol, UDP (User Datagram Protocol). Both protocols contribute to the functionality of the transport layer. It should be noted that RTP and RTCP are independent of the underlying transport and network layers, so the RTP/RTCP protocols can be used with other suitable transport protocols.

RTP/RTCP protocol data units are called packets. Packets generated in accordance with the RTP protocol and used to transmit multimedia data are called information packets or data packets (data packets), and packets generated in accordance with the RTCP protocol and used to transmit service information required for reliable teleconferencing are called packets. control or service packets (control packets). An RTP packet includes a fixed header, an optional variable length header extension, and a data field. An RTCP packet starts with a fixed part (similar to the fixed part of RTP information packets) followed by variable length building blocks.

In order for the RTP protocol to be more flexible and applicable to various applications, some of its parameters are intentionally undefined, but it provides for the concept of a profile. Profile (profile) is a set of parameters for RTP and RTCP protocols for a specific class of applications, which determines the features of their functioning. The profile defines the use of individual packet header fields, traffic types, header additions and header extensions, packet types, communication security services and algorithms, features of the use of the underlying protocol, etc. RTP profile for audio and video conferencing with minimal control). Each application usually works with only one profile, and setting the profile type is done by selecting the appropriate application. No explicit indication of profile type by port number, protocol identifier, etc. not provided.

Thus, a complete RTP specification for a particular application must include additional documents, which include a profile description, as well as a traffic format description that defines how a particular type of traffic, such as audio or video, will be processed in RTP.

Features of multimedia data transmission during audio and video conferences are discussed in the following sections.

2.1. Group audio conferencing

Group audio conferencing requires a multi-user group address and two ports. In this case, one port is required for the exchange of audio data, and the other is used for control packets of the RTCP protocol. The group address and port information is sent to the intended teleconference participants. If privacy is required, then the information and control packets may be encrypted as defined in Section 7.1, in which case the encryption key must also be generated and distributed.

The audio conferencing application used by each conference participant sends audio data in small bursts, such as 20 ms. Each piece of audio data is preceded by an RTP header; the RTP header and data are in turn formed (encapsulated) into a UDP packet. The RTP header indicates which type of audio coding (eg, PCM, ADPCM, or LPC) was used to form the data in the packet. This makes it possible to change the coding type during the conference, for example, when a new participant arrives who uses a low bandwidth connection, or during network congestion.

In the Internet, as in other packet-switched data networks, packets are sometimes lost and reordered, and also delayed for various times. To counteract these events, the RTP header contains a timestamp and sequence number that allow receivers to re-timing so that, for example, portions of an audio signal are played continuously by the speaker every 20 ms. This timing reconstruction is performed separately and independently for each source of RTP packets in the teleconference. The sequence number can also be used by the receiver to estimate the number of lost packets.

Since participants in a teleconference can join and leave during a teleconference, it is useful to know who is currently in the conference and how well the conference participants are receiving audio data. For this purpose, each instance of the audio application during the conference periodically issues on the control port (RTCP port) for applications of all other participants, packet reception messages indicating their user name. The receive message indicates how well the current speaker is being heard and can be used to control adaptive encoders. In addition to the username, other identification information for bandwidth control may also be included. When leaving the conference, the site sends an RTCP BYE packet.

2.2. Videoconferencing

If both audio and video signals are used in a teleconference, they are transmitted separately. For the transmission of each type of traffic, regardless of the other, the protocol specification introduces the concept of an RTP session (see the list of abbreviations and terms used). A session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). Packets for each type of traffic are transmitted using two different pairs of UDP ports and/or multicast addresses. There is no direct RTP layer connection between audio and video sessions, except that a user participating in both sessions must use the same canonical name in the RTCP packets for both sessions so that the sessions can be linked.

One reason for this separation is that some conference participants need to be allowed to receive only one type of traffic if they wish to. Despite the separation, synchronous playback of source media data (audio and video) can be achieved using the timing information that is carried in the RTCP packets for both sessions.

2.3. The concept of mixers and translators

Not always all sites have the ability to receive multimedia data in the same format. Consider the case where participants from the same locality are connected via a low speed link to the majority of other conference participants who have broadband network access. Instead of forcing everyone to use a narrower bandwidth and lower quality audio coding, an RTP layer communication facility called a mixer can be placed in a low bandwidth region. This mixer resynchronizes the incoming audio packets to restore the original 20ms intervals, mixes these restored audio streams into a single stream, performs low bandwidth audio encoding, and transmits the packet stream over a low speed link. In this case, packets can be addressed to one recipient or a group of recipients with different addresses. In order for receiving endpoints to provide a correct indication of the source of messages, the RTP header includes means for mixers to identify the sources involved in the formation of the mixed packet.

Some of the participants in the audio conference may be connected by broadband communication lines, but may not be reachable through an IP multicast group conference (IPM). For example, they may be behind an application layer firewall that will not allow any transmission of IP packets. For such cases, not mixers are needed, but a different type of RTP layer communication, called translators. Of the two translators, one is installed outside the firewall and externally forwards all multicast packets received over a secure connection to the other translator installed behind the firewall. The translator behind the firewall broadcasts them again as multicast packets to a multi-user group restricted to internal network site.

Mixers and translators can be designed for a number of purposes. Example: A video mixer that scales video images of individuals in independent video streams and composites them into a single video stream, simulating a group scene. Broadcast examples: Connecting a group of IP/UDP-only hosts to a group of ST-II-only hosts, or transcoding video packet by packet from individual sources without retiming or mixing. The details of how mixers and translators work are discussed in Section 5.

2.4. Byte order, alignment, and timestamp format

All fields of RTP/RTCP packets are transmitted over the network in bytes (octets); the most significant byte is transmitted first. All header field data is aligned according to its length . Octets designated as optional have a value of zero.

Absolute time (Wallclock time) in RTP is represented using the NTP (Network Time Protocol) timestamp format, which is a countdown in seconds relative to zero hours on January 1, 1900. The full NTP timestamp format is a 64-bit unsigned fixed-point number with an integer part in the first 32 bits and a fractional part in the last 32 bits. In some fields with a more compact representation, only the middle 32 bits are used - the low 16 bits of the integer part and the high 16 bits of the fractional part.

The next two sections of this article (3 and 4) discuss the packet formats and features of the functioning of the RTP and RTCP protocols, respectively.

3. RTP data transfer protocol

3.1. Fixed RTP header fields

As noted above, an RTP packet includes a fixed header, an optional variable length header extension, and a data field. The fixed header of RTP protocol packets has the following format: .

The first twelve octets are present in every RTP packet, while the contributing source CSRC (contributing source) identifier field is present only when inserted by the mixer. The fields have the following purposes.

Version (V): 2 bits. This field identifies the RTP version. This article focuses on version 2 of the RTP protocol (value 1 was used in the first draft version of RTP).

Complement (P): 1 bit. If the padding bit is set to one, then the packet at the end contains one or more padding octets that are not part of the traffic. The last padding octet contains an indication of the number of such octets to be subsequently ignored. Padding may be required by some cipher algorithms with fixed block sizes or to carry multiple RTP packets in a single underlying protocol payload.

Extension (X): 1 bit. If the extension bit is set, then the fixed header is followed by a header extension with the format defined in .

CSRC counter (CC): 4 bits. The CSRC counter contains the number of CSRC source identifiers to include (see list of used abbreviations and terms) that follow the fixed header.

Marker (M): 1 bit. The interpretation of the marker is determined by the profile. It is intended to allow significant events (eg video frame boundaries) to be marked in the packet stream. The profile may introduce additional marker bits or determine that no marker bit is present by changing the number of bits in the traffic type field (see ).

Traffic type (PT): 7 bits. This field identifies the format of the RTP traffic and determines how the application will interpret it. The profile defines the default static mapping between PT values and traffic formats. Additional traffic type codes can be defined dynamically via non-RTP facilities. The sender of an RTP packet at any given time emits a single RTP traffic type value; this field is not intended for multiplexing individual media streams (see ).

Sequence number: 16 bits. The sequence number value is incremented by one with each RTP information packet sent and can be used by the receiver to detect lost packets and restore their original sequence. The initial value of the sequence number is chosen randomly to make it difficult to crack the key based on known values of this field (even if the source does not use encryption, since the packets may pass through a relay that uses encryption). Timestamp: 32 bits. The timestamp reflects the sampling time for the first octet in the RTP information packet. The sample time must be derived from a timer that increments monotonically and linearly with time to provide synchronization and jitter detection (see Section 4.3.1). The resolution of the timer should be sufficient for the desired timing accuracy and packet arrival jitter measurement (one timer report per video frame is usually not enough). The timing frequency depends on the format of the transmitted traffic and is set statically in the traffic format profile or specification, or can be set dynamically for traffic formats defined through "non-RTP tools". If RTP packets are generated periodically, then the nominal sampling times determined by the sampling timer should be used, not the values of the system timer. For example, for a fixed rate audio signal, it is desirable that the timestamp encoder be incremented by one for each sample period. If an audio application from an input device reads blocks containing 160 samples, then the timestamp must be incremented by 160 for each such block, regardless of whether the block was transmitted in a packet or dropped as a pause. The initial value of the timestamp, like the initial value of the sequence number, is a random value. Several consecutive RTP packets may have equal timestamps if they are logically generated at the same time, eg belong to the same video frame. Consecutive RTP packets may contain non-monotone timestamps if the data is not transmitted in sample order, as is the case with interpolated MPEG video frames (however, packet sequence numbers will still be monotonic when transmitted).

SSRC: 32 bits. The SSRC (synchronization source) field identifies the synchronization source (see list of abbreviations and terms used). This ID is randomly chosen so that no two clock sources within the same RTP session have the same SSRC ID. While the likelihood of multiple origins choosing the same identifier is low, all RTP implementations must be prepared to detect and resolve such collisions. Section 6 discusses the probability of collisions along with a mechanism for resolving them and detecting RTP layer loops based on the uniqueness of the SSRC identifier. If a source changes its original transport address, then it must also choose a new SSRC identifier so that it is not interpreted as a looped source.

CSRC list: 0 to 15 items, 32 bits each. The contributing source (CSRC) list identifies the sources of traffic contained in the packet to include. The number of identifiers is given by the CC field. If there are more than fifteen included sources, then only 15 of them can be identified. CSRC IDs are inserted by mixers when using SSRC IDs for switched sources. For example, for sound packets, the SSRC identifiers of all sources that were mixed when the packet was created are listed in the CSRC list, providing a correct indication of message sources to the recipient.

3.2. RTP sessions

As mentioned above, in accordance with the RTP protocol, different types of traffic must be transmitted separately, in different RTP sessions. A session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). For example, in a teleconference composed of separately encoded audio and video, each type of traffic needs to be sent in a separate RTP session with its own destination transport address. Audio and video are not expected to be carried in the same RTP session and separated based on traffic type or SSRC fields. Interleaving of packets having Various types traffic but using the same SSRC would cause some problems:

If one of the traffic types changes during a session, there will be no general means to determine which of the old values has been replaced by the new one.
The SSRC identifies a single timing interval value and sequence number space. Interleaving multiple types of traffic would require different synchronization intervals if the clock rates of the different streams differ, and different sequence number spaces to indicate the type of traffic to which the packet loss is related.
The RTCP sender and receiver messages (see Section 4.3) describe only one timing interval value and sequence number space for SSRC and do not carry a traffic type field.
The RTP mixer is not capable of combining interleaved streams of different types of traffic into a single stream.
Multiple types of traffic in a single RTP session are hampered by the following factors: different network paths or distribution network resources; receiving a subset of multimedia data when required, such as audio only if the video signal has exceeded the available bandwidth; sink implementations that use separate processes for different types of traffic, while using separate RTP sessions allows for both single and multiple process implementations.

By using different SSRCs for each type of traffic, but sending them in the same RTP session, the first three problems can be avoided, but the last two cannot be avoided. Therefore, the specification of the RTP protocol requires each type of traffic to use its own RTP session.

3.3. Profile-defined RTP header changes

The existing RTP Information Packet header is complete for the set of features required in general for all classes of applications that might support RTP. However, for better adaptation to specific tasks, the header can be modified through modifications or additions defined in the profile specification.

The marker bit and traffic type field carry profile specific information, but are located in a fixed header as many applications are expected to need them. The octet containing these fields may be redefined by the profile to meet different requirements, for example with more or less marker bits. If any marker bits are present, they should be placed in the high-order bits of the octet, since profile-independent monitors may be able to observe a correlation between the packet loss pattern and the marker bit.

Additional information that is required for a particular traffic format (eg video coding type) MUST be carried in the data field of the packet. It can be placed at a certain place at the beginning or inside the data array.

If a particular class of applications needs additional functionality independent of the traffic format, then the profile that those applications operate with must define additional fixed fields to be placed immediately after the SSRC field of the existing fixed header. These applications will be able to quickly access additional fields directly, while profile-independent monitors or recorders will still be able to process RTP packets by interpreting only the first twelve octets.

If it is considered that additional functionality is needed in general for all profiles, then the a new version RTP to make permanent fixed title change.

3.4. RTP header extension

To allow individual implementations to experiment with new traffic-format-independent features that require additional information to be carried in the information packet header, RTP provides a packet header extension mechanism. This mechanism is designed so that the header extension can be ignored by other cooperating applications that do not require it.

If the X bit in the RTP header is set to one, then a variable length header extension is appended to the fixed RTP header (following the CSRC list, if any). Note that this header extension is for limited use only. The RTP packet header extension has the following format:

The extension contains a 16-bit length field that indicates the number of 32-bit words in it, excluding the four-octet extension header (hence the length can be zero). Only one extension can be added to a fixed RTP information packet header. To allow each of a plurality of cooperating implementations to experiment independently with different header extensions, or to allow a particular implementation to experiment with more than one type of header extension, the use of the first 16 bits of the extension is undefined, left to distinguishing identifiers or parameters. The format of these 16 bits must be determined by the profile specification that the applications are working with.

1999
2000

The requirement to support several types of traffic with different requirements for quality of service based on the TCP / IP protocol stack is now very relevant. This problem is addressed by the Real-Time Transport Protocol (RTP), which is an IETF standard for real-time transmission of data such as voice or video over a network that does not guarantee quality of service.

The RTP protocol guarantees the delivery of data to one or more recipients with a delay not exceeding the specified value. To do this, the protocol header contains timestamps necessary for the successful restoration of audio and video information, as well as data on the method of encoding information.

Although the TCP protocol guarantees the delivery of transmitted data in the correct sequence, the traffic is not uniform, that is, unpredictable delays occur during the delivery of datagrams. Since the RTP protocol is aware of the content of datagrams and has data loss detection mechanisms, it can reduce latency to an acceptable level.

IP protocol address scheme

The internetwork addressing scheme used in the IP protocol is described in RFC 990 and RFC 997. It is based on the separation of addressing networks from addressing devices in these networks. This scheme facilitates routing. In this case, addresses must be assigned in an orderly (consecutive) manner in order to make routing more efficient.

When using the TCP / IP protocol stack on the network, end devices receive unique addresses. Such devices can be personal computers, media servers, routers, etc. However, some devices that have multiple physical ports, such as routers, must have a unique address on each of the ports. Based on the addressing scheme and the fact that some devices on the network may have multiple addresses, we can conclude that this scheme addressing describes not the device itself on the network, but a specific connection of this device to the network. This scheme leads to a number of inconveniences. One of them is the need to change the address of the device when moving it to another network. Another drawback is that to work with a device that has several connections in a distributed network, you need to know all its addresses that identify these connections.

So, for each device in IP networks, we can talk about addresses of three levels:

q The physical address of the device (more precisely, a specific interface). For devices on Ethernet networks, this is the MAC address network card or router port. These addresses are assigned by the hardware manufacturers. The physical address has six bytes: the upper three bytes are the identifier of the manufacturer, the lower three bytes are assigned by the manufacturer;

q An IP address consisting of four bytes. This address is used at the network layer reference model OSI;

q Character identifier - name. This identifier can be assigned arbitrarily by the administrator.

When the IP protocol was standardized in September 1981, its specification required that every device connected to the network have a unique 32-bit address. This address is divided into two parts. The first part of the address identifies the network where the device is located. The second part uniquely identifies the device itself within the network. This scheme leads to a two-level address hierarchy (Figure 6.23).

Now the network number field in the address is called network prefix, because it identifies the network. All workstations on the network share the same network prefix. However, they must have unique device numbers. Two workstations located in different networks, must have different network prefixes, but they can have the same device number.

For flexibility in addressing computer networks The designers of the protocol determined that the IP address space should be divided into three different classes - A, B, and C. Knowing the class, you know where the boundary between the network prefix and the device number lies in a 32-bit address. On fig. Figure 6.24 shows the address formats of these basic classes.

One of the main advantages of using classes is that you can determine from the class of the address where the boundary between the network prefix and the device number is. For example, if the most significant two bits of the address are 10, then the split point is between bits 15 and 16.

The disadvantage of this method is the need to change the network address when connecting additional devices. For example, if the total number of devices in a class C network exceeds 255, then its addresses will need to be replaced with class B addresses. Changing network addresses will require additional efforts from the administrator to debug the network. Network administrators cannot make a smooth transition to a new class of addresses because the classes are clearly separated. You have to prohibit the use of an entire group of network addresses, change all addresses of devices in this group at the same time, and only then allow their use on the network again. In addition, the introduction of address classes significantly reduces the theoretically possible number of individual addresses. IN current version IP protocol (version 4) the total number of addresses could be 2 32 (4 294 967 296), since the protocol provides for the use of 32 bits to specify the address. Naturally, the use of some bits for service purposes reduces the available number of individual addresses.

Class A is for large networks. Each class A address has an 8-bit network prefix with the most significant bit set to 1 and the next seven bits used for the network number. The remaining 24 bits are used for the device number. At the moment, all class A addresses are already allocated. Class A networks are also referred to as "/8" because class A addresses have an 8-bit network prefix.

The maximum number of class A networks is 126 (2 7 -2 - two addresses are subtracted, consisting of only zeros and ones). Each network of this class supports up to 16,777,214 (2 24 -2) devices. Since a class A address block can contain a maximum of 231 (2 147483648) individual addresses, and IP version 4 can support a maximum of 232 (4 294 967 296) addresses, class A occupies 50% of the total IP address space. .

Class B is intended for medium-sized networks. Each class B address has a 16-bit network prefix where the two most significant bits are 10 and the next 14 bits are used for the network number. 16 bits are allocated for the device number. Class B networks are also referred to as "/16" because class B addresses have a 16-bit network prefix.

The maximum number of class B networks is 16,382 (2 14 -2). Each network of this class supports up to 65,534 (2 16 -2) devices. Since an entire class B address block can contain up to a maximum of 230 (1,073,741,824) individual addresses, it occupies 25% of the total IP address space.

Class C addresses are used in networks with a small number of devices. Each class C network has a 24-bit network prefix, in which the three most significant bits are 110, and the next 21 bits are used for the network number. The remaining 8 bits are allocated for device numbers. Class C networks are also referred to as "/24" because class C addresses have a 24-bit network prefix.

The maximum number of Class C networks is 2,097,152 (221). Each network of this class supports up to 254 (2 8 -2) devices. Class C occupies 12.5% of the total IP address space.

In table. 6.9 summarizes our analysis of network classes.

Table 6.9. Network classes

In addition to these three classes of addresses, there are two more classes. In class D, the most significant four bits are 1110. This class is used for multicasting. In class E, the upper four bits are 1111. It is reserved for experimentation.

For readability of addresses in technical literature, application programs, etc., IP addresses are represented as four decimal numbers separated by dots. Each of these numbers corresponds to one octet (8 bits) of the IP address. This format is called dotted decimal (Decimal-Point Notation) or dotted decimal notation (Figure 6.25).

In table. 6.10 lists the ranges of decimal values for the three classes of addresses. In table. 6.10 XXX entry means an arbitrary field.

Table 6.10. Address value ranges

Some IP addresses cannot be assigned to devices on the network (Table 6.11).

As shown in this table, in reserved IP addresses, all bits set to zero correspond to either this device, or this network, and IP addresses, all bits of which are set to 1, are used in broadcasting information. To refer to the entire IP network as a whole, an address with a device number is used, with all bits set to "0". Class A network address 127.0.0.0 is reserved for loopback and introduced to test communication between processes on the same machine. When an application uses a loopback address, the TCP/IP protocol stack returns this data to the application without sending anything to the network. In addition, this address can be used for the interaction of separate processes within the same machine. Therefore, in IP networks, it is forbidden to assign IP addresses starting with 127 to devices.

In addition to directed data transmission to a specific workstation, broadcast transmission is actively used, in which all stations in the current or specified network receive information. There are two types of broadcasts in the IP protocol: directed and limited.

A directed broadcast allows a device on a remote network to send a datagram to all devices on the current network. A datagram with a forwarded broadcast address can pass through routers, but it will only be delivered to all devices on the specified network, not to all devices. In a directed broadcast, the destination address consists of a specific network number and a device number, all bits of which are 0 or 1. For example, addresses 185.100.255.255 and 185.100.0.0 would be treated as directed broadcast addresses for the class B network 185.100.xxx.xxx. From an addressing point of view, the main disadvantage of directional broadcast is that knowledge of the target network number is required.

The second form of broadcast, called limited broadcast, broadcasts within the current network (the network where the sending device resides). A datagram with a limited broadcast address will never pass through a router. In limited broadcasting, the network number and device number bits are all zeros or ones. Thus, a datagram with a destination address of 255.255.255.255 or 0.0.0.0 will be delivered to all devices on the network. On fig. Figure 6.26 shows networks connected by routers. In table. Figure 6.12 lists the recipients of broadcast datagrams sent by workstation A.

The IP protocol supports three addressing methods: single (unicast), broadcast (broadcast) and group (multicast).

Table 6.12. Broadcast Datagram Receivers

In single addressing, datagrams are sent to a specific single device. The implementation of this approach is not difficult, but if working group contains many stations, the throughput may not be sufficient, since the same datagram will be transmitted many times.

With broadcast addressing, applications send a single datagram, which is delivered to all devices on the network. This approach is even simpler to implement, but if in this case the broadcast traffic is not limited to the local network (and, for example, another network is forwarded using routers), then global network must have significant throughput. If the information is intended only for a small group of devices, then this approach seems irrational.

In multicast, datagrams are delivered to a specific group of devices. At the same time (which is very important when working in distributed networks), no excess traffic is generated. Multicast and single address datagrams differ in address. In the header of an IP datagram with multicast, instead of IP addresses of classes A, B, C, there is a class D address, that is, a group address.

A group address is assigned to some recipient devices or, in other words, to a group. The sender writes this multicast address in the header of the IP datagram. The datagram will be delivered to all members of the group. The first four bits of the class D address are 1110. The rest of the address (28 bits) is occupied by the group identifier (Figure 6.27).

In dotted decimal format, group addresses range from 224.0.0.0 to 239.255.255.255. In table. Figure 6.13 shows the class D address allocation scheme.

Table 6.13. Class D address allocation

As can be seen from Table. 6.13, the first 256 addresses are reserved. In particular, this range is reserved for routing protocols and other low-level protocols. In table. 6.14 contains some reserved class D IP addresses.

Above this range is a large group of addresses dedicated to applications running on the Internet. The uppermost address range (approximately 16 million addresses) is for administrative purposes in local networks. Class D group addresses are centrally managed and registered by a special organization called IANA.

Multicast can be implemented at two levels of the OSI model - channel (Data-Link Layer) and network (Network Layer). Link layer transmission protocols, such as Ethernet and FDDI, can support single, broadcast, and multicast addressing. Link layer multicast is especially effective if it is supported in hardware on the NIC.

To support IANA multicasting, a block of multicast Ethernet addresses has been allocated, starting from 01-00-5E (in hexadecimal notation). A multicast IP address can be translated to an address in this block. The principle of translation is quite simple: the lower 23 bits of the IP group identifier are copied into the lower 23 bits of the Ethernet address. Note that this scheme associates up to 32 different IP groups with the same Ethernet address, since the next 5 bits of the IP group identifier are ignored.

Table 6.14. Reserved class D addresses

Address	Purpose
224.0.0.1	All devices on the subnet
224.0.0.2	All routers on the subnet
224.0.0.4	All DVMRP routers
224.0.0.5	All MOSPF Routers
224.0.0.9	RIP IP Version II
224.0.1.7	audio news
224.0.1.11	IEFT audio
224.0.1.12	IEFT video

If the sender and recipient belong to the same physical network, the process of transmitting and receiving multicast frames at the link layer is quite simple. The sender specifies the IP address of the group of recipients, and the NIC translates this address into the corresponding group Ethernet address and sends the frame.

If the sender and receiver are on different subnets connected by routers, datagram delivery is difficult. In this case, the routers must support one of the multicast routing protocols (DVMRP, MOSPF, PIM - see below). According to these protocols, the router will build a delivery tree and correctly forward the multicast traffic. In addition, each router must support Group Management Protocol (IGMP) to determine the presence of group members on directly connected subnets (Figure 6.28).

RTP Protocol

The main transport protocol for multimedia applications has become the real-time protocol RTP (Real-Time Protocol), designed to organize the transmission of packets with coded speech signals over an IP network. The transmission of RTP packets is carried out over the UDP protocol, which, in turn, works over IP (Fig. 1.5.).

Rice. 1.5.

In fact, the level to which RTP belongs is not defined as unambiguously as shown in Fig. 1.5 and as it is usually described in the literature. On the one hand, the protocol really works on top of UDP, it is implemented application programs and, by all indications, is an application protocol. But at the same time, as stated at the beginning of this paragraph, RTP provides transport services independently of multimedia applications and is, from this point of view, just a transport protocol. Best definition: RTP is a transport protocol implemented at the application layer.

To transmit voice (multimedia) traffic, RTP uses packets, the structure of which is shown in Fig. 1.6.

An RTP packet consists of at least 12 bytes. The first two bits of the RTP header (version bit field, V) indicate the version of the RTP protocol (currently version 2).

Clearly, with this header structure, only one more RTP version is possible at most. The field following them contains two bits: the P bit, which indicates whether padding characters have been added to the end of the payload field (they are usually added if the transport protocol or encoding algorithm requires the use of fixed-size blocks), and the X bit, which indicates Whether an extended header is being used.

Rice. 1.6.

If used, the first word of the extended header contains the total length of the extension. Further, the four CC bits determine the number of CSRC fields at the end of the RTP header, i.e. the number of sources forming the flow. The marker bit M allows you to mark what the standard defines as significant events, for example, the beginning of a video frame, the beginning of a word in an audio channel, and so on. It is followed by a PT data type field (7 bits), which indicates the payload type code that determines the contents of the payload field - application data (Application Data), for example, uncompressed 8-bit MP3 audio, etc. From this code, the application can learn what to do to decode the data. The rest of the fixed-length header consists of a Sequence Number field, a Time Stamp field to record when the first word of the packet was created, and an SSRC timing source field that identifies this source. The last field can be a single device with only one network address, multiple sources that can represent different media (audio, video, etc.), or different streams of the same media. Since the sources may be different devices, the SSRC identifier is chosen randomly so that the chance of receiving data from two sources at once during an RTP session is minimal. However, a mechanism for resolving conflicts if they arise is also defined. The fixed part of the RTP header can be followed by up to 15 separate 32-bit CSRC fields that identify data sources.

RTP is supported by the Real-Time Transport Control Protocol (RTCP), which generates additional reports containing information about RTP sessions. Recall that neither UDP nor RTP are engaged in providing QoS (Quality of Service). The RTCP protocol provides feedback with senders, and for stream receivers it provides some QoS enhancements, packet information (loss, delay, jitter) and user (application, stream). For flow control, there are two types of reports - generated by senders and generated by recipients. For example, information about the percentage of lost packets and the absolute number of losses allows the sender, when receiving a report, to detect that channel congestion may cause receivers not to receive packet streams that they expected. In this case, the sender has the option to lower the coding rate to reduce congestion and improve reception. The sender report contains information about when the last RTP packet was generated (it includes both an internal timestamp and real time). This information allows the recipient to coordinate and synchronize multiple streams such as video and audio. If the stream is directed to several recipients, then streams of RTCP packets from each of them are organized. This will take steps to limit the bandwidth - inversely proportional to the rate at which RTCP reports are generated and the number of recipients.

It should be noted that although RTCP works separately from RTP, the RTP/UDP/IP chain itself leads to significant overhead (in the form of their headers). The G.729 codec generates packets of 10 bytes (80 bits every 10 ms). One RTP header, 12 bytes in size, is larger than this entire packet. In addition, an 8-byte UDP header and a 20-byte IP header (in IPv4) must be added to it, which creates a header that is four times the size of the transmitted data.

Thematic materials: