RTP and RTCP protocols in VoIP. IP telephony: from copper wires to digital signal processing

RTP Protocols and RTCP to VoIP

RTP is the main transport protocol in IP telephony networks. RTP (Real Time Protocol) - a real-time protocol, was created for the transmission of multimedia (audio, video), encoded and packaged, information over IP networks in strict time frames. RTP segments are transmitted over the UDP and IP protocols, respectively, at different levels of the OSI model. The use of the UDP protocol, which does not guarantee delivery, is due to the strict time limits for the transmission of multimedia information in real time, as well as the inability of the TCP protocol to work in real time. Therefore, despite the loss of part of the data, the timeliness of delivery is more important in this case.
IN general view The distribution of the protocol over the layers of the OSI model is as follows:
Transport layer: RTP over UDP
Network: IP
Channel: Ethernet
Physical: Ethernet
When transmitting multimedia information using the RTP protocol, the following encapsulation is used:

The minimum RTP segment size is 12 bytes. The first two bits define the protocol version. Today, RTPv.2 is used. The next P field is also 2 bits long and indicates the presence of filler characters in the data field when segments of the same length are used. The X field specifies whether an extended header is used. The 4-bit CC field then defines the number of CSRC fields at the end of the RTP header, i.e. the number of streaming sources. Then comes the M field, a marker bit used to highlight important data. The next PT field has a size of 7 bits. It is intended to determine the type of payload - the data needed for the application. Based on the specified code, the application determines the type of multimedia information and the decoding method.
The rest of the header consists of a sequence number field (SequenceNumber) - the sequence number of the segment that keeps track of the order of the packets and their loss; time stamp fields - synchronization code indicating the time of the first encoded sample in the payload, this stamp is used by synchronization recovery buffers to eliminate quality losses caused by delay variation; SSRC timing source fields - An arbitrary number that distinguishes one RTP session from another to allow for multiplexing. After the constant fixed part of the RTP header, up to 15 thirty-two bit CSRC fields may be added that identify the data sources.
Let us describe the procedure for establishing an RTP session. The protocol states that the traffic different type transmitted in separate communication sessions. To establish a session, it is necessary to define a pair of destination transport addresses i.e. one network address and two ports for RTP and RTCP. So for a video conference, audio and video must be transmitted in different sessions with correspondingly different destination ports. Passing different types of traffic using interleaving in the same session could cause the following problems:
- when changing one of the traffic types, it is impossible to determine which parameter in the session should be replaced with a new value;
- only one timing interval is used to establish a session, and when transmitting heterogeneous traffic, each type will have its own interval, and they will differ;
- RTP mixer cannot combine interleaved streams of different types of traffic into one stream;
- the transmission of several types of traffic in one RTP session is not possible due to the following reasons: the use of different network paths or distribution network resources; receiving a subset of multimedia data when required, such as audio only if the video signal has exceeded the available bandwidth; sink implementations that use separate processes for different types of traffic, while the use of separate RTP sessions allows both single and multiple process implementations.

However, RTP (Real-time Transport Protocol) and UDP (User Datagram Protocol) do not guarantee quality, i.e. they do not work with QoS (Quality of Service). Therefore, RTP is supported by RTCP (Real-Time Transport Control Protocol), which provides additional information about the state of an RTP session.
The RTCP protocol performs four main functions:
I. The main purpose of the RTCP protocol is to provide feedback to guarantee the quality of data transmission. Feedback can be directly useful when applying adaptive coding in transmission. Also, when using IP multicasting for recipients, it is extremely important to diagnose errors in the transmission of messages (packets). Sending messages with receipt reports allows the transmitting side to determine the reason for the unsuccessful transmission of messages, if any.
II. RTCP contains an immutable transport layer identifier for the RTP source, which is called the "canonical name" or "Cname" (Canonical Name). Since the SSRC identifier can change if collisions are found, the receiver needs a Cname value to keep track of each participant. Recipients also use Cname to map multiple data streams from one participant when establishing multiple sessions at the same time, for example, to synchronize audio and video channels when transmitting video with audio.
III. The above two functions assume that all session participants sent RTCP packets, so the rate must be controlled so that RTP can establish sessions with a large number of users. When each participant sends its control packets to all the others, any partner can independently determine the total number of participants in the session. This is required for RTCP message rate calculations.
IV. This function serves to convey the minimum necessary control information, such as the participant ID, which is used GUI user. This feature is used for loosely controlled sessions where users enter and exit without proper matching of parameters and characteristics. RTCP serves as a convenient channel for contacting all participants, but it does not necessarily support all the communication requirements of an application.
In IP networks using multicasting, functions one, two and three are mandatory when using RTP sessions. It is also recommended to use them for transmission in other networks and environments. Today, it is recommended that RTP application developers use tools that allow them to work in multicast mode, and not just in unicast mode.
Consider the RTCP packet format.
The protocol standard defines several types of RTCP packets. RTCP is intended for transmission of service information:
sr: Sender report. Required for receiving and transmitting statistics of session participants who are directly active senders;
rr: Recipient report. Required for statistics from participants who are not recipients;
sdes: Describes the source, includes Cname;
bye: Indicates the end (exit) of the session;
app: Application-specific functions;
Each RTCP packet consists of a constant part, as for the RTP protocol, which is used by RTP packets, followed by fields that can vary in length depending on the type of packet, but a multiple of 32 bits. Alignment requirements and a length field in the fixed part of the header are introduced to make RTCP packets concatenable. Several RTCP packets can be connected to each other without introducing any separators in order to obtain a composite RTCP packet that is sent within a low level transport protocol such as UDP. There is no specific count of individual RTCP packets, as the low level protocol will set the total length and determine the end of the composite packet.

The format of the RTCP message packet of the sender is as follows, as shown in the figure above.

RTCP packets are subject to the following validation checks.
- RTP version field must be equal to 2.
- The data type field of the first RTCP packet in the compound packet must be SR or RR.
- The filler bit (P) must be set to zero for the first packet of a multipart RTCP packet, since the filler can only be present in the last one.
- The length of the fields of individual RTCP packets must sum to the total length of the composite packet.

One of the most important trends in the evolution of modern telecommunications is the development of IP-telephony - a set of new technologies that ensure the transmission of multimedia messages (voice, data, video) through information and computer networks (ICNs) built on the basis of the IP (Internet Protocol) protocol, in including local, corporate, global computer networks and the Internet. The concept of IP-telephony includes Internet-telephony, which allows organizing telephone communication between subscribers Internet networks, between subscribers telephone networks general use (PSTN) over the Internet, as well as telephone communication between PSTN and Internet subscribers with each other.

IP-telephony has a number of undeniable advantages that ensure its rapid development and expansion of the computer telephony market. It is beneficial to end users who are provided with telephone communication at a fairly low per-minute payment. For companies with remote branches, IP technology allows you to organize voice communications using existing corporate IP networks. Instead of several communication networks, one is used. The undoubted advantage of IP-telephony over a regular phone is also the ability to provide additional services through the use of a multimedia computer and various Internet applications. Thus, with IP telephony, businesses and individuals can expand their communications capabilities by incorporating advanced videoconferencing, application sharing, whiteboard-type tools, and more.

What international standards and protocols regulate the main parameters and algorithms for the operation of hardware and software tools connections used in IP-telephony? Obviously, as the name suggests, this technology is based on the IP protocol, which, however, is used not only for telephony: it was originally developed for transmitting digital data to packet-switched IVS.

In networks that do not provide a guaranteed quality of service (these include networks built on the basis of the IP protocol), packets may be lost, the order of their arrival may change, the data transmitted in packets may be distorted. Various transport layer procedures are used to ensure reliable delivery of transmitted information under these conditions. When transmitting digital data, the TCP protocol (Transmission Control Protocol) is used for this purpose. This protocol provides reliable data delivery and restores the original packet order. If an error is detected in a packet or the packet is lost, the TCP procedures send a retransmission request.

For audio and video conferencing applications, packet delays have a much greater effect on signal quality than individual data distortions. Differences in delays can lead to gaps. Such applications require a different transport layer protocol that provides packet resequencing, delivery with minimum delay, real-time playback at precisely specified moments, traffic type recognition, multicast or two-way communication. Such a protocol is the real-time transport protocol RTP (Real-Time Transport Protocol). This protocol regulates the transmission of multimedia data in packets through the IVS at the transport level and is supplemented by the real-time data transmission control protocol RTCP (Real-Time Control Protocol). The RTCP protocol, in turn, provides control over the delivery of multimedia data, quality of service control, transfer of information about the participants in the current communication session, control and identification, and is sometimes considered part of the RTP protocol.

In many publications on IP telephony, it is noted that most of the network equipment and special software for this technology is developed on the basis of the Recommendation H.323 of the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) (including TAPI 3.0, NetMeeting 2.0, etc.). How does H.323 relate to RTP and RTCP? H.323 is a broad conceptual framework that includes many other standards, each dealing with different aspects of information transfer. Most of these standards, such as audio and video codec standards, have wide application not only in IP-telephony. As for the RTP / RTCP protocols, they form the basis of the H.323 standard, are focused on providing exactly IP technology, and underlie the organization of IP telephony. This article is devoted to the consideration of these protocols.

2. Basic concepts

The RTP real-time transport protocol provides end-to-end real-time transmission of multimedia data such as interactive audio and video. This protocol implements traffic type recognition, packet sequencing, timestamping, and transmission control.

The action of the RTP protocol comes down to assigning a timestamp to each outgoing packet. On the receiving side, packet timestamps indicate in what sequence and with what delays they need to be played back. Support for RTP and RTCP allows the receiving host to arrange the received packets in the proper order, reduce the effect of packet delay jitter on the network on signal quality, and restore synchronization between audio and video so that incoming information can be correctly heard and viewed by users.

Note that RTP itself does not have any mechanism to guarantee timely data transmission and quality of service, but uses underlying services to ensure this. It does not prevent out-of-order packets, but it does not assume that the underlying network is absolutely reliable and transmits packets in the correct sequence. The sequence numbers included in RTP allow the receiver to re-sequence the sender's packets.

The RTP protocol supports both two-way communication and data transfer to a group of destinations if the multicast is supported by the underlying network. RTP is designed to provide the information required individual applications, and in most cases integrated into the application.

Although RTP is considered a transport layer protocol, it usually functions on top of another transport layer protocol, UDP (User Datagram Protocol). Both protocols contribute to the functionality of the transport layer. It should be noted that RTP and RTCP are independent of the underlying transport and network layers, so the RTP/RTCP protocols can be used with other suitable transport protocols.

RTP/RTCP protocol data units are called packets. Packets generated in accordance with the RTP protocol and used to transmit multimedia data are called information packets or data packets (data packets), and packets generated in accordance with the RTCP protocol and used to transmit service information required for reliable teleconferencing are called packets. control or service packets (control packets). An RTP packet includes a fixed header, an optional variable length header extension, and a data field. An RTCP packet starts with a fixed part (similar to the fixed part of RTP information packets) followed by variable length building blocks.

In order for the RTP protocol to be more flexible and applicable to various applications, some of its parameters are intentionally undefined, but it provides for the concept of a profile. Profile (profile) is a set of parameters for RTP and RTCP protocols for a specific class of applications, which determines the features of their functioning. The profile defines the use of individual packet header fields, traffic types, header additions and header extensions, packet types, communication security services and algorithms, features of the use of the underlying protocol, etc. RTP profile for audio and video conferencing with minimal control). Each application usually works with only one profile, and setting the profile type is done by selecting the appropriate application. No explicit indication of profile type by port number, protocol identifier, etc. not provided.

Thus, a complete RTP specification for a particular application must include additional documents, which include a profile description, as well as a traffic format description that defines how a particular type of traffic, such as audio or video, will be processed in RTP.

Features of multimedia data transmission during audio and video conferences are discussed in the following sections.

2.1. Group audio conferencing

Group audio conferencing requires a multi-user group address and two ports. In this case, one port is required for the exchange of audio data, and the other is used for control packets of the RTCP protocol. The group address and port information is sent to the intended teleconference participants. If privacy is required, then the information and control packets may be encrypted as defined in Section 7.1, in which case the encryption key must also be generated and distributed.

The audio conferencing application used by each conference participant sends audio data in small bursts, such as 20 ms. Each piece of audio data is preceded by an RTP header; the RTP header and data are in turn formed (encapsulated) into a UDP packet. The RTP header indicates which type of audio coding (eg, PCM, ADPCM, or LPC) was used to form the data in the packet. This makes it possible to change the coding type during the conference, for example, when a new participant arrives who uses a low bandwidth connection, or during network congestion.

In the Internet, as in other packet-switched data networks, packets are sometimes lost and reordered, and also delayed for various times. To counteract these events, the RTP header contains a timestamp and sequence number that allow receivers to re-timing so that, for example, portions of an audio signal are played continuously by the speaker every 20 ms. This timing reconstruction is performed separately and independently for each source of RTP packets in the teleconference. The sequence number can also be used by the receiver to estimate the number of lost packets.

Since participants in a teleconference can join and leave a teleconference while it is in progress, it is useful to know who is participating in the teleconference. this moment, and how well conference participants receive audio data. For this purpose, each instance of the audio application during the conference periodically issues on the control port (RTCP port) for applications of all other participants, packet reception messages indicating their user name. The receive message indicates how well the current speaker is being heard and can be used to control adaptive encoders. In addition to the username, other identification information for bandwidth control may also be included. When leaving the conference, the site sends an RTCP BYE packet.

2.2. Videoconferencing

If both audio and video signals are used in a teleconference, they are transmitted separately. For the transmission of each type of traffic, regardless of the other, the protocol specification introduces the concept of an RTP session (see the list of abbreviations and terms used). A session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). Packets for each type of traffic are transmitted using two different pairs of UDP ports and/or multicast addresses. There is no direct RTP layer connection between audio and video sessions, except that a user participating in both sessions must use the same canonical name in the RTCP packets for both sessions so that the sessions can be linked.

One reason for this separation is that some conference participants need to be allowed to receive only one type of traffic if they wish to. Despite the separation, synchronous playback of source media data (audio and video) can be achieved using the timing information that is carried in the RTCP packets for both sessions.

2.3. The concept of mixers and translators

Not always all sites have the ability to receive multimedia data in the same format. Consider the case where participants from the same locality are connected via a low speed link to the majority of other conference participants who have broadband network access. Instead of forcing everyone to use a narrower bandwidth and lower quality audio coding, an RTP layer communication facility called a mixer can be placed in a low bandwidth region. This mixer resynchronizes the incoming audio packets to restore the original 20ms intervals, mixes these restored audio streams into a single stream, performs low bandwidth audio encoding, and transmits the packet stream over a low speed link. In this case, packets can be addressed to one recipient or a group of recipients with different addresses. In order for receiving endpoints to provide a correct indication of the source of messages, the RTP header includes means for mixers to identify the sources involved in the formation of the mixed packet.

Some of the participants in the audio conference may be connected by broadband communication lines, but may not be reachable through an IP multicast group conference (IPM). For example, they may be behind an application layer firewall that will not allow any transmission of IP packets. For such cases, not mixers are needed, but a different type of RTP layer communication, called translators. Of the two translators, one is installed outside the firewall and externally forwards all multicast packets received over a secure connection to the other translator installed behind the firewall. The translator behind the firewall broadcasts them again as multicast packets to a multi-user group restricted to the site's internal network.

Mixers and translators can be designed for a number of purposes. Example: A video mixer that scales video images of individuals in independent video streams and composites them into a single video stream, simulating a group scene. Broadcast examples: Connecting a group of IP/UDP-only hosts to a group of ST-II-only hosts, or transcoding video packet by packet from individual sources without retiming or mixing. The details of how mixers and translators work are discussed in Section 5.

2.4. Byte order, alignment, and timestamp format

All fields of RTP/RTCP packets are transmitted over the network in bytes (octets); the most significant byte is transmitted first. All header field data is aligned according to its length . Octets designated as optional have a value of zero.

Absolute time (Wallclock time) in RTP is represented using the NTP (Network Time Protocol) timestamp format, which is a countdown in seconds relative to zero hours on January 1, 1900. The full NTP timestamp format is a 64-bit unsigned fixed-point number with an integer part in the first 32 bits and a fractional part in the last 32 bits. In some fields with a more compact representation, only the middle 32 bits are used - the low 16 bits of the integer part and the high 16 bits of the fractional part.

The next two sections of this article (3 and 4) discuss the packet formats and features of the functioning of the RTP and RTCP protocols, respectively.

3. Data transfer protocol RTP 3.1. Fixed RTP header fields

As noted above, an RTP packet includes a fixed header, an optional variable length header extension, and a data field. The fixed header of RTP protocol packets has the following format: .

The first twelve octets are present in every RTP packet, while the contributing source CSRC (contributing source) identifier field is present only when inserted by the mixer. The fields have the following purposes.

Version (V): 2 bits. This field identifies the RTP version. This article focuses on version 2 of the RTP protocol (value 1 was used in the first draft version of RTP).

Complement (P): 1 bit. If the padding bit is set to one, then the packet at the end contains one or more padding octets that are not part of the traffic. The last padding octet contains an indication of the number of such octets to be subsequently ignored. Padding may be required by some cipher algorithms with fixed block sizes or to carry multiple RTP packets in a single underlying protocol payload.

Extension (X): 1 bit. If the extension bit is set, then the fixed header is followed by a header extension with the format defined in .

CSRC counter (CC): 4 bits. The CSRC counter contains the number of CSRC source identifiers to include (see list of used abbreviations and terms) that follow the fixed header.

Marker (M): 1 bit. The interpretation of the marker is determined by the profile. It is intended to allow significant events (eg video frame boundaries) to be marked in the packet stream. The profile may introduce additional marker bits or determine that no marker bit is present by changing the number of bits in the traffic type field (see ).

Traffic type (PT): 7 bits. This field identifies the format of the RTP traffic and determines how the application will interpret it. A profile defines a default static mapping of PT values and traffic formats. Additional traffic type codes can be defined dynamically via non-RTP facilities. The sender of an RTP packet at any given time emits a single RTP traffic type value; this field is not intended for multiplexing individual media streams (see ).

Sequence number: 16 bits. The sequence number value is incremented by one with each RTP information packet sent and can be used by the receiver to detect lost packets and restore their original sequence. The initial value of the sequence number is chosen randomly to make it difficult to crack the key based on known values of this field (even if the source does not use encryption, since the packets may pass through a relay that uses encryption). Timestamp: 32 bits. The timestamp reflects the sampling time for the first octet in the RTP information packet. The sample time must be derived from a timer that increments monotonically and linearly with time to provide synchronization and jitter detection (see Section 4.3.1). The resolution of the timer should be sufficient for the desired timing accuracy and packet arrival jitter measurement (one timer report per video frame is usually not enough). The timing frequency depends on the format of the transmitted traffic and is set statically in the traffic format profile or specification, or can be set dynamically for traffic formats defined through "non-RTP tools". If RTP packets are generated periodically, then the nominal sampling times determined by the sampling timer should be used, not the values of the system timer. For example, for a fixed rate audio signal, it is desirable that the timestamp encoder be incremented by one for each sample period. If an audio application from an input device reads blocks containing 160 samples, then the timestamp must be incremented by 160 for each such block, regardless of whether the block was transmitted in a packet or dropped as a pause. The initial value of the timestamp, like the initial value of the sequence number, is a random variable. Several consecutive RTP packets may have equal timestamps if they are logically generated at the same time, eg belong to the same video frame. Consecutive RTP packets may contain non-monotone timestamps if the data is not transmitted in sample order, as is the case with interpolated MPEG video frames (however, packet sequence numbers will still be monotonic when transmitted).

SSRC: 32 bits. The SSRC (synchronization source) field identifies the synchronization source (see the list of used abbreviations and terms). This ID is randomly chosen so that no two clock sources within the same RTP session have the same SSRC ID. While the likelihood of multiple origins choosing the same identifier is low, all RTP implementations must be prepared to detect and resolve such collisions. Section 6 discusses the probability of collisions along with a mechanism for resolving them and detecting RTP layer loops based on the uniqueness of the SSRC identifier. If a source changes its original transport address, then it must also choose a new SSRC identifier so that it is not interpreted as a looped source.

CSRC list: 0 to 15 items, 32 bits each. The contributing source (CSRC) list identifies the sources of traffic contained in the packet to include. The number of identifiers is given by the CC field. If there are more than fifteen included sources, then only 15 of them can be identified. CSRC IDs are inserted by mixers when using SSRC IDs for switched sources. For example, for sound packets, the SSRC identifiers of all sources that were mixed when the packet was created are listed in the CSRC list, providing a correct indication of message sources to the recipient.

3.2. RTP sessions

As mentioned above, in accordance with the RTP protocol, different types of traffic must be transmitted separately, in different RTP sessions. A session is defined by a specific pair of destination transport addresses (one network address plus a pair of ports for RTP and RTCP). For example, in a teleconference composed of separately encoded audio and video, each type of traffic needs to be sent in a separate RTP session with its own destination transport address. Audio and video are not expected to be carried in the same RTP session and separated based on traffic type or SSRC fields. Interleaving of packets having Various types traffic but using the same SSRC would cause some problems:

If one of the traffic types changes during a session, there will be no general means to determine which of the old values has been replaced by the new one.

The SSRC identifies a single timing interval value and sequence number space. Interleaving multiple types of traffic would require different synchronization intervals if the clock rates of the different streams differ, and different sequence number spaces to indicate the type of traffic to which the packet loss is related.

The RTCP sender and receiver messages (see Section 4.3) describe only one timing interval value and sequence number space for SSRC and do not carry a traffic type field.

The RTP mixer is not capable of combining interleaved streams of different types of traffic into a single stream.

The transmission of multiple types of traffic in a single RTP session is hampered by the following factors: different network paths or distribution of network resources; receiving a subset of multimedia data when required, such as audio only if the video signal has exceeded the available bandwidth; sink implementations that use separate processes for different types of traffic, while using separate RTP sessions allows for both single and multiple process implementations.

By using different SSRCs for each type of traffic, but sending them in the same RTP session, the first three problems can be avoided, but the last two cannot be avoided. Therefore, the specification of the RTP protocol requires each type of traffic to use its own RTP session.

3.3. Profile-defined RTP header changes

The existing RTP Information Packet header is complete for the set of features required in general for all classes of applications that might support RTP. However, for better adaptation to specific tasks, the header can be modified through modifications or additions defined in the profile specification.

The marker bit and traffic type field carry profile specific information, but are located in a fixed header as many applications are expected to need them. The octet containing these fields may be redefined by the profile to meet different requirements, for example with more or less marker bits. If any marker bits are present, they should be placed in the high-order bits of the octet, since profile-independent monitors may be able to observe a correlation between the packet loss pattern and the marker bit.

Additional information that is required for a particular traffic format (eg video coding type) MUST be carried in the data field of the packet. It can be placed at a certain place at the beginning or inside the data array.

If a particular class of applications needs additional functionality independent of the traffic format, then the profile that those applications operate with must define additional fixed fields to be placed immediately after the SSRC field of the existing fixed header. These applications will be able to quickly access additional fields directly, while profile-independent monitors or recorders will still be able to process RTP packets by interpreting only the first twelve octets.

If it is considered that additional functionality required in general for all profiles, then the a new version RTP to make permanent fixed title change.

3.4. RTP header extension

To allow individual implementations to experiment with new traffic-format-independent features that require additional information to be carried in the information packet header, RTP provides a packet header extension mechanism. This mechanism is designed so that the header extension can be ignored by other cooperating applications that do not require it.

If the X bit in the RTP header is set to one, then a variable length header extension is appended to the fixed RTP header (following the CSRC list, if any). Note that this header extension is for limited use only. The RTP packet header extension has the following format:

The extension contains a 16-bit length field that indicates the number of 32-bit words in it, excluding the four-octet extension header (hence the length can be zero). Only one extension can be added to a fixed RTP information packet header. To allow each of a plurality of cooperating implementations to experiment independently with different header extensions, or to allow a particular implementation to experiment with more than one type of header extension, the use of the first 16 bits of the extension is undefined, left to distinguishing identifiers or parameters. The format of these 16 bits must be determined by the profile specification that the applications are working with.

1999
2000

RTP protocol

The main transport protocol for multimedia applications has become the real-time protocol RTP (Real-Time Protocol), designed to organize the transmission of packets with coded speech signals over an IP network. The transmission of RTP packets is carried out over the UDP protocol, which, in turn, works over IP (Fig. 1.5.).

Rice. 1.5.

In fact, the level to which RTP belongs is not defined as unambiguously as shown in Fig. 1.5 and as it is usually described in the literature. On the one hand, the protocol really works on top of UDP, it is implemented application programs and, by all indications, is an application protocol. But at the same time, as stated at the beginning of this paragraph, RTP provides transport services independently of multimedia applications and is, from this point of view, just a transport protocol. Best definition: RTP is a transport protocol implemented at the application layer.

To transmit voice (multimedia) traffic, RTP uses packets, the structure of which is shown in Fig. 1.6.

An RTP packet consists of at least 12 bytes. The first two bits of the RTP header (version bit field, V) indicate the version of the RTP protocol (currently version 2).

Clearly, with this header structure, only one more RTP version is possible at most. The field following them contains two bits: the P bit, which indicates whether padding characters have been added to the end of the payload field (they are usually added if the transport protocol or encoding algorithm requires the use of fixed-size blocks), and the X bit, which indicates Whether an extended header is being used.

Rice. 1.6.

If used, the first word of the extended header contains the total length of the extension. Further, the four CC bits determine the number of CSRC fields at the end of the RTP header, i.e. the number of sources forming the flow. The marker bit M allows you to mark what the standard defines as significant events, for example, the beginning of a video frame, the beginning of a word in an audio channel, and so on. It is followed by a PT data type field (7 bits), which indicates the payload type code that determines the contents of the payload field - application data (Application Data), for example, uncompressed 8-bit MP3 audio, etc. From this code, the application can learn what to do to decode the data. The rest of the fixed-length header consists of a Sequence Number field, a Time Stamp field to record when the first word of the packet was created, and an SSRC timing source field that identifies this source. The last field can be a single device with only one network address, multiple sources that can represent different media (audio, video, etc.), or different streams of the same media. Since the sources may be different devices, the SSRC identifier is chosen randomly so that the chance of receiving data from two sources at once during an RTP session is minimal. However, a mechanism for resolving conflicts if they arise is also defined. The fixed part of the RTP header can be followed by up to 15 separate 32-bit CSRC fields that identify data sources.

RTP is supported by the Real-Time Transport Control Protocol (RTCP), which generates additional reports containing information about RTP sessions. Recall that neither UDP nor RTP are engaged in providing QoS (Quality of Service). The RTCP protocol provides feedback to senders, and to stream receivers it provides some QoS enhancements, packet information (loss, delay, jitter) and user (application, stream). For flow control, there are two types of reports - generated by senders and generated by recipients. For example, information about the percentage of lost packets and the absolute number of losses allows the sender, when receiving a report, to detect that channel congestion may cause receivers not to receive packet streams that they expected. In this case, the sender has the option to lower the coding rate to reduce congestion and improve reception. The sender report contains information about when the last RTP packet was generated (it includes both an internal timestamp and real time). This information allows the recipient to coordinate and synchronize multiple streams such as video and audio. If the stream is directed to several recipients, then streams of RTCP packets from each of them are organized. This will take steps to limit the bandwidth - inversely proportional to the rate at which RTCP reports are generated and the number of recipients.

It should be noted that although RTCP works separately from RTP, the RTP/UDP/IP chain itself leads to significant overhead (in the form of their headers). The G.729 codec generates packets of 10 bytes (80 bits every 10 ms). One RTP header, 12 bytes in size, is larger than this entire packet. In addition, an 8-byte UDP header and a 20-byte IP header (in IPv4) must be added to it, which creates a header that is four times the size of the transmitted data.

When using the RTP protocol, two ports are opened for communication. One for media streaming (even port number), and one for signaling data ( Feedback for QoS and media flow control) - RTCP. The port number values are not hardcoded, basically they are highly dependent on the application being used.

RTP - Real-time Transport Protocol RTCP - Real-time Control Protocol Optionally includes information about: Packet Loss Buffering "Jitter" Delays Signal Strength Signal Quality Metric (Call Quality Metrics) Echo Return Loss, etc. RTCP XR - Real-time Control Protocol Extended Reports All fields described for RTCP plus: R Factor - Signal quality parameter MOS - Signal quality parameter and others
Packets containing the transmitted voice are transmitted using RTP/RTCP for the protocol that is used for VOIP calls. The RTP protocol can transfer media data identified by parameters that are registered by the organization: "Internet assigned numbers authority" - IANA. They are also used for fields in the protocol that are used in messages.

Some payload field values:

PT	codec name	audio/video (A/V)	clock rate (hz)	number of channels	Document	0	PCMU	A	8000	1	RFC3551	3	GSM	A	8000	1	RFC3551	4	G723	A	8000	1	Kumar	5	DVI4	A	8000	1	RFC3551	6	DVI4	A	16000	1	RFC3551	7	LPC	A	8000	1	RFC3551	8	PCMA	A	8000	1	RFC3551	9	G722	A	8000	1	RFC3551	10	L16	A	44100	2	RFC3551	11	L16	A	44100	1	RFC3551	12	QCELP	A	8000	1	-	13	CN	A	8000	1	RFC3389	14	MPA	A	90000		RFC3551,RFC2250	15	G728	A	8000	1	RFC3551	16	DVI4	A	11025	1	DiPol	17	DVI4	A	22050	1	DiPol	18	G729	A	8000	1		19	reserved	A				20	not assigned	A				21	not assigned	A				22	not assigned	A				23	not assigned	A				24	not assigned	V				25	CelB	V	90000		RFC2029	26	JPEG	V	90000		RFC2435	27	not assigned	V				28	nv	V	90000		RFC3551	29	not assigned	V				30	not assigned	V				31	H261	V	90000		RFC2032	32	MPV	V	90000		RFC2250	33	MP2T	AV	90000		RFC2250	34	H263	V	90000		Zhu	35--71	not assigned				72--76	reserved for RTCP to avoid conflicts				RFC3550	77--95	not assigned				77--95	dynamic			RFC3551

IANA: Registered RTP Protocol Parameters: http://www.iana.org/assignments/rtp-parameters

RTP Protocol and IP Address Translation (NAT) During a VOIP session, two RTP streams are generated, one in each direction. If one of the participants participating in this session uses an IP address from a private network, then the stream from the subscriber located in public network towards the NAT server, will not be able to reach the subscriber located in internal network. To solve this problem, (symmetric RTP) is often used. For additional information about using VOIP in NAT networks, see: NAT and VOIP. Articles RTCP XR measures VoIP performance Network World 11/17/03RFC Docs: IETF RFC 3550 RTP: Transport protocol for real-time applications. IETF RFC 3611 RTP Control Protocol Extended Reports (RTCP XR) IETF RFC 1890 RTP profile for audio and video conferencing with minimal control. IETF RFC 2508 Header compression for IP/UDP/RTP packets for low-speed communication lines. IETF RFC 3545 Enhanced RTP Compression (CRTP) for links with high latency, high packet loss, and frequent data retransmissions.

Thematic materials: