Introduction
Operating on the SIP protocol within an Asterisk-based system, it's not uncommon to face challenges in call quality due to various factors. These factors may encompass network connections between end-users and PBXware, as well as the overall connectivity of PBXware itself.
In navigating these scenarios, we recognize the detailed interplay of elements influencing call quality. It's imperative to approach each reported issue with diligence and a holistic perspective. By acknowledging the multifaceted nature of these challenges, we can better equip ourselves to address them effectively.
Through comprehensive analysis, we try to identify and mitigate any impediments impacting the seamless communication experience facilitated by PBXware.
In this document, we will cover the troubleshooting path when it comes to call quality issues and how to approach these issues.
Preliminary Checks
When it comes to Call Quality issues it is important to recognize a pattern when the issue occurs:
Does the issue happen on local calls/via trunk or all calls?
Does the issue happen only to a specific user or multiple users?
Is the issue isolated to one site/network?
Is the issue isolated only to one device?
Is the issue constant or intermittent?
Depending on the answers, we can formulate an image of what could be the issue cause. What can we know depending on the answers:
The issue is isolated only to calls via specific trunk – The issue is most likely (almost certainly) with that specific trunk provider.
The issue happens only to one user – The issue is isolated to that user, however, it should be determined if the issue lies in the Extension’s settings, their device, or their network. This can easily be determined by testing on our end. If we replicate the issue on our network, the issue is with Extension’s settings, however, if we do not replicate the issue, we should instruct the user to try on another device to determine if the issue is with their device or network.
If the issue occurs only on one Tenant where all users share the same network – The issue is most likely within their network and the first step should be rebooting their router and disabling SIP ALG on their router.
Troubleshooting path
The first step in the troubleshooting path is to answer the questions mentioned in the previous section. These steps will help us go in the right direction. Further, it is important to note that call quality issues along with any type of audio issue can not be checked from the written report (CLIR report) as CLIR report only contains pieces of Asterisk output and does not provide SIP flow. Here, we need to acknowledge that SIP trace with RTP is the only way to check the RTP flow and potential packet loss.
If the issue can not be troubleshooted live, a SIP Trace is required. However, it is important to note that if we are using the SNGREP tool to capture a SIP trace, we must use -r flag to capture RTP along with SIP.
First, we check the call and confirm that there is RTP from both ends (which would not be a case when there is no-audio or one-way-audio issue):
Then, we download this SIP trace and analyze it in Wireshark.
Wireshark is a powerful network protocol analyzer widely used by network engineers and administrators for troubleshooting, analysis, and security auditing. With its intuitive graphical interface and extensive protocol support, Wireshark enables users to inspect network traffic in real-time or from captured packet capture (PCAP) files.
When analyzing SIP (Session Initiation Protocol) traffic in Wireshark, one typically focuses on SIP messages exchanged between endpoints and the PBX system. This involves examining the request and response headers, as well as the message bodies, to diagnose call setup issues, codec negotiation problems, and other SIP-related issues.
For RTP (Real-Time Transport Protocol) analysis in Wireshark, the focus shifts to the actual audio/video payload exchanged during a call. Upon loading a PCAP file containing RTP traffic, one can navigate to the "Telephony" menu and select "RTP" to access the RTP stream analysis feature. Here, Wireshark provides comprehensive statistics and metrics regarding packet loss, jitter, and other performance parameters.
To delve deeper into RTP analysis, one can drill down into individual RTP packets to inspect payload types, sequence numbers, timestamps, and other pertinent details. Additionally, Wireshark offers the option to play back captured audio streams, facilitating auditory verification of call quality issues.
In summary, Wireshark serves as an indispensable tool for SIP and RTP analysis, offering insights into both signaling and media aspects of VoIP communication. By leveraging Wireshark's robust feature set and analytical capabilities, network professionals can effectively diagnose and resolve issues impacting call quality and overall VoIP performance.
In the picture above, we can notice that the SIP trace in question contains RTP packets.
To compare, we can see the PCAP below that does not contain RTP packets:
As mentioned, to analyze RTP, we would navigate to Telephony → RTP → RTP Streams:
Here, we can see the Source IP and determine the sender of the stream along with the Destination IP which would let us know who is the receiver of the stream. In each segment, we can see what is the payload which represents the codec that is used in that exchange and with all that said, we can observe the section ‘Lost’ and ‘Jitter’ sections. In the screenshot above, we can notice that there are several packets lost and that in some cases packet loss goes up to 100%.
Furthermore, we can select these entries and analyze them:
Here, we can check Stream 4 which has the biggest loss:
In this scenario, encountering a "wrong sequence number" status in the RTP stream indicates a deviation from the expected sequence of packets. Sequence numbers in RTP packets are used to ensure the correct ordering of data packets during transmission and to detect packet loss or misordering.
In the observed RTP stream, the sequence numbers progress consistently from 10472 to 10495, indicating a continuous sequence of packets. However, at packet 1627, the sequence number reverts to 10240, which is lower than the expected sequence number based on the preceding packets. This discrepancy triggers the "wrong sequence number" status.
This discrepancy typically suggests packet loss or misordering within the network. When packets arrive out of order or are lost during transmission, it disrupts the sequential flow of data and can impact the quality of real-time communication. In this case, the deviation from the expected sequence number likely indicates a loss or misordering of packets, leading to the observed "wrong sequence number" status.
To address this issue, further investigation into the network conditions, including potential congestion, latency, or network errors, may be necessary. Implementing Quality of Service (QoS) mechanisms or optimizing network configurations can help mitigate packet loss and ensure smoother RTP transmission, thereby improving overall call quality.
Additionally, we can examine the codec utilized in these calls, identifying any correlation between problematic calls and specific codecs. If a pattern emerges, where certain codecs coincide with call quality issues, we should consider switching to an alternate codec to assess its impact. Among codecs, G.711 ulaw/alaw is recommended due to its simplicity compared to G.729, a more complex codec.
G.711, specifically ulaw and alaw variants, are widely used in VoIP systems for their simplicity and compatibility across various devices. These codecs offer uncompressed audio transmission, maintaining high audio fidelity and minimal processing overhead. However, they require higher bandwidth compared to more compressed codecs like G.729.
In contrast, G.729 is a compressed codec designed to minimize bandwidth consumption while maintaining acceptable audio quality. It achieves this by employing algorithms that compress audio data, resulting in reduced data size and bandwidth requirements. However, the compression process introduces some level of complexity, potentially leading to higher processing demands on both endpoints.
When troubleshooting call quality issues, opting for G.711 ulaw/alaw can simplify the codec negotiation process and reduce the likelihood of compatibility issues. Moreover, the uncompressed nature of G.711 ensures minimal audio degradation, particularly in scenarios where network conditions may not be optimal.
Conversely, while G.729 conserves bandwidth, its compression techniques may introduce artifacts and degrade audio quality, especially in networks with limited bandwidth or high packet loss. Therefore, G.729 is often preferred in situations where bandwidth conservation is critical, such as congested networks or environments with constrained resources.
In summary, when evaluating codec selection for call quality optimization, prioritizing G.711 ulaw/alaw over G.729 can provide a more straightforward and robust solution, particularly in environments where maintaining audio fidelity and minimizing processing complexity are paramount.
In recent years, Opus has emerged as a versatile codec renowned for its exceptional audio quality and bandwidth efficiency. Opus is particularly well-suited for real-time communication applications, offering robust performance across a wide range of network conditions. Its adaptive bitrate capabilities and support for both speech and music make it a popular choice for VoIP and streaming applications. With Opus, organizations can achieve high-fidelity audio transmission while optimizing bandwidth utilization, ultimately enhancing the overall call quality experience.
In summary, the selection of the appropriate codec involves balancing factors such as audio quality, bandwidth efficiency, and computational complexity. While G.711 (ulaw/alaw) offers simplicity and compatibility, G.729 prioritizes bandwidth efficiency, and Opus excels in both audio quality and bandwidth optimization. By understanding the characteristics of each codec and evaluating their suitability for specific use cases, organizations can make informed decisions to optimize call quality and network performance.
Also, on PBXware, we can enter Asterisk CLI and monitor Live calls (if issue can be replicated) to see if there are any errors reported regarding codecs. Sometimes, there might be issues reported regarding transcoding where a certain codec can not be translated to a required format.
Network checks
When suspecting that the issue lies in the network connectivity on the client’s end, it is crucial to obtain all the needed evidence to support that. The first step is to check the MTR or Traceroute results from PBXware to their site and from their site to PBXware. So, for example, if PBXware IP is 185.59.92.136 and customer’s public IP is 213.91.124.149, on PBXware we would run
mtr 213.91.124.149
When conducting an MTR (My TraceRoute) from PBXware to the end user's public IP address, it's crucial to closely examine each hop along the network path for potential packet loss. This comprehensive analysis allows us to discern any irregularities or disruptions that may occur during data transmission.
During the MTR process, packets traverse through various network devices, including gateways, routers, and service providers, as they journey from the originating host (PBXware) to the end user's destination. Each hop represents a distinct node in the network infrastructure, and any loss encountered at these points can significantly impact the overall quality of communication between PBXware and the end user.
By scrutinizing the MTR results, we gain insight into the performance and reliability of each network segment traversed by the data packets. Any instances of packet loss observed at particular hops warrant further investigation to identify the underlying causes, which could range from network congestion and routing issues to hardware malfunctions or configuration errors.
Furthermore, analyzing the MTR data enables us to pinpoint the precise location and severity of packet loss, facilitating targeted troubleshooting efforts and informed decision-making regarding network optimization and mitigation strategies. Through proactive monitoring and analysis of MTR metrics, we can ensure robust and uninterrupted connectivity between PBXware and end users, thereby enhancing the overall quality and reliability of communication services.
It is important to add that if there is loss only on one hop (e.g. in the middle), that can be ignored because if there were any issues with that specific service provider, it would continue in the following hops as well. For instance:
We can notice a 40% loss on the Telia provider, however, that does not continue further, so this would represent some kind of filtering instead of a issue with network connectivity for that specific provider.
After getting MTR from PBXware to the end-user, we should request an MTR from end-user to PBXware where end-user can use a tool on their PC (mtr tool on Linux, or WinMTR on Windows) to run an MTR to PBXware. E.g.the
PBXware setup
By default, PBXware settings are optimized for high call quality, however, we can slightly modify a few settings that might have impact on the exchanged RTP. Within Protocols section, navigate to RTP tab:
Strict RTP Protection
This will drop RTP packets that do not come from the recognized source of the RTP stream. Strict RTP qualifies RTP packet stream sources before accepting them upon initial connection and when the connection is renegotiated (e.g., transfers and direct media).
Initial connection and renegotiation start a learning mode to qualify stream source addresses. Once Asterisk has recognized a stream it will allow other streams to qualify and replace the current stream for 5 seconds after starting learning mode. Once learning mode completes the current stream is locked in and cannot change until the next renegotiation.
This option is enabled by default.
Strict RTP Probation Interval (seconds)
Number of packets containing consecutive sequence values needed to change the RTP source socket address. This option only comes into play while using strictrtp=yes.
Consider changing this value if RTP packets are dropped from one or both ends after a call is connected.
This option is set to 4 by default.
Opus Adaptive Bitrate
It is used for mobile devices to perform better audio in case of bad network.
Furthermore, if TCP/UDP is used, we should consider switching to TLS in order to improve the overall call quality.
Transport Layer Security (TLS) offers several advantages over UDP (User Datagram Protocol) and TCP (Transmission Control Protocol) when it comes to VoIP (Voice over Internet Protocol) calls, ultimately contributing to better call quality in various ways:
Reliability: Unlike UDP, which is connectionless and lacks built-in error correction mechanisms, TLS operates over TCP, which offers reliable, connection-oriented communication. TCP ensures the orderly delivery of data packets, re-transmitting lost or corrupted packets and maintaining packet sequencing. This reliability minimizes the risk of packet loss, jitter, and out-of-order delivery, thereby improving call quality and reducing disruptions or distortions during conversations.
Quality of Service (QoS) Management: TLS can leverage existing network infrastructure features, such as Quality of Service (QoS) mechanisms, to prioritize VoIP traffic and ensure optimal call quality. By assigning appropriate priority levels to TLS-encrypted VoIP packets, network administrators can mitigate latency, jitter, and packet loss, thereby enhancing the overall user experience and maintaining consistent call quality, even in congested or bandwidth-constrained networks.
Overall, TLS offers a comprehensive solution for securing and optimizing VoIP calls, addressing key concerns related to security, reliability, compatibility, QoS management, and regulatory compliance. By leveraging TLS encryption in VoIP deployments, organizations can mitigate risks, enhance privacy, and deliver superior call quality, thereby fostering trust and satisfaction among users.
When using the Communicator/gloCOM application, we can enable QoS which allows us to track the Quality of Service on the application itself and determine the potential issue with network connectivity. More about QoS within Communicator in this article.
SERVERware checks
If you run VPS-s on a SERVERware level, and notice call quality issues among multiple VPS-s on one SERVERware infrastructure, network connectivity should be checked on SERVERware itself. Here, a network inspection should be done, which includes checking the physical connections along with running software checks (MTR, Traceroute).
Additionally, please check CPU usage and resources in general, as high CPU load can cause call quality issues.
Recognizing and Categorizing Symptoms of Voice Quality Problems
There are different types of noise that can occur on VoIP calls and they can be categorized. Some of the most common ones are mentioned below:
Clicking
Symptom - Clicking is an external sound similar to a knock that is inserted usually at intervals.
Cause - Clock slips or other digital errors are common causes.
Crackling
Symptom - Crackling is an irregular form of very light static, similar to the sound a fire makes.
Cause - A common cause is poor electrical connections, in particular poor cable connections. Other causes are electrical interference and a defective power supply on the phone.
Tunnel Voice
Symptom - Tunnel voice is similar to talking in a tunnel or on a poor quality mobile phone car kit.
Cause - A common cause is tight echo with some loss. For example, 10 ms delay and 50 percent loss on the echo signal.
Choppy Voice
Symptom - Choppy voice describes the sound when there are gaps in the voice. Syllables appear to be dropped or badly delayed in a start and stop fashion.
Cause - Common causes are consecutive packets that are lost or excessively delayed, such that DSP predictive insertion cannot be used and silence is inserted instead. For example, delay inserted into a call through contention caused by a large data packets.
…………………...