First packet classification in an encrypted world

By Sebastian Müller
Published on: 06.07.2021

Reading time: ( words)

Categories: DPI technology, AI, ML, DL & encrypted traffic intelligence

#first packet classification, #real-time traffic classification, #caching

Five wooden cubes in a row. The first one is red to symbolize first packet classification.

With proliferating encryption, network management faces new challenges. One such challenge is the delivery of application awareness. With encryption, deep packet inspection (DPI), which is often used to identify underlying applications, must engage more advanced techniques to deliver this application visibility.

First packet classification

First packet classification or first packet inspection is a method for reliably classifying network traffic using the first packet instead of waiting for 3 to 5 packets before the underlying application is identified. First packet inspection is important for network operators, SD-WAN vendors, SASE providers and anyone relying on real-time traffic classification, as it helps them implement network management and security rules from the initial packet, allowing for a consistent treatment of packets across the entire application.

First packet classification utilizes caching, a method in which data from prior classifications is made readily available in the application cache and is matched against the first packet of a flow to identify the underlying application instantaneously. There are two types of caching – DNS caching and service caching. DNS caching reads the hostname from the domain name system (DNS) query and stores the provided IP addresses from the corresponding DNS response. Service caching, on the other hand, identifies a packet’s underlying application or service through the DPI engine and then caches its IP address with the corresponding application or service. Any packet with the same IP address is immediately recognized and no longer requires processing through additional DPI algorithms.

Newer encryption technologies

Newer encryption technologies, such as TLS 1.3, DNS over HTTPs (DOH) and DNS over TLS, unfortunately limit DNS caching as the relationship between IP addresses and their respective domain names and applications becomes impossible to establish, resulting in limited cache information. DPI software, which is able to identify traffic encrypted with more commonly used encryption technologies, for example, HTTPs, TLS 1.2 and SSL, will no longer be able to do so for future encrypted traffic. This will inevitably result in first packet inspection becoming ineffective, or not applicable altogether, especially when two or more of these new encryption technologies are used.

Service caching also has its weaknesses. The use of proxy servers and content delivery networks (CDNs) may result in false positives as the IP addresses of certain applications are concealed and therefore cannot be reliably matched to the correct applications or services. The same applies to obfuscated traffic, where traffic payloads are disguised as something entirely different to enable the packets to be sent via protocols that are otherwise not supported by the network or to escape network security policies, as in the case of domain fronting, data tunneling and randomization. In this case, the classification information provided by DPI can end up delivering misleading information about the underlying traffic.

Machine learning and deep learning to the rescue

Our DPI engine R&S^®PACE 2 is widely deployed across various traffic and network security functions, providing network traffic visibility via application classification and metadata extraction. To do so, R&S^®PACE 2 employs three major methods: pattern matching, behavioral analysis and statistical/heuristic analyses. Pattern matching involves matching network traffic with thousands of verified application and protocol signatures from the signature library of R&S^®PACE 2, which is updated weekly. Behavioral analysis means analyzing the size, order and frequency of a flow’s packets coupled with information on the subscriber and host. Statistical/heuristic analysis, on the other hand, means identifying wider traffic attributes such as the entropy of a flow by calculating statistical measures such as the mean and median across behavioral indicators. Using the above techniques, R&S^®PACE 2 filters each IP flow to determine the underlying application and wider attributes such as speeds, latency and jitter.

The challenges introduced with new encryption technologies, the use of proxies and CDNs as well as the prevalence of traffic obfuscation across modern networks call for techniques that can future-proof deep packet inspection given the limitations inherent in caching-based first packet classification. Responding to this, R&S^®PACE 2 is enhancing its encrypted traffic intelligence (ETI), which combines traditional packet analysis methods with deep learning (DL) and machine learning (ML). DL and ML work hand in hand to identify deeper correlations between various traffic attributes ranging from packet and flow patterns to bandwidth and speeds or the frequency and entropy of a flow. These are used to deduce the underlying application accurately even when packets are encrypted or obfuscated.

This helps in identifying subsequent flows on the network. With DL and ML, the identification information is dynamically updated to reflect changes in IP addresses, flow and packet attributes as well as updates in service level attributes.

Amping up ML & DL capabilities

In order to achieve this and to ensure that our next-generation DPI engine R&S^®PACE 2 continues to deliver reliable, accurate, real-time classification of network traffic across newer and emerging encryption technologies, ipoque has established research collaborations with major universities on the classification of IP traffic that has been encrypted via TLS 1.3, DoH and ESNI or obfuscated via domain fronting. Such research also extends into advanced techniques to circumvent obfuscation, for example through the use of self-learning network management. In addition to these, researchers at ipoque are deploying advanced statistical and classical ML, high-dimensional data analysis and DL to continuously detect and identify the latest encryption and obfuscation technologies as well as other traffic masking methods.

Visualization of advanced DPI methods and advanced chaching methods of R&S PACE2

The advancements in ML/DL are expected to set R&S^®PACE 2 ahead of its competitors with deep intelligence on encrypted traffic acquired by continuously analyzing traffic over long periods of time and across terabytes of modern network traffic data. These advancements will build on top of traditional DPI methods as well as caching techniques to ensure that our DPI technology as well as first packet classification work reliably and accurately. With R&S^®PACE 2, network operators and providers, security solution providers and even analytics vendors can focus on network performance and security without being hindered by the need to continuously upgrade their DPI software to cater for new encryption technologies as well as new and emerging traffic masking techniques that render traditional DPI methods ineffective. With the right DPI software, application awareness can be delivered seamlessly for any type of traffic in any part of the network at all times, now and in the future.

Download our first packet classification or encrypted traffic intelligence datasheet to discover how ipoque can empower your products with high-performing network traffic analytics.

Sebastian Müller

Sebastian is a passionate DPI thought leader guiding a cross-functional team to build the networks of the future with leading traffic analytics capabilities. He has over ten years of dedicated experience in the telecom and cybersecurity domain, providing him with deep understanding of market requirements and customer needs. When he’s not at work, you can either find him on his road bike or hiking in the mountains.

Email: Seb.Mueller@rohde-schwarz.com