Telecom CDRs Data Serialization (with Simulation)

Telecom CDRs Data Serialization (with Simulation)

Part I (CDRs)

In the realm of the telecom industry, any event taking place within a node or system is commonly referred to as a CDR, short for Call Detail Record. Whether it’s a call event, charging record, data session log, customer account creation/deletion log, voucher refill log, or invoicing/payment record, the term has become a sort of slang for any captured log of events within a process or operation. To avoid misconceptions, some resources use the term XDR where X stands for anything. However, for the purposes of this article, we will continue using the term CDR.

CDRs first appeared in 1965 with the introduction of Bell's No. 1 Electronic Switching System (ESS). These records were created to bill long-distance calls, which were expensive and charged based on duration. Early CDRs were very basic, including only caller, called, start time, and duration.


Bell's No. 1 Electronic Switching System

In the 1970s and 1980s, CDRs became more advanced with digital switching systems like Ericsson’s AXE and Siemens’ EWSD. They started capturing more details, such as call type (local, international, or mobile) and switch or trunk information. The growth of mobile networks in the 1980s introduced roaming-specific CDRs, including identifiers like IMSI and cell locations. These records were still processed in batches on mainframes and stored on magnetic tapes, these records helped telecom companies manage billing for mobile and international calls.

Within the evolution of telecom, standards organizations played a crucial role in shaping CDRs as networks expanded. ITU-T influenced early call data recording standards, while ETSI introduced the TAP (Roaming CDRs) format in 1988, standardizing CDRs for mobile network roaming. Later, 3GPP defined detailed CDR formats for 2G, 3G, 4G, and 5G networks, ensuring consistent billing and usage data collection across operators. These efforts facilitated seamless data sharing and billing between telecom companies worldwide.

Nowadays, CDRs are used not only for charging and billing but also for analytics, fraud prevention, revenue assurance, and OSS/BSS operations, playing a critical role in telecom business processes. Operators typically rely on a mix of systems and technologies from multiple vendors, each with different versions and generations, making it vital to establish a standardized framework for exchanging CDRs seamlessly. To address this complexity, 3GPP introduced specification 32.298, which provides a common format and structure for CDRs, covering voice, SMS, data, and IMS services. This standard ensures interoperability across systems and networks globally, enabling efficient data exchange. While most vendors comply with 32.298, they often add proprietary extensions or customizations to enhance functionality, integrate with existing infrastructure, or meet unique operational needs.

These standard specifications do not define the CDRs structure of other types of event within the telecom operation, like those related to voucher refills, offer subscriptions, or adjustments SIM card creation etc... In these cases, vendors have the flexibility to design and implement their own logic and formats for constructing CDRs, allowing them to tailor the records to their specific system requirements and business needs.

CDRs can include hundreds of fields and attributes, ranging from simple data types like integers and strings to complex, hierarchical structures with nested objects and arrays. This flexibility allows CDRs to capture a wide variety of information, and with the growing demand and increasing advancements in telecom systems, CDRs have evolved in both volume and complexity. Despite these advancements, however, CDRs continue to be represented mostly in ASN.1 format, a protocol used to serialize data between systems and nodes, which we will explain further in this article.

Part II (CDRs Representation)

In the context of CDRs, they can be represented in JSON format like the following,


CDR Represented in JSON (Human Readable)

A human-readable and developer-friendly representation, or it can be represented in compacted binary format as shown bellow,


CDR Represented in ASN.1 PER Encoded (Packet Encoding Rules)

Both format represents the same CDR, but in different format, back in the 1980s the computational capability were limited and the network throughput was measured in few kilobytes, therefore ASN.1 (Abstract Syntax Notation Number 1), was the optimum choice back then, which is firstly introduced as a universal way to describe data independently of any specific encoding or communication protocol and it was adopted by the ITU and 3GPP and it’s used in LDAP Protocol and X.509 certificates (used in SSL/TLS),

In contrast, JSON was introduced much later, in 2001, JSON looks prettier and simpler to read for human, made it an instant favorite for web development and modern applications, but the CDRs are not processed by human, the CDRs consumed by the machines, in terabytes of volume per node, and the other thing is JSON takes larger space which mean more resource to store it, and more bandwidth to transfers it,

This is a comparison between only two types of data serialization formats in terms of Efficiency and Performance, which are the most important aspects from a software engineering perspective. However, in the telecom industry, a third factor Standardization is equally significant, whereas mentioned earlier, the CDRs generated within a telecom operator are processed by multiple systems, which may be used for billing, network monitoring, business analytic, or fraud control. Or even Transferred between telecom operator like the case in TAP CDRs for roaming settlement, these systems are often interconnected in a tightly cascading integrated structure, therefore this is why the 3GPP adopted the ASN.1 because of its efficiency performance, and has maintained its relevance through decades of standardization.


An example of systems integration within the telecom operator

The purpose of Standardization is to achieve and facilitate Integral, Alignment, Uniformity, Consistency, and Harmony among all this highly complicated and critical systems functions and components, So imagine you have new upgrade in the Charging and Billing systems and this upgrade includes new version of Charging CDRs structure, this will lead to upgrade or at least to change request CR in all the connected systems and this means cost and time, so is that an easy thing in ASN.1?,,

In summary, our goal is to achieve maximum efficiency and optimal performance, enabling seamless CDR exchange between systems without delays, inaccuracies, or data integrity issues and this will be archived by Standardization. All that will be covered in Part III.

Part III (Data serialization)

Data serialization is the process of converting data into a format that can be easily stored, transmitted, or reconstructed later. This is often necessary when data needs to be sent over a network or saved to a file. The serialized data can be in various formats, from human readable like JSON, XML, and CSV or compressed binary format. Once serialized, the data can be deserialized, or converted back into its original structure,

Ok let’s get back to our previous CDR and view it in XML format it will look like this…


CDR in XML format

And we already previewed the JSON & ASN.1 format so what is the difference? let’s see the difference in terms of size

So If one of your systems generating 100 million CDRs per day and you want to keep these CDRs stored in files for later use, then let’s compare the size in different format,

So it’s obvious that the optimum choice here is to keep these CDRs in ASN.1 format in terms of size and transfer bandwidth over network, and here comes the Efficiency

Let’s go a little deeper in ASN.1, with simpler example than our CDR,

The following JSON represents the data from a survey where a person chose his Favorite Fruit,

Ok how to represent this in ASN.1 format, first we need to specify its structure or schema, as shown below,

As shown the ASN.1 a standard interface description language used to define data structures for efficient data serialization and deserialization,

Line 1 defines a module or schema called FavoriteThing. Everything between BEGIN and END is part of this module.

The FavoriteFruit type is being defined as an enumeration, meaning it represents a list of specific choices,

Therefore, in ASN.1 format our data will be encoded to binary so it will look like this,

Its represented in 3 bytes only. And this graph will show how ASN.1 works in detail,

Of course there is hundreds of resource of how ASN.1 protocol works in the internet, and you can search for, so the point of our example is how we managed to replace the JSON representation to an ASN.1 compacted binary format,

Beside ASN.1 there is other popular protocol and framework to serialize the data into compacted binary format and one of the most common one is Protocol Buffer (Protobuf), is a lightweight, language-neutral data serialization protocol developed by Google and is the core of gRPC framework, and it’s similar to ASN.1 where you should define the Data structure in order to encode the data and be able to store it or transmit it, and the below is the FavoriteFruit Data structure in ProtoBuf,

Similar to ASN.1 in representing our data structure, and below is how the JSON is converted to ProtoBuf binary format

It’s null or nothing, and you’ve to believe it!, so how ?

In Protocol buffer if you decode null with the given FavoriteFruit schema then it will translate that to the default value and the default value in the Protocol schema is the APPLE, so instead of 3 Bytes of ASN.1, now we can store one byte with null value to represent the Favorite Fruit data,

Of course that is not always the case with ProtoBuf where most of the time the encoded data size is similar to that at ASN.1 this means that they both similar from Efficiency perspective, but ProtoBuf exceed the ASN.1 when it comes to the process of serialization and deserialization (the time it takes to complete the process of encoding and decoding the data), here ProtoBuf Performance is way better than ASN.1 serialization process where ProtoBuf is optimized for fast transmission and it’s much simpler than ASN.1, in addition the open source offers highly optimized, actively maintained libraries for a wide range of programming languages compered to limited ASN.1 framework (its performance will be proofed at PART IV),

There is Another Popular Data Serialization framework called Apache Avro which is a data serialization framework used for efficient and compact data storage and exchange. It is particularly popular in big data ecosystems, like those using Apache Hadoop, Apache Kafka, or Apache Spark, it’s similar to ASN.1 and ProtoBuf, but the key difference here is that Avro attaches the data structure to the encoded binary data, so it overcomes the ASN.1 and ProtoBuf in terms of easily integration and facilitates direct alignment between the source and distention, to illustrate that more clearly please find the following graph,

Schema based Encoding and Decoding (ASN.1 & ProtoBuf)

In both ASN.1 and ProtoBuf, encoding and decoding are tightly linked to the schema. If the source schema differs from the target schema, it can lead to mismatches and errors that can affect subsequent processes. In my experience, ASN.1 frameworks are not very good at handling these discrepancies, often causing immediate errors when data doesn’t match the schema. ProtoBuf is better in this regard because it ignores unknown fields and tags. However, Avro surpasses both ASN.1 and ProtoBuf, as its schema is attached to the data itself. This means the destination system can read the new schema instantly, making it more flexible when schema changes occur, and in Telecom systems, this happen often, these concepts are considered in software engineering as backward/forward compatibility, and in telecom are referred to as Standardization,


Avro mechanism of data Encoding & Decoding

Therefore, and before we go to the last part, I hope that you’ve gained enough understanding of our main three metric to measure the Efficiency, Performance, and Compatibility (for Standardization) when it comes to CDRs Data Serialization within the telecom systems.


Part IV (Simulation)

So far we have gone through what are CDRs and how they serialized between different Telecom systems, and we explored some data serialization method like ASN.1, ProtoBuf, and Avro,

And how they differ in terms of Efficiency, Performance and Compatibility or Standardization, in this part we will simulate these 3 protocol over telecom CDRs for experimental purpose, and our measurement will be for the following,

  • Efficiency is the ratio of the original CDR size to its compressed size (The higher the better).
  • Performance is the time consumed to encode and decode the CDRs when using a single CPU core (The Lower The better).
  • Compatibility (Standardization) as illustrated in PART III, they will be in the following order Avro, ProtoBuf, ASN.1.

?Therefore, we will have the following table, and the Compatibility already determined,

The simulation conducted in one virtual machine (Ubuntu20) and all the simulation is conducted using python and the following frameworks,

  • For ASN.1 we used asn1tools library.
  • For ProtoBuf we used google.protobuf library.
  • For Avro we used fastavro library.

Ok now in order to encode and decode CDRs we most have CDRs, obviously we will not use a real CDRs here, also we can’t find a standard 3GPP 32.298 CDRs available for public,

Therefore, I’ve write some python script to generate something similar to the real CDRs in JSON format, and here is a snapshot of it,

Using this script, we generated 150 JSON files each file has between 4,000 & 4,500 CDRs, and with total count of 642,587 CDRs,

The total files size in the directory is shown below,

Starting with ASN.1 encoding/decoding process, and we got the following, consumed time.

Encoded Data Size.

Doing the same with Protocol Buffer,


Finally, Avro

All the used scripts are published at the following repository, https://github.com/abdofrea/data_serialization.git

Now we can fill our table with the obtained results,

It’s Obvious that when it comes to the Efficiency in terms of size compression the three methods are similar or close to each other, However the performance of ASN.1 is very slow compared to Avro and ProtoBuf, where they are very similar or almost identical,

In conclusion, In this article and simulation we highlights why Apache Avro and Protocol Buffers have become widely adopted in modern applications, ranging from APIs like gRPC to big data ecosystems such as Hadoop. Their superior performance, ease of use, and compatibility make them ideal choices for data serialization in diverse use cases. Meanwhile, ASN.1, though efficient in size compression, remains mostly limited to legacy 3GPP standards,

For this article, we leveraged publicly available libraries and frameworks to explore data serialization methods and evaluate potential alternatives to ASN.1 for representing CDRs. While these findings provide valuable insights, there may be additional factors or considerations not fully addressed here, which could further influence the choice of serialization protocols in specific contexts.

Ibrahim Khorwat

Information Security Engineer at Almadar Company

3 个月
回复
Marwan Khalil

Telecommunication Engineer

3 个月

??? ????? ?????? ????? ???????? ????? ????? ????? ???? ?????

要查看或添加评论,请登录

Abdalwahed Frea的更多文章

  • RAFM Lookup Process using different Data representation

    RAFM Lookup Process using different Data representation

    In RAFM, lookup process is often required for various tasks. For example, in some cases, you may need to aggregate…

  • Revenue Assurance & Python

    Revenue Assurance & Python

    Throughout my career as a Revenue Assurance Engineer, I've learned that there is one thing for sure, which is there no…

    2 条评论

社区洞察

其他会员也浏览了