Transforming Data
June 29, 2024

Importance of Data Enrichment

How the best ETL apps enrich data after it is extracted

Custom Fields Enrichment

Many business apps offer custom fields which are often used to capture specific business data that either doesn't fit into the app's existing data fields or is just unique to that particular business.

As an example, take Copper CRM:

Custom fields can be added to Leads, People, Companies, Opportunities, Project and Tasks.

In a Lead record, assume one wants to add a custom field that is an array of 2 options:  [1] Home Owner or [2] Home Renter.

In the context of ETL, Copper CRM by default will simply output an identifier to represent the value for this custom field in a particular Lead record.  For example:

{ "custom_field_definition_id": 100764, "value": 2 }

Both the definition_id and value are useless to a human user downstream trying to understand this data.  Thus, an ETL solution must enrich this data by automatically looking up the definitions of custom fields and replacing both raw values (2) and their field labels ("definition_id") with enriched values:

{ "Property Status": "Renter" }

Only the best data extraction apps do this enrichment automatically.

Do-it-yourself data extraction apps will always miss this, since a multi-step API request process fine tuned for data enrichment is required.

Timestamp Enrichment

Moving onto timestamp/datestamp enrichment, in data analytics, timestamps often appear as an unintelligible Unix format. This is important in keeping track of the order and period when events happen but makes it hard to interpret by an average person.

Example:  A Unix timestamp, also known as a Unix epoch, is a system for tracking time that stems from computing history. It refers to the total number of seconds that have elapsed since the Unix Epoch, which started at 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds. For instance, the Unix timestamp "1629122288" signifies the moment of "16th August 2021 12:38:08 GMT".

In a practical scenario, you might receive a raw API response that includes a Unix timestamp. For instance, the response could appear as follows:

{
 "name": "John Doe",
 "age": 30,
 "timestamp": 1629122288
}


In this example, the Unix timestamp (1629122288) represents the last activity of the user "John Doe".

When it comes to setting up analytics, a Unix timestamp, such as 1629122288, can present challenges for human users. While this timestamp precisely logs the last activity of a user, its format is incompatible with human cognition. Essentially, Unix timestamps are a system of time measurement used by computers, marking the number of seconds that have passed since the 'epoch' - the start of 1970 in Coordinated Universal Time (UTC). This choice of format naturally results in a large and somewhat obtuse number that holds little inherent meaning for a human analyst.

For those setting up analytics, it's crucial to have a clear, real-world understanding of the time frames they're working with. A Unix timestamp doesn't provide this intuitively. Instead, it necessitates conversion processes to translate it into a more interpretable format, such as a standard date and time. This additional step can introduce errors and complications, slowing down the analytics setup and complicating the interpretation of the data.

The task of data enrichment here involves converting these unix timestamps into human-readable date and time formats. This process increases the understandability of data, allowing easier identification of trends over time, and enhances the ability to visualize the data effectively. So when your data has a clear timestamp, you'll be able to keep tracking and analyzing on your fingertips.

For instance, a Unix timestamp like '1615891632' translates into a more human-readable form such as '2021-03-16 09:27:12'.

Report-Formatted Data Enrichment

Accounting reports for example, are not in inherently ready to be read by business intelligence or analytics apps, they need to be reformatted into tabular data.

Data is invariably the lifeblood of any organization. The more actionable insights a company can derive from its data, the more strategic decisions it can make towards growth. However, data isn't inherently useful in its raw form. In specific cases, applications output data in a format that's designed for human consumption, a perfect example being accounting reports. Although these reports are ready to be interpreted by professionals in their respective fields, they pose a challenge when it comes to analysis by business intelligence (BI) applications.

Why is this so? Business intelligence apps are built on the premise of interpreting data in tabular form, where information is organized into rows and columns. When presented with data structured in report format, BI apps struggle to decode it. They do not understand the text and cannot make sense of the structure. Ultimately, this results in a failure to extract meaningful insights, defeating the whole purpose of having such technological tools in the first place.

The solution to this predicament lies in data enrichment, specifically report-formatted data enrichment. This process involves converting the report format into a more digestible, tabular format that can be easily understood and analyzed by BI apps. Through data enrichment, businesses enhance the value of their data, enabling their BI tools to provide more accurate, meaningful insights for strategic decision-making.

Recent blog