Guides
June 29, 2024

Selecting a Vendor

Key Factors to Consider When Choosing an ETL Vendor

Gauge your size

The ETL community is comprised of vendors serving all sizes of data and uses cases along the spectrum of volume and complexity.  You can find micro-sized ETL apps that only work with single rows of data from single data source apps, point solutions that process only one type of data set or output really well, all the way to Enterprise-level ETL apps that replicate your business data in enormous data warehouses with inconceivably high capacities like Snowflake, often requiring large, expensive IT teams to manage.

A data warehouse is a centralized repository of integrated and structured data from various sources within an organization. It is designed to support business intelligence (BI) activities, such as reporting, analysis, and decision-making. The data in a data warehouse is typically historical and subject-oriented, meaning it is organized around specific subjects or areas of interest for the business.

For most small and medium sized teams, a popular spreadsheet app will suffice.

  • Airtable
  • Coda
  • Excel Online
  • Google Sheets
  • Smartsheet

Some of these spreadsheets are limited by sizes.

If your data set is, or is about to grow to into several million cells, consider using an Enterprise-grade cloud storage drive or a data warehouse instead.

Extension or Not

Some ETL apps function directly within the spreadsheet app interface and others operate externally on their own domains.

A tiny sidebar can be stifling for integration management

If you want a full-screen mobile-ready sync management dashboard without being confined to a miniature sidebar inside your spreadsheet app, consider using an ETL app that functions outside of your spreadsheet app.

If you plan to eventually switch from a spreadsheet app to a larger (going beyond millions of cells) warehouse, to BigQuery, Azure, Amazon S3 or Google Cloud Storage for example, an externally functioning ETL app, instead of an extension, can reduce the migration time considerably.

In many cases your data may be synced to more than one warehouse, for example to both Google Sheets and Google Cloud Storage, in which case an externally functioning ETL app is the obvious choice.

A full-screen integration app is a better user experience



Avoid row-syncers

It is not advisable to use a row-syncer, that relies on triggers, to replicate business data for business intelligence.

The reason is that when new records are added to your data warehouse, they may be duplicates of existing records. Worst of all, your existing records in your warehouse will never change. They will become stale records of no use to your dashboard viewers. You can attempt to setup row-level mappings with your row-syncer but it's probably not worth the effort for large data sets beyond 10 rows. Appended rows will eventually breach the spreadsheet app's cell limit, which is why a filtered or date-limited data set within the context of a comprehensive bulk ETL solution is preferable to a row-syncer.

Time Considerations

Decide how much time and money do you want to spend on your Business Intelligence / analytics / dashboard project.

The fundamental question you should ask yourself is:

How much time do I really want to spend on this project?

Are you part of a multi-staff effort with complex requirements involving data mapping and cleansing, or are you a solo-professional or a non-technical staff that needs a dashboard ready in a day?

Avoiding Common Traps for Small Teams

  • Is unraveling the intricacies of an "Application Programming Interface" appealing to you?
  • Does the term "JSON" spark your curiosity?
  • Are you interested in dedicating time to learn about "Pagination"?
  • How comfortable are you tackling OAuth and Bearer tokens?
  • Do you have the patience to delve into details such as "arrays" or "objects" in your business data?
  • Does "Base64" seem like an exciting topic?
  • Would you like digging into concepts like "Rate Limiting" or "Chunking?"

If you're enthusiastic to quickly progress past the ETL phase of your project, it's advisable not to opt for a do-your-own-API-configuration tool, often found in spreadsheet applications as extensions or add-ons. Understanding APIs can become an infinite labyrinth due to their complexity and the infrequent adherence to standards. The substantial ownership cost of an API configuration is often hidden, a factor that must be taken into account in your budgeting process.

Instead, look for a turnkey ETL vendor that removes all of the technical complexity from your life, and just replicates your data in a format ready for consumption downstream analytics applications.

(Turnkey definition from Oxford Languages: of or involving the provision of a complete product or service that is ready for immediate use. "turnkey systems for telecommunications customers")

Recent blog