You are a small or mid-size business, either a SaaS company, an e-commerce business, a coaching business, an SMMA agency, or an advertising company, and you have your data is spread across many sources: advertising platforms, CRM, ERP, OLTP databases, email marketing platforms, etc.
You used to manually build your reports using a bunch of Google Sheets or Excel files, but now you have encountered many issues:
Your Excel/Google Sheets are getting large and you can’t even open some of them.
Human error: As these files are manually filled, it leads to many errors and corrupted data.
You are now looking to build an automated, scalable, and simple solution that will:
Ingest all your data from your sources to a single location (data warehouse) on a scheduled basis.
Transform the raw ingested data and build custom reports (by joining data from different sources, aggregating those data, etc.), such as a table to display KPIs across ad data — total spend by ad platform, conversions, etc.
Visualize the data using a simple BI tool.
Let me break down how to build such a solution.
Ingestion
I used to write all my ingestion pipelines using Python, just for the sake of writing code.
With 1, 2, or 3 data sources, it’s fine. Your code can break from time to time, and you will have to adjust it.
But what if you have more than 10 different sources? Or 20?
You will have to hire more people because this is too much for a single person (who also has to manage the transformation part and the visualization part).
Luckily, some SaaS companies have made our lives easier.
Using a visually appealing UI, you can easily connect all your sources to your analytic database/data warehouse in less than 3 hours (including the time to familiarize yourself with the documentation of the tool you chose).
The magic is that even for sources that aren’t available on those tools (like custom APIs), there is a feature I really like on ‘Airbyte’ specifically, called the ‘Connector Builder’.
There are many ingestion tools available out there, and they pretty much offer similar services:
Airbyte — although some people don’t like it, I use it a lot and I’m quite satisfied.
Fivetran — sometimes if you don’t find a connector on Airbyte, you will find it here.
Meltano
DataChannel
Estuary — this tool is really interesting because it allows you to ingest data in real-time (streaming).
Custom Python scripts — sometimes you will have to build custom Python scripts with tools like Airflow or Dagster.
Transformation

Remember, we want the simplest solution possible. The simplest solution means it can be easily managed and understood by many people. And what’s the language that every data professional speaks (or at least should speak)? SQL.
But SQL alone isn’t enough — there is no way to track changes in your code, and you can easily encounter the ‘spaghetti SQL code’ problem, and trust me, you don’t want this type of problem.
That’s why you should use DBT. What is DBT?
DBT = SQL + Jinja Templates
In other words, you will be able to write SQL++ code (you will be able to use loops, variables, lists, use a version control tool like Git, etc..).
Instead of a single 10,000-line SQL query that you would normally write to build a report, you will be able to write 2 or 3 ‘DBT models’ that will be easy to read and maintain.
Using the cloud version of this tool, you can set your DBT models (aka SQL code) to run on a scheduled basis. (Another alternative is using DBT Core + the scheduling feature of GitHub Actions).
The result? Tables in your database automatically refreshed and always up-to-date without having to touch anything.
There are many things to say about this tool, and I tried to keep it short in this part. For sure, there are downsides, but there are also many upsides.
At the end of the day, it depends on how and who uses it.
Business Intelligence and Visualization
Your data are now ingested and transformed on a scheduled basis, and with a simple SELECT * FROM the_report_table_created_with_DBT
, you can see key metrics about your data.
But a good dashboard is always easier to understand than a table with many rows.
For this part, you have a lot of choices:
Looker Studio
Superset
Power BI
Metabase (which I like a lot, easy to deploy and use)
Streamlit
…
Congrats, you have a fully automated data infrastructure with dashboards on the front-end displaying KPIs and updated on a scheduled basis.
Wrap-up
Keep in mind that the design of your data platform is your own responsibility and depends on your specific requirements.
The purpose of this article was to show you how a data and analytics platform for a small or mid-size business, which has basic analytics requirements (no complex ML or AI stuff), can look like.
— Dorian Teffo