StitchData is a no-code platform that simplifies the ETL process. In this post we are going to cover setting up StitchData for the first time and getting your first extract, transform, and load process running smoothly.
In this example, we are focusing on moving data from a Google Sheet over to Google BigQuery. This is a great example of how a business might be scaling up and starting to expand beyond the capabilities of spreadsheets and into databases. If you are unfamiliar with BigQuery, it is a cloud-based data warehouse that utilizes SQL. Databases like this have quite a few advantages over traditional spreadsheets. To name a few, you can leverage far larger datasets, they enforce stricter organization and structure, reduce the risk of becoming compromised or creating accidental errors, and allow for greater collaboration with more accessibility.
StitchData allows just about anyone to move data because it requires no coding and has straightforward instructions. You can follow along with the walkthrough in this post by using the free trial found on their website. All of the site links will be provided in the steps associated with them.
Before you begin, you need to make sure you have a game plan. Step one is to go through your sheet and make sure the data is clean and ready for the ETL process. Cleaning data will involve looking through your dataset and properly formatting data types, removing duplicates, correcting incomplete or inconsistent data, and generally smoothing out the dataset. The data I’m working with can be found at this link.
https://www.kaggle.com/datasets/harlfoxem/housesalesprediction
Now that your data is set up, we need to decide on what destination we are going to send it to. Again, for this case, we will be sending it to Google BigQuery.
It’s time to jump into StitchData and set up your account.
For the configuration of Google Sheets, all you need to do is decide on a name that will be transferred over to BigQuery, link your sheet ID, set your historical data sync date, set a replication frequency, and choose what tables/columns you want. There are details about these choices included in the next steps.
At this point, your integration should look similar to this.
StitchData will now connect to your source, and if successful, you will be moved to a screen where you can select what data you want to work with. StitchData is highly useful in this aspect because you can select exactly what you are wanting to put into a table at your destination.
If you need further assistance, you can view this guide. https://www.stitchdata.com/docs/integrations/saas/google-sheets
Your integration is now complete so it’s time to configure the warehouse, Google BigQuery.
Now you can select the destination you want to use. For the demo, we are going to select BigQuery. Within the form that loads you will see there are a couple of steps in BigQuery that need to be completed for StitchData to load properly.
At this point, it might seem like you’re good to go, however, for the load to run properly you need to set up a couple of permissions to give administrator rights.
The service account is now mostly set up, so click done.
We can move back over to StitchData and in the connection window, fill out the information, upload your key, and click done.
If you need further assistance, you can view this guide.
https://www.stitchdata.com/docs/destinations/google-bigquery/v2/connecting-google-bigquery-to-stitch
Your destination configuration is now complete, so let’s run the first extract and load.
Everything is set up and it’s time to see if it works.
This page will show you everything you need to know about the extraction process, when they were run, and the logs to tell you if it was successful or any errors that were encountered. Depending on the size of your data it might take a little while for it to complete. Once it finishes, the in-progress notation will disappear.
Finally, we can move over to GCP one last time and confirm that it was a success.
For now, we can just focus on the sheet you want to work with and run a test query on it.
If you have confirmed that your data is correct, you are good to go and your integration is complete!
Congratulations, you have gotten started with your first StitchData ETL integration! Now you can take this concept and apply it to whatever business case you might have. All in all, StitchData is a very intuitive and powerful tool so there’s not much to complain about. However, I’ll touch on a few key points I found noteworthy.
In addition to StitchData, here are some competitors you might want to check out. Coupler.io, Fivetran, Hevo Data, and Integrate.io. Compare their features and find out what is going to work best for you and your organization!
Cassandra.Link is a knowledge base that we created for all things Apache Cassandra. Our goal with Cassandra.Link was to not only fill the gap of Planet Cassandra but to bring the Cassandra community together. Feel free to reach out if you wish to collaborate with us on this project in any capacity.
We are a technology company that specializes in building business platforms. If you have any questions about the tools discussed in this post or about any of our services, feel free to send us an email!
Subscribe to our monthly newsletter below and never miss the latest Cassandra and data engineering news!