Snowflake is a cloud-based data warehousing platform built on top of AWS. It can run on Azure, AWS, and Google cloud platforms. Contrary to other data warehouse solutions, Snowflake is easy to set up, faster, and very flexible.
No hardware or software infrastructure is required to run Snowflake. It completely runs on a public cloud. Snowflake has unbeatable architecture and extensive data sharing capabilities. Snowflake architecture delivers concurrency, performance, and elasticity. It handles full optimization, resource management, authentication, data protection, availability, and configuration.
Benefits of Snowflake Cloud Data Warehouse
- High performance and speed: With its built-in elasticity, you can load data faster or run high volumes of queries whenever necessary by simply scaling your stack up or down, thus ensuring that you pay only for what you use.
- Supports a wide variety of tools: Several available tools such as Power Bi, Tableau, etc., can help you run queries even against large data sets.
- Effortless data sharing: Snowflake’s simple and intuitive architecture enables seamless data sharing for any customer.
- Supports multiple data formats: Snowflake is a versatile cloud-based platform that supports various file formats, including XML, JSON, and several others. Not only does it import data from any kind of structured or semi-structured users supply – but it can also handle various unstructured data types as well within one cohesive system if needed.
- Scalability: Snowflake delivers instant data warehouse scaling for handling concurrency bottlenecks during peak periods. It can be scaled without the need to redistribute data which can be a significant disruption to end-users.
Introductory tutorial: Getting Started with Snowflake
Before you get started, make sure to install Snowflake on your platform(Azure, AWS, pr Google cloud. You can click here to get detailed instructions.
1. Login to SnowSQL
To login to SnowSQL, open a terminal window and start SnowSQL in the command prompt.
Type: $ snowsql -a <account_identifier> -u <user_name>
Explanation of command:
- <account_identifier>: This is a unique identifier for your SnowSQL account.
- <user_name>: This your login name for SnowSQL
Next, enter your password for your SnowSQL account when prompted by SnowSQL.
2. Create Snowflake Objects
Before loading data, you need to create a database and a table. Create a database named sf-test using the create database command:
create or replace database sf_test;
Every database you create in Snowflake has a default schema named public. You can use the following function to view your created database and schema:
select current_database(), current_schema();
To make a table named emp_table in sf_test.public use the create table command:
create or replace table emp_table ( first_name string , last_name string , email string , streetaddress string , city string , start_date date );
Creating a virtual warehouse
You can create an X-small warehouse named sf_test_wh using the create warehouse command:
create or replace warehouse sf_test_wh with warehouse_size='X-SMALL' auto_suspend = 180 auto_resume = true initially_suspended=true;
Here the warehouse is set to auto-resume. Therefore, it will automatically start running when you execute your 1st SQL command requiring to compute resources.
You can use your created warehouse using the following function:
3. Staging the data files
Snowflake enables you to load data from files that have previously been staged in the internal stage (Snowflake) or external stage (Google Cloud Storage, Amazon S3, or Microsoft Azure).
To stage files, execute PUT to upload your local data files to the table stage provided for emp_table.
Since the command is OS specific use separate command as per your OS:
- Linux or macOS
put file:///tmp/employees0*.csv @sf_test.public.%emp_table;
put file://c:\temp\employees0*.csv @sf_test.public.%emp_table;
Explanation of command:
- File: This specifies the entire directory path and names of files to be staged that are present on your local system.
- @<namespace>.%<table_name>: this indicates using the stage for the specified table. In this case, it will be emp_table
4. Copying data to the target table
For copying data into the target table, execute COPY INTO<TABLE>
Now this command needs an active and running warehouse, if you don’t have it create one as follows:
copy into emp_table from @%emp_table file_format = (type = csv field_optionally_enclosed_by='"') pattern = '.*employees0[1-5].csv.gz' on_error = 'skip_file';
Explanation of command:
- From: this clause identifies internal stage location.
- File_format: this specifies the file type as CSV.
- Pattern: It applies pattern matching for loading data from all the files that match the expression .*employees0[1-5].csv.gz
- On_error: this command specifies what has to be done when there is an error in the files.
5. Querying the loaded data
To query the data loaded in emp_table table, you can use standard SQL and any supported operators or functions.
To query all data, use select * from emp_table;
To insert additional rows, use:
insert into emp_table values (‘Clementin’,’Adam’,’cadamou@sf_test.com’,’10510 Sakh Road’,’Klnak’,’2018-9-25′) , (‘Marlow’,’De Anes’,’madamouc@sf_test.co.uk’,’36768 North Plaza’,’Fagshen’,’2018-1-27′);
To query rows based on email addresses use:
select email from emp_table where email like ‘%.in’;
To query rows based on start date use:
select first_name, last_name, dateadd(‘day’,90,start_date) from emp_table where start_date <= ‘2017-01-02’;
Snowflake is a cloud-native data warehouse service that helps businesses forecast future growth and improve their performance. This tutorial covered the basics of Snowflake and how you can get started with Snowflake. If you’re interested in learning more about Snowflake, we encourage you to check out the Snowflake Training Program.