Bulk Loading from a Local File System into Snowflake

sujitmeshram

In this tutorial, we will gonna learn how to load data from files in your local file system into a table.

Introduction

In this tutorial, you’ll learn how to:

Create named file format objects to describe your data files.
Create named stage objects.
Upload your data to the internal stages.
Load your data into tables.
Resolve errors in your data files.

Please note that this tutorial covers loading both CSV and JSON data.

Prerequisites

Before starting, ensure you have:

A Snowflake account with a user role that grants necessary privileges.
SnowSQL installed.
Download the sample files provided for this exercise from here : data-load-internal.zip
Unzip the sample files. The tutorial assumes you unpacked files in to the following directories:

Linux/macOS: /tmp/load

Windows: C:\tempload

The data files include sample contact data in CSV and JSON formats.

Creating the Database, Tables, and Warehouse

Execute the following statements to create a database, two tables (for CSV and JSON data), and a virtual warehouse needed for this tutorial:

CREATE OR REPLACE DATABASE mydatabase;

CREATE OR REPLACE TEMPORARY TABLE mycsvtable (
id INTEGER,
last_name STRING,
first_name STRING,
company STRING,
email STRING,
workphone STRING,
cellphone STRING,
streetaddress STRING,
city STRING,
postalcode STRING);

CREATE OR REPLACE TEMPORARY TABLE myjsontable (
json_data VARIANT);

CREATE OR REPLACE WAREHOUSE mywarehouse WITH
WAREHOUSE_SIZE='X-SMALL'
AUTO_SUSPEND = 120
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED=TRUE;

Step 1. Create File Format Objects

Before loading data from a file into a table, describe the file’s format. You can create named file format objects using the CREATE FILE FORMAT command.

For CSV data, create the mycsvformat file format:

CREATE OR REPLACE FILE FORMAT mycsvformat
TYPE = 'CSV'
FIELD_DELIMITER = '|'
SKIP_HEADER = 1;

The setting FIELD_DELIMITER = ‘|’ specifies that the character ‘|’ is used to separate fields. By default, fields are separated by ‘,’.
SKIP_HEADER = 1 indicates that the source file contains a single header line. When using the COPY command to load data, this header line is skipped. The default value is 0.

For JSON data, create the myjsonformat file format:

CREATE OR REPLACE FILE FORMAT myjsonformat
TYPE = 'JSON'
STRIP_OUTER_ARRAY = TRUE;

Note: STRIP_OUTER_ARRAY = TRUE directs the COPY command to exclude the root brackets ([]) when loading data to the table.

Step 2. Create Stage Objects

A stage specifies where data files are stored for loading into a table. Create named stages for the different types of sample data files.

For CSV data files, create the my_csv_stage stage:

CREATE OR REPLACE STAGE my_csv_stage
FILE_FORMAT = mycsvformat;

Note that you don’t need to repeat the FILE_FORMAT option in the COPY command if you’ve already specified it when creating the stage.

For JSON data files, create the my_json_stage stage:

CREATE OR REPLACE STAGE my_json_stage
FILE_FORMAT = myjsonformat;

Step 3. Stage the Data Files

please note that, for staging, I have provided the sample files in prerequisites section and you have to unzip that files to the mentioned temp location as per your operating system.

Staging the CSV Sample Data Files

To upload the CSV files from your local file system, execute the PUT command:

Linux or macOS:
PUT file:///tmp/load/contacts*.csv @my_csv_stage AUTO_COMPRESS=TRUE;

Windows:PUT file://C:\temp\load\contacts*.csv @my_csv_stage AUTO_COMPRESS=TRUE;

Let’s break down the command:

file://<file-path>[/]contacts*.csv specifies the full path and names of the files on your local machine to be staged. File system wildcards are supported.
@my_csv_stage indicates the stage where the data will be uploaded.
AUTO_COMPRESS=TRUE instructs the command to compress the data during staging. This is the default behavior.

Staging the JSON Sample Data Files

To upload the JSON file from your local file system to the named stage, execute the PUT command:

Linux or macOS:
PUT file:///tmp/load/contacts.json @my_json_stage AUTO_COMPRESS=TRUE;

Windows:
PUT file://C:\temp\load/contacts.json @my_json_stage AUTO_COMPRESS=TRUE;

The command returns the following output, showing the staged files:

Listing the Staged Files (Optional)

You can list the staged files using the LIST command:

```
CSV:
LIST @my_csv_stage;
```
Snowflake returns a list of your staged CSV files.
JSON:
```
LIST @my_json_stage;
```
Snowflake returns a list of your staged JSON files.

Step 4: Load Data into Target Tables

To load data into the target tables, use the COPY INTO <table> command to import staged data.

Loading CSV Data

To begin, load data from a specific CSV file (e.g., contacts1.csv.gz). Execute the following command:

COPY INTO mycsvtable FROM @my_csv_stage/contacts1.csv.gz FILE_FORMAT = (FORMAT_NAME = mycsvformat) ON_ERROR = 'skip_file';

Here:

The FROM clause indicates the location of the staged file (combining the stage name and file name).
The ON_ERROR clause specifies that if the COPY command encounters any errors, it should skip the problematic file and continue with the next one. This is just for demonstration; none of the files used in this tutorial contain errors.

Next, load data from the remaining files in the mycsvtable table using pattern matching to target multiple files:

COPY INTO mycsvtable FROM @my_csv_stage FILE_FORMAT = (FORMAT_NAME = mycsvformat) PATTERN='.*contacts[1-5].csv.gz' ON_ERROR = 'skip_file';

Here, the PATTERN clause specifies the regular expression to match filenames (.*contacts[1-5].csv.gz), ensuring all relevant files are loaded.

Note:

contacts1.csv.gz data is ignored as it was already loaded.
Files contacts2.csv.gz, contacts4.csv.gz, and contacts5.csv.gz were loaded successfully.

Loading JSON Data

To load the JSON file contacts.json.gz into the myjsontable table, execute the following command:

COPY INTO myjsontable FROM @my_json_stage/contacts.json.gz FILE_FORMAT = (FORMAT_NAME = myjsonformat) ON_ERROR = 'skip_file';

The command returns a result set indicating the file loaded and related details:

Step 5: Validate the Loaded Data

To confirm that the data has been successfully loaded, execute a SELECT query.

Run the following query to check the contents of the mycsvtable table:

SELECT * FROM mycsvtable;

Verifying JSON Data

Run the following query to check the contents of the myjsontable table:

SELECT * FROM myjsontable;

The query returns the following data:

Step 6: Delete Successfully Copied Data Files

Once you have verified that the data has been successfully copied from the stage into the tables, you can delete the data files from the internal stage to save on storage costs. Use the REMOVE command for this purpose.

To remove CSV files from the stage, use:

REMOVE @my_csv_stage PATTERN='.*.csv.gz';

This command returns the following results:

To remove JSON files from the stage, use:

REMOVE @my_json_stage PATTERN='.*.json.gz';

This command returns the following results:

Step 7: Clean Up

Congratulations on successfully completing the tutorial!

Tutorial Clean Up (optional process)

If you wish to revert your system to its pre-tutorial state, execute the following DROP commands:

DROP DATABASE IF EXISTS mydatabase; 
DROP WAREHOUSE IF EXISTS mywarehouse;

Dropping the database will also automatically delete all associated objects, such as tables.

Conclusion

By following this tutorial, you have learned how to load data from local files into Snowflake tables, handle CSV and JSON formats, and manage staging and file formats. You have also seen how to verify the loaded data and clean up resources to maintain an efficient data environment.

Solutions

Industry

Our thinking

Bulk Loading from a Local File System into Snowflake

sujitmeshram

Table of Contents

Introduction

Prerequisites

Creating the Database, Tables, and Warehouse

Step 1. Create File Format Objects

Step 2. Create Stage Objects

Step 3. Stage the Data Files

Staging the CSV Sample Data Files

Staging the JSON Sample Data Files

Listing the Staged Files (Optional)

Step 4: Load Data into Target Tables

Loading CSV Data

Loading JSON Data

Step 5: Validate the Loaded Data

Verifying JSON Data

Step 6: Delete Successfully Copied Data Files

Step 7: Clean Up

Tutorial Clean Up (optional process)

Conclusion

sujitmeshram

Leave a Comment Cancel Reply

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements