At Localytics, one of the tools we use for data processing is the Snowflake data warehouse. We connect to Snowflake in a couple different ways, but our main data retrieval application is a Ruby on Rails API. To accomplish this we use a combination of unixODBC (an open-source implementation of the ODBC standard), Snowflake’s ODBC driver, and our own ODBC ActiveRecord adapter for Ruby on Rails. This sequence of tools allows us to take full advantage of ActiveRecord’s query generation and general ease-of-use while still enjoying all the benefits of a fully cloud-enabled data warehouse such as Snowflake.
ODBC
First, a bit of background on the ODBC standard. ODBC is a common interface through which you can connect to multiple backend databases in the same manner. In this way it enables users to write code now and maintain the ability to migrate later, while also mitigating the pain of learning each DBMS’s idiosyncrasies. You connect to a data store through an ODBC adapter, which implements the ODBC interface for that specific DBMS.
For example, the following code will execute the query SELECT id, name FROM users
on any database without you needing to make changes to the code, just by passing in a data store name (DSN) as the first command-line argument to this script.
DSNs are a string of key-value pairs representing the connection configuration. They correspond to an entry in an odbc.ini
file that you configure. You can then reference the configured DSN using an implementation of ODBC (e.g., unixODBC) to connect to an ODBC DBMS like Snowflake. For example, in your odbc.ini
file you might have:
[LocalyticsProductionSnowflake]
Driver = SnowflakeDSIIDriver;
Locale = en-US;
Server = yoursnowflakeaccount.snowflakecomputing.com;
Port = 443;
Account = yoursnowflakeaccount;
Database = PRODUCTION;
Schema = PRODUCTION;
Warehouse = QUERY_WH;
Role = QUERY;
SSL = on;
Query_Timeout = 270;
uid = ...;
pwd = ...;
The configuration above operates under the assumption that you’ve previously installed the adapter for each type of DBMS to which you’re attempting to connect.
Installation
Installing unixODBC
is relatively straightforward on *NIX-based machines (on Windows ODBC actually ships with the OS by default). Run whichever package manager your machine uses (e.g., brew
, apt-get
, yum
, etc.) to install unixodbc
and unixodbc-dev
(to get the headers needing for linking). Fortunately Snowflake provides great documentation on how to handle the Snowflake-specific steps of getting ODBC set up, so follow those instructions as well.
Once you do, make sure to take full advantage of the isql
utility that comes with unixODBC
, as it can be invaluable for debugging. isql
will drop you into an SQL terminal connected to any given DSN; for example:
[17:38:30] ~ $ isql LocalyticsProductionSnowflake
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL> SELECT COUNT(*) FROM fact_events WHERE app_name = ‘Localytics Test’;
+----------+
| COUNT(*) |
+----------+
| 226975 |
+----------+
SQLRowCount returns 1
1 rows fetched
SQL>
odbc_adapter
Once you’re comfortably set up with unixODBC
and Snowflake’s adapter, you can configure your Ruby on Rails app to connect to Snowflake like you would any other data store. First, add the odbc_adapter
gem to your Gemfile
like so:
gem ‘odbc_adapter’, ‘~> 5.0.3’
Then run bundle install
to download the gem to your system. (Note that the major and minor version of the gem are linked to the dependent Rails version, so if your app is not yet running Rails 5.0.x
, you’ll need to specify 4.2.3
or 3.2.0
). Then, edit your config/database.yml
to specify the Snowflake connection for a given environment, like so:
snowflake:
adapter: odbc
dsn: LocalyticsProductionSnowflake
This tells Rails to use those connection settings when running in production mode. The final step is to register Snowflake as a valid connection option within the odbc_adapter
gem. By default, odbc_adapter
ships with support for MySQL
and PostgreSQL
. Fortunately, it also ships with the ability to register you own adapters as well. To accomplish this, add the following code to an initializer, e.g. config/initializers/odbc.rb
:
This code does a couple of things. It tells the odbc_adapter
gem that if when ODBC reports back the connected DBMS’s type it matches the /snowflake/
regex, to use the subsequent block to create a class to act as the adapter. We’re then using the PostgreSQL adapter as the superclass, because the syntax is close enough so as it work. Finally, it handles the Snowflake-specific setup of turning off prepared statements, quoting column names correctly, and forcing strings to come back in UTF-8 encoding.
ActiveRecord
Once you’ve configured the odbc_adapter
gem, you can take advantage of it by connecting your models to that connection. First, create a model that corresponds to a table in your Snowflake schema. For instance, in our production schema we have a table called fact_events
. Second, call establish_connection
to tell ActiveRecord
to connect to the correct database configuration from database.yml
. For example:
class FactEvent < ApplicationRecord
establish_connection(:snowflake)
end
Note that if all of your models are going to be reading and writing from Snowflake for a given environment (development, production, etc.) then you can name the connection after the environment and the establish_connection
call becomes unnecessary. With these models in place, you can perform any of the normal ActiveRecord
queries.
Roadmap / OSS opportunities
This configuration works for us, and we’ve been happily running this code in production since January of 2017. That being said, there are still a couple of things that we’d like to build into our adapter to make it even better.
Out-of-the-box Snowflake support
Currently, every project that uses Snowflake needs the initializer mentioned above because the odbc_adapter
gem doesn’t come with Snowflake support baked in. At the moment subclassing the PostgreSQL adapter works for us, but we’d like to fully support Snowflake’s driver so that we can take advantage of some of the more advanced UDF capabilities that Snowflake has to offer.
Rails 5.1
The latest version of Rails was recently released, so in order to upgrade our applications we need to go through and ensure that our adapter works with all of the new capabilities of the latest version of ActiveRecord.
Prepared statements
Our adapter supports prepared statements for the PostgreSQL adapter, but it’s explicitly turned off for MySQL and Snowflake. We’d like to take advantage of caching prepared statements to cut down on memory allocations and generally improve performance by enabling it for these two adapters.
Wrapping up
Snowflake is a great option for a cloud-based data warehouse, and solved a lot of problems that we’ve had with previous solutions to the problem of storing massive amounts of data. By being ODBC compliant, it enables us to connect using all of our favorite tools with minimal setup. If you also would like to use Snowflake with Ruby on Rails, feel free to install our odbc_adapter
gem and give it a shot. When you do please share your experience, approach, and any feedback in a gist, on a blog, or in the comments.