Mysql sharding at our company - Part3 (Shard schema)

As discussed in Part2 of the series we do Horizontal sharding for any schemas that will store 100M+ rows. As we are a cloud file server no data is shared between two customers a perfectly isolation can be achieved easily. One year back when I was thinking on designing the schema there were many alternatives

One shard per customer mapped to one database schema : Rejected this idea because mysql stores 1 or more files per table in physical file system and linux file system chokes after some no of files in a folder. We had faced this problem when storing the real files on filers (topic of another blog post).
One shard per customer maped to one set of tables in database schema : This would solve the issue of multiple files in a folder but again it would lead to too many files on the disk and operating system can choke on it. Also we have customers to do a trial for 15 day and never signup, so too much for ops team to manage for these trials.
Many customers in one shard mapped to one database schema: This would solve both issue one and two, but this is again too many schemas to manage for operations team when they have to setup replication or write any scripts to manage the schemas.
Many customers in one shard mapped to one set of tables in one database schema. : This is the approach we finally ended up picking as it suits both engineering and operations needs.

On each Mysql server we create a set of schemas and within each schema we have a cluster of tables that comprises a shard. To figure out what customer lives in what set of tables we use a master db called a lookup db aka DNS db. Each query first looks up the master db to figure out what shard this customer lives in, this is a highly read intensive db so we cache this data in memcache. Once we figure out the shard then we lookup what schema and what server this shard lives on. based on that information the application looks up appropriate connection pool in spring and executes the query. This is how the topology looks like at 10000 ft.

This is the structure of our dns db tables

A typical schema in a shard db has tables with name like
folders_${TBL_SUFFIX}, file_${TBL_SUFFIX}. Here TBL_SUFFIX is unique within cluster so that shard can be moved easily. To make it unique for now we just append schema_name and table set number to it to it. So let say for schema c1_db1 the tables for shard 10 and 15 would look like
folders_c1_db1_t1
folders_c1_db1_t2
files_c1_db1_t1
files_c1_db1_t2

We could have just appended shard_id to the table names also to make then unique but this makes logical and physical mapping hard. Later if we move the shard to a different host all we need to do is move the entire schema containing many shards to a diff host and change metadata db mappings and flush cache and post a message to zookeeper. App nodes listen to zookeeper for such events and refresh connection pools.

Part1 of series

Part2 of series

Part3 of series

Part4 of series

Programming fun at startup

Search This Blog

Mysql sharding at our company - Part3 (Shard schema)

Comments

Post a Comment

Popular posts from this blog

RabbitMQ java clients for beginners

Spring 3.2 quartz 2.1 Jobs added with no trigger must be durable.

Haproxy and tomcat JSESSIONID