As discussed in Part2 of the series we do Horizontal sharding for
any schemas that will store 100M+ rows. As we are a cloud file server
no data is shared between two customers a perfectly isolation can be
achieved easily. One year back when I was thinking on designing the
schema there were many alternatives
This is the structure of our dns db tables
A typical schema in a shard db has tables with name like
folders_${TBL_SUFFIX}, file_${TBL_SUFFIX}. Here TBL_SUFFIX is unique within cluster so that shard can be moved easily. To make it unique for now we just append schema_name and table set number to it to it. So let say for schema c1_db1 the tables for shard 10 and 15 would look like
folders_c1_db1_t1
folders_c1_db1_t2
files_c1_db1_t1
files_c1_db1_t2
We could have just appended shard_id to the table names also to make then unique but this makes logical and physical mapping hard. Later if we move the shard to a different host all we need to do is move the entire schema containing many shards to a diff host and change metadata db mappings and flush cache and post a message to zookeeper. App nodes listen to zookeeper for such events and refresh connection pools.
Part1 of series
Part2 of series
Part3 of series
Part4 of series
- One shard per customer mapped to one database schema : Rejected this idea because mysql stores 1 or more files per table in physical file system and linux file system chokes after some no of files in a folder. We had faced this problem when storing the real files on filers (topic of another blog post).
- One shard per customer maped to one set of tables in database schema : This would solve the issue of multiple files in a folder but again it would lead to too many files on the disk and operating system can choke on it. Also we have customers to do a trial for 15 day and never signup, so too much for ops team to manage for these trials.
- Many customers in one shard mapped to one database schema: This would solve both issue one and two, but this is again too many schemas to manage for operations team when they have to setup replication or write any scripts to manage the schemas.
- Many customers in one shard mapped to one set of tables in one database schema. : This is the approach we finally ended up picking as it suits both engineering and operations needs.
This is the structure of our dns db tables
folders_${TBL_SUFFIX}, file_${TBL_SUFFIX}. Here TBL_SUFFIX is unique within cluster so that shard can be moved easily. To make it unique for now we just append schema_name and table set number to it to it. So let say for schema c1_db1 the tables for shard 10 and 15 would look like
folders_c1_db1_t1
folders_c1_db1_t2
files_c1_db1_t1
files_c1_db1_t2
We could have just appended shard_id to the table names also to make then unique but this makes logical and physical mapping hard. Later if we move the shard to a different host all we need to do is move the entire schema containing many shards to a diff host and change metadata db mappings and flush cache and post a message to zookeeper. App nodes listen to zookeeper for such events and refresh connection pools.
Part1 of series
Part2 of series
Part3 of series
Part4 of series
Comments
Post a Comment