Skip to main content

Mysql sharding at our company - Part3 (Shard schema)

As discussed in Part2 of the series we do Horizontal sharding for any schemas that will store 100M+ rows. As we are a cloud file server no data is shared between two customers a perfectly isolation can be achieved easily. One year back when I was thinking on designing the schema there were many alternatives

  1. One shard per customer mapped to one database schema : Rejected this idea because mysql stores 1 or more files per table in physical file system and linux file system chokes after some no of files in a folder. We had faced this problem when storing the real files on filers (topic of another blog post).
  2. One shard per customer maped to one set of tables in database schema : This would solve the issue of multiple files in a folder but again it would lead to too many files on the disk and operating system can choke on it. Also we have customers to do a trial for 15 day and never signup, so too much for ops team to manage for these trials.
  3. Many customers in one shard mapped to one database schema: This would solve both issue one and two, but this is again  too many schemas to manage for operations team when they have to setup replication or write any scripts to manage the schemas.
  4. Many customers in one shard mapped to one set of tables in one database schema. : This is the approach we finally ended up picking as it suits both engineering and operations needs.
On each Mysql server we create a set of schemas and within each schema we have a cluster of tables that comprises a shard. To figure out what customer lives in what set of tables we use a master db called a lookup db aka DNS db. Each query first looks up the master db to figure out what shard this customer lives in, this is a highly  read intensive db so we cache this data in memcache. Once we figure out the shard then we lookup what schema and what server this shard lives on. based on that information the application looks up appropriate connection pool in spring and executes the query. This is how the topology looks like at 10000 ft.


This is the structure of our dns db tables

 A typical schema in a shard db has tables with name like
folders_${TBL_SUFFIX}, file_${TBL_SUFFIX}.  Here TBL_SUFFIX is unique within cluster so that shard can be moved easily. To make it unique for now we just append schema_name and table set number to it to it. So let say for schema c1_db1 the tables for shard 10 and 15 would look like
folders_c1_db1_t1
folders_c1_db1_t2
files_c1_db1_t1
files_c1_db1_t2

We could have just appended shard_id to the table names also to make then unique but this makes logical and physical mapping hard. Later if we move the shard to a different host all we need to do is move the entire schema containing many shards to a diff host and change metadata db mappings and flush cache and post a message to zookeeper. App nodes listen to zookeeper for such events and refresh connection pools.

Part1 of series

Part2 of series

Part3 of series

Part4 of series 

Comments

Popular posts from this blog

Haproxy and tomcat JSESSIONID

One of the biggest problems I have been trying to solve at our startup is to put our tomcat nodes in HA mode. Right now if a customer comes, he lands on to a node and remains there forever. This has two major issues: 1) We have to overprovision each node with ability to handle worse case capacity. 2) If two or three high profile customers lands on to same node then we need to move them manually. 3) We need to cut over new nodes and we already have over 100+ nodes.  Its a pain managing these nodes and I waste lot of my time in chasing node specific issues. I loath when I know I have to chase this env issue. I really hate human intervention as if it were up to me I would just automate thing and just enjoy the fruits of automation and spend quality time on major issues rather than mundane task,call me lazy but thats a good quality. So Finally now I am at a stage where I can put nodes behing HAProxy in QA env. today we were testing the HA config and first problem I immediately

Adding Jitter to cache layer

Thundering herd is an issue common to webapp that rely on heavy caching where if lots of items expire at the same time due to a server restart or temporal event, then suddenly lots of calls will go to database at same time. This can even bring down the database in extreme cases. I wont go into much detail but the app need to do two things solve this issue. 1) Add consistent hashing to cache layer : This way when a memcache server is added/removed from the pool, entire cache is not invalidated.  We use memcahe from both python and Java layer and I still have to find a consistent caching solution that is portable across both languages. hash_ring and spymemcached both use different points for server so need to read/test more. 2) Add a jitter to cache or randomise the expiry time: We expire long term cache  records every 8 hours after that key was added and short term cache expiry is 2 hours. As our customers usually comes to work in morning and access the cloud file server it can happe

Spring 3.2 quartz 2.1 Jobs added with no trigger must be durable.

I am trying to enable HA on nodes and in that process I found that in a two test node setup a job that has a frequency of 10 sec was running into deadlock. So I tried upgrading from Quartz 1.8 to 2.1 by following the migration guide but I ran into an exception that says "Jobs added with no trigger must be durable.". After looking into spring and Quartz code I figured out that now Quartz is more strict and earlier the scheduler.addJob had a replace parameter which if passed to true would skip the durable check, in latest quartz this is fixed but spring hasnt caught up to this. So what do you do, well I jsut inherited the factory and set durability to true and use that public class DurableJobDetailFactoryBean extends JobDetailFactoryBean {     public DurableJobDetailFactoryBean() {         setDurability(true);     } } and used this instead of JobDetailFactoryBean in the spring bean definition     <bean id="restoreJob" class="com.xxx.infrastructure.quar