Skip to main content

Mysql sharding at our company - Part3 (Shard schema)

As discussed in Part2 of the series we do Horizontal sharding for any schemas that will store 100M+ rows. As we are a cloud file server no data is shared between two customers a perfectly isolation can be achieved easily. One year back when I was thinking on designing the schema there were many alternatives

  1. One shard per customer mapped to one database schema : Rejected this idea because mysql stores 1 or more files per table in physical file system and linux file system chokes after some no of files in a folder. We had faced this problem when storing the real files on filers (topic of another blog post).
  2. One shard per customer maped to one set of tables in database schema : This would solve the issue of multiple files in a folder but again it would lead to too many files on the disk and operating system can choke on it. Also we have customers to do a trial for 15 day and never signup, so too much for ops team to manage for these trials.
  3. Many customers in one shard mapped to one database schema: This would solve both issue one and two, but this is again  too many schemas to manage for operations team when they have to setup replication or write any scripts to manage the schemas.
  4. Many customers in one shard mapped to one set of tables in one database schema. : This is the approach we finally ended up picking as it suits both engineering and operations needs.
On each Mysql server we create a set of schemas and within each schema we have a cluster of tables that comprises a shard. To figure out what customer lives in what set of tables we use a master db called a lookup db aka DNS db. Each query first looks up the master db to figure out what shard this customer lives in, this is a highly  read intensive db so we cache this data in memcache. Once we figure out the shard then we lookup what schema and what server this shard lives on. based on that information the application looks up appropriate connection pool in spring and executes the query. This is how the topology looks like at 10000 ft.

This is the structure of our dns db tables

 A typical schema in a shard db has tables with name like
folders_${TBL_SUFFIX}, file_${TBL_SUFFIX}.  Here TBL_SUFFIX is unique within cluster so that shard can be moved easily. To make it unique for now we just append schema_name and table set number to it to it. So let say for schema c1_db1 the tables for shard 10 and 15 would look like

We could have just appended shard_id to the table names also to make then unique but this makes logical and physical mapping hard. Later if we move the shard to a different host all we need to do is move the entire schema containing many shards to a diff host and change metadata db mappings and flush cache and post a message to zookeeper. App nodes listen to zookeeper for such events and refresh connection pools.

Part1 of series

Part2 of series

Part3 of series

Part4 of series 


Popular posts from this blog

RabbitMQ java clients for beginners

Here is a sample of a consumer and producer example for RabbitMQ. The steps are
Download ErlangDownload Rabbit MQ ServerDownload Rabbit MQ Java client jarsCompile and run the below two class and you are done.
This sample create a Durable Exchange, Queue and a Message. You will have to start the consumer first before you start the for the first time.

For more information on AMQP, Exchanges, Queues, read this excellent tutorial
import com.rabbitmq.client.Connection; import com.rabbitmq.client.Channel; import com.rabbitmq.client.*; public class RabbitMQProducer { public static void main(String []args) throws Exception { ConnectionFactory factory = new ConnectionFactory(); factory.setUsername("guest"); factory.setPassword("guest"); factory.setVirtualHost("/"); factory.setHost(""); factory.setPort(5672); Conne…

Spring query timeout or transaction timeout

If you are using spring to manage transactions then you can specify default transaction timeout using

    <bean id="transactionManager"
        <property name="dataSource" ref="dataSource" />
        <property name="defaultTimeout" value="30" /> <!--30 sec--->             

or you can override the timeout in the annotation

    @Transactional(readOnly = false, timeout=30)

or if you are doing it programatic transactions then you can do

DataSourceTransactionManager transactionManager = new DataSourceTransactionManager(dataSource);

 or you can override the timeout for one particular transaction

TransactionTemplate transactionTemplate = new TransactionTemplate();

Python adding pid file

I have a thumbnail generator that launches multiple processes and the correct way to shut it down is to send kill -HUP to the parent process. To automate I had to write a pid file from python, it was a piece of cake
def writePidFile(): pid = str(os.getpid()) f = open('', 'w') f.write(pid) f.close()