abnormal data migration

Cloud storage is a funny and interesting field. Just analyzed one data pattern where one customer sent a 4 TB hard drive and just migrating 1.2TB of it into cloud created 5M files. Doing some rough calculation the avg size came out to be 300KB, which is weird. Digging deep into the system revealed that the customer scanned all his documents into TIF and the avg size ranged from 10KB to 100KB to 300KB. wth.

Also they had a special LFT or loft file that was 4112 bytes and 2M of them were there.

As of right now the sharding approach I had implemented, pins a customer to a shard and that means if we migrate the entire 4TB we would end up with 30M+ files. Life is going to be interesting in next few months.

It seems the solution I did an year back is already reaching limits and I need some other solution to federate a customer data across multiple shards and machines but still able to do a consistent mysql backup and replication and also do a 2 phase commit across multiple servers.

or we put some arbitary limit on no of files in a domain and give them multiple domains to put data into it (which doesnt sound good).

Programming fun at startup

Search This Blog

abnormal data migration

Comments

Post a Comment

Popular posts from this blog

Haproxy and tomcat JSESSIONID

RabbitMQ java clients for beginners

Spring 3.2 quartz 2.1 Jobs added with no trigger must be durable.