Cloud storage is a funny and interesting field. Just analyzed one data pattern where one customer sent a 4 TB hard drive and just migrating 1.2TB of it into cloud created 5M files. Doing some rough calculation the avg size came out to be 300KB, which is weird. Digging deep into the system revealed that the customer scanned all his documents into TIF and the avg size ranged from 10KB to 100KB to 300KB. wth.
Also they had a special LFT or loft file that was 4112 bytes and 2M of them were there.
As of right now the sharding approach I had implemented, pins a customer to a shard and that means if we migrate the entire 4TB we would end up with 30M+ files. Life is going to be interesting in next few months.
It seems the solution I did an year back is already reaching limits and I need some other solution to federate a customer data across multiple shards and machines but still able to do a consistent mysql backup and replication and also do a 2 phase commit across multiple servers.
or we put some arbitary limit on no of files in a domain and give them multiple domains to put data into it (which doesnt sound good).
Also they had a special LFT or loft file that was 4112 bytes and 2M of them were there.
As of right now the sharding approach I had implemented, pins a customer to a shard and that means if we migrate the entire 4TB we would end up with 30M+ files. Life is going to be interesting in next few months.
It seems the solution I did an year back is already reaching limits and I need some other solution to federate a customer data across multiple shards and machines but still able to do a consistent mysql backup and replication and also do a 2 phase commit across multiple servers.
or we put some arbitary limit on no of files in a domain and give them multiple domains to put data into it (which doesnt sound good).
Comments
Post a Comment