Skip to main content

Posts

Showing posts from February, 2013

Spring an Quartz JobDataMap serialization exception

We dont run our nodes in HA yet and once a customer registers he is assigned a node and he lives there. Problem is that if the node dies we incur a downtime for that customer and also we need to overallocate hardware to prepare for worse case scenario.  for the past 6 months I have been working to making the code stateless so that we can do HA and reduce our node count by 4 times. So we used to run quartz using in memory scheduler but for HA I need to run quartz in a cluster. We chose org.quartz.impl.jdbcjobstore.JobStoreTX for this. Problem was that as soon as I tried it I ran into issues because I was injecting spring beans into our quartz job using JobDataMap and JobStoreTX was trying to serialize the jobData into a table and our spring beans are not serializable.  There were two options: 1) Load the entire applicationContext in each job and read the bean from there. 2) Use the schedulerContextAsMap. After evaluating options I found scheduler context as the best o...

Final nail in BDB coffin

This weekend we would finally put the Final nail in BDB coffin.  We were using BDB in webdav, Cloud file and backup product. Over the course of last 6 months my team was able to remove BDB from webdav and Cloud file and the mysql is scaling fine. We have now billions of rows in mysql and last weekend we had a pilot migration of few backup product nodes.  This weekend we would strive to migrate all the backup nodes. I am hoping we would increase the no of rows in mysql by 30%. Mysql and sharding rocks!!

Being ruthless

Lot of time our system deals with abuse. For e.g. some customer will move the same file between 2 folders the whole day and normally it doesn't cause issues but in some extreme cases it would generates hundreds of thousands of records.  Also there are some customers who have bots written that will make millions of call in a day. Or sometimes some customer will put malware on FTP and use our servers as a way to spread malware, this causes antvirus to flag our site as spammers causing field issues. One of the strategy we use to deal with abuse is to throttle the user for a while but sometimes it hurts good users also. In some cases the abuse is so much that it can bring down the system or hurt other genuine users. Like in case of malware we just block the customer as there is not time to reach the customer and solve the issue, some user might have accidentally share the file but we have to be ruthless.