Skip to main content

Posts

From NOSQL to MySql (BDB to mysql) - Part3

We finally migrated 14 nodes this weekend. As there were millions of files to be migrated even after running 5 parallel thread on each node it took close to 3-4 hours per node. We were running 3 nodes at a time otherwise the db load was shooting up high. As we cant afford to have a downtime on weekdays, the migration has to happen on weekend nights. On Friday night it we were up till 6:00 AM and on Saturday night we were up till 4:00 AM. We wanted to do more nodes but the no of files per host was going overboard and I wanted to be conservative to start with, if the mysql servers can handle more then later we would consolidate shards or move them to this host. New mysql hosts are going to be provisioned next week so hoping to migrate more nodes this week.  Monday was a calm day after long time as we chose all nodes that were spiking to be migrated first. Db loads in all DCs are low and even app nodes are low, Surprisingly slave dbs are getting pounded more than master db after las...

Mysql Sharding at our company- Part1

Sharding is a double edged sword, on one hand it allows you to scale your applications with increased growth in customers and on other hand it very hard for junior developers to grasp the concept. I try to abstract and encapsulate as much complexity as I can in the framework. Typically people either do 1. Functional Sharding/Vertical sharding : For huge datasets this will only buy you time. When I joined the company everything was stored in Berkely db or Lucene or in some files on filesystems. I tried moving all the things to Mysql in one project but it became a monster so I started moving pieces of applications out into mysql and moving them to production. This also boosted confidence of team in mysql as more feature went live on Mysql and they saw that Mysql was reliable and scalable. I started with smaller pieces that didn't had >10-20M rows but needs mysql for its ACID properties. Anticipating the growth we decided to create one schema per functionality and avoided joins ...

From NOSQL to MySql (BDB to mysql) - Part2

So we are now in third week of Mysql deployment and slowly we are moving nodes from BDB to Mysql  Results so far have been promising. 1) We had ran into some query optimisation issues with Mysql that I blogged here http://neopatel.blogspot.com/2012/04/mysql-subquery-multiple-columns.html 2) Just ran into a 4 byte unicode issue with Mysql where utf8 with mysql wont support 4 bytes characters and apparently you have to use utf8mb4 for it, how stupid of Mysql.  Will write a separate blog post for it. But as far as performance is concerned its rocking so far. On 20th we did the deployment and as soon as the node went live with Mysql you can see immediate drop in load avg consistently for this week. This is because most of the processing that was done by in memory db on app node is now transferred to Mysql.  But in second graph you would see that Mysql is sitting ducks and its no where close to app node load avg. Now this is with 3 nodes pounding, we would add 10+ nodes to...

Quartz delete/unschedule a job at runtime

I had a requirement to remove a quartz job from a tomcat without restarting it. So best solution is to write a jsp to do this. I had earlier written a Quartz Admin jsp blog so I went and added a Delete button to that jsp and it appears all I need to do is call <%     if("Delete".equals(btnInterrupt)) {          scheduler.deleteJob(jobNameToRun, groupNameToRun); %>     Job <%=jobNameToRun%> Deleted. <%     } %> There is a unschedule  api also but that will only unschedule the job's next execution and I wanted to remove the job.  When tomcat is restarted the job will come back.

Adding Jitter to cache layer

Thundering herd is an issue common to webapp that rely on heavy caching where if lots of items expire at the same time due to a server restart or temporal event, then suddenly lots of calls will go to database at same time. This can even bring down the database in extreme cases. I wont go into much detail but the app need to do two things solve this issue. 1) Add consistent hashing to cache layer : This way when a memcache server is added/removed from the pool, entire cache is not invalidated.  We use memcahe from both python and Java layer and I still have to find a consistent caching solution that is portable across both languages. hash_ring and spymemcached both use different points for server so need to read/test more. 2) Add a jitter to cache or randomise the expiry time: We expire long term cache  records every 8 hours after that key was added and short term cache expiry is 2 hours. As our customers usually comes to work in morning and access the cloud file server it ...

shell file $10 parameter

Normally you don't have a shell file that calls a java program with 10 parameters but I had a migration script had 10 arguments and I kept getting first parameter in the 10th argument as shell was seeing $10 and it was passing first param concatenated with 0. The solution is to enclose all parameters > 9 with curly brackets. so use something like $JAVA_HOME/bin/java -verbose:gc -Duser.timezone=GMT -Dfile.encoding=UTF8 -server -Xms512m -Xmx2048m com.xxx.NoSql2MysqlMigrator $1 $2 $3 $4 $5 $6 $7 $8 $9 ${10} ${11} ${12} instead of $JAVA_HOME/bin/java -verbose:gc -Duser.timezone=GMT -Dfile.encoding=UTF8 -server -Xms512m -Xmx2048m com.xxx.NoSql2MysqlMigrator $1 $2 $3 $4 $5 $6 $7 $8 $9 $10 $11 $12

mysql subquery multiple columns

if you need a subquery to return multiple columns for comparison purpose then its allowed so you can do something like select x from table1 where (a,b)= (select a,b from table2) but if you want something like select (select a,b from table1 where  table1.x=table2.x) fields, table2.y, table2.z from table2 join table3 on table2.aa=table3.bb where table2.xx=10 then mysql wont allow you to select 2 columns in the first subquery. To avoid this you can turn the query into a join query and write a wrapper query on top like select a,b,y,z from (table1 join table2 on table1.x =table2.x) join table3 on table2.aa=table3.bb where table2.xx=10 but I ran into an issue on weekend where table1 had 1.4M rows and table 2 had 40K rows and table3 had 3K rows.  Now joining 1.4M to 44K rows was not coming out of Mysql database even in 10 minutes so this approach was out of question. The final result set size was 44K. So finally the best solution I could find on weekend to fix thi...