Skip to main content


Showing posts from 2012

Making life easy at mysql command line

I was annoyed with these same problems especially paging query results  at mysql command line or grepping within results or recording queries for future purpose. For all those debugging productions issues related to mysql  here is an excellent blog from perconna to make your life easy

CGlib enhancer and finalize method

Yesterday was a bad day because one of the node after updating to tomcat7 ran into full garbage collection. I took a heap dump and finally found that one of the enhanced class had 300K references hanging around in finalizer thread. I was  enhancing SimpelJDBCTemplate to find out time taken by each query and log it. The reason this happened because CGLib also enhanced the finalize method and in the method interceptor I was  delegating the call to the delegate object which was not existing at that time. Anyways the solution was to skip the enhancing of finalize by adding a callback filter and nooping on it.

As you can see in below code the filter returns 0 if finalize method is called and that means use the callback at 0th position in the callbacks which is a NoOp callback and for all others it uses 1st position which is my real code. 

The simple code fix was

    public static SimpleJdbcTemplate createPerfInterceptedTemplate(Class callerClass, DataSource dataSource) {
        final Simp…

Tomcat7 session fixation and session listener

Ran into an interesting issue. We use flash uploader in our web ui for browsers that dont support html5 and flash has a problem that each request it makes to server has a new session because it wont send any cookies back to server. The only way to circumvent around this is to send original sessionId as a post parameter and on server cache all the sessions in tomcat memory and then join this new session to the original session using sessionid coming in post.

Anyways long story short we updated to tomcat7 and suddenly one of our feature that allows us to impersonate a user broke.  Finally nailed it to  a security fix in tomcat7 that will renew sessionId on basic authentication but the issue is that to do flash based file upload we relied on HttpSessionListener.sessionCreated to cache all sessions by sessionId.  And when  tomcat7 was renewing sessionId  it was not calling  the sessionCreated event for the new session. There were two ways to solve it:

1) Disable session fixation security …

Biggest relief from NOSQL to SQL migration

This year my biggest accomplishment was to move our old NOSQL system from BDB/Cassandra to Mysql and so far its holding billions of rows and working fine.  Though this move has given me and my buddy peace and good sleep and I can now focus on other fires. But the biggest relief comes from being able to delegate some tasks to junior team and also being able to quickly script adhoc requirements quickly.

For e.g. today I got an adhoc requirement to find list of customers with > 1000 versions of a single file.  Had it been BDB I would have to write a program and then run it on each app node to find the answer and it would have taken days to get this info.  But with mysql all I had to do was to write a script that will execute a federated query and get me the output so all I need to do is run something like

nohup python "select pid, max(cnt) from (select customerid,file_id,count(version_id) cnt from \${SCHEMA_NAME}.version_\${TBL_SUFFIX} group by customer_id,fi…

Got first mysql table with 84M records

wow this is the first time I had scaled a database with 84M records in one table. Though I didn't expected it to grow this big but a weird customer behaviour where he had 17K versions of a file and he moved it 5K times. 17K version * 5K move=85M events generated in one shard in one table.

Within a month these will get purged as we retain last one month event, so table size will go down but it feels good that the system is behaving nice after adding the missing index on this 85M table, today its again sleeping like a baby.

A copy paste mistake can bring down a server

we recently migrated our eventstore on Mysql and I did a small boo boo.  I had two tables event and event_details and I had created two indexes on it

CREATE INDEX events_${TBL_SUFFIX}_i2 ON events_${TBL_SUFFIX} (event_detail_event_guid, pid);

CREATE INDEX event_details_${TBL_SUFFIX}_i1 ON events_${TBL_SUFFIX} (event_detail_event_guid, pid);

As you can notice the boo boo in second index Instead of creating it on event_details table I created the same index on event table :(.

Yesterday night a shard event_detail table balloned to 85M records and there were 20 threads doing a full table scan on this table. 

so I fixed the copy paste mistake and generated ddls for each shard but the mysql server kept going Out of memory everytime it tried creating index on this 85M record table.  Ultimately the only way to get it done was to start mysql on diff port so no one will connect  to it and then give innodb more memory and create index, reset innodb settings back and then restart the server.

All …

solr making AND as default operator

I had a requirement to make all optional parameters in query to match i.e. make all of them mandatory. We didnt wanted to write parsing logic and add + sign in front of each expresion.

It seems edismax was the answer and you have to just set mm to 100%. All you need to do from solrj is

        SolrQuery solrQuery = new SolrQuery();
        solrQuery.set("defType", "edismax");
        solrQuery.set("mm", "100%");

I also had this requirement to match use  user's input against 5-6 fields. There are two solutions:

1) user copyFields in schema.xml and append everything to a field called as "textDump" and then you can make this as the default field.

2) use dismax parser.

I chose to use dismax because copyFields will increase the index size to almost double and solr performance is directly proportional to index size.

Hurray dismax as it neatly solved both my requirements:).

Throwing more Hardware vs developer time in tuning on a single db

Last weekend I read two interesting articles on scaling up vs scaling out and then I read Jeff atwood's .  By the way I am a big fan of Jeff Atwood and if you guys haven't read him you should start reading him :).

But at our company we planned for a scale out model because being a startup sometimes management wont order hardware worth 100-200K or more in one shot and also if you have a table with 1B rows and you are doing agile programming then you are ought to build something and throw it in prod and then refine it. This can sometime lead to alter tables and data migration. While scaling up is good, doing alter table to add a column with default value on even 100M row table will incur significant downtime and we cant afford that. So when I designed our metadata db we chose to scale out.  As of now we are sto…

hierarchical locking using mysql database

We are a cloud file server company and one of the requirements we have is to be able to hierarchically lock the paths i.e. for a particula customer if one user is trying to move a path /shared/marketing/Dallas then no one should be allowed to add/delete files/folders in Dallas and its children, also no one should be able to make any edits to shared or marketing folder.  But users should be able to add files to any sibling hierarchies like /Shared/Engineering/test or /private/kpatel or /Shared/QA.

so the requirement is to have an api where I can do

try {
lockManger.lockPaths("/Shared/source", "/Shared/sbc", "/Shared/
Do some long running task (may be 1msec to 10 minutes).
} finally {

I evaluated some alternatives so here is the summary :
1) Apache commons has this locking api with class org.apache.commons.transaction.locking.DefaultHierarchicalLockManager
and it has a method called as lockIn…

User perception using elevator joke

I am attending a startup lab workshop by Steve souders and he makes a very interesting note about user perception. I mean I had read about this in his book but he explains it much better using this joke. So it starts like:

An Apartment complex has two elevators and tenants are complaining about longer waits times and slow elevators so the owner calls a civil engineer and asks him what it takes to fix the issue. The civil engineer replies we need to add few more elevators and it would take 5 Million dollars and we have to close the complex for 6 months.

The owner was like hmm so he called an computer engineer and asks him what it takes to fix the issue. The computer engineer was like hmm i would need to first monitor the pattern over a time as to when and how people are using the elevator write and come up with some AI algorithms to optimize the solution.

The owner was like hmm so he called a systems engineer and he was like hmm all we need are TVs. The owner was like what? So the syst…

Should you join a startup?

You should if you are passionate about learning new things and going through the pains of scaling the team and inventing something new.  But there are some cons also:

1) your healthcare benefits may suck. So if your wife has good benefits then this may be a moot point.

2) Your HR may suck, you might have no or non existent HR and sometimes small things can take an ample amount of things to be done. Like getting a document for your greencard process can take 2-3 months so dont join a startup if you are waiting for greencard as you can run into complications if the startup closes shop.

3) Startup may be demanding on you but at other times it can be a lot flexible also.

So if point 1 and 2 are not of concern then yes offcourse you should join it.

Brother MFC 490CW unable to clean 46

Day before yesterday after printing one page it started giving "unable to clean 46" and then I thought its an ink issue, as the yellow ink had ran out. I ordered new ink and plugged it in but it again gave same crap. Finally found this blog post with detailed steps and solution 2 worked for me

Thanks to whoever wrote this post.

Exposing jmx mbeans via spring for tomcat7

I needed to expose some jmx mbeans in tomcat for nagios monitoring and was reading and other things but it seems its a PITA to write an expose a bean. This didnt sounded right because I wanted to make it easy for junior developers to expose jmx monitors and not deal with all this complexity. Then I landed onto and voila the life became easy.  It took me some time to read and understand this but exposing jmx was 5 minutes.

So all I need to do is

1) create a file spring_jmx_mbeans.xml and add

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns=""
    xmlns:xsi="" xmlns:aop=""

SLF4j logging exception with template params and stacktrace

I learn a new thing today.

I knew  that slf4j had two logger methods

org.slf4j.Logger.error(String, Object[])
org.slf4j.Logger.error(String, Throwable)

now If i wanted to call

        logger.error("This is {} message {}", "test" ,"to kp" ,e);

then I was under the impression that it would not print the stack trace as it would use the signature with object array.

But I was wrong :). It seems log4j has a nice trick and it would print the trace if the last argument is an exception object, a fellow programmer told me this that slf4j already handles this .

No java programmers in Africa or south america?

huh was checking the stats for my blog and no hits from africa or south america. so either this is a bug in google analytics or very few people are programming in java in Africa or south america.

log4j TCP SyslogAppender

The current SyslogAppender in log4j uses UDP to transmit the logs to syslog server. It potentially has risks of losing data and our ops guy was looking for a TCP based appender and ran into this and plugged it in. Then something weird happen as an appnode would keep running out of file handles and other app node ran into issue where all threads were stuck. The implementation at this link is buggy so I rewrote this and publishing here in case anyone is interested.

 * TCP appender to syslog. This class uses a blocking queue with 10K message capacity and any requests beyond that would be rejected.
 * The append method from all caller threads inserts the message into blocking queue and there is a single background thread that logs to the syslog.
 * This complex queueing is introduced to relieve the user thread as soon as possible.
public class Syslog4jTCPAppender extends Syslog4jAppender {
    private static final l…

Dire need for centralized logging

We have HA in some components of the application and any request can go to any of the tomcats in Data centre. The problem is that lately I have been asked to chase a lot of customer issues and its a PITA to grep logs across all machines and make a time line and then figure out what's going on.  What we need is to log to some central server in addition to local box and then all these debugging things can be grepped here.  We are finally picking logstash for this purpose because you can do lucene like search queries on customer name (we append customer name in thread). Also we generate a unique requestId for all requests coming to data centre which gets passed through many components. We would put all those logs in logstash so we can see how the request flows through all components. We would keep 1-2 months of logs else it would be tons of logs.

We are finally picking for this but runnin…

Bcrypt and slow tests

We use Bcrypt hash to encrypt user's password. One of the strength of bcrypt is that its inherently slow to compute the hash and that makes the brute force breaking of password almost impossible.  But I recently ran into an interesting issue where almost overnight the build times on jenkins blow up from 12 mins to 22 mins and the build engineer complained about bulk user testcase as the culprit.

I ran it locally and found that the class had close to 15 tests and each test method was taking 20 sec and all I see is in setup we were creating 5 users and deleting 5 users in teardown.  So as usual best way to debug a performance problem is to look at the thread dump. I did a ps -ef on the running build and did a kill -QUIT to print the thread dump and immediately I saw time spent in Bcrypt computation. I took 3-4 more thread dump and all of them were stuck on bcrypt.  So I did two things:

1) I saw many methods in the test class that were testing pure validations and didnt needed the cr…

Ubuntu 12.04 Toshiba satellite L875D keyboard, mouse issues on login

I just bought a new laptop and then tried installing ubuntu 12.04 on it and ran into all sorts of issues from first wireless not working and then keyboard or mouse not working after installing the software updates.

The keyboard and mouse works fine on initial listing of operating systems but when the login prompt comes I cant enter anything on the screen. I was like WTF man.  Googling around gave me an idea of pluggin in an external mouse and usb keyboard and it worked. So I was even more frustrated.  Finally after combining things from various posts I figured out that I have to change grub2 config and add some more settings to it.

Open a terminal and enter

sudo vi /etc/default/grub

and then change the line GRUB_CMDLINE_LINUX="" to GRUB_CMDLINE_LINUX="i8042.nomux=1 i8042.reset"

then run

sudo update-grub

reboot the system and all issues were gone.

Brother MFC-490cw unable to print 70

My wife is studying as last year college student and one of her friend knows that I am a software engineer so yesterday she called my wife that the printer was not working.  Its funny I asked her to do a video chat and bring laptop near the printer and show me the issue.

First issue was that she had recently changed the ink so I asked her to take out the cartridge and show me and it seems she hasnt pulled the plastic wrapper in front of ink nozzle.

Now even after removing it the print was not working so I asked her to put something on scanner tray and try to copy it and again it was same issue "Unable to print 70".

It seems real issue was paper stuck issue near the cartridge and I asked her to remove it and volla the problem was solved.

Funnies part of the story was that I was going ROFL when I found unable to print 70 was related  to paper stuck issue, being an engineer I can feel the pain of a non IT user who would not know how the heck to connect unable to print 70 to pap…

Orion skyquest finder scope alignment

I got my orion skyquest XT6 telescope from craigslist and on first good day of viewing I was able to see moon craters and saturn but  even though I would see the moon in cross hair intersection of finder scope I will have to literally spend 1-2 mins trying to bring it in the main eyepiece. And man I was frustrated as moon is still big but saturn took me almost 5 mins to bring it in eyepiece. Something was not right and I remembered the owner had said to collimate the telescope. So I used this youtube video  to collimate the telescope next day and again in the night same frustration something wasnt right.

So I remembered that in order to  bring it home I had to disassemble the finder scope, tube and base in order to transfer it safely in my car and when putting it back I had tightened some screws on the finder scope and voila that was it.  I read the manual it seems that as shown in the image the finder scope mirror need to be aligned to be in…

Got Orion skyquest Xt6 telescope

My friend recently brought Orion skyquest Xt8 and I saw moon with it and boy I was hooked on to it. So I had to get a one but I wasnt sure if I would really use it or not so I saw got a used one for $75 from craigslist.  The next few days were cloudy but then on third day finally I was able to see Saturn and I was all thrilled. The moon was awesome. The only problem is that its hard to get the object into view in the main eyepiece even though finder shows it. I have finally fixed the issue, I will write a post about the issue (debugging the issue was fun).

Javascript developer interview and design patterns

I have been taking interviews of Javascript developers lately and one of the question I ask is "Can you tell me what design patterns you have used in Javascript besides MVC?".

People with 6-10 year exp  have gone blank on this question. wth  any decent developer would have seen:

1) Composite pattern if you had build a custom widget.
2) Observer/Observable pattern to do event handling.
3) Proxy pattern if you had mocked server apis
4) Delegate pattern.
5) Factory pattern.
6) Singleton pattern.

Fizz buzz and interviews of any engineers

Lately I have been doing lots of interviews from Java developer to UI/JavaScript developers to Build/Release engineer and my forte is Java but unfortunately I have to be pulled in other interviews so how do I test a Javascript developer when I myself cant write decent Javascript (I can google and get things done but I consider myself a newbie in Javascript).

So I resorted to basics and started asking fizzbuzz tests like

1) given an array find max out of an array.

People with 10 years of exp have flunked on it. One guy wrote code like
for(var i=0;i++;i<arr.length-1) {

one guy told me to sort the array and find it out of it.

2) given an array find second max out of an array.

3)sort an array

4) reverse an array

5) binary search in a sorted array

I have seen experienced people taking anywhere from 5-15 mins just to find max out of an array. I mean even if you wake me up in the night at 3:00 AM and ask me to find max of an array I can do it without syntax error in…

Java Annotations/AOP and being lazy

I have found that if you try to be lazy then you would use AOP and annotations more. I try to be lazy and my blood boil when DRY being violated in the code. My goal is to how to make code simple yet powerful and try to hide complexity from an average developer and if the complexity cant be avoided then keep it in a single layer as much as possible.

AOP allows you to hide the compexity in a single layer transparent to an average junior developer.  For e.g.
1)  I was given a requirement to allow X no of Read and Y no of Write request per node as Berkely db couldnt handle >X+Y requests.  So AOP helped me sort it out. I created an interceptor MetadataThrottlingInterceptor and created two annotations @MetadataWriter and @MetadataReader.  So I just annotated the methods in the storage layer with these annoatations and used a threadpool in the Interceptor class to limit the read/write request.  This allowed me to hide the complexity in a layer and transparent to the developer. A junior de…

Tracking missing cobertura code coverage

I had recently plugged in code coverage tool cobertura in our startup and got X% of coverage and was happy because this is the first time we had testcases and code coverage working in jenkins.  I was happy and we were planning to steadily increase this code coverage but then one of the developer complained that some of the jersey tests were showing proper coverage but the server classes calling it were not showing any coverage at all.  When I had initially plugged in the coverage tool I had seen server classes showing coverage and those classes are still  showing coverage.  I thought this has to do with the grizzly rest container we were using for Jersey test.  Then while tracing missing code coverage like a detective I had an Aha moment.  Our code structure is something like


The ui module depends on server module classes for testcase execution. I had a common_module.xml ant build file that is…

Mysql release column name

Ran into an issue where mysql would throw some weird exception saying there is a syntax error in my insert query. After 20 mins of investigation I found that I had named a column in the table as 'release' and its a reserved keyword :).

Java code coverage using cobertura

Code coverage is important as it gives developers confidence in the code that they are checking in. And for a startup automated test and measuring code coverage is equally important to be able to Ship early and be nimble.  I recently automated unit testing in jenkins and wanted to measure how much code coverage we have. So I integrated cobertura in the build framework. Cobertura is a build time instrumentation plugin so the steps are:

1) compile the code
2) Instrument the classes to generate instrumented classes with cobertura hooks.
3) modify junit class path to put instrumented classes before real classes
4) add a system property in junit ant task, so that cobertura knows where to write its statistics.
5) generate a coverage report in html to be used by developers
6) generate a coverage report in xml so that it can be published to sonar so we can do trends on code coverage release after release.

Here is how a typical code coverage report looks like…

Sequence diagram tool on ubuntu

Web is amazing, I was trying to find some good tool to create sequence diagrams on ubuntu as my writing is awful and fortunately landed onto

All you need to do is write some text in the left pane like

UI->Jersey:Invoke Rest api
Jersey->Spring:Spawn a transaction

and it would generate an image for you in right pane.

Kudos to the guy who created it.

Spring query timeout or transaction timeout

If you are using spring to manage transactions then you can specify default transaction timeout using

    <bean id="transactionManager"
        <property name="dataSource" ref="dataSource" />
        <property name="defaultTimeout" value="30" /> <!--30 sec--->             

or you can override the timeout in the annotation

    @Transactional(readOnly = false, timeout=30)

or if you are doing it programatic transactions then you can do

DataSourceTransactionManager transactionManager = new DataSourceTransactionManager(dataSource);

 or you can override the timeout for one particular transaction

TransactionTemplate transactionTemplate = new TransactionTemplate();

memcached evictions though memory is available

Ran into a performance issue where we would see intermittent slowness on application and it was traced to memcached evictions. Running stats command on memcached telnet port gave

STAT limit_maxbytes 8589934592
STAT bytes 6792297037
STAT total_items 1149728020
STAT evictions 192571322

As you can see even though close to 1G of memory is free we were seeing evictions. This article explained this in great detail and cleared some of my concepts

Adding hostname or IP address to log4j logging for centralized logging

we use logstash for centralized logging and every tomcat needs to write to it using syslogappender. One requirement was to identify the log line by adding a IP address and port to the logged line.  This can be done using MDC feature of log4j or slf4j but problem is there can be lots of entry points into the logger.  The solution I came up is introduce template parameters (_@server_ip_@ and _@server_port_@) in log4j.xml as shown below and let the installer replace them at install time on each node.

            <param name="ConversionPattern" value="%d{yyyy-MM-dd'T'HH:mm:ss.SSSZ}{GMT+0} %p %t H-_@server_ip_@:_@server_port_@ D-%X{MDC_DOMAIN} %c - %m %throwable"/>

Can you spot the bug here? Findbugs can

I love this tool because it can find bugs that are hard to spot for humans.

                for (U user : users) {
                        if (user.getPrefix() != null && user.getPrefix().equals(userPrefix)
                                        && user.getOrgName() != null & user.getOrgName().equals(orgName)) {
                                return user;

There is a bug in the if statement below that I wasnt able to spot and its an honest mistake on developer's part :).

There is a & instead of && in the if condition.

Memcached bulk api (spymemcached v/s memcached-client)

We use memcached as a cache to front mysql and we do caching at granular level. As we are a cloud filesystem company,  we cache files by path and folders by path. The problems comes when someone renames a top level folder that has 100K or more files in the hierarchy. The db operation is fast because in db all we need to do is one folder update (files are stored in normalized fashion so no rename is required at file level in db) but rename in memcache means add/delete of 100K keys. So that's like 200K operations and production code was spending close to 2 mins in memcache only. We use memcached-client java library in our code.  Memcache protocol doesn't support bulk-api so there was no easy solution than trying to set it in different threads but one colleague came across this interesting optimization that spymemcached guys has done spymemcached write optimizations.

I thought of writing a test program to compare memcached-client v/s spymemcached and spymemcached rocks.

1. On my …

From NOSQL to MySql (BDB to mysql) - Final Part

Finally last weekend we migrated all nodes to Mysql. As usual BDB gave last minute hiccups during migration but we were finally able to solve it.  There is BDB used in still some other other parts of the system where its used through background jobs. My next goal is to get rid of BDB from those areas and convert  them to Mysql.

Life after BDB is cool, Today is tuesday and there were no spikes whole Monday and today so far so I can focus on writing more code and solving other issues that were sidetracked before.


Engineering discipline in real life

I had signed up for frymire services who comes and tune up your AC 2 times a year for some $300 bucks.  This time the technician came and did some work on the AC in attic and then he went outside and did some work on compressor.  Then he left. On that day a cold front came and temp never go above 78 so my AC was not on, for last 2 days temp went above 78 so AC went on but instead of decreasing the temp it was increasing it. Whole night yesterday the temp kept on increasing from 78 to 84 and then decreased from 84 to 80(as the outside temp cooled off). There was a smell of mist in the air also. I was puzzled as to why the heck it was increasing instead of decreasing.  There was some thunderstorm 1 day back and there was an electrical outage also so my first suspicion was that it was because of that I went and tripped the power supply but that had no effect. Now being an enginner I didnt wanted to give up easily, I had no idea what an AC compressor is so googled a bit and thought of tri…

From NOSQL to MySql (BDB to mysql) - Part5

This weekend was painful. Hardware came but out of 12 mysql servers, 3 had hardware issues, so only 8 mysql available (4 pair of Master-slave). I didn't wanted to take a risk of having a cluster up with only 1 master so left that one intentionally.

Started migration on 1st DC and after 4 hours the DC guys were doing some inbound network maintenance and took the in bound network to DC down and it took them 3 hours to restore it, so 3 hours were lost. We were up till 6:00 AM on Saturday morning and only able to finish 8-10 nodes.

We worked on Saturday night and as usual Berkely db gave us last minute jitters, one node was badly fragmented and took 10 hours to migration and one other node stuck in middle of migration so had to stop migration and run db_recovery and then resume again. In all 12-15 nodes migrated on that day.

We are now almost done 60% done in all DCs over last 3 weeks.

But all this pain is worth having a calm Monday where no system alerts or no Sales/Operations team h…

Useless comments in code

I hate when developers writes useless comments in code like this
}finally {
            // Closing the connection

Anyone reading the function name close connection can figure out this is closing the connection.

I try to give function a meaningful name so that it conveys the intent and comments are not required. I write method comments only when function is doing some complex algorithm or some trick.

I also hate comments like this

     * @param customer
     * @param ids
     * @return

This is also useless comment generated by eclipse automatically but the developer didn't added a comment.

Screen size is limited and I would rather see code in it than useless comments.

Mysql sharding at our company - Part4 (Cell architecture)

Interesting to know how different people can come to same architecture to solve scalability issues.  I just read this article published today called as cell architecture  and I came up with same architecture in our company as highlighted in the  diagram below and I am calling it as "Cluster" instead of "cell" or "pod" but the concept is same.

You can read a bit more on the link I published above but the main reason why we chose this architecture were:

Failure of a cell doesn't cause the entire DC to go down.We can update one cell and watch out for a week before pushing the release to all cells.We can have difference capacity for different cells (Enterprise customers vs trial customers).We can add more cells or mores mysql host to once cell if it has a capacity problem in one component.Ideally you want to make it as homogeneous as possible but let say for some reason in one cell people a…

Mysql sharding at our company - Part3 (Shard schema)

As discussed in Part2 of the series we do Horizontal sharding for any schemas that will store 100M+ rows. As we are a cloud file server no data is shared between two customers a perfectly isolation can be achieved easily. One year back when I was thinking on designing the schema there were many alternatives

One shard per customer mapped to one database schema : Rejected this idea because mysql stores 1 or more files per table in physical file system and linux file system chokes after some no of files in a folder. We had faced this problem when storing the real files on filers (topic of another blog post).One shard per customer maped to one set of tables in database schema : This would solve the issue of multiple files in a folder but again it would lead to too many files on the disk and operating system can choke on it. Also we have customers to do a trial for 15 day and never signup, so too much for ops team to manage for these trials.Many customers in one shard mapped to …

From NOSQL to MySql (BDB to mysql) - Part4

No hardware this weekend due to delays by dell so only  2 nodes were migrated. Ran into another crappy Berkely DB issue where for no reason it would get stuck in native code.
java.lang.Thread.State: RUNNABLE
at com.sleepycat.db.internal.db_javaJNI.Dbc_get(Native Method)
at com.sleepycat.db.internal.Dbc.get(
at com.sleepycat.db.Cursor.getNext(
It was already 4:00 AM in the morning and I was pissed that Berkely db is giving last minute pains. In India they say "Bujhta Diya last main Tej Jalta hai" or "dying lantern runs more bright at end" .  We tried doing a defrag of database but it didnt helped. Ultimately we moved the customer db to its own folder and then ran catastrophic recovery and restarted the migration. From past Berkely db migrations I had the experience that you would get these kind of issues and you might have to restart migrations. I had delibe…

Mysql Sharding at our company- Part2 (picking least used shard)

in Mysql sharding at my company part1 I covered that when a customer registers we assign him next free shard. Now this problem seems simple at first but it has its own twists. You could have different strategies to determine next free shard id
Based on no of rows in shardBased on no of customers in shard After lots of thinking I use the first approach. When a customer registers I pick up the next free shard by looking up information schema and query least used 8 shards and then randomly pick one of those.  Picking shards based on no of customers was rejected because we use a nagios monitor to test registration and that causes lots of dummy registrations and also QA team does registrations and some times people will register a trial use for 15 days and wont convert as they are either spammers or just want to use us for sharing large files for <15 days.

The reason to not always pick the first least used shard is that 6 months down the line if we add 2 more shards to cluster then every…

From NOSQL to MySql (BDB to mysql) - Part3

We finally migrated 14 nodes this weekend. As there were millions of files to be migrated even after running 5 parallel thread on each node it took close to 3-4 hours per node. We were running 3 nodes at a time otherwise the db load was shooting up high. As we cant afford to have a downtime on weekdays, the migration has to happen on weekend nights. On Friday night it we were up till 6:00 AM and on Saturday night we were up till 4:00 AM. We wanted to do more nodes but the no of files per host was going overboard and I wanted to be conservative to start with, if the mysql servers can handle more then later we would consolidate shards or move them to this host. New mysql hosts are going to be provisioned next week so hoping to migrate more nodes this week.  Monday was a calm day after long time as we chose all nodes that were spiking to be migrated first.

Db loads in all DCs are low and even app nodes are low, Surprisingly slave dbs are getting pounded more than master db after last wee…

Mysql Sharding at our company- Part1

Sharding is a double edged sword, on one hand it allows you to scale your applications with increased growth in customers and on other hand it very hard for junior developers to grasp the concept. I try to abstract and encapsulate as much complexity as I can in the framework.

Typically people either do
1. Functional Sharding/Vertical sharding: For huge datasets this will only buy you time. When I joined the company everything was stored in Berkely db or Lucene or in some files on filesystems. I tried moving all the things to Mysql in one project but it became a monster so I started moving pieces of applications out into mysql and moving them to production. This also boosted confidence of team in mysql as more feature went live on Mysql and they saw that Mysql was reliable and scalable. I started with smaller pieces that didn't had >10-20M rows but needs mysql for its ACID properties. Anticipating the growth we decided to create one schema per functionality and avoided joins bet…