Skip to main content

Posts

Showing posts from November, 2014

Webhooks and Integrating Aseembla svn commits with Pivotal tracker stories

I am intrigued by this webhook concept and it seems a very nice way for B2B communication.  Webhooks are powerful and it eliminates Poll for integrating with third parties. All you need is to have some REST api registered that will be called when an event occurs. A good e.g. of webhook for a cloud storage provider can be  "automatically print this document on registered printers when a file is dropped in this folder". Lets assume all the customer needs to do is register a webhook "http://xyz.foo.com/printer/xxd344/print?token=authTokenXXX and the cloud storage provider can then call this url and POST the body of document in input.

Recently I had a chance to play with webhooks when I was trying to move jenkins to EC2 for my friend and as part of it I moved his svn to assembla.com.  I saw webhooks and I thought I can integrate commits into svn hosted by assembla.com to pivotaltracker.com tickets.  It took just 1 hour to do it but it was fun, apparently there are post commi…

Data driven performance issue and NewRelic

NewRelic really shines at discovering these data driven performance issues. Earlier we would find them late or these would be buried but now they seem so obvious if the engineer is paying attention.  I was casually trolling new relic and sorted all apps by avg time per api and one of our core application in one DC was taking twice the avg time for each call than all other DCs. I immediately compared that DC with other DCs and I saw was a graph like below in DC1

and I saw this in DC2

Clearly DC1 is spending abnormal amount of time in database. So I went to database view and saw this in DC1

and I saw this in DC2



Clearly something is weird in DC1 even though its same codebase.  309K queries per minute seems abnormal.  Within 5 min I found out its a n query problem. Aparently some customer has 4000 users  and he has created 3000 groups and the group_member table has 40K rows for this customer. Normally all of our customers would create 10-50 groups and there is a code that iterates over …

AWS and rise of devops

I used to always wonder how Snapchat, Pinterest and Instagram were able to scale to millions of users with just 10-15 engineers.  I am a Java Architect but when it comes to networking, operations and other stuff I am a Noob beyond basic skills.  Recently our ops team did some subnet changes and some IP changes and added 10G network between some services, All this is grey area to me and I was like you really need to hire Operations for this so how come these other startups did without so many people.  One of my friend was after me for months to help him move his jenkins servers from Ukraine to EC2 as Ukraine is in turmoil. I have no ops expertise so this was tricky but here is how I got it done over 2 weekends as Dallas is freezing due to cold front and I dont have driver license due to immigration fiasco by USCIS. So this friend really got benefit due to it as I had nothing else to do on weekend.

I took a vanilla CentOS AMI and launched an instance in EC2. But when launching it asked m…

A well intentioned public api can bring down a server

Apis are powerful creatures and people can use them to do tons of weird things. We had exposed a public api to create a link but our UI had a capability to select multiple files and generate links for them in bulk in call, so our public api mimicked this behavior.

Today a server was running hot with full GC, I took a heap dump and restarted it. Upon analysing the heap dump in Eclipse Memory Analyzer I found that Sys log appender was choked and it had a queue of 10K messages with each being 2MB. I copied the value and found the class name in log message.

Aparently whenever a link was created a log message was written that would iterate over each file and log a line for each file. There was a bug in the log message that  it would log entire message instead of that file.

for (target in linkRequest.getTargets()) {
 logger.info("queuing preview generation for {}", event) ;
}

QA/developers cant detect this and most people in code review focuses less on logger messages.

But there w…