Skip to main content

Posts

Showing posts from 2015

Data motivates you to do more

I always believed in "Trusting data" over human gut but lately I have been observing a simple fact that being data driven has a side effect, "it motivates you and  keeps you on track". Some recent examples are: Fitbit: Last year I started the afternoon walk because by 2:00PM after doing calls and replying to a lot of emails the brain would be fried and I wont have energy left to code or think.  Doing these 30 min walk daily recharges the brain. I  had an Omron pedometer sitting around for almost an year and I seldom took it with me on walks. The problem with it was that it used to store last 30 days data of my steps and other things but it didn’t had a good way to graphically see the data. When it comes to data "less is more" but also one more important aspect is that you need to present your data in graphs so it doesn’t take a huge amount of cognitive effort to make sense of it.  Recently my employer gave a fitbit to everyone who participated in summe

Have I started hating mysql and falling in love with distributed databases

It seems Mysql is rock solid if you want: Transactions ACID support So I would still recommend mysql for any thing that is mission critical data and it should be the primary datastore for your transactions. But what about derived data, immutable data or analytical data? In past I have built large scale cluster of mysql server storing metadata about billions of files and folders used by tens of thousands of customers daily and its scaling fine and working good, its still growing at a healthy rate and holding up.  But this requires a lot of baby sitting if you have 100s of nodes and you need to do replication add more nodes rebalancing data monitoring entire cluster Sharding Backup/restore You have to write a lot of tooling and lot of monitoring/babysitting to scale the cluster. Plain stock Mysql will scale up to a limit but vertically scaling has its own issues. So +1 for Mysql but not everything should be stuffed there. Recently me and my team built full

Rate limiting APIs and Java services when operating at Scale to solve Thundering herd problem

When you are operating at scale and handling peak traffic of 1K+ request per sec on a jvm then no matter what you do you would get hit by a Thundering herd problem. There would be operations that happens once in a while but take more than 10 sec and if there are too many of them happening then you could choke backend services or worse cause a downtime. So you need to Rate limit these long running operations that only X can run at a time, this way you are leaving room for running lots of short lived transactions. When you have millions of users then not all users are doing these long running operations and not all traffic is coming from online users. We are a cloud storage company and we give sync client to users so 80%+ traffic at a given time is coming from these clients that are trying to sync changes between cloud and local system behind the scenes.  Our application is written using REST apis and these clients  are using the same REST apis that our web ui is using.  Also some cust

Cognitive overhead of email

Looks like these days bitching about email is my favourite topic. What can I do no matter what I do I just cant keep up with email flow.  The new kind of emails just keep increasing, I was doing a good job 2 years ago and close to inbox zero daily and was writing a lot of code but two things have increased the email flow to me recently: Internal reports about production montioring from Newrelic, haproxy, exception analysis by team.  Review requests.  This is a recent one. Most review requests are sent to Java review group and somehow I atleast need to spend 1-2 min on every email to first decide whether this is for me or not.  Even if there are 30 review request per day it consumes 30 min of my time just to make sense of it. Daily when I see 100 emails to be answered I feel disheartened that I cant write any code today.    Both above items are necessary evil, #1 has increased production stability so my weekends are less busy.  #2 I am hoping eventually will increase code quality

New relic and statuspage.io integration

Monitoring tools exposes a lot of data and we use Nagios, cacti, graphite,newrelic, mixpanel, flurry, boundary and many more tools.  But one of the ask for Support and marketing teams is how can they internally know if something is wrong. We cant expect them to wade through so many systems and so many applications to make sense of what is operational and what is not.  For e.g. we use a lot of services to serve the cloud filer server solution and this is the first page of status of our services in new relic and it spans 2 pages. Support team and management relies on Operations team to notify them if an issue is on going. To make this easy I did a Proof of concept integration application responsible for serving main website with Statuspage.io.  The idea is simple Create public metrics in statuspage.io that are human readable. Query new relic and various systems for application status. Map new relic green/red/yellow status and other system status to statuspage.io status. If the

Move fast break things but with monitoring

We run a complex system with multiple services and every 2 or 3 week we  update the Java applications.  I want to do it every week as most applications are stateless and can be patched anytime but the application serving the main website is using sticky session. We are working to make it failover sessions, once we do that, we can do mid week deployment and that will allow us to go faster than 3 weeks.  This week I pushed a huge infrastructure change related to user Id generation. I had asked ops team to check the status of new relic after the midnight deployment and it looked like this so everyone was happy. I woke up and checked new relic mobile app and things looked ok to me. After finishing my morning routines I ran my daily exception report and one thing that caught the eye was 90K exceptions in last 12 hours in one of the files I had changed.  To gauge the impact I went in new relic and it showed me an error rate of 0.07 in one of the app I then checked new relic and I

Email slavery

It seems I have become an EmailSlave. The first half of the day is spent in just answering to emails. There are so many emails where I am copied but I need not be. There are many emails  where its a 1-2 page email and somewhere down someone says @KP please answer this.  So it seems daily my work schedule is: Signin to newrelic and check anomalies for 15 min.  Check emails related production exception report and yes there are a ton of these report daily. Need a better tool here as this model is not scalable. I need to reduce the incoming data at me to only see relevant data like what newrelic does. May be I need to create a webapp out of these emails. Check emails for next few minutes before team calls Do team calls Then again back to checking emails until a I have taken a best shot at answering everyone waiting for my reply. Attend team meetings on Tue/Thu Being an architect and coder at heart I don't feel satisfied at end of the day if there is nothing tangible getting d

Lint in mouse :)

My wired mouse in home setup is as old as I started working from home for my employer. Suddenly this week the scroll wheel would scroll up fine but it would not scroll more than one page, you had to scroll up half page before it would resume scroll down. At first I thought its my Firefox upgrade but soon I noticed same issue in eclipse and terminal.  In evening I use the laptop without wired mouse and it would work fine. That led me to conclusion that issue is with mouse.  In USA people have a replace mentality, if something is not working and cheap, you can replace it. I was this close to ordering a new mouse on amazon when I thought let me shake it to see if something is stuck, but nothing came out.  I then saw a small screw in bottom, I unscrew it and I hit the jackpot, I was like wth "Lint". My friend had similar issues with Lint in Iphone5 when his iphone wont charge and we used a breadtie to get out the lint and it was charging fine. I was like yay problem solved an