Skip to main content

Data motivates you to do more

I always believed in "Trusting data" over human gut but lately I have been observing a simple fact that being data driven has a side effect, "it motivates you and  keeps you on track". Some recent examples are:

Fitbit: Last year I started the afternoon walk because by 2:00PM after doing calls and replying to a lot of emails the brain would be fried and I wont have energy left to code or think.  Doing these 30 min walk daily recharges the brain. I  had an Omron pedometer sitting around for almost an year and I seldom took it with me on walks. The problem with it was that it used to store last 30 days data of my steps and other things but it didn’t had a good way to graphically see the data. When it comes to data "less is more" but also one more important aspect is that you need to present your data in graphs so it doesn’t take a huge amount of cognitive effort to make sense of it.  Recently my employer gave a fitbit to everyone who participated in summer challenge and it isn’t a better  pedometer than my old Omron but immediately I saw that it can sync data stored on pedometer via bluetooth to my fitbit app and after a week I see this. Immediately in 1 sec I can see that I am lagging this week and need to catch up.

Its another thing that I need to remember to carry this dongle with me, I always carry my cell with me on walk and I saw that Iphone6 has a pedometer built into it so that would eliminate the need for this when I upgrade to Iphone6.

Large scale migration to Elasticsearch: We recently migrated billions of files to Elasticsearch and the migration took months but data kept us on toes and telling us if the compass is pointing towards north or not. We built various dashboard to monitor migration rates and as migration was running day and night I would start my day with checking how many more files we migrated and whether we need to add more servers, jvms, memory or CPU to meet the goal. Here is a graph in one data center.
Data kept us on track and we were able to spot many issues before they were able to create a disaster.

Exception Analysis:  We do exceptions and 5xx status analysis on incoming requests daily and in 2 week sprint we strive to fix  as many as we can, but after the release either new issues pops up or customers use the flow in a different way causing some components to buckle under pressure but one thing that has kept us on toes is data. By looking at the one screen report we can tell how did this data center do yesterday and which areas require immediate attention vs areas that require attention in 1-2 days. This leads to lesser no of customer escalations as we are able to spot many issues before them. Little little things adds up and  having this report daily strives us to optimize more and we can spend quality time on doing things we love which is writing code for scalable systems.

In short data points you that there is a problem and once a bug has been implanted in your brain that a problem exists you would try to fix it so you can get back to normal routine.


Popular posts from this blog

RabbitMQ java clients for beginners

Here is a sample of a consumer and producer example for RabbitMQ. The steps are
Download ErlangDownload Rabbit MQ ServerDownload Rabbit MQ Java client jarsCompile and run the below two class and you are done.
This sample create a Durable Exchange, Queue and a Message. You will have to start the consumer first before you start the for the first time.

For more information on AMQP, Exchanges, Queues, read this excellent tutorial
import com.rabbitmq.client.Connection; import com.rabbitmq.client.Channel; import com.rabbitmq.client.*; public class RabbitMQProducer { public static void main(String []args) throws Exception { ConnectionFactory factory = new ConnectionFactory(); factory.setUsername("guest"); factory.setPassword("guest"); factory.setVirtualHost("/"); factory.setHost(""); factory.setPort(5672); Conne…

What a rocky start to labor day weekend

Woke up by earthquake at 7:00 AM in morning and then couldn't get to sleep. I took a bath, made my tea and started checking emails and saw that after last night deployment three storage node out of 100s of nodes were running into Full GC. What was special about the 3 nodes was that each one was in a different Data centre but it was named same app02.  This got me curious I asked the node to be taken out of rotation and take a heap dump.  Yesterday night a new release has happened and I had upgraded spymemcached library version as new relic now natively supports instrumentation on it so it was a suspect. And the hunch was a bullseye, the heap dump clearly showed it taking 1.3G and full GCs were taking 6 sec but not claiming anything.

I have a quartz job in each jvm that takes a thread dump every 5 minutes and saves last 300 of them, checking few of them quickly showed a common thread among all 3 data centres. It seems there was a long running job that was trying to replicate pending…

Logging to Graphite monitoring tool from java

We use Graphite as a tool for monitoring some stats and watch trends. A requirement is to monitor impact of new releases as build is deployed to app nodes to see if things like
1) Has the memcache usage increased.
2) Has the no of Java exceptions went up.
3) Is the app using more tomcat threads.
Here is a screenshot

We changed the installer to log a deploy event when a new build is deployed. I wrote a simple spring bean to log graphite events using java. Logging to graphite is easy, all you need to do is open a socket and send lines of events.
import org.slf4j.Logger;import org.slf4j.LoggerFactory; import; import; import; import java.util.HashMap; import java.util.Map; public class GraphiteLogger { private static final Logger logger = LoggerFactory.getLogger(GraphiteLogger.class); private String graphiteHost; private int graphitePort; public String getGraphiteHost() { return graphiteHost; } public void setGraphite…