Skip to main content

Java count open file handles

Encountered an issue in production where JVM ran out of file handles due to code bug. It took five minutes for file handles to build up but had there been any trending of open file handles we would have caught it as soon as release was pushed as on some nodes it didnt exhausted the file handles but the number was high enough to have caught suspicion. Now I can do lsof and put it in cron but I am not fond of crons as you have to configure it manually and if a box has 4 tomcats then you have to configure for each one of them on 20-30 nodes. So I wanted to get count of open file handles every five minutes and push it to graphite for trending. Here is a sample code to do it

 public long getOpenFileDescriptorCount() {
  OperatingSystemMXBean osStats = ManagementFactory.getOperatingSystemMXBean();
  if(osStats instanceof UnixOperatingSystemMXBean) {
   return ((UnixOperatingSystemMXBean)osStats).getOpenFileDescriptorCount();
  }
  return 0;
 }

Comments

Popular posts from this blog

Preparing for an interview after being employed 11 years at a startup

I would say I didn't prepared a hell lot but  I did 2 hours in night every day and every weekend around 8 hours for 2-3 months. I did 20-30 leetcode medium problems from this list https://leetcode.com/explore/interview/card/top-interview-questions-medium/.  I watched the first 12 videos of Lecture Videos | Introduction to Algorithms | Electrical Engineering and Computer Science | MIT OpenCourseWare I did this course https://www.educative.io/courses/grokking-the-system-design-interview I researched on topics from https://www.educative.io/courses/java-multithreading-for-senior-engineering-interviews and leetcode had around 10 multithreading questions so I did those I watched some 10-20 videos from this channel https://www.youtube.com/channel/UCn1XnDWhsLS5URXTi5wtFTA 

Adding Jitter to cache layer

Thundering herd is an issue common to webapp that rely on heavy caching where if lots of items expire at the same time due to a server restart or temporal event, then suddenly lots of calls will go to database at same time. This can even bring down the database in extreme cases. I wont go into much detail but the app need to do two things solve this issue. 1) Add consistent hashing to cache layer : This way when a memcache server is added/removed from the pool, entire cache is not invalidated.  We use memcahe from both python and Java layer and I still have to find a consistent caching solution that is portable across both languages. hash_ring and spymemcached both use different points for server so need to read/test more. 2) Add a jitter to cache or randomise the expiry time: We expire long term cache  records every 8 hours after that key was added and short term cache expiry is 2 hours. As our customers usually comes to work in morning and access the cloud file server it can happe

Quartz stop a job

Quartz did a good job on implementing this concept. It was very easy to add this feature by implementing a base class that abstract the details of interrupt and have every job extend this class. If you can rely on thread.interrupt() then its the best way to interrupt a job that is blocked on some I/O or native call. However if its a normal job then a simple boolean flag would do the work. You would need to use scheduler.interrupt(jobName, groupName); to interrupt a running Quartz job. public abstract class BaseInterruptableJob implements InterruptableJob { private static final AppLogger logger = AppLogger.getLogger(BaseInterruptableJob.class); private Thread thread; @Override public void interrupt() throws UnableToInterruptJobException { logger.info("Interrupting job " + getClass().getName()); if (thread != null) { thread.interrupt(); } } @Override final public void execute(JobExecutionContext context) throws JobExecutionException { try { thread