Skip to main content

Posts

Showing posts from December, 2013

Penetration testing and crowdsourcing

We take security of our data and customers seriously and any security issue found is patched ASAP.  We hired outside security testing companies to do our testing in case the developers missed something. Initially we hired a company lets call it YYYhat and they were great in first few months and found many standard issues like XSS, XSRF, session hijacking but after sometime no new issues were found. Then one day a customer reported an issue to us that was not detected by YYYhat so we hired another company, lets call it YYYsec and this company was good in finding some sql injection and some XSS in another parts of the system that were constantly missed by YYYhat company.  But again the pipelined dried from YYYsec and we thought we were secure. We even asked our engineers to start downloading penetration test tools and automate them to detect standard issues.  But again they didnt found much. Lately I am observing that these specialized security testing companies are a one skill or so

Combine mysql alter statements for performance benefits

We have 1200+ shards spread across 20 mysql servers. I am working on some denormalization project to remove joins and improve performance of a big snapshot query.  I had to alter a table with 8 denormalized columns so initially my script was alter table file_p1_mdb1_t1 add column ctime BIGINT; alter table file_p1_mdb1_t1 add column size BIGINT; alter table file_p1_mdb1_t1 add column user_id BIGINT; in performance environment when I generated the alter script and started applying alter I started seeing below data where each alter is taking 30sec. I was like this could take 80 hours to alter 1200 shards each with 8 alter per table. Even if we do 20 servers in parallel this could take 4 hours. Query OK, 1446841 rows affected (33.58 sec) Records: 1446841  Duplicates: 0  Warnings: 0 Query OK, 1446841 rows affected (31.66 sec) Records: 1446841  Duplicates: 0  Warnings: 0 Query OK, 1446841 rows affected (31.86 sec) Records: 1446841  Duplicates: 0  Warnings: 0 Query OK, 1446841 rows affe

Mysql replication and DEFAULT Timestamp

if you are using mysql replication to replicate data across data centres and using statement replication then dont use the DEFAULT and on UPDATE fields last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP We recently did a simple RING replication across 3 datacentres to manage our user account identities and ran into this issue.  Due to this when the same statement is applied on other data centre the timestamp column gets a different value.  For now we would remove this and generate the time from Java so that we can use do a hash based consistency checker on replicated rows.

Url rewriting XSS and tomcat7

If you have url rewriting enabled and your site has a XSS vulnerability then the site can he hacked by reading url in Javascript and sending sessions to remote servers. To disable this if you are on tomcat7 the fix is simple add this to your web.xml     <session-config>         <tracking-mode>COOKIE</tracking-mode>     </session-config> Also you should mark your sessionId cookie secure and httpOnly.

Velocity, code quality and manual QA

Unless you can remove manual QA, you can either have code quality or you can have release velocity. You cant have both goals, you need to compromise on one of them. We recently did a big release where we replaced lot of old DWR apis with REST apis and we revamped a big part of old ExtJS UI with backbone based UI. The release was done in a duration of 4 weeks and even after delaying the release by 1 week, there were some bugs found by some customers.  I had expected some issues and I am of proponent of "fail fast and fix fast".  Many of these issues we were able to find and fix it on the weekend itself. In the coming week we found an issue on Friday afternoon and we had a similar bug fixed for next release that would fix this bug also, but QA was unsure of merging this fix as most of the QA team was out. This is the second last week of december and it seems everyone is out for christmas so it was decided to delay the fix to be done when QA was available. Now I have to dail

nginx enable compression and JS/CSS files

So one of my old colleague was trying to find some quick way of increasing performance for his pet project website so he asked me to take a look.  First thing I did was ran pagespeed and immediately I saw compression was off, css/js was not aggregated. JS/CSS aggregation would require either build changes or installing pagespeed so I thought quick 1 hour fix would be to just enable compression so I went and enabled " gzip  on;"  in nginx conf and it gave a good bump on login page but the home page after login, it kept complaining about JS/CSS not being compressed. I was scratching my head and left it as is that day. Yesterday night finally I found out that turning gzip on would only compress text/html by default in nginx.  You need to explicity turn it on for other types. So I went and added the below and problem was solved.     gzip_types  text/plain application/xml text/css text/js text/xml application/x-javascript text/javascript application/javascript application/json

120K lines of code changed and 7000 commits in 4+ years and going strong

wow svn stats reports are interesting.  I ran into a report recently that I made 7000+ commits to svn and changed 120K+ lines at my startup.  Still I feel like I just joined and do not know enough parts of the system. I wonder if there is a tool that can detect how many lines of code I would have read.  If I changed 120K lines and assuming I read 3 times that of code, it seems low.

how a loose wire can lead to strange network issues

On Thursday I started having weird network issues where my skype would disconnect and network would be choppy. I had to use my 3G on phone to make skype calls with team. But streaming services like netflix would work fine.  I tried all things from changing DNS to restarting router to shutting down extra devices.  Finally I remember last time it happened the time warner cable had asked me to plug the laptop directly to modem to weed out router issues. To do this I was connecting modem with ethernet and it also was choppy and suddenly I thought let me check the connections and found that the incoming cable connection to modem required 1-2 turns to tight it. Thats it all issues solved but this loose wire definitely gave me some heart burn for 2 days and even to my son who would start complaining because his netflix would constantly start showing the red download bar :).

eclipse kepler pydev not working

On my new laptop I installed eclipse kepler and then tried installing pydev and no matter what I do pydev wont show up.  Finally I figured out that the latest pydev requires jdk1.7 an unless I boot eclipse in jdk 1.7 vm it wont show up. wtf its pretty lame atleast it could show some warning dialog or something. Anyways my startup uses jdk1.6 and I didnt wanted to pollute the default jdk so the solution was to download the .tar.gz version from oracle and then I exploded it in some directory, I opened the eclipse.ini in the eclipse install directory and added -vm /home/kpatel/software/jdk1.7.0_45/bin/java and thats it. After I restarted eclipse, pydev showed up.

Change

Change is always hard for me. I installed ubuntu 12.04 and didnt liked unity as I was so used to ubuntu10.  I did everything possible to make the new ubuntu look like classic ubuntu.  But if you want to use classic ubuntu in ubuntu 12.04 then its bare bones and you have to install everything manually.  There were many things that kept crashing and there is no fix for it. Also somethings like seeing skype,pidgin in systray was a must for me. Ultimately I gave up and tried unity. Honestly I still dont prefer unity that much but have adjusted to its nuances and slowly started to liking it.

Inertia

Lot of intertia is built up when you are using a laptop for 4 years. I got a new company laptop and I was procrastinating to switch to it.  Primary reason was that some of the things that I use daily were not working on new laptop.  Mostly pidgin, py-dev. Also evolution and other settings have changed have changed in ubuntu 12.04. Finally I found out why my pidgin was not working in my new laptop. It seems I have a 8 year old router and somehow one of the devices in my home keeps hijacking the ip address assigned to my new laptop. I found this out  because randomly web browsing was slow on the new laptop and when Idid ifconfig I realized it was assigned 192.168.2.2 IP and I remember the hijacking of this IP by Roku or some other device. Anyways assigning a static IP to my wireless and ethernet connection seems to have solved the issue. I am up and running on new laptop but still lots of small small thigns are annoying when you change to a new laptop like the new unity sucks so I sw

Building reusable tools saves precious time in future

I hate cases when I had made a code fix but the code fix is going live in 4 days and customer cant wait for the fix. Now you have to go and fix the data manually or give some workaround.  Anyways today ran into an issue where due to some recent code change a memcache key was not getting flushed and the  code fix for that is supposed to go live in 3 days. But a production support  guy called me to fix it for customer and I can understand frustration on part of customer. So the choices were: 1) Ask the production support guy to flush the key in all memcached instances with help of ops. 2) Manually go myself and flush the key. This could take anywhere from 10-15 min plus its boring doing data cleanup. 3)Flush the entire memcached cluster. This is a big NO NO. So I was about to go on path #2 and realized that there is a internal REST API that the production support guy can call to flush arbitary key from memcached. Hurray I just told him what keys to flush and he knew how to call suc

you just have to be desperate

I recently setup a new ubuntu machine and  coincidently my company Jabber account stopped working on both new and old laptop. I was busy so didnt spent much time on it and sent it to ops and they said its an issue on my side. I was puzzled because how come it can be an issue on both laptops. But using skype is a pain and its also not secure so I had to find a way to fix it.  Finally I found  that you can start pidgin in debug mode using pidgin -d > ~/debug.log 2>&1 then I tailed the logs and immediately I saw 17:44:44) account: Connecting to account kpatel@mycompany.com/. > (17:44:44) connection: Connecting. gc = 0x1fc9ae0 > (17:44:44) dnssrv: querying SRV record for mycompany.com: _xmpp-client._tcp.mycompany.com > (17:44:44) dnssrv: found 1 SRV entries > (17:44:44) dns: DNS query for '127.0.0.1' queued > (17:44:44) dnsquery: IP resolved for 127.0.0.1 doing a dig on record shows it was correct dig +short SRV _xmpp-client._tcp.mycompany.

Human touch and scalable systems

I am a big fan of eliminating human touch points when it comes to scalable systems.  When I say "human touch" I mean a step that involves a human to perform the operation rather than relying on bots.  A good example is db schema updates.  We use mysql sharding and recently we split our  global db into 12 sharded databases with identical schema but diffrent hosts.  As we were crunched on time, I didnt got time to write an automatic schema applier, the process agreed upon was to put the ddl statements in deployment notes  and devops would apply it in all databases.  As usual I am always skeptical of human touch points so after 2 months of going live just for my curiosity I wrote a mysql schema differ that would dump all 12 schemas and diff them and to my surprise there were schema differences.  Four of the mysql servers were setup with latin1 characterset rather than utf8_bin. No one caught the issue because that table was created a sleeper mode and the feature is yet to go l