Glassfish_logo.svgI manage a small application that is hosted on a single Linux virtual machine and have noticed some occasional performance and stability issues.

The server runs CentOS, MySQL and a GlassFish server. I chose GlassFish several years ago because at the time it was the only EJB3 container around, as well as being very well integrated with the NetBeans IDE.

Like most JavaEE containers, it also makes clustering and scaling easier. Continue reading

English: A timezone indicator showing UTC+6 Ру...

I support a web application that is hosted on a virtual private server. The application architecture is JavaEE running under GlassFish on CentOS.

Like most ISP’s, my hosting provider builds vanilla Linux boxes that can be configured with various flavors of the OS.

Out of the box these images have their timezone set to UTC, and since my end users are in California, and I’m not doing anything on the client to handle timezone conversion, the times that come up in the application are off by 7 or 8 hours (depending on whether it’s Daylight Savings time or not. Continue reading

I am a big fan of Test Driven Development (TDD) and tools like Hudson/Jenkins to automate the process of having a continuous integration build system are key.

On my current project we recently started moving things to Amazon EC2, and rather than put everything on one big server, I thought I’d follow the best practices in cloud computing and make a number of small special purpose servers to take care of the project’s needs.

We’ve had a Jenkins server running for a bit, so rather than reinventing the wheel, I figured I could copy my Jenkins configuration to a new server and get things up and running.

I fired up a new Tomcat server on Amazon Elastic Beanstalk, and loaded up the Jenkins WAR file, which quickly got me to a working Jenkins server. This project is written in PHP, so I had to install PHP after that, which meant logging in to the server and running through the whole PHP and PHPUnit setup.

Once that was done, I scp’d the Jenkins folders from the old server, edited the Tomcat startup files to include the environment variable to point Jenkins to the right place, changed a few permissions, and everything appeared to be working.

I could log in, I fired up the build, and it appeared to be running – very cool.

But that was when the flaw in the design of the PHP unit tests was exposed ….

I was watching the output of the phpunit tests, and noticed two things:

  1. The tests seemed to be taking a really long time
  2. Every test was failing

Watching the console, each time a test would fail, the little “E” would print, then a few seconds would go by and another “E” would appear. Finally after many minutes (because we have a LOT of classes to test) the error output appeared, and looked something like this for EVERY test:

And of course there were 5297 of these … I did some Google searches for the PHP_Invoker_TimeoutException which mostly pointed to issues with upgrade from one version of PHPUnit to another, but the versions on the old server and this one were the same.

So my next step was debugging an individual tests. Running the test from the command line gave me the same error, odd. But then I ran the test using php instead of the phpunit call, and found the problem: I was getting a timeout trying to open a database connection.

The issue as it turns out, is a design flaw in our code that hadn’t showed up before: all the classes invoke a database connection class that sets up the connection to the database as soon as they are loaded.

Since the Elastic Beanstalk server was in a different security group than was allowed to connect to the RDS database, it was unable to connect at all, and PHPUnit would simply timeout before the connection failed (by default phpunit sets 1 second as the acceptable for a test to run in order to catch endless loops).

Now in theory our tests shouldn’t be hitting the database (at least not for these unit tests since we don’t want them updating anything on the backend), so this problem turned out to be very fortuitous. Because the Jenkins server couldn’t reach the database, it exposed a flaw in our unit tests: we weren’t mocking all the things we needed to, so the tests were actually opening connections to the database.

With some refactoring of the test classes to mock the database access layer, the tests all succeeded. Next we’ll need to do the actual DBUnit tests for the database, and Selenium or HTTPUnit tests for all the front-end and AJAX stuff.

The other day I got a notification from the Plesk control panel for Carticipate telling me that the user account was about to expire. I logged into Plesk only to find out that the disk on the VPS was full.

Now this was really confusing, since the entire Carticipate system consists of a web2py install, on Linux with a fairly small MySQL database, and the VPS had 20 gigabytes of disk space.

I looked around and found that there were three full backups each about 14Gb, with several incremental backups of a gig or so. Thinking this was the issue, I deleted all but the latest backup.

Well apparently, those backups are stored somewhere not on the server, so that made no difference to the space problem. So I did the next natural thing which was to look for big log files, clear out /tmp, and anything that might be causing the problem. I only managed to clear out a few megabytes of space.

I dug around some more and found a lot of files under the /root directory that seemed to be related to updating Plesk. Since that’s provided as part of my 1and1 VPS service, I called the support number to see whether they could help me figure out where my space had gone. Unfortunately, for the first time, they were not very helpful, suggesting that I needed to research and find out where the big files were on my system.

After some frustrating arguing with the support guy, where I pointed out that I couldn’t get support from Plesk, only they could, I finally gave up and went to my old stand by of Google.

First a quick search to find where the mysterious gigabytes of space were being consumed. First I did a quick “find” command with:

Going to a few of these folders and running “du -hs” gave me the folders that seemed to have a lot of files in them and were eating up a lot of space.

A few more searches on the Parallels site, and found a couple more references telling me that these were both OK to clean out.

The /root/psa folder is where Plesk was configured to download updates, and apparently it doesn’t clean out those folders after the update is successful.

The other folder is where all the dumps from the local backups get placed, and that was the primary problem area. The /var/lib/psa/dumps folder was over 14Gb of the 20Gb, so cleaning that out got me started again.

Looking at the dump directory, it appears to have daily dumps of all of the Plesk stuff going back forever, so this may happen again, but for now, my VPS backup is down to a much more reasonable 2Gb.

JetBlue Airways logo Category:Airline logos
Image via Wikipedia

I was on my way home from a business trip and wanted to print my boarding pass. Most hotels have a PC set up that is primarily for that purpose, and this one was no different.

The PC they provide is set up so that it wipes itself every time somebody logs in, which in theory protects you from somebody eavesdropping and/or stealing your information. The down side to this approach is that you’re stuck with whatever they decide is the right version of software to work with.

The problem (for me) came in because I was flying JetBlue. Now I love JetBlue, but they (for some unknown reason) use Flash in the page to print out your boarding pass.

You’d think this wouldn’t really be much of a problem, since I’m sure they probably have some sophisticated check that will catch if the browser doesn’t have Flash and redirect you.

… Except …

The PC did have Flash loaded, but the JetBlue page wanted a higher version installed than was on the machine. So the browser helpfully asked if I wanted to upgrade, which I said “yes” to, only to find that the generic login doesn’t have install permissions (surprise, surprise).

So I asked the desk to help, but they had even more locked down machines, so I figured I’d just have to punt.

Until I remembered that I had my Mac with me. I went to my room, ran through printing the boarding pass from the JetBlue site straight to PDF. Then I uploaded the PDF to my Google Docs.

Back down to the hotel computer, log into my Google Docs, open the PDF and presto – the boarding pass is printed !

Bottom line is that by relying on Google to connect me, I was able to get what I needed in spite of the software incompatibility. Once again I’m loving the cloud (and Google Apps).

Image representing Windows Live Mesh as depict...

Image via CrunchBase

I’ve been playing with the beta of Live Mesh from Microsoft for some time now, and find it a very useful technology. So far the only problem I’ve run into has been some bug that was introduced when I upgraded to Snow Leopard.

For some reason, after restarting or hibernating my machine, Live Mesh gets left in an odd state that leaves it unable to connect to the mesh, leaving it in a weird state where the login action is greyed out:

Live Mesh greyed out login

After a bit of Googling and searching around on the Microsoft Connect site for people experiencing this bug, I found a couple of different solutions.

Two possible workarounds, both require Live Mesh to be shut down:

Method 1: delete the Live Mesh preferences file ~/Library/Preferences/com.microsoft.LiveMesh.plist.

Live Mesh preferences

This method is what I typically use, since it is the least intrusive. It reconnects all the folders that I’ve added to my mesh, and re-establishes the synchronization. It does tend to fill up my hard drive with files, since the initial synch puts most (if not all) of the files in the folders into the Trash.

Method 2: Star with a clean slate:

  1. Quit the Live Mesh client.
  2. Delete the Live Mesh settings in Application Support (~LibraryApplication SupportLive Mesh).
  3. Delete the Live Mesh preference (~LibraryPreferencescom.microsoft.LiveMesh.plist).
  4. Launch Live Mesh client.
  5. Log in and select the folders you want to synch like you did originally.

This method is effectively like doing a complete uninstall, since it removes all the settings and preferences. It does cause a complete re-synch of the folders, and you can choose if you want to “merge” or “replace” based on whether you think you might need to or not.

This will also end up with lots of files in the Trash, so watch out for your disk filling up.

Method 3: Never shut down or let you machine sleep ;-)

Obviously, this method isn’t practical, but I figure I’d mention it. Until Microsoft adds some code to the Mac client it is probably worth trying to remember to shut down the Live Mesh client before you reboot or leave you machine in a state where it loses it’s connection with the network.

My guess is that the Microsoft developers aren’t listening for the right events, and therefore leaving things in a state where they don’t know how to recover. Most Mac apps are pretty smart about knowing when the machine is going to shut down, or when the network connection goes away, and handle the problem as gracefully as possible.

Live Mesh is still in beta, so it’s likely they will fix this before it becomes a real product. Like most Microsoft beta products, Live Mesh is still incredibly useful and solid on Windows. I’m hopeful it will get there on Snow Leopard as well.