Terminal Insanity: 2011

Wednesday, December 7, 2011

Entering the Black Arts of Automation

Well, this is fun. With all of the testing I do now on Ubuntu Arm, it is time to look at automating as much as possible. The "fun" part is that I haven't done much coding or scripting since 2001 (When Perl ruled the day, and Python was barely in existence). The other part of the problem (for me anyways) is that most of the existing automation tools we currently use in x86/amd64 testing rely on tools that just don't work on existing arm platforms (kvm/libvirt, kexec, ipmi to name a few). So existing scripts that do things like, say, "reimage a platform" need to be done from scratch.

And I am torn between using the antiquated tools I know (or remember using in the past), or learning a bunch of new tools. And the overwhelming "helpful" responses I get when asking for some help in certain areas usually are "Use this tool, it is easy." Of course, those people are well versed in that tool. Try telling someone that only knows how to edit in vi how to use latex. Yea, not going to happen. Don't get me wrong, I am all for learning new tricks. But I can't justify the days it takes to learn how to do one task that someone who knows how can whip together in a few minutes.

Also, I learn by books and examples. I've already spent a lot of money updating my book library (2 Python books, XML, Expect, and a few others). My 4x8 foot bookshelf is starting to sag under the weight of all the books I have accumulated over the years. And, no, I don't like ebooks for this. I just bought the Exploring Expect book for my Nook Color, and while the information is proving very helpful, it isn't as easy to bounce between sections like a good print edition.

But I am making very good progress. It used to take me 2-3 days to fully test each kernel SRU update cycle (4 platforms across 3 releases), most of it hands-on. Now, I can (almost) start a full test sequence with the click of the mouse and check the results in a day or two. There are still a few kinks to work out (like automating the preseed configuration and monitoring the reimaging progress), but we're getting there. Other parts of this testing were outside of my control (tests that fail because of configuration differences between arm and x86/amd64 kernels for example), but they too are being resolved. I have also hit an issue in the past where an SRU kernel had an update from the vendor that disabled video on my test systems (well, disabled HDMI for LCD port - which I don't have hardware to test). Had that kernel gone to the general public after testing only on an automated, headless environment....

Once I have the SRU process fully automated, I can focus on automating other jobs. They should be fairly straight-forward, as the core work (reimaging) will be done, and I can just launch a job to install & run packages at will.

The other big (bigger) problem is (drum roll), infrastructure. When I started on this route, I had 4 systems. I now have 15. Some of this can be deployed in our QA lab, mainly the headless stuff. I'm not sure I could justify the expense of equiping the lab to be able to do remote desktop testing (KVM/IP for HDMI is expensive, and then there is audio, bluetooth, etc). Server stuff yes. Add to that the power relay I have (see earlier blog) works fantastic...on 4 systems. It is expandable up to 256 relays, but my personal budget...well...

Some people also think I should focus on automating the desktop testing somewhat. Well; 1) The interfaces enabling any type of desktop automation is currently broken in gtk3, and 2) The desktop changes too rapidly to automate (gnome 2 -> netbook-launcher ->Unity/Unity2D in 4 cycles). That means scrapping/rewriting a lot of tests every cycle. Much better to get tests for server/core stuff running now, and hit desktop later. There is a lot that can be tested at that level, that desktop will also benefit from.

Hmm, I wonder if this blog could be automated.

Wednesday, October 19, 2011

The only constant in this equation is change.

With winter rolling in (practically overnight), it is time to focus my spare time on inside-the-house projects. I am working on remodeling our 1934 2 story (plus full basement) house; rewiring to current code, new plumbing, insulation, and drywall. First on my project list for this winter is my office. The ceiling fan/light is on the shared knob & tube circuit as the rest of the lights in the house. And the single wall outlet in the room is shared with an outside plug with a grounding wire that is almost invisible (and to think I had as many as 15 computers running on this). There is almost no insulation, so during the last year, temperature in this room went from 65F (18C) in the winter with the furnace running, to 85F (30C) in the summer with air conditioning. The problem is that my office is an addition on the south end of the house, so it gets sun all day long (when the sun is out). It also doesn't have a full basement, only a crawl space with a large opening next to the stairs to the back door.

In years past, I could regulate the temperature better by turning on or off computers, but since I am now doing ARM based testing, my systems don't generate nearly the same amount of heat. My tower of 4 Pandas with drives consumes less than 20 watts of power. For comparison, the average x86 processor today uses 65 watts, and that is just the processor. Think in terms of light bulbs. A 60 watt bulb generates enough heat to burn your fingers if you try to remove it while it is on, whereas a 20 watt bulb is only slightly warm.

Back to this relocation project, I have moved all of my test systems and their 8' table to a big room in the basement, and it is now back online after reimaging my serial console (serial-killer) with 11.10 server. The rest of the systems are booting fine and waiting for me to start slamming them with tasks. Now I just need to move my other 8' table downstairs, along with my desktop system and netbooks. This brings it's own set of problems, as my current firewall is also here along with the dsl-modem. The firewall is an old Pentium III-450mhz (clocked to 300mhz for passive cooling) running Mandriva 9.0 (2.4 kernel and just plain ancient). It has been my firewall now since ~1999 (well before Ubuntu existed). I reinstalled it in 2003 when I moved, mainly because the 10G drive in it had failed. It has been running since then, with the only downtimes due to power outages. If it isn't broken ...

I have a new firewall system in a 1U rack case currently running in the basement. This system is based on a Pentium-M 800mhz with dual gigabit ethernet ports. and using an 8G CF card on an IDE adapter for the OS. It is currently running Ubuntu server and manages DNS and DHCP for the house. It will also provide IPv6 control once it is the primary firewall. My only reservation is the downtime that will be inflicted when I move the dsl-router. When I built and configured my old firewall, it was setup such that it looks like a hole at that ip address, neither responding or timing out (unauthorized access is put into a holding pattern instead of being dropped, a trick I learned during the Code Red virus days of 2001). I am having to relearn how to setup a firewall though, as a lot has changed since 2003 (like say the kernel). I also had to run new phone wiring as the existing wires are the old style 4 wire phone lines, all terminating at a ceramic block with 2 brass studs and nuts in the basement. Talk about scary. With any luck, I should have the new system fully online in short order. Hopefully the new phone line will also improve speed.

Old Phone Wiring

New Phone Wiring

Friday, September 30, 2011

I survived (sort of).

It has been almost 3 months since my last post. During that time, I have expanded my pool of arm systems and added some significant infrastructure improvements including a 4TB dedicated file server to replace heavily overloaded 1TB server that runs everything. This new server was built for a little more than the cost of a good monitor (~$250 US). The advantages are that instead of two 500Gb drives stripped, I now have 4 1TB drives in a full raid. I can now mirror the entire arm tree, not just main and restricted. This will also include the source, but I will wait a bit to start pulling that until just before release.

I have also learned a great deal about the various server loads and setups. The stuff that wasn't well documented I have written up on my testing wiki. Other tests are already documented within ubuntu.com, either under testcases or general wiki pages. Now that I have these documented and tested, I can turn that documentation into some sort of automation run (well, for most of the tests).

The biggest test I ran was taking all 6 pandas and turning them into a clustered filesystem. Imagine a cluster of cell phones...how cool is that? Slow, but still cool. I also helped track an issue with the USB on these devices where USB drive I/O performance would increase 10x if you flood-pinged the system while it was doing heavy I/O. Still slower than USB on a PC, but at least it is now respectable.

Unfortunately, with all of the server work I have been doing, I have been unable to work on porting Ubuntu to my Nook Color. The current port that I have seen online relies on VNC, with no on-screen support. WTF? Since the main SOC is very close to the same as a BeagleXM or Droid 2, it shouldn't be too hard to figure out. Just need time.

Well, there are 3 weeks left before release. And since we are in Final Freeze, only critical bugs are getting fixed, so that lets me relax on my daily testing a little. I plan to use the time working on automating the server tests, and also writing up blueprints for the next release cycle. Maybe I will get more hardware, like an actual ARM server.

One can dream.

Tuesday, July 5, 2011

Validate This!

So, for my first work related blog, I thought I would describe the insane world I am creating for myself. I am a QA tester working on the Ubuntu armel platform.

I have one specific talent for this: I find the bugs that make engineers bang their heads on the wall in frustration. Some of my early work (early 90's) is still listed in Microsoft DevNet (Access date formatting bug, database corruption issue, etc).

My office is (was) an unused bedroom in my house. As part of my remodeling project, I have wired the entire house with Gigabit Ethernet and Wifi 802.11N (for when I am either outside enjoying the Oregon weather or in the bathroom running things remotely). I have 4 Intel systems and 15 Arm systems currently online in my office, plus a rack cabinet in the basement housing my main server (Apache, MySQL, Postgresql, Quassel Core, Mediatomb, Jenkins, and a mirror of the Ubuntu binaries updated hourly), my firewall, and two other arm systems that I keep alive for SRU update testing. Other systems are also running in the house, but I don't care about them as much (wife's Windows box for instance).

The last few years have been revolving around testing the desktop/netbook images for Ubuntu on Arm platforms (FreeScale iMX51, Marvell, Ti Omap3/4).

Most of this testing involves making sure the apps just work. Not a lot of them were developed with ArmV7 in mind, let alone SMP on Arm (this opens up a whole new barrel of monkeys). For example, one fine piece of code had this comment in their atomic memory handler for smp:

/* SWP on ARM is very similar to XCHG on x86.  Doesn't lock the
* bus because there are no SMP ARM machines. */

Erm, yea.  Sure.

This cycle, we are starting to do server stacks on Arm (queue maniacal laughter). To add to the "fun", the current hardware I have available is essentially equivalent to a cell phone (Droid cluster computing anyone?). With the greatly expanded number of systems now cluttering my newly dedicated test table, I "can" test a large plethora of server stacks.

I say that figuratively, as most of the test automation tools I need to implement revolve around x86/amd64 (virtualization, CD ISO files, etc). Not easy when the boot methods revolve around an SD card. Access to these test systems will be primarily serial console, with ssh only after install.

Don't get me wrong, I am in no way saying this hardware is bad. Just that this particular system (for which I have an odd abundance of - 5 online and 2 on order) is really designed for cell phone and tablet/smart book use. There is no SATA, 100mb Ethernet on the USB bus only, etc. What this system excels at is its ability to control 2 1080p monitors with HD video playback. It does have 1G memory, and built in wifi. It will make a great nettop unit and I recently heard that it is the first platform to get HD certification from Netflix on Android.

But running a server stack can be a lot different. Sure, it should be able to handle dhcp and dns loads without yawning, but when you get into NFS, SQL, or LAMP stacks, things may get a little more dicey. At least in theory.

My main goals for this release (Oneiric Ocelot) are to ensure that the software we want to run can actually run on Arm SMP (with none of the code issues listed above). Don't expect benchmarks from me, other than as a cursory "We ran it and it worked". I'm not interested in making it fail any faster than it may already at this stage. My goal is to make sure it runs. I will be running benchmark suites, mainly as they are already written and are good at stress testing the entire stack. Any benchmarks I generate will only be useful in ensuring that any changes between now (Alpha 2) and release in October don't get worse as features are added & fine tuned.

Stick with me on my journey into madness, as I add more systems to my rack cabinet in the basement to be used for various tests (iSCSI host, web client traffic generators, NFS root server, etc). I may even throw a Windows Server system down there for Active Directory testing.

So, with that in mind, queue the maniacal laughter, get your freak on and join me on my journey into madness.