21 Jun 2012

Rails test coverage: sometimes 100% is just right

DHH, the éminence grise of the Ruby on Rails world, took a swipe at the test-first cult with his provocative article "Testing like the TSA", saying in effect that 100% test coverage is mad, bad, crazy behaviour, worthless, and an overall affront to good taste and a crime against humanity. [I paraphrase.] Since we enforce 100% code coverage at all points through our development process, I want to explain how this does not necessarily make us time-wasting, genital-fondling idiots, how the needs of our business drive our quality strategy, and how this pays off for us.

Quality Stamp

At Sage our customers demand and deserve the best we can deliver. We are very quality focused because we build accounting solutions in which getting the right answer matters a great deal: perhaps some customers don't care about quality, but ours demonstrably do. Perhaps in some cases time-to-market is much more important than reliability or maintainability: it is a business decision, and there is no one-size-fits-all answer. However, if you're building for the future and want to avoid years of functional paralysis and a costly rewrite, building an application on a solid quality foundation makes a lot of economic sense.

Write less code

The most effective way to maintain 100% test coverage is by writing less code. We refactor like crazy, and we refactor our tests just as much as our code. We don't repeat ourselves. We spend time creating the right abstractions and evolving them. Having 100% test coverage makes it much easier for us to do this: it is a virtuous cycle.

We've been doing Rails development at Sage for five years now, and we've learned a few lessons. Even if you're writing unit tests with 100% code coverage, you're doing it wrong if:

  • Generators are used to build untested code (i.e. using the default Rails scaffolds to build controllers and views)
  • Partials are the most sophisticated method of generating views, and they look like PHP or ASP
  • The tests are harder to understand than the code


What is the alternative? Well, if all of the controllers and views look pretty much the same, factor them out. The Rails generators create enormous amounts of crappy, unmaintainable boilerplate code – every bit as as much as a Visual Studio wizard. On the other hand, if the controllers and views are each completely different and unique flowers, is it for a good reason or is the code just a mess? Chances are, if the code looks like a mess, so does the app.

In my experience it's also basically useless to attempt to retrofit unit test code coverage onto a project that doesn't have it: the tests that wind up written are always written to pass, and they rarely help much. I haven't yet seen a project that could be rescued from this situation.

Whom do you trust?

When DHH says that the use of ActiveRecord associations, validations, and scopes (basic Rails infrastructure) shouldn't be tested, he's claiming that Rails is never wrong: not now, not in the future, not ever. It's his choice to make that promise, but it would be irresponsible of us to believe it:

  • Rails changes all of the time. Sometimes there are even bugs! (Crazy talk, I know!) But active record associations and scopes are complex and ornery, and can easily be broken indirectly (through a change elsewhere in the code).
  • Because we operate on the Internet, new security risks and fixes appear constantly: zero day attacks are real. We need to react to these threats quickly, and being able to prepare and deploy new versions of our apps based on updated components immediately is crucial. Having a robust test suite makes it much cheaper and less stressful to implement these changes, which drives down technical debt and makes development more responsive, and oh yeah, helps prevent a costly rewrite.
  • We use components that extend and complement the behaviour of Rails. DHH calls out the example of testing validations to be particularly useless. Well, what about when the validations methods change in a rails upgrade? Or you want to adopt a new plugin that changes core Rails behaviour? Or you want to refactor an application to move validation to a more useful place? In all of those cases the tests on validation code would be useful.

Often this means a function in a spec mirroring a function in a model (but with enough difference in naming and syntax to be truly maddening). Yes, this feels stupid sometimes, but it is a very cheap insurance policy, and sometimes it pays off.

Time split

Coffee mug reading 'I ♥ Spreadsheets'

DHH says that you shouldn't be spending more than 1/3 of your time writing tests. This leads to a question: how are you characterizing your time? Is the person doing the implementation also the person making design decisions? If you are doing behaviour-driven development you are actually vetting the requirements at the time you write the tests, so is it a good idea to skip that part and move on to the coding? If you spend time refactoring tests to speed up the test process, should that be counted? Should the time spent writing tests before fixing bugs be counted? Have you decided to outsource quality to a bunch of manual testers? What is your deployment model? I'm reluctant to put a cap on the time writing tests. I find this metric as useful as dictating the time spent typing vs. reading, or the amount of time thinking vs. talking: my answer is not yours, and the end result is what matters.

Risk assessment

We enforce 100% test coverage because it ensures that no important line of code goes completely untested. One can decide to write tests for "important" code and ignore the "unimportant" code, but unfortunately a line of code only becomes "important" after it has failed and caused a major outage and data loss. Oops!

Road sign: reality check ahead

DHH avers that the criticality and likelihood of a mistake should be considered before deciding to write a test about something. However, this ignores the third criteria: cost. Is it cheaper to spend time deciding the criticality and likelihood of writing vs ignoring tests for every single line of code, or is cheaper to just write the stupid test and be done with it? Given the cost of doing a detailed long-term risk analysis on every line of code, does anybody ever really do it, or is the entire argument just an elaborate cop-out? The answer gets a lot clearer once you elect to write a lot less code, and it gets easier once you resign yourself to learning a new skill and changing your behaviour.


Code coverage is a great way to measure the amount of exposure you have to future changes, and depending on your business, it might be necessary to have 100% coverage. A highly respected figure speaking ex cathedra can be very wrong when it comes to the choices you need to make, and sometimes it shows. 100% code coverage may seem like an impossible goal, especially if you've never seen it done. I'm here to tell you it's not impossible: it's how we work, and in our case it makes a lot of sense.

20 Jun 2012

Sympathy for the trolls: do everyone a favour and walk away from fights online

Everybody's been a troll at one point or another. Sometimes we know when we're trolling, but mostly we're just having a bad day (or week, month, year) and we take it out on someone else.

Sometimes we take it out on customer service representatives on the phone. That's the trolling I'm particularly guilty of, and the one I'm most ashamed of: for some reason I always find myself railing at the phone company. (Which phone company? Any of them.  All of them.)  Thankfully, although those conversations are recorded, they aren't (yet) transcribed and posted publicly.  (Now that would make people behave better on the phone.)

But mostly, it's a case of Duty Calls (when Someone is Wrong on the Internet) and we allow a disagreement to escalate.  I had one of these happen to me today, and I (for once) didn't make it worse.  I was proud of myself because I responded with grace and humour, attempted to defuse the situation, and didn't respond when it got truly nasty.

Instead of mixing it up and making things worse, I went for a walk with my husband and dog, where we saw flames peeking out of a big paper recycling bin.  My husband got someone to call the fire department while I ran to find a fire extinguisher, and then another when the police had used that one up and the fire had reignited.  Very exciting, but at least the fire didn't get out of control for very long before the firefighters arrived:

Later I was especially pleased that I didn't bite because I looked at the troll's Twitter account and saw that he was genuinely upset, having left the forum because of poor quality conversations (apparently a pattern he is experiencing). A woman I know once cried in frustration, "why is it that wherever I work, there's always some bitch who makes life miserable for me?"  Indeed.

Nothing I said could have made it better: he needed me to be the villain.  Okay then.  But that doesn't mean I had to feed the fire; I didn't need to correct him, I didn't need to have the last word, and I didn't need to humiliate him.  He took care of it himself.  And it turns out I really did have better things to do.

In closing: not the most mature choice of videos, but hey, gotta be me.

"Fire! Fire! Come in through the back door...
Fire! Fire! I want to be a fireman... and handle your hose."

13 Jun 2012

Rails i18n translations in Yaml: translation tool support

With Rails 2.2 the i18n API was introduced with a new method for translations.  Instead of embracing the venerable gettext which had been the previous standard, the Rails team invented a new way using Yaml files.  The result is a particularly graceful, flexible and very Rubylike way of specifying translations.  It also is much more reliable than gettext, which had many inscrutable issues with locales and caching, and sometimes caused people to get things in the wrong language.  So: bravo, great job.

But to do this, they specified their own translation format, the very flexible Yaml file. There are already a lot of formats floating around, and translation tool vendors and open-source translation developers have been working for a long time on conversion tools between them.  The Translate Toolkit and Pootle emerged from South Africa (a country which groans beneath the weight revels in the glory of eleven official languages) which provide an excellent web-based tool for collaboration, centered around gettext PO files.  However, poor little Pootle started a migration from Python to Django, and we all know how rewrites go.  [Halfway. Badly.]  But Translate Toolkit supported a lot of formats:

  • moz2po - Mozilla .properties and .dtd converter. Works with Firefox and Thunderbird
  • oo2po - OpenOffice.org SDF converter (See also oo2xliff).
  • odf2xliff - Convert OpenDocument (ODF) documents to XLIFF and vice-versa.
  • prop2po - Java property file (.properties) converter
  • php2po - PHP localisable string arrays converter.
  • sub2po - Converter for various subtitle files
  • txt2po - Plain text to PO converter
  • po2wordfast - Wordfast Translation Memory converter
  • po2tmx - TMX (Translation Memory Exchange) converter
  • pot2po - initialise PO Template files for translation
  • csv2po - Comma Separated Value (CSV) converter. Useful for doing translations using a spreadsheet.
  • csv2tbx - Create TBX (TermBase eXchange) files from Comma Separated Value (CSV) files
  • html2po - HTML converter
  • ical2po - iCalendar file converter
  • ini2po - Windows INI file converter
  • json2po - JSON file converter
  • web2py2po - web2py translation to PO converter
  • rc2po - Windows Resource .rc (C++ Resource Compiler) converter
  • symb2po - Symbian-style translation to PO converter
  • tiki2po - TikiWiki language.php converter
  • ts2po - Qt Linguist .ts converter
  • xliff2po - XLIFF (XML Localisation Interchange File Format) converter

In its heels, Google introduced the Google Translate Toolkit, which lets you use the Google Translate engine to suggest translations (based on its own databases or translation memories you provide).  It also does the core of what Pootle does: collaboration, access, but without crashing and flakiness, and it works with:
But neither of them supports Yaml files.  Unfortunately, tooling support libraries have not embraced this format in the intervening two and a half years.  I did find one solution: i18n-translators-tools which supports conversion between Yaml and gettext PO files, but it's somewhat idiosyncratic, and it turns out there's a good reason why there isn't a straightforward Yaml  PO converter: the PO format is consists of name-value pairs with metadata, and the Yaml format is a tree.

English source Yaml fileSpanish Yaml file produced by i18n-translators-tools from a PO file


    date: "Date"


      default: "Sales Credit Note"

      new: "New Sales Credit Note"
    date: "Fecha"
        default: "Sales Credit Note"
        translation: "Crédito de venta"
        default: "New Sales Credit Note"
        translation: "New Sales Credit Note"

There are some interesting things going on here: the Spanish Yaml file provides fallbacks so untranslated strings don't come through as blank.  The intermediate gettext PO file keeps the tree structure in the msgctxt metadata, and looks like this:

msgctxt "page_info.fuji_sales/sales_credit_notes.title.default"
msgid "Sales Credit Note"
msgstr "Crédito de venta"

msgctxt "page_info.fuji_sales/sales_credit_notes.title.new"
msgid "New Sales Credit Note"
msgstr "New Sales Credit Note"

So it's possible to use Google Translate Toolkit to translate your Rails Yaml files, provided you use the i18n-translators-tools library to do the conversions, and configure your Rails applications to support fallbacks.

6 Jun 2012

Job satisfaction: the passionate dermatologist, the chair, and the metal hook

As a teenager I had terrible skin.  It was just average-bad until I was nineteen, at which point it went absolutely Vesuvian, requiring Accutane to tame it.  Each pill came in a nearly impregnable (hah) blister pack which required multiple steps to open: first, slide open the box and see the outline of a hydrocephalic fetus; second, remove a serrated paper tab with the silhouette of a pregnant woman and the red ban-symbol through it, and finally pop the liquid-filled gel pill through a rather tough plastic laminated foil surface.  (The packaging shown at right is nowhere near as extreme.) I would later give the paper tabs with the "no pregnancy" symbol to friends and co-workers to encourage contraception.  But I digress.

Rewind slightly: at sixteen I was taken to a dermatologist.  I went with my mother, who always tried very hard to get me to take care of my skin.  I can't count the number of doctors who looked at my face, looked again, mumbled uncomfortably, then handed me a bar of Purpose soap and gently indicated that I might try using it.  I didn't, of course - the only thing I was willing to do was take the erythromycin that would stain my teeth.  I went through a couple of dermatologists over the years.  But I digress.

So, the dermatologist: somewhere in the northern woods of Fulton County, a rather long drive from my parents' house in Marietta.  Picture me hurtling across the suburbs in a gray 1979 Ford Mustang: one time I recall taking a corner through a freshly red light, tires squealing as I accelerated through second gear, laying on the horn to make sure the other people didn't act too soon on their green light and get in my way.  But I digress: obviously I'm stalling.

The dermatologist was a woman in her mid-to-late thirties, average height, and sensible black hair.  Her skin had obviously been ravaged by acne.  She did the usual first visit: yes, here's your bar of Purpose soap, your prescription for Retin-A, your bottle of pills, whatever.  Then she put me in a reclining dentists' chair, fixed the spotlight on my face, and proceeded to squeeze out each and every one of my blackheads through use of a metal implement.  It hurt like hell: she pressed with that damn thing really hard, all across my nose, forehead and cheekbones.  Push down, scrape across, wipe on a tissue, repeat.  No small talk, no lectures, just intense concentration.

This took at least forty-five minutes, and at least a half hour on each of my subsequent visits.  Each time my face would eventually stop producing the goods and she'd reluctantly let me go.  This was back in the days of full insurance, no copays, and no referrals, when you could go to a specialist all you wanted, for anything you felt like doing.  I have no idea what this woman charged, and it probably wasn't cheap, but she was doing the whole thing herself: no receptionist, no assistants.  Just her versus the zits.

The last time I went she came out to the waiting room to get me.  I stood up and said hello to her, and she never looked me in the eyes: she mumbled hello as she started scanning my face.  I sat through that last agonizing session as she pushed, scraped, and wiped my throbbing face.  Yes, I stopped going because it was a long way from home, because it was painful, and because it didn't stop the pimples that actually bothered me, but mostly I stopped going because that woman creeped me out.

She obviously loved her job, but she loved it way too much: she was a sebum junky, a zit juice vampire, a woman on a mission of vengeance against the acne that had obviously scarred her for life.  To this day, when people talk about being "passionate" about their job I think of her.

27 Apr 2012

An elegy for sweet forgetfulness, soon to be lost forever

In my memory, I'm standing on the Île Saint-Louis, looking at a butcher shop.

But was I ever there? I've provably been in Paris. I've likely been on the Île Saint-Louis. After that I don't know. In my mind's eye I can picture it and picture myself there, but my mind's eye is a notoriously filthy liar. I can remember any number of events that never happened, and I have forgotten many important events that did.

If I was never there, where did this memory come from? It could have been Edmund White, whose evocative novels of his life in Paris have always brought the city to life for me. Reading Declare by Tim Powers brought these memories back, and added wartime paranoia and Nazi intrigue to the mix.

I'll never know for sure whether I've been there before. My previous visits to Paris were before the era of ubiquitous surveillance, GPS cellphone tracking, Google Latitude, Foursquare and ultrazillions of digital photos being taken of absolutely everything at every moment and being pasted online. So even once all of the artificial "privacy" barriers are dropped, once indexing and face recognition systems correlate every sparrow fart since the dawn of the digital age, once every credit-card purchase record is cracked open and something like Vernor Vinge's GreenInc provides a complete personal history of every human, nobody will be able to say with any degree of clarity whether that memory is true or false.

I weep for the children. Their digital trail will never allow them to erase their personal history and start over. No more retrospective virginity restorations. No more he-said, she-said he-did. No more bonfire of the diaries for personal reinvention. Everyone will become a politician denying their words of the day before, followed by an immediate multi-POV video playback with subtitles, location tags, and links to probable original sources shown in the goggles of everyone around them.

On the other hand, I weep with joy for the children. Memory prostheses will make arguments quite different: instead of arguing whose recollection is more accurate, people's agents will automatically debate the relative authoritativeness of the certificate chains and trust authorities of the different sources of evidence. When professionally photoshopped memories, reputation laundering, real-time distributed consensus auctions and whitelisted memory attestation services become common we just won't worry about it anymore. We won't argue about trivia.

Maybe I'll steer clear of the Île Saint-Louis on my next trip and leave the past alone, whether it's mine or borrowed from somebody else. I'll just preserve my own personal mythology a little bit longer.

17 Apr 2012

Homogeneous web development: Meteor, Derby, Firebase and the portents of doom

A variety of new web frameworks are being cooked up that allow you to write one set of seamless code for the client and server.  It's a problem that has haunted the web development community since the dawn of JavaScript and the DOM.  One approach is to basically define the database operations on the client.  Does that sound like a good idea, or does that sound like a great idea?

Exposes the MongoDB API directly on the client to work on automatically-synced data subsets. What could possibly go wrong?  Let's name the project after a flaming ball of rock and find out for sure!

Is client-side MVC too confusing? Is Node.js too immature? Let's combine them and see what happens!  (It remains to be seen whether Derby is named after a hipster hat or a county fair event.)

"We have a full security system in the works that will allow you to control read and write access on individual locations in Firebase on a per-user basis. However, it’s not ready for widespread use yet, so right now all data in Firebase is publicly accessible. Please keep this in mind when building apps! Please contact us if you need security or want to be one of the first to try out the new system." *

Despite my scornful tone, I'm actually very optimistic on these technologies and very hopeful that at least one of these will be ultimately successful.  I'm also really happy that I'm not going to be the first person trying build an application on this stuff. Given the theme of the project names, it's fair to say that most early adopters will get burned.

* Yes, that's a direct quote.

11 Apr 2012

Taste matters: why I should have known better than to use GoDaddy

Years ago I registered several domain names.  They were a lot cheaper then, and because I didn't want to think about which registrar to use, I went with the cheapest and most popular one: GoDaddy.

I did it despite their stupid, vaguely patriarchal name.  I did it even despite the blatantly sexist advertisements.  I told myself that they were just doing what they had to do to bring in customers, that it really didn't matter.  I silenced my doubts and gave them my money.

Since then, GoDaddy's behaviour has been increasingly tacky, insulting, and just plain bad for the Internet and its users.  I'm moving all of my domains onto another registrar, and although it is a pain in the neck, it's the right thing to do.  The lesson for me is that taste matters.  If a company seems distasteful to you initially, they're likely to offend you later — and they'll be doing it with your money.

28 Mar 2012

Second, Third, and Fourth-Order Effects of Social Marketing and Mass Securitization

Several years ago, Facebook founder Mark Zuckerberg crowed that he was able to use the database to retroactively predict with 33% accuracy with whom people would hook up a week later. This was widely viewed as very creepy (and was not spoken about again until recently) but you can guess that this was a dog whistle meant for potential advertisers. The advertisers have listened, and now Google is scrambling to catch up with Facebook on social search (and then advertising).
It’s impossible to get clear numbers on how well this stuff works. Even Facebook and Google probably have no clear numbers, but they certainly have clear enough indications. Google obviously has a clear enough indication to reform their entire company around this. So we can assume it is real. It all seems plausible enough, right?

So we can easily assume that this trend will continue, and that Google and Facebook will correlate increasing amounts of data on us, our friends, our coworkers, and the people we encounter, and will sell this data to advertisers who will essentially be placing bets on our behaviour. If there is a 27% chance that a given couple will marry within the next nine months, then there is a 14% chance that each of their closest long-distance friends will want to buy a plane ticket to the ceremony. Therefore, as an advertiser, you buy a tranche of ads for people whose out-of-town friends are soon to marry. The MapReduce job is an exercise for Google’s new Malaysian coding shop, the tranche is sold to the highest bidder via AdWords. Bada-bing, ca-ching.

As a second-order effect, this advertising activity begins to affect the behaviour of these out-of-town friends. A measurable jump in the number of people attending out-of-town weddings results, and the price on these ads consequently rises. Advertising grows markets all the time, so this is not surprising.

Now we emerge into science fiction-ville. An analyst-bot for a huge trading firm is trawling the AdWords marketplace, looking for interesting tranches for which the price has become overweight, and happens upon the out-of-town weddings advertising market, which is suddenly hugely oversubscribed. It pops up on the screen of a junior analyst (of the human variety) who clicks through to approve the creation of a out-of-town weddings futures market, which the trading firm then (automatically) proceeds to sell to its customers, and then (automatically) takes a short position.
An analyst-bot for one of the advertising agencies flags this new offering, and raises it to the desk of the (human) product manager for this market. She promptly buys into the futures market, betting that the market will rise. She talks to an executive VP and gets approval to buy a large product placement with a popular television show to feature a destination wedding as an upcoming plot. She does not get approval for a proposed contribution to a PAC formed by the National Organization for Marriage, as the VP is gay and cites the growing market for same-sex weddings.

Of course, this assumes that the securitization of everything will continue apace. Certainly there has been no progress in stemming the tide, and I don’t expect it to happen (barring a bloody worldwide insurrection against the dominant economic order).

What are some other examples of the weird things that could result from social marketing combined with this level of financial automation?
  • A new global baby boom triggered by businesses embracing new market development, caused by an algorithmic storm of projected demand for diapers, crude oil, softwood lumber, and manual labour. [The whole thing is triggered by a rounding bug in an Excel spreadsheet.]
  • Investment banks engage in wide-scale manipulation of tampon supply futures indexes by using sponsored advertisements to influence birth control method preferences so that women favour Depo-Provera over oral contraceptives.
  • The Corrections Corporation of America gets into a bidding war with Indian defense contractors on a cheap-labour-supply futures index, which is based on the relative probability of incarceration due to attempted drug sales by American teens.  The Indian defense contractors are shorting this to offset their own risk (due to the effect of rural broadband penetration shortfalls on the gold mining talent pool), and the market becomes very volatile.  To ease this situation, the CCA makes a large automated contribution to a tough-on-crime SuperPAC.
  • Asperger's patients become a new hot dating commodity, as their profiles are moved to the top of the activity ranking by social networks who wish to boost their visibility to advertisers who are bidding extremely highly for their ad dollars.  Social networks optimize their users lives to improve their value to advertisers.  This results in nerds getting laid a whole lot more, and lots more little Asperger's-prone nerdlings (who have truly wonderful advertising potential).
So just remember kids, just because you don't click on those ads in Facebook doesn't mean that those ads aren't clicking on you. And with Google+ and Facebook embedded in every single webpage, you can run, and you can hide, but you cannot avoid being aggregated, and those aggregations will be monetized until they control your every move. Resistance is futile.

Re-reading this hours later I realized that what I'm describing here is a much less rosy portrait of the same technological trends outlined by Bruce Sterling in his seminal short story Maneki Neko back in 1998. Except of course his story has excellent characterization, plot, and narrative drive.

26 Nov 2011

Bankers cautious on Eurozone breakup

While some banks are calling for contingency plans in the case of the breakup of the Eurozone, a large majority of investment banks, hedge fund managers, and responsible commentators are saying "hold on a minute."

"Where am I supposed to put the money?" said Lloyd Blankfein, CEO of Goldman Sachs. "As CEO, I'm committed to maximizing corporate profits, and while Goldman Sachs will obviously make bank on this one that will make the real estate bubble look like a pinprick, I personally haven't even figured out what to do with my bonus from that yet."

Most economists agree that a breakup of the Eurozone would mean an immense realignment of riches, and complain that the term "percent" is becoming particularly unwieldy in describing the distribution of wealth. "My friends and I can't even explain how rich we are anymore. Saying 'We're the 0.00001%' is too hard to understand," lamented hedge-fund manager Peter Thiel. U.S. lawmakers are studying proposals for mandating use of a new term "permega", in which percentages would be replaced by fractions based on a million, as part of the recently introduced Division Is So Hard (DISH) Act.

Tax experts welcome the possibility of change, saying that they are running out of zeros when expressing tax liability to their clients. "You need to pay 0.0000001% of your income in tax this year" is just too hard for many of my esteemed colleagues to follow, said H&R Block CEO William Cobb. "Keep in mind, most investment bankers have graduated in the past 5-10 years from top educational institutions in the United States, and although they all got straight As and graduated at the tops of their classes, so did everyone else."

Although momentum has been significant on the DISH act, with ten co-sponsors in the Senate, opposition is growing. Senator Chuck Schumer (D-NY) is leading the charge against such a change, arguing that math isn't the problem and the tax code itself is at fault. "We can't keep applying the same logic, putting another zero in front of the tax burden of our nation's most productive workers. My smartest staffers tell me that by simply putting a minus sign in front of the tax rate we could reverse the trend and start getting those numbers under control, especially for workers in the affected brackets. We wouldn't have to invent a new symbol, either."

Notwithstanding these efforts by legislators to soften the blow, plans are proceeding apace to launch new currencies in place of the Euro. "We're still trying to work out how we can absorb all of the wealth of the Euro, as well as all of the wealth in the new currencies that will be introduced," said Mr. Blankfein. "We applaud the efforts of our friends in European governments to include us in discussions on how to build a variety of efficient banking systems that will benefit world economic growth."

However, Mr. Blankfein sounded a note of caution on current U.S. legislative efforts to ease the transition. "Of course we appreciate all of the help we can get, but right now that's up to the European Union. The U.S. government should stay focused on efforts to finance the next round of bailouts."

8 Nov 2011

AGPL revisited: how MongoDB licensing differs from MySQL

Now that the Affero General Public License (AGPL3) is actually being used by successful projects, I'm looking at it again. Specifically, MongoDB is AGPL3 licensed, and it is being used for commercial applications. But how?!? I though the AGPL was complete communism, and that's what excited me so much about it - one touch of the the brush, and the whole batch of milk is stained vermillion, and your entire enterprise now belongs to Richard Stallman so he can use it to fund GNU HURD.

The AGPL actually has some pretty fixed boundaries:
A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an "aggregate" if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate.
Upon reflection, the AGPL isn't as restrictive as I once thought. Let's take what I consider to be the most successful GPL (v2) product: MySQL*, and consider what would have happened if it had been released under AGPL instead. Since Amazon used MySQL code to build RDS, under the AGPL Amazon would be forced to release the code they use to provide the RDS service. They would not be forced to release the code for Amazon.com** however: that would clearly be outside the boundaries set out in AGPL.

Also consider that Facebook uses MySQL internally, with something like 4000 MySQL databases to power much of their site, and they've made many changes to MySQL in order to make that possible, some of which they've made public. If MySQL had been AGPL-licensed, they would have been required to make those changes publicly available under the same license.

Google is also reportedly one of the largest users of MySQL, and in a similar spirit they have released some of their tools. However, they released these tools under the more permissive Apache 2.0 license: if MySQL had been released under AGPL3, Google would most likely have been forced to release these tools under AGPL3 as well.*** And now that Google is also offering Google Cloud SQL made with GPL-based MySQL, they don't have to share their work as they would if MySQL were AGPL3-based.

All of this to say: if you want to use MongoDB to power a web app, have fun: the boundaries within the AGPL3 are there to help you, and probably won't require you to hand over your code to every visitor. However, if you see MongoDB and think "hey, that's cool, I'm going to offer a web service with the MongoDB API and become a cloud provider of NoSQL data storage, just like Amazon SimpleDB" then you will have made a derivative work, and you'll have to share those changes with the world under AGPL3.

Finally, IANAL, not in any jurisdiction, and if you base your legal strategy on lay analyses found on personal blogs, then sadly you're not alone and you're in very risky company. Best of luck, however, in finding a copyright attorney who will dig through these issues for you and give you an opinion for less than $500k.

* The Linux kernel is more widely used than MySQL, but it's so mixed up with other licences that it can't just be GPL anymore, not honestly - and the copyrights are owned by so many different people that nobody can claim ownership. MySQL, on the other hand, was always extremely diligent about maintaining ownership of every line of code they include in their distribution (which made acquisition by Sun and Oracle all the more attractive).
** ... that is, provided Amazon.com was built using MySQL, which it isn't AFAIK.
*** They could still licence their code any other way they want, as they own it, but they'd be required to license it under AGPL3.