Friday, August 7, 2009

A critique of an open-source software critique, or why OSS can be awesome

It's been years since I've written a critique that anyone but myself has had the (perhaps unfortunate) opportunity to read, but this particular entry - http://blog.bitquabit.com/2009/06/30/one-which-i-say-open-source-software-sucks - just begged me for it.

For full disclosure - my bias here about open-source software (OSS) should be noted. I run Ubuntu on my work computer with all sorts of OSS and freeware installed (OpenOffice, the GIMP, Inkscape, FF, NXServer+NXClient, etc.). I also posses a custom-built (by me) machine with Windows 2000 Pro, MS Office, Visual Studio, and Adobe Photoshop. I do work almost entirely from my Ubuntu machine, heavily use the wonderfully free Google Docs for interaction with clients, and have not run into any problems that have hindered my productivity. I daresay that my Windows 2000 Pro machine is also very stable, and I can be just as productive on it when I work from home.

So in short, I am totally thrilled with the state of OSS; some, probably a lot of it sucks, so I just don't use it. Same reason I never bought Neverwinter Nights 2.... But I have my biases, as does everyone, and now you know them!


OK - back to the article. I'll start with the positive - there are definitely one or two sound tidbits. I certainly agree that Apple makes a product that is becoming far more desirable than the Microsoft + Dell/HP/whoever model, so long as there's enough money in the bank to afford one.

Also, I do agree with the basic premise that commodities developed without any motivation for making it good, will - in most cases - suffer from being not-good.

However, overall, the article's argument is not sound - it is not the case that open-source software programmers suffer from an inherent lack of motivation, necessarily resulting in a sucky product. Less rooted in logical critique, I also found it annoying that before presenting the primary argument, the article defames open-source software with a smattering of ad hominem attacks (more on this later). I mean, great political technique but come on, it's not like we should all try to argue the same way Fox News reports.

SO ... on the surface, a lot of the points do pass validity checks:

a. Good software is "well thought-out, and easy-to-use, and easy-to-install, and highly efficient, and bug free."
b. Open source software is "painful to use" and "incontrovertibly, a total usability clusterfuck."
c. Therefore, open source software fails to be easy-to-use, so it is not good.

OK, so now we need to know why open source software fails to be usable. Gladly, there is some language to back this up.

a. Programmers, left to their own devices, are lazy, both in terms of creating efficient software efficiently, and not wanting to do extra work that does not make their software more efficient.
b. Programmers, primarily care about creating an "underlying, beautiful implementation," and on their own are unmotivated - as it follows from a. - to make intuitive, usable UIs.
c. Companies need to make usable, bug-free software so people will buy and use it, and so less money will need to be put into maintenance.
d. A company employs these lazy programmers and uses a salary (and probably pushy managers) as motivation to create usable, bug-free software.

e. (a. and b. restated more simply) Without motivation, the programmer will develop unusable software.
f. Since a programmer who makes open-source software lacks corporate-provided motivation, and since we've already established b., he will therefore create unusable software.
g. From f. we can conclude that Open-source software is developed without proper motivation and is consequently unusable. And the suck.

I just have to question the premises a., b. and consequently f. It's hard to argue over a. as aside from just disagreeing (which I do) and being a bit offended (I'm not that lazy), neither of us has statistical proof, so I'll just leave it be. There are possibly certain truths to b., but it might moreso be that a good amount of programmers are unable to create intuitive UIs because it's hard. I don't know for sure - an HCI person might have answers.

But I can certainly attack f. by proposing alternative motivations. There are certainly other reasons a programmer would be motivated to create a great, usable UI - hubris, for example? How many cocky programmers do you know? Or perhaps they simply want to take pride in their work, and expect a windfall from the product that is not immediately monetary in nature. Maybe they'll pull a Google, release massive amounts of open-source and freely available software, and slowly gain possession of the world's information and trust until they assimilate us all into Google-Borgs. Motivation enough for ya?

So I do agree in principle to e., that an unmotivated developer would be likely to cut corners, and the end product would suffer. But an unmotivated producer of any commodity will probably develop a pretty sucky product, and it is not the case that all open-source software suffers from an implicit lack of motivation. So throwing the baby out with the bathwater because it's not immediately monetizable is just short-sighted.


OK - now for some more fun, let's pick through all of the slander in the opening argument and try to find some meat:

a. The means of business entities with successful ends infers the "goodness" of the mean.
b. Open source software is not used by successful business entities - that "I can point to" - as a means to their end.
c. Therefore, open source software lacks "goodness".

Phew - who sees validity or non-questionable premises in that?

It does not follow that if a business is successful and uses product X, that a business that uses product Y will not be successful. If there was some data presented that showed the statistical significance of unsuccessful companies running on open-source software vs. successful companies running closed-source software, perhaps this would be acceptable. Instead, the validity of the argument simply runs out of steam after "I don't know of any successful companies that use open-source software."


OK so this wasn't a ridiculously rigorous academic analysis but I think I hit a few good points, and if I missed something by all means, critique me! And if you made it all the way down here, thanks for getting through it, and be sure to let me know what you think.

Friday, June 19, 2009

GWT / AJAX Performance Tools

Performance is important in your Web applications, especially in your client code. If you're a GWT user, I wanted to highlight a particularly useful benchmarking library - Jiffy and it's Firebug counterpart. It is the case that Safari's developer tools and Firebug by itself make user-injected benchmarking code less relevant for macro-benchmarks. But, if you're interested in fine-grained measurements and creating a rolling log of them, I found the developer tools a bit lacking.

Benchmarking is really just about specifying two points - a start and a stop. Giving that benchmark a label can be useful, and being able to review logs of these benchmarks over time would also be nice. Jiffy does all of this, with minimal setup time, and if you install the Firebug plugin, very immediate results.

First I downloaded the jiffy.js source and plugged this script tag into the HEAD of my GWT application's main HTML page:

script language="javascript" type="text/javascript" src="jiffy.js"

Then I created this simple class to wrap the required Jiffy calls as Java methods:

public class JiffyUtils {
public native static void mark(String markerName) /*-{
if( $wnd.Jiffy )
$wnd.Jiffy.mark(markerName);
}-*/;

public native static void measure(String measurementName, String markerName) /*-{
if( $wnd.Jiffy )
$wnd.Jiffy.measure(measurementName, markerName);
}-*/;
}

...and I was ready to go! I inserted numerous marks and measures into a particularly slow, esoteric series of operations and determined where my bottlenecks were; oddly enough, it lead me to the discovery of the slowness of HashMaps in GWT 1.4 and, to a lesser extent, in 1.5 (Google for "HashMap optimizations gwt" for a bit more info.

Turns out I didn't need continual performance measurements, but Jiffy does offer a bunch of support for it. You can either get a JSON object out of Jiffy with all of your measurements by wrapping a call to
$wnd.Jiffy.getMeasures();
in a JSNI method, which I used momentarily but did not rely upon heavily. You can also use the Jiffy Batch and Realtime logging, which is explained on their site and would be useful for those interested in maintaining backlogs of their performance metrics for analysis.

Wednesday, April 8, 2009

GWT 1.6! Has it been that long...?

Google had some rather sizable announcements yesterday afternoon - the launch of GWT 1.6, Java support for App Engine, and an Eclipse plug-in for better integration with these two products.

App Engine had been a small pain to develop in all of its Python-y goodness, but is a fantastic platform in theory, and will definitely yield a result after jiggling enough bugs off of the farm. It's good to see the chains coming off, and it will be fun to watch it continue to evolve.

More important (to me, at least!), GWT 1.6 boasts performance benefits (parallel compilation is wicked fast on multicore boxes ... just pass the compiler the -localWorkers flag and you no longer have to wait 8 minutes for your app to build!), slicker Widgets, and a nifty new Event handling model. Do watch out - if suddenly your events aren't being fired after an upgrade, make sure if you've overriden the onBrowserEvent method you invoke super.onBrowserEvent(...) - that's where the default Event handling happens! The WAR-based implementation is probably cool, but I haven't taken advantage of it at all (RESTlet is my friend), so let me know what you all come across!

For more details, check out their blog.

...and now, back to your regularly-scheduled life...

Friday, August 29, 2008

GWT 1.5 is here!

Just a quick huzzah for the GWT folks ... GWT 1.5 is done! One of the best new features: Java 5 compatibility.

In their own words: 1.5 blog announcement

Enjoy!

Monday, July 28, 2008

Data Conversion Libraries 'R' Us

I've noticed a good chunk of traffic towards this blog is from people looking for help making their relational data (or Java classes) into XML/JSON/(insert Web-friendly format here), and vise versa. Well, this entry's for you.

Are you developing a Web application that requires object serialization, and don't want to write custom mapping files for every object? Have a database you want to access in an object-oriented way? Well, you're in luck! There are plenty of solutions that will do most of the heavy lifting of these processes for you. I will quickly introduce a few of them - Hibernate, Middlegen, XStream, and JAXB - with which you can rapidly access your database as objects, serialize the data to XML documents or JSON representations, and send it around the Internets at will.


XStream

If you want to use XML or JSON as your translated medium, XStream provides automatic Object -> XML (and back again) mechanisms, requiring only a JVM and some knowledge of Java. In my experience with XStream, it is a very good answer to a problem that otherwise took a lot of manual effort to do correctly. Best of all, you typically don't have to create any custom mappings, so if you're in a pinch, this will provide a big bang with little effort. Also worth mentioning is that XStream has a very impressive performance profile, so if your server has high throughput, XStream should not be a bottleneck.


Hibernate and Middlegen

If you have your data in a database and want to move it to Object-land, I'd advocate a classical solution: Hibernate and Middlegen. Most of you might know about Hibernate, but if you don't it's an "object/relational persistence and query service", or a service that will transition your data between its persisted state to handy Objects, based on queries you specify and mapping files you provide. The rub is in the generation of the mapping files, as it is not a task I imagine anyone enjoys. That's where Middlegen comes in; it will auto-generate mapping files based on the structure of your database, and for the most part does a fantastic job of it. It isn't perfect, but it will at least yield a result that you can tweak to your needs.

On the useability front, Middlegen can be a bit of a pain to set up and plug in, but if you have a lot of data to deal with (it generated over 100 mapping files for my case), it's worth it. For what it is, Hibernate is pretty easy to set up and run (although it has a LOT of dependencies to clutter up your classpath). The biggest issue I've come across is its performance. At first use, Hibernate will probably make you cringe. The memory footprint can get ludicrous very quickly, and typically in the amount of time it takes to bootstrap I can build an Ark. But once it's up and running on a machine with sufficient memory, it will perform quite well, querying the database and yielding a list of objects to play with in about the time it would take to do a raw SQL query. And if there are still sufficient performance issues, the cache-control mechanisms Hibernate allows for can give a big boost to your application (either minimizing memory usage or increasing throughput), depending on your traffic profile.


JAXB

If your client application runs inside a Java container, or if your server-side data consists heavily of XML documents, JAXB provides a way to take these documents and unmarshall their data into Java objects. I'm not going to lie and say I've used it, read anything more about it than in a few articles, or even say it looks painless to use, but I've had it recommended to me from a few reliable sources, and figured I'd give it a mention.

Have some better or alternative methods? Used anything I've mentioned above? Please share it in the comments section!

Friday, July 25, 2008

GWT and RESTlets

I'm back, with some big news!

RESTlet ported to GWT!

I have been busy with a large-scale, enterprise AJAX Web application, and I'm happy to say I've had to write at most 200 lines of Javascript; instead, we have about 80,000 lines of Java. Most of you probably know why ... we're using Google Web Toolkit (GWT) (+ RESTlets), and thought I would share my love for these libraries. (Before we delve deep into my ramblings, I'd like to point out there other libraries that rival RESTlet's fantastic-ness, such as Jersey, but I've not used them extensively enough to write about them, so feel free to post your links to other blogs!)

If you haven't heard of GWT or just haven't taken the time to check it out, you really, really should [ http://code.google.com/p/googlewebtoolkit/ ]. In short, Google developed a compiler that creates Javascript bytecode from Java source, so you develop your Client-side code just as if you're developing a Java app (don't worry, custom Javascript can be invoked using JSNI), and Google does the rest of the work.

If you're going to build both the client-side and server-side bits, GWT has some built-in support for you. It provides RPC mechanisms, as well as ways to make plain ol' HTTP requests. First up: RPC! Simply explained, an RPC-based application is built using just a few pieces and parts - an interface so the client knows what methods it can invoke and an implementation of these methods on the server. Having used it, though, the RPC is a bit arduous to maintain, clunky to build, and can be slow as molasses. But it works, has the benefit of making the client-side code easy to read, and methods more intuitive to invoke.

Now, if you want to provide server-side functionality via HTTP requests instead of RPC calls, the RequestBuilder class is what you want. It has a simple and relatively robust API, but has its drawbacks. If I had to try to pick a fight, I'd say the drawbacks are that it does require a bit of knowledge about the lower-level workings of HTTP requests (e.g. header syntax), and doesn't provide a quick way to get XML from the Response object in a Document format, but there aren't any really pressing issues that I've encountered.

So enough about GWT - what is this RESTlet stuff? Well, it's based on a particular architectural style, developed in the wonderful world of Acadaemia (don't run away now): Representational State Transfer (REST). It's just a way of designing how to manage and access resources, and once you figure it out, you'll notice a lot of the Internet basically works this way already. Let's say you're building a store that sells T-shirts of many colors. A RESTful way of modelling your store-front would be to provide the resources of your store, e.g. the T-shirts, in a representational way. So to see a representation of all T-shirts sold at the store, one might visit the url http://www.tshirtstore.com/storefront/tshirts. To see a representation of all blue T-shirts, one could visit /storefront/tshirts/blue. To visit the cart, one could go to /storefront/cart. These resources can be delivered as XML, XHTML, JSON, binary, plain-text ... you name it. Each URL basically represents a resource and ways to get to other resources, if applicable. Intuitive? I tend to think so.

By the by, my colleague explains it well (i.e. better) in non-example-based terminology, if you'd like to see it go here: http://gogocarl.blogspot.com/

In any case, REST is a wonderful way to expose resources and functionality to your client code in an intuitive way, and the greatest of all is that it highly decouples your client and server code bases, so if you deliver your resources in a client-independent way (e.g. XML), you can show off your resources in many different ways.

If you'd like to create a GWT + RESTlet application, check out the RESTlet-GWT module [ http://blog.noelios.com/2008/07/25/restlet-ported-to-gwt/ ]. If you give it a try let me know how it goes! I've had such good luck with it I'm rather skewed towards it, but I'm still a bit of a n3wb Web app developer, and I'd be very interested to know what your opinions are.

Wednesday, October 24, 2007

Relational Data as XML

Everything is fodder for argument with XML. Note these 50 pages of point-counterpoint that discuss nearly every aspect of XML's use and usability - it's a pretty quick read [http://www.zapthink.com/report.html?id=ZT-XMLPROCON].

The short of this entry is that storing data on the server-side as XML documents is a very flexible, readable, maintainable, and most importantly scalable option for a Web application. Document size issues that tend to accompany XML-based storage can be avoided by leveraging the size of documents vs. external contextual data (i.e. indexing information). Performance and scalability for even the most demanded applications can be as outstanding as any highly-trafficked Web site, if the metaphor of serving your data in XML documents as Web pages can be kept in mind.

The traditional rule of thumb is to put "data" in a database and store documents on a file system. But breaking up a document's worth of data to shove into a database is not a difficult task, and composing a table or two into a document of information is also not difficult. Where things become complicated is when mounds of contextual metadata is generated to cope with relational data. Separating contextual data from content can give you immense power over what you serve, when you serve it, and how you manage it.

-----

There are many time-sinks and concerns when using XML to describe your data, a few of which are engineering the documents' structures, handling data serialization, performance issues, and information bloat. Before you can massage your data into XML, you need to know what structure the document should be - sometimes this is not trivial, especially if representing relational data, and will take time. Taking into account the hitch that binary serialization is brittle and not always platform/hardware independent, having to massage your data so it can be serialized as String representations may also take time; complex objects need to be broken up into primitives to be represented nicely.

Once this is finished, immediately one might notice that some XML representations are thousands of lines long (I made this wonderful mistake more than once), slowing your parser and increasing the transfer time towards O(eternity); the more complex the object, the more contextual descriptions and metadata will be needed to describe the object. Ugh.

Since there is (obviously) a connection between the size of your XML document and the time it takes to parse its content, a good way to increase performance is to break your data into finer levels of granularity. For instance, if you have a document representing the furniture in your house by room, you can easily break the document up into many documents, each representing one room.

With each room separated out, a natural idea would be to add an additional tag under the "room" parent node (or as an attribute) that identifies which house it belongs to. If we break things down even farther, and separate out each piece of furniture into its own document, we would need to add a tag that identifies which room each piece of furniture belongs in.

This method of "contextualizing" XML documents has a problem. Since the information is internal to the document, each time the context of a document needs to be determined it must be parsed and searched. Additionally, this method only allows the "child" document to know who its "parent" is - a house would not know what furniture it has, only the furniture would know which house it belongs to. There are at least two ways to solve this, and the first is obvious - after breaking up a large document, build a "virtual" document that has references to its pieces and parts. This is not a terrible way of doing things - it can, however, lead to an enormous amount of files.

The alternative is to externalize all "virtual resources" into one document, namely a large indexing file. So if multiple house documents - "FooHouse" and "BarHouse" - have each of its rooms stored as separate files, a master document will subsume the identifiers for both room documents under their particular identifiers. When a user requests the resource "FooHouse", the master document (which is assumedly kept in memory for quick traversal) will either assemble the document, or - assuming a screen could not handle showing every room's contents all at once - simply serve each room document when requested.

This solution can scale well, requires no particular storage model (the master document could refer to tables in a database, files on a disk or even a URL) and allows a conceptual resource to be as complex as needed. It also allows for performance tuning, as the client can receive only the portions of the resource pertinent to their particular task.