Thursday, July 5, 2007

Building a Greasemonkey Mashup Tutorial

Ever wonder what a mashup between Tim O'Reilly (technology guru) and Alfred Chuang (BEA CEO) would look like? Well, that might not be pretty but a mashup between O'Reilly Safari and BEA dev2dev holds more promise. What's not to like: embed a world class online technical library (Safari) into a world class enterprise developer web site (dev2dev). Sounds like a winner. Let me show how this easily done with the browser based mashup tool called Greasemonkey.

NOTE: this blog entry was originally posted July 5th, 2007 on my previous blogging system (dev2dev.bea.com). Comments on the old blog were not transferred.

This is part 2 of a series of blog entries on Greasemonkey mashups.

The Magic of Greasemonkey

As I demonstrated in my last blog entry, it is very easy to meddle with any web site you visit on the web. The tool that makes this possible is Greasemonkey, a project started by Aaron Boodman, which enables you to inject, modify or delete pieces of a web page. And all of this is done client-side in the Firefox browser. As an example, in my previous blog entry I showed how easy it is to add a new link to the dev2dev web site. For that case, the Greasemonkey script simply found the right injection point in the page, and then blasted in a new link into that spot.

If you read Mark Pilgrim's Dive Into Greasemonkey online manual or his Greasemonkey Hacks book, you will find all sorts of things to do with an HTML document. You can slice and dice a page anyway you want. For example, one scripter wrote the Ad Blocker user script that deletes ads magically from any page. Also, writing Greasemonkey scripts for GMail seems to have become a cottage industry.

If you stretch the term mashup, you can even call the results of these activities mashups. The target web site is being mashed up with the Greasemonkey user script. But how about creating a more definitive mashup - can the script inject elements from a second site into that target web site? Once again, this is easily done in Greasemonkey and this blog will show you how.

O'Reilly Safari + BEA dev2dev

As mentioned in the preamble, the mashup will target BEA's dev2dev website like in my previous blog entry. However, in this tutorial, I will show you how to inject dynamic content from another site: O'Reilly's book catalog. The idea is this: as you read blogs and articles on BEA dev2dev about web technologies (JSON, Ajax, JavaScript, Greasemonkey, etc), it would be nice to have a list of embedded links to O'Reilly's book catalog. This will be helpful if you wish to find a book to learn more about the technology you are seeing in the blog or article. Further, this list will include links to the O'Reilly Safari online library, giving you instant access to in-depth books and papers on that topic. Safari eliminates the need to order an O'Reilly book via mail - almost the entire O'Reilly catalog is conveniently available for reading online.

BEA dev2dev does not currently have this O'Reilly feature, but that is not a problem. I will build a mashup in this blog entry using Greasemonkey to create this feature.

Let's get started...

Building a Web Mashup with Greasemonkey

I will assume you have read my previous blog entry and that you understand:

  • What Greasemonkey is, and how it performs it's magic in the Firefox browser without modifying the target site
  • How to inject HTML into a location on a web page
  • A basic familiarity with a programming language like Java, JavaScript, etc

Additionally, you need to understand in essence what the XmlHttpRequest is, which I will cover immediately. The XmlHttpRequest is the heart and soul behind that Ajax buzzword you probably have heard something about. It allows the browser to issue HTTP requests programmatically from JavaScript. The XmlHttpRequest quietly operates behind the scenes on a page to gather information from a web server. This feature is used along with code to modify the current page (called DOM manipulation) to update small pieces of the page without causing a full page refresh. This is Ajax. There are detailed articles on the subject like this one from David Teare on dev2dev if you would like to learn more.

To build the O'Reilly Safari + BEA dev2dev mashup, the XmlHttpRequest will be the cornerstone of the solution. When the user browses to a blog or article on dev2dev, the Greasemonkey script will issue an XmlHttpRequest to the O'Reilly web site to gather a list of books that are relevent. It will pass the title of the blog or article as the search query, and it will specify that only 3 books are wanted in the result set. Once the response is received, the Greasemonkey script will extract the summaries of those 3 books and inject that list into the blog or article.

As a starting point, look at this mashup tutorial article of mine on dev2dev: Mashup Article. Now, see the image below to see the end result of this mashup with the O'Reilly links injected:

greasemonkey_oreilly_injected.png

What about the Single Origin Policy?

If you have read about mashups in general or the XmlHttpRequest specifically you will perhaps see a problem with the plan I outlined above. Browsers have universally implemented a security policy that affects the XmlHttpRequest. This policy is called the Single Origin Policy (SOP), and is designed to prevent Cross-Site-Scripting (XSS) attacks. The details of XSS aren't important here, but the SOP is. The SOP prevents an XmlHttpRequest from targeting a URI from a network domain different from the parent page. For example, if a user types http://dev2dev.bea.com in the address bar of his browser, an XmlHttpRequest triggered from that page cannot target http://hack.evil.com. The SOP limits the XmlHttpRequest to the bea.com domain. This is security and this is good.

So how about my plan to implement this mashup? If the user targets a blog or article on dev2dev.bea.com, how is it possible to send an XmlHttpRequest to an O'Reilly web property? The answer is that Greasemonkey scripts operate at a higher privilege level than ordinary page JavaScript. The SOP does not apply to XmlHttpRequests driven from Greasemonkey scripts. So we are clear for take off with this mashup implementation!

greasemonkey_oreilly_arch.png

Building the Script

I will once again assume you have read my previous blog entry and understand how Greasemonkey can inject HTML into a page. What I will show here is how to issue an XmlHttpRequest to a website from a Greasemonkey script and how to process the results to extract the required information. That information will then be injected into the dev2dev blog or article using the technique already covered. The full script is publicly available on the Greasemonkey script repository here: dev2dev Oreilly Script. I encourage you to look at the full script, install it, and use it when visiting dev2dev in the future.

What follows is the partial script listing. It shows these elements:

  • Constructs the URI - sets a parameter to constrain the query to 3 results, and sets the blog/article title as the book query
  • Invokes the XmlHttpRequest to the URI, passing a callback function to invoke when the response is received
  • The callback function parses the response and injects the list of books into the page
// SNIP - code removed for brevity.
// The full script finds the right place in the DOM
// to inject the O'Reilly results

if (commentsAnchor)
{
// build our not-quite-REST URL, not oreilly.com but
// nevertheless it is the O'Reilly book query.
// Note: passing blog title as the query string
var oreillyUrl =
'http://search.atomz.com/search/?sp-a=sp1000a5a9'+
'&sp-t=store&sp-x-1=cat&sp-x-2=cat2&sp-q-2=Books'+
'&sp-c=3&sp-q='+
document.title;
GM_log('URL: '+oreillyUrl);

var user_agent = 'Mozilla/4.0 Greasemonkey';

// go ask Tim what he would recommend based on the
// blog/article title
GM_XmlHttpRequest({
method: 'GET',
url: oreillyUrl,
headers: {
'User-agent': user_agent,
'Accept': 'text/html',
},
onload: processOreillyResponse
});
}
else {
GM_log(' Error: did not find a comment anchor');
}

// callback from the XmlHttpRequest with response
function processOreillyResponse(responseDetails)
{
oreillyHTML = responseDetails.responseText;

// we want to find this HTML in the response since
// it delineates the list of books:
/*
<!-- ResultListStart -->
blah blah
<!-- ResultListEnd -->
*/
var start = oreillyHTML.indexOf(
"<!-- ResultListStart");
var end = oreillyHTML.indexOf(
"<!-- ResultListEnd -->", start);
GM_log("Clipping start: "+start+" end: "+end);
var result = oreillyHTML.substring(start, end-1);

if (result)
{
oreillyHTML = result;

// create the HTML element containing the list
resultsElement = document.createElement(
"placeholder");
resultsElement.innerHTML = oreillyHTML;

// inject the list into the page
commentsAnchor.parentNode.insertBefore(
resultsElement, commentsAnchor);
}
else {
GM_log("Nothing was returned from the clipping");
}
}





Once this script is installed into Firefox, Greasemonkey will properly inject O'Reilly book results into dev2dev pages. Furthermore, the results returned from O'Reilly are conveniently marked up with links to the traditional book catalog, and also to Safari. So with a single click from the blog or article, you can navigate to Safari and dive into a full length book on the technology you just read about on dev2dev. Nice!



Next: Is Greasemonkey a Good Mashup Solution in the Enterprise






If you have been following my blog you will know that I like to evaluate how to use various mashup technologies in the enterprise. I do have an opinion in this regard with Greasemonkey, but it's more than a few paragraphs worth of material. I will delay diving into this topic until my next post. Actually I am debating how to cover it - I may stretch this discussion across the next 3 blog posts, or perhaps I will blast it all in just one. We need to look at various issues such as versioning, Greasemonkey's inverted security model, and Greasemonkey's mashup sweet spot. Subscribe to my blog feed and you'll be sure to catch the discussion however I package it.



Further Reading:






No comments: