Thursday, July 5, 2007

Building a Greasemonkey Mashup Tutorial

Ever wonder what a mashup between Tim O'Reilly (technology guru) and Alfred Chuang (BEA CEO) would look like? Well, that might not be pretty but a mashup between O'Reilly Safari and BEA dev2dev holds more promise. What's not to like: embed a world class online technical library (Safari) into a world class enterprise developer web site (dev2dev). Sounds like a winner. Let me show how this easily done with the browser based mashup tool called Greasemonkey.

NOTE: this blog entry was originally posted July 5th, 2007 on my previous blogging system (dev2dev.bea.com). Comments on the old blog were not transferred.

This is part 2 of a series of blog entries on Greasemonkey mashups.

The Magic of Greasemonkey

As I demonstrated in my last blog entry, it is very easy to meddle with any web site you visit on the web. The tool that makes this possible is Greasemonkey, a project started by Aaron Boodman, which enables you to inject, modify or delete pieces of a web page. And all of this is done client-side in the Firefox browser. As an example, in my previous blog entry I showed how easy it is to add a new link to the dev2dev web site. For that case, the Greasemonkey script simply found the right injection point in the page, and then blasted in a new link into that spot.

If you read Mark Pilgrim's Dive Into Greasemonkey online manual or his Greasemonkey Hacks book, you will find all sorts of things to do with an HTML document. You can slice and dice a page anyway you want. For example, one scripter wrote the Ad Blocker user script that deletes ads magically from any page. Also, writing Greasemonkey scripts for GMail seems to have become a cottage industry.

If you stretch the term mashup, you can even call the results of these activities mashups. The target web site is being mashed up with the Greasemonkey user script. But how about creating a more definitive mashup - can the script inject elements from a second site into that target web site? Once again, this is easily done in Greasemonkey and this blog will show you how.

O'Reilly Safari + BEA dev2dev

As mentioned in the preamble, the mashup will target BEA's dev2dev website like in my previous blog entry. However, in this tutorial, I will show you how to inject dynamic content from another site: O'Reilly's book catalog. The idea is this: as you read blogs and articles on BEA dev2dev about web technologies (JSON, Ajax, JavaScript, Greasemonkey, etc), it would be nice to have a list of embedded links to O'Reilly's book catalog. This will be helpful if you wish to find a book to learn more about the technology you are seeing in the blog or article. Further, this list will include links to the O'Reilly Safari online library, giving you instant access to in-depth books and papers on that topic. Safari eliminates the need to order an O'Reilly book via mail - almost the entire O'Reilly catalog is conveniently available for reading online.

BEA dev2dev does not currently have this O'Reilly feature, but that is not a problem. I will build a mashup in this blog entry using Greasemonkey to create this feature.

Let's get started...

Building a Web Mashup with Greasemonkey

I will assume you have read my previous blog entry and that you understand:

  • What Greasemonkey is, and how it performs it's magic in the Firefox browser without modifying the target site
  • How to inject HTML into a location on a web page
  • A basic familiarity with a programming language like Java, JavaScript, etc

Additionally, you need to understand in essence what the XmlHttpRequest is, which I will cover immediately. The XmlHttpRequest is the heart and soul behind that Ajax buzzword you probably have heard something about. It allows the browser to issue HTTP requests programmatically from JavaScript. The XmlHttpRequest quietly operates behind the scenes on a page to gather information from a web server. This feature is used along with code to modify the current page (called DOM manipulation) to update small pieces of the page without causing a full page refresh. This is Ajax. There are detailed articles on the subject like this one from David Teare on dev2dev if you would like to learn more.

To build the O'Reilly Safari + BEA dev2dev mashup, the XmlHttpRequest will be the cornerstone of the solution. When the user browses to a blog or article on dev2dev, the Greasemonkey script will issue an XmlHttpRequest to the O'Reilly web site to gather a list of books that are relevent. It will pass the title of the blog or article as the search query, and it will specify that only 3 books are wanted in the result set. Once the response is received, the Greasemonkey script will extract the summaries of those 3 books and inject that list into the blog or article.

As a starting point, look at this mashup tutorial article of mine on dev2dev: Mashup Article. Now, see the image below to see the end result of this mashup with the O'Reilly links injected:

greasemonkey_oreilly_injected.png

What about the Single Origin Policy?

If you have read about mashups in general or the XmlHttpRequest specifically you will perhaps see a problem with the plan I outlined above. Browsers have universally implemented a security policy that affects the XmlHttpRequest. This policy is called the Single Origin Policy (SOP), and is designed to prevent Cross-Site-Scripting (XSS) attacks. The details of XSS aren't important here, but the SOP is. The SOP prevents an XmlHttpRequest from targeting a URI from a network domain different from the parent page. For example, if a user types http://dev2dev.bea.com in the address bar of his browser, an XmlHttpRequest triggered from that page cannot target http://hack.evil.com. The SOP limits the XmlHttpRequest to the bea.com domain. This is security and this is good.

So how about my plan to implement this mashup? If the user targets a blog or article on dev2dev.bea.com, how is it possible to send an XmlHttpRequest to an O'Reilly web property? The answer is that Greasemonkey scripts operate at a higher privilege level than ordinary page JavaScript. The SOP does not apply to XmlHttpRequests driven from Greasemonkey scripts. So we are clear for take off with this mashup implementation!

greasemonkey_oreilly_arch.png

Building the Script

I will once again assume you have read my previous blog entry and understand how Greasemonkey can inject HTML into a page. What I will show here is how to issue an XmlHttpRequest to a website from a Greasemonkey script and how to process the results to extract the required information. That information will then be injected into the dev2dev blog or article using the technique already covered. The full script is publicly available on the Greasemonkey script repository here: dev2dev Oreilly Script. I encourage you to look at the full script, install it, and use it when visiting dev2dev in the future.

What follows is the partial script listing. It shows these elements:

  • Constructs the URI - sets a parameter to constrain the query to 3 results, and sets the blog/article title as the book query
  • Invokes the XmlHttpRequest to the URI, passing a callback function to invoke when the response is received
  • The callback function parses the response and injects the list of books into the page
// SNIP - code removed for brevity.
// The full script finds the right place in the DOM
// to inject the O'Reilly results

if (commentsAnchor)
{
// build our not-quite-REST URL, not oreilly.com but
// nevertheless it is the O'Reilly book query.
// Note: passing blog title as the query string
var oreillyUrl =
'http://search.atomz.com/search/?sp-a=sp1000a5a9'+
'&sp-t=store&sp-x-1=cat&sp-x-2=cat2&sp-q-2=Books'+
'&sp-c=3&sp-q='+
document.title;
GM_log('URL: '+oreillyUrl);

var user_agent = 'Mozilla/4.0 Greasemonkey';

// go ask Tim what he would recommend based on the
// blog/article title
GM_XmlHttpRequest({
method: 'GET',
url: oreillyUrl,
headers: {
'User-agent': user_agent,
'Accept': 'text/html',
},
onload: processOreillyResponse
});
}
else {
GM_log(' Error: did not find a comment anchor');
}

// callback from the XmlHttpRequest with response
function processOreillyResponse(responseDetails)
{
oreillyHTML = responseDetails.responseText;

// we want to find this HTML in the response since
// it delineates the list of books:
/*
<!-- ResultListStart -->
blah blah
<!-- ResultListEnd -->
*/
var start = oreillyHTML.indexOf(
"<!-- ResultListStart");
var end = oreillyHTML.indexOf(
"<!-- ResultListEnd -->", start);
GM_log("Clipping start: "+start+" end: "+end);
var result = oreillyHTML.substring(start, end-1);

if (result)
{
oreillyHTML = result;

// create the HTML element containing the list
resultsElement = document.createElement(
"placeholder");
resultsElement.innerHTML = oreillyHTML;

// inject the list into the page
commentsAnchor.parentNode.insertBefore(
resultsElement, commentsAnchor);
}
else {
GM_log("Nothing was returned from the clipping");
}
}





Once this script is installed into Firefox, Greasemonkey will properly inject O'Reilly book results into dev2dev pages. Furthermore, the results returned from O'Reilly are conveniently marked up with links to the traditional book catalog, and also to Safari. So with a single click from the blog or article, you can navigate to Safari and dive into a full length book on the technology you just read about on dev2dev. Nice!



Next: Is Greasemonkey a Good Mashup Solution in the Enterprise






If you have been following my blog you will know that I like to evaluate how to use various mashup technologies in the enterprise. I do have an opinion in this regard with Greasemonkey, but it's more than a few paragraphs worth of material. I will delay diving into this topic until my next post. Actually I am debating how to cover it - I may stretch this discussion across the next 3 blog posts, or perhaps I will blast it all in just one. We need to look at various issues such as versioning, Greasemonkey's inverted security model, and Greasemonkey's mashup sweet spot. Subscribe to my blog feed and you'll be sure to catch the discussion however I package it.



Further Reading:






Monday, July 2, 2007

More Mashups: Using Greasemonkey to Weave New Features into Web Sites

With this blog entry I am continuing the theme of demonstrating tools to help you build mashups. In this case, I will show how a tool called Greasemonkey can be a powerful approach for building browser side mashups. Greasemonkey is a plugin to Firefox that allows a script developer to inject useful Javascript into any web page. This capability enables you to add new features to sites that you do not own. I will show in this blog how you can add a new feature to the dev2dev website, without having access to the dev2dev code. Later, a follow-up blog will show how this same technique can create a feature on dev2dev that includes data from a different web site, producing a true mashup.

NOTE: this blog entry was originally posted July 2nd, 2007 on my previous blogging system (dev2dev.bea.com). Comments on the old blog were not transferred.

This is part 1 of a series of blog entries on Greasemonkey mashups.

Client side Code Injection with Greasemonkey

Greasemonkey is just cool - and I have been having a lot of fun playing around with it. It operates on a simple principle, but delivers massive power from that simplicity.

Greasemonkey allows you, a Javascript developer, to write a script that gets included into web pages that a user visits with the Firefox browser. This script can do really anything, but usually it will inject/update HTML elements into the page to create entirely new features for that web site. The targeted web site need not have any idea that this is going on - the Greasemonkey script is executed on the browser after the original page comes back from the web server.

To be useful, a Greasemonkey script generally will be targeted towards a specific web site, though a general purpose script is possible. The developer specifies which URLs (with wildcards) are valid for the script. The Greasemonkey plug-in in Firefox will silently watch the user's URLs as they browser, and will inject a script when appropriate.

To be clear, Greasemonkey is not installed by default, nor are the scripts that you write. It is an opt-in activity for the user to both install Greasemonkey and then each script that they want to have. This means that Greasemonkey is not quite as accesible as other mashup solutions that I have written about - the mashup here is not a URL. The user must be skilled enough to install Greasemonkey and your script.

greasemonkey_arch

Greasemonkey Hello World

Installing Greasemonkey is straightforward: you will find the download link at the Greasemonkey Home Page. You will need to use Firefox of course, but otherwise it can't go wrong. After you have it installed, the next step is to install your first Greasemonkey script.

Unlike a lot of cutting edge projects out on the web, Greasemonkey has excellent documentation. Primarily, the best place to go for an introduction, tutorials, and documentation is Mark Pilgrim's online book, Dive Into Greasemonkey. The book is quite comprehensive and is geared towards getting you on your feet quickly.

To that point, the Hello World example for Greasemonkey that I will show is derived directly from Mark's book. The example below has been stripped of comments for brevity, but otherwise is identical to his original found here. This script is offered as a public file named helloworld.user.js. By the file extension, you know that this is nothing but JavaScript. And a closer look will show that it contains some meta-data within the comments, and then a single JavaScript function call to alert().

The meta-data is hopefully self-explanatory: it instructs Greasemonkey to execute it on any (*) site except for a couple of exclusions. If you install Greasemonkey, and then install this script helloworld.user.js by clicking on the link, you will see that it just pops up an alert box for every page view.

// Hello World! example user script
// version 0.1 BETA!
// 2005-04-22
// Copyright (c) 2005, Mark Pilgrim
// Released under the GPL license
// http://www.gnu.org/copyleft/gpl.html
// ==UserScript==
// @name Hello World
// @namespace http://diveintogreasemonkey.org/download/
// @description example script to alert "Hello world!"
// @include *
// @exclude http://diveintogreasemonkey.org/*
// @exclude http://www.diveintogreasemonkey.org/*
// ==/UserScript==

alert('Hello world!');









greasemonkey_install

























Mr. Wong and BEA dev2dev



























If you visit BEA's dev2dev site, you will notice that Jon has helpfully included links to popular tagging services on all articles and blogs. This allows you to quickly tag the article or blog using del.icio.us, Digg, DZone, Furl or Reddit. See the image below to see how the site works today:



























greasemonkey_mrwong_orig



























However, what if that list is not sufficient for your needs? That is my problem - there is one tagging site that I use that is not listed. As an aside, I have been doing research into the spread of Web 2.0 in China. In doing that work, I have been tagging chinese articles that I find, and some english ones just to be helpful. I am not using an english language tagging site, I am using Mr. Wong, which is a Chinese language tagging site. I would like to have Mr. Wong as one of my tagging options when browsing dev2dev. But since I don't have access to the dev2dev code, I cannot add it.



























Weaving Mr. Wong into BEA dev2dev Using Greasemonkey


















Enter Greasemonkey.









The process for developing a Mr. Wong feature on dev2dev is as follows:













  • Navigate to dev2dev, and View Source on the HTML for blogs and articles






  • Note that the tagging links box is easily found in both by looking for the last div tag with class box_gray







  • The div tag contains a simple HTML table, all we need to do is add a new row






  • Formulating the link to Mr. Wong is easy: we just need the document title and url. Both are easily available in JavaScript






  • Write the JavaScript!
















Before I show the code for the script, here is the result:



























greasemonkey_mrwong


















As you can see, even though I did not have access to dev2dev code, I was able to easily add a brand new feature to it!









The Mr. Wong Script








The script used to add this feature to dev2dev is below. But if you would like to try it out, or view the live version, navigate to my script repository for Greasemonkey (or here if that link does not work). Install the script named dev2dv Mr Wong. The code roughly follows this path:













  • Find the last box_gray div tag on the page, this is the tagging box







  • Find the inner table in that div







  • Add a new row to that table







  • Populate the row with a link and image for Mr. Wong
















// ==UserScript==
// @name dev2dev Mr Wong
// @namespace http://dev2dev.bea.com
// @description Injects a link to Mr Wong on each blog/article link
// @include http://dev2dev.bea.com/blog/*
// @include http://dev2dev.bea.com/pub/*
// ==/UserScript==

var divTagsWithClass, taggingDiv;
GM_log('Running dev2dev Mr Wong script');

// get the existing tagging div box
// by looking for the class (box_gray)
divTagsWithClass = document.evaluate(
"//div[@class='box_gray']",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
// the tagging div appears last
taggingDiv = divTagsWithClass.snapshotItem(divTagsWithClass.snapshotLength-1);

if (taggingDiv)
{
// find the table
tableTag = taggingDiv.getElementsByTagName('table')[0];
if (tableTag)
{
// create the new row
var lastIndex = tableTag.rows.length;
newLinkTR = tableTag.insertRow(lastIndex);

// create the td for image and set styles
var newLinkTD = newLinkTR.insertCell(0);
newLinkTD.valign = 'bottom';
newLinkTD.width = '20';

// build the HTML
newLinkTD.innerHTML = '<img src='+
'"http://www.mister-wong.cn/favicon.ico"'+
' alt="Mr. Wong" border="0" height="18" '+
'hspace="8" width="18">';

// create the td for the image
var newLinkTD = newLinkTR.insertCell(1);

// set the styles
newLinkTD.nowrap = 'nowrap';
newLinkTD.valign = 'bottom';

// build the HTML
newLinkTD.innerHTML =
'<a href="http://www.mister-wong.cn/index.php?'+
'action=addurl&v=1&bm_url='+
window.location+
'&bm_description='+
document.title+
'">'+
'Mr. Wong</a>';
}
else {
GM_log(' Error: did not find the tagging inner table');
}
}
else {
GM_log(' Error: did not find a tagging div');
}








Pros and Cons









Hopefully you can see the power and simplicity of the Greasemonkey plugin. In a follow up blog I will show how you can take this technique further by injecting data from other sites to create a true mashup. Here is what I see as the pros and cons of this tool:













  • Pro: it is easy for a developer to install both the plugin and scripts







  • Pro: a seasoned JavaScript developer can crank out features in little time







  • Pro: you do not need access to the web site's code to add new features







  • Pro: the documentation and developer community around Greasemonkey are superb







  • Con: mashups created with Greasemonkey aren't exactly URL accessible, it requires install steps







  • Con: Greasemonkey is not supported on IE or any browser other than Firefox







  • Con: there is a significant security issue that I will cover in my next blog entry






































References


































Technorati Tags: ,