Web

Web scraping with JavaScript

KarmaDude Dec 23, 2007

UPDATE: Here is a better technique for scraping websites using Node.js

Web scraping is a very common process which constantly gathers content from web pages, and is then either put to good use as in search engines or bad uses, such as stealing content. It’s mostly a server side process, where bots and crawlers visit pages, parse content using various pattern matching, string comparison, and regular expression based techniques.

But today, with the popularity of JavaScript, flexible access to the DOM structure, and availability of libraries such as jQuery, page scraping can be approached differently, with less code, and less intrusively using JavaScript. So, I decided to give it a try, using a well structured site like Digg as an example, and build a page scrapper using JavaScript.

DiggStripper is the result of this experiment. The functionality is simple, it takes the Digg home page, traverses the DOM structure, and extracts stories, and builds a JSON object containing the extracted stories. Now, Digg does provide an API to access its information, so there is probably not much use for this page scraper, other than to serve as an example of page scrapping using JavaScript, or to get around any limits set by the Digg API.

The DiggStripper code is available as open source under MIT License, so feel free to download it, and do provide your feedback and ideas for taking it to levels I have not thought of yet.

Web: Ladybug

KarmaDude Dec 12, 2007

Ladybug Cakes & Catering website
Client: Ladybug Cakes & Catering
Notes:

STOP resizing my browser window

KarmaDude Oct 8, 2007

This is to all you web developers, especially flash developers, who feel it’s uber cool to resize browser windows. STOP IT NOW! The browser window on my computer is mine, and not yours to resize. As a user, I like to size the browser window a certain way, and it’s extremely annoying when a site resizes the browser window. So, back off and stop resizing my browser window!

If you are a Firefox user, then there is a way around this annoyance. Open your options, go to content tab, and click on the “Advanced” button shown in the image below.

Firefox Options

In the “Advanced JavaScript Settings” dialog which pops up, uncheck the first option, “Move or resize existing windows”, and this will prevent scripts from resizing or moving your browser window.

Firefox Advanced

Web: myHimachal

KarmaDude Oct 3, 2007

myHimachal website
Client: myHimachal
Notes: Avnish Katoch from myHimachal blog had approached me last week, seeking a new design using Revolution News theme by Brian Gardner as base. After a couple of iterations, Avnish settled on this final design. I tried not to stray too much from the existing layout of the Revolution theme, and that made it quiet easy to update Revolution News Theme to the new look, and the site is now live.

There is still some on going work being done to make the rest of the site adhere to the new design, but overall a quick and fun redesign.

Slider Red: A WordPress Theme

KarmaDude Sep 12, 2007

Slider Red
The original Slider theme was released back in March, and one of the frequent requests has been an option to have post contents displayed by default, like a regular blog. So here it is, Slider Red, a variant of the Slider theme, but one which has post contents showing at all levels.

Demo | Download

« Older Posts