Web crawler – Link Visualisation

// March 16th, 2010 // Experiments

Yet another small weekend project (as in “Omg , I have 2 days to make something for myself before I have to go back working on commercial projects!” )

This time I made a simple web crawler that visualises html pages and the links between them

In short: Every circle is a page, they try to group per domain, the biggest circle is the start page, the smaller the circle, the more clicks your away from it (the smallest ones are 3 clicks), if there is a connection there is a line between them (the deepest items aren’t checked for connections between each other)

An example, click to view in full size (starts from neuroproductions.be) :

flash_web_crawler

It only works local due cross domain issues and spring-graphing almost 2000 nodes is not what you call fast, so no live example.
You can always download the source code and try it yourself. (Warning: the source code is a crappy mess )
Source code: WebCrawler_src

I should have put more time in it, but time flies, and I already have some new exiting ideas for next weekend… ;)

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

9 Responses to “Web crawler – Link Visualisation”

  1. Ronny says:

    Cool! I did something similar a few years ago with a proxy.php file (that loads the complete HTML and than returns that to a Flash client). The cool thing about your experiment is that you actually used it to create a visualisation. I used it to create a simple (and utterly boring!) sitemap of different sites.

    I think I might just use your source and add a few features. Will get back at you later! :)
    Love the idea!

    As always: great stuff man!

  2. Manhattan says:

    WebCrawler is the application entry point. It is not necessary to import the classes page and pageparser?

    If I set WebCrawler as the document class and change URL to a local domain then it should run?

    I am working on a project to build a sitemap visualization.

    thanks for the help

  3. Manhattan says:

    never mind got it working after importing flex libs into Flash CS4

    thanks

  4. KurajeHuda says:

    Hi,

    I’ve created a new project in Flash Builder, imported the .as files, but now I’am not sure what to do with them. I’ve tried too import them into the main project file, but without any luck.

    Any idea on how to get this running in Flash Builder or in general? :p

    Thanks for any kinda help in advance.

    PS: Great website :)

  5. Kris says:

    Hi KurajeHuda,
    you have to create an actionscript project and use WebCrawler.as as your document class.
    that should work?

  6. KurajeHuda says:

    Thanks for the quick response!

    I’ve created a new actionscript project and I’ve set WebCrawler.as as the default aplication, I hope thats what u meant with “document class”.

    But now I’am getting two erros:
    1172: Definition mx.controls:Text could not be found. Page.as line 6
    1172: Definition mx.graphics.codec:JPEGEncoder could not be found. WebCrawler.as line 4

    Same error in both cases I guess, I tried googling a bit and saw Flex 3 apps that work with those two lines, so I tried changing the SDK to 3.5, but with no effect. Seems like those two classes are missing? :o

    Thanks again for your time.

  7. KurajeHuda says:

    Finnaly got it to work,

    after adding library found here:
    http://code.google.com/p/as3corelib/downloads/detail?name=as3corelib-.93.zip&can=2&q=

    I replaced the line:
    mx.graphics.codec:JPEGEncoder;
    with:
    import com.adobe.images.JPGEncoder;

    Also had to remove import mx.controls.Text; in Page.as.

    Weird that the mx.graphics isn’t avaible in an ActionScript project, but works like a glowe Flex Project. There’s probably an easier way to do this in Flash Builder, but I’am beginner, so at least I got it working :P

    Thanks again Kris, great stuff you got on here!

  8. You should consider the Flare Visualizaition Toolkit as the visual renderer for your data. It could work REAL-time for 10K or more nodes; use Flash or Flex for your application with the toolkit. https://github.com/prefuse/Flare

  9. david says:

    wow….i can’t get this working…..anyone have a project they would be willing to share?

Leave a Reply