Read more...
created: Aug 14, 2009
I thought it would be neat to write a program to spider the Web autonomously... so I wrote a Perl script that does just that!
The computer science folks would consider the spidering aspect a breadth first search that uses a queue data-structure to hold the collection of hostnames. That sounds like a mouth full, but the idea is simple (I promise!).
The program basically works by looping through the following phases:
- do a HTTP dump on a host (port 80) in queue
- scrape unique hostnames from the output
- add the found hostnames to the crawl queue
- do something fun with the output
The link to the code is here if you want to check that out.
perl, programming
:tags
Comments (0)
- Tags
- ALIX (1)
- digitalfoo.net (2)
- embedded (6)
- FreeBSD (25)
- Java (1)
- Linux (20)
- misc (4)
- my projects (1)
- NanoBSD (3)
- opensource (5)
- perl (1)
- PHP (3)
- programming (7)
- security (4)
- Archives
- 2010
- June (5)
- July (2)
- April (6)
- March (2)
- May (1)
- August (2)
- 2009
- August (7)
- July (8)
- April (4)
- May (4)
- December (2)
- June (1)
- September (1)
- November (4)
- October (1)
- Web Tools
- Index
- dig-shovel Live
- SQL Injection Encoder
- Links
-

