Web Crawler

Overview

This web crawler crawls a given website and generates a report for all the internal and external links found during the crawl.

Go: A minimum version of Go 1.23.0 is required for building/installing the web crawler. Please go here to download the latest version.

Clone this repository to your local machine.

git clone https://github.com/dananglin/web-crawler.git

Build the application.

Run the application specifying the website that you want to crawl.

./crawler [FLAGS] URL

Crawl the Crawler Test Site.
```
./crawler https://crawler-test.com
```
Crawl the site using 3 concurrent workers and stop the crawl after discovering a maximum of 100 unique pages.
```
./crawler --max-workers 3 --max-pages 100 https://crawler-test.com
```

Crawl the site and print out a CSV report.

./crawler --max-workers 3 --max-pages 100 --format csv https://crawler-test.com

Crawl the site and save the report to a CSV file.

mkdir -p reports
./crawler --max-workers 3 --max-pages 100 --format csv --file reports/report.csv https://crawler-test.com

You can configure the application with the following flags.

Name	Description	Default
`max-workers`	The maximum number of concurrent workers.	2
`max-pages`	The maximum number of pages the crawler can discoverd before stopping the crawl.	10
`format`	The format of the generated report. Currently supports `text` and `csv`.	text
`file`	The file to save the generated report to. Leave this empty to print to the screen instead.