No description

Find a file

Dan Anglin 85717a7fac feat: use flags to configure the crawler - Use flags to configure the worker pool and the maximum number of pages. - Add README.md		2024-08-27 17:11:47 +01:00
.forgejo	feat: add the web crawler	2024-08-27 15:42:26 +01:00
internal	feat: use flags to configure the crawler	2024-08-27 17:11:47 +01:00
magefiles	feat: add the web crawler	2024-08-27 15:42:26 +01:00
.gitignore	feat: add the web crawler	2024-08-27 15:42:26 +01:00
.golangci.yaml	feat: use flags to configure the crawler	2024-08-27 17:11:47 +01:00
go.mod	feat: add the web crawler	2024-08-27 15:42:26 +01:00
go.sum	feat: add the web crawler	2024-08-27 15:42:26 +01:00
LICENSE	feat: add the web crawler	2024-08-27 15:42:26 +01:00
main.go	feat: use flags to configure the crawler	2024-08-27 17:11:47 +01:00
README.md	feat: use flags to configure the crawler	2024-08-27 17:11:47 +01:00

Web Crawler

Overview

This web crawler crawls a given URL and generates a report for all the internal links it finds.

Go: A minimum version of Go 1.23.0 is required for building/installing the web crawler. Please go here to download the latest version.

Clone this repository to your local machine.

git clone https://github.com/dananglin/web-crawler.git

Build the application.

go build -o crawler .

Run the application specifying the website that you want to crawl.

To crawl https://example.com using 3 concurrent workers and generate a report of up to 20 unique discovered pages:
```
./crawler --max-workers 3 --max-pages 20 https://example.com
```

You can configure the application with the following flags.

Name	Description	Default
`max-workers`	The maximum number of concurrent workers.	2
`max-pages`	The maximum number of pages discovered before stopping the crawl.	10