generated from templates/go-generic
Dan Anglin
85717a7fac
- Use flags to configure the worker pool and the maximum number of pages. - Add README.md
1.2 KiB
1.2 KiB
Web Crawler
Overview
This web crawler crawls a given URL and generates a report for all the internal links it finds.
Repository mirrors
- Code Flow: https://codeflow.dananglin.me.uk/apollo/web-crawler
- GitHub: https://github.com/dananglin/web-crawler
Requirements
- Go: A minimum version of Go 1.23.0 is required for building/installing the web crawler. Please go here to download the latest version.
How to run the application
Clone this repository to your local machine.
git clone https://github.com/dananglin/web-crawler.git
Build the application.
go build -o crawler .
Run the application specifying the website that you want to crawl.
- To crawl
https://example.com
using 3 concurrent workers and generate a report of up to 20 unique discovered pages:./crawler --max-workers 3 --max-pages 20 https://example.com
Flags
You can configure the application with the following flags.
Name | Description | Default |
---|---|---|
max-workers |
The maximum number of concurrent workers. | 2 |
max-pages |
The maximum number of pages discovered before stopping the crawl. | 10 |