No description
Find a file
Dan Anglin caa6bbfe7e
feat: generate CSV reports and save to file
The crawler can now generate CSV reports and save both text and CSV
reports to a file.
2024-08-28 12:00:25 +01:00
.forgejo/workflows ci: use remote mage-ci action 2024-08-27 18:52:43 +01:00
internal feat: generate CSV reports and save to file 2024-08-28 12:00:25 +01:00
magefiles feat: generate CSV reports and save to file 2024-08-28 12:00:25 +01:00
.gitignore feat: generate CSV reports and save to file 2024-08-28 12:00:25 +01:00
.golangci.yaml feat: use flags to configure the crawler 2024-08-27 17:11:47 +01:00
go.mod feat: add the web crawler 2024-08-27 15:42:26 +01:00
go.sum feat: add the web crawler 2024-08-27 15:42:26 +01:00
LICENSE feat: add the web crawler 2024-08-27 15:42:26 +01:00
main.go feat: generate CSV reports and save to file 2024-08-28 12:00:25 +01:00
README.md feat: generate CSV reports and save to file 2024-08-28 12:00:25 +01:00

Web Crawler

Overview

This web crawler crawls a given website and generates a report for all the internal and external links found during the crawl.

Repository mirrors

Requirements

  • Go: A minimum version of Go 1.23.0 is required for building/installing the web crawler. Please go here to download the latest version.

Build the application

Clone this repository to your local machine.

git clone https://github.com/dananglin/web-crawler.git

Build the application.

  • Build with go
    go build -o crawler .
    
  • Or build with mage if you have it installed.
    mage build
    

Run the application

Run the application specifying the website that you want to crawl.

Format

./crawler [FLAGS] URL

Examples

  • Crawl the Crawler Test Site.
    ./crawler https://crawler-test.com
    
  • Crawl the site using 3 concurrent workers and generate a report of up to 100 pages.
    ./crawler --max-workers 3 --max-pages 100 https://crawler-test.com
    
  • Crawl the site and print out a CSV report.
    ./crawler --max-workers 3 --max-pages 100 --format csv https://crawler-test.com
    
  • Crawl the site and save the report to a CSV file.
    mkdir -p reports
    ./crawler --max-workers 3 --max-pages 100 --format csv --file reports/report.csv https://crawler-test.com
    

Flags

You can configure the application with the following flags.

Name Description Default
max-workers The maximum number of concurrent workers. 2
max-pages The maximum number of pages discovered before stopping the crawl. 10
format The format of the generated report.
Currently supports text and csv.
text
file The file to save the generated report to.
Leave this empty to print to the screen instead.