

More tests (already included tests are only for the most critical classes) Make preliminary HEAD request to distinguish between text and binary filesĬheck Content-Type and exclude files that are not HTMLsĪdd matchers and sitemap generators for additional sitemap flavour (images, videos, etc.) Use generators to lower memory footprint and gain a bit more speed The script sticks to the url provided and does not dive into subdomains of the given domainĮven if encounters internal redirect like -> Possible enhancements

It’s single-threaded script that walks every page it gets and it’s

The whole site and generating simple sitemap. The NightCrawler is site crawling/spider tool to gather links at the given domain by walking through
