Gnu York

IT-Services

gnuCrawl&Map - more than a sitemap generator

gnuCrawl&Map is a free software for web developers. It helps you to create XML sitemaps for search engines. The programme visits all the pages of your homepages and searchs for links. The software can also be used to search for specific content within a website or to search for 404 error pages and deadlinks. In addition, the software offers the possibility to create an overview of the meta information (page titles, meta descriptions, meta keywords, robots, etc.) of all pages of your website.

key features

  • create XML sitemaps for search engines
  • create a .CSV file, that contains the meta information of all fetched URLs
  • The application runs on all operating systems for which a Java Runtime Environment is available.
    (You can download Java here if it is not installed on your machine)

Options and parameters

You can ...
  • set up different sitemap parameters (lastmod, changefreq, priority)
  • use a proxy server (HTTP or SOCKS proxy / with or without authentification)
  • choose whether you like to have "www." URLs or non-"www." URLs in your sitemap (duplicates will be removed for SEO reasons)
  • choose whether the software takes account into robots information ("nofollow" and "noindex" tags, robots.txt files)
  • set a maximum number of links that will be fetched and added to the sitemap
  • choose which file types should be downloaded and crawled and whether you like to add downloads (files) to the sitemap
  • set up filters if you want to fetch only URLs and/or content that contain user-defined terms or that does not contain these terms

Download

Here you can download gnuCrawl&Map as an executable Java file (.jar).

Download gnuCrawl&Map 0.9 beta


Some general information

Please note:
  • As already stated, you need Java to run the application. Java can be downloaded here.
  • The application runs quite stable. Nevertheless it is a beta verison and it may contain some errors. If you find a bug or just want to give me some feedback, please contact me via the contact form or the #gnuyork IRC channel.