Documentation

Documentation to build on the Passmarked platform

Web Crawler

Passmarked was built as a tool that can be used by users to test pages against a list of open source rules. Pages must be fetch from the target web servers. The program that does this for us is called a "bot". It handles all the fetching which includes web pages and all their references assets.

These bots will always use the following user-agent:

Passmarked/1.0 (compatible; Mozilla/5.0; +http://passmarked.com/docs/agent)

The above mentioned user-agent string might be appearing in your web server logs and simply means that someone used Passmarked to either do a single page or recursive check of your web site.

Robots.txt

For single page requests Passmarked explicitly does not check the robots.txt file of a website, as these requests will always be made by a person and with conviction. It is similar to a person pointing a browser to your page and downloading it.

Any automated processes, such as checking an entire website, obey the robots.txt of a website. This simply means that website wide checks can be blocked using robots.txt. On these recursive crawls single pages may also be blocked using the robots.txt.

Blocking Passmarked

Passmarked will always use the following user agent:

Passmarked/1.0 (compatible; Mozilla/5.0; +http://passmarked.com/docs/agent)

The user agent can be configured and blocked in robots.txt using:

user-agent: passmarked
disallow: *

Blocking any recursive reports

Recursive reports happen when a user has the website configured and has it triggered either manually or on a schedule by Passmarked itself. Anyone is able to add any website, which makes it important for us to ensure that website owners can limit any unneeded load our systems may present.

When a recursive report is about to start, the robots.txt file of a website is checked. If this file indicates that the entry point for the walk through the website is blocked, the report will never start and will fail immediately. Same applies if the entire website has been blocked to the Passmarked bot.

Users will not be able to add websites that explicitly disallow Passmarked, and will be notified after a few days that these websites are not able to be crawled anymore.

When checking the robots.txt file, Passmarked expects to see an explicit rule for our agent. If not found the website is checked. This is important as the goal is to be able to parse the robots.txt in our checks as well.

Blocking Specific Pages

Specific pages can be blocked in a recursive report. These pages will not be checked nor will they be included in the final report of the website.

To block a page, something like the following could be used:

user-agent: passmarked
disallow: /admin/*
Signup icon
Ready to see how well your site scores?

Passmarked works best when you have an account. It allows you to keep a dashboard with saved data of the sites you have run through the system, we’ll alert you about important updates and you get access to the Passmarked Slack forum.

Sign up to get started