Documentation for crwlr / robots-txt (v0.1)

Attention: You're currently viewing the documentation for v0.1 of the robots-txt package.
This is not the latest version of the package.
If you didn't navigate to this version intentionally, you can click here to switch to the latest version.

Getting Started

The robots-txt package provides a Parser for the Robots Exclusion Standard/Protocol. You can use this library within crawler/scraper programs to parse robots.txt files and check if your crawler user-agent is allowed to load certain paths.

Requirements

Requires PHP version 7.4 or above.

Installation

Install the latest version with:

composer require crwlr/robots-txt

Usage

use Crwlr\RobotsTxt\RobotsTxt;

$robotsTxtContent = file_get_contents('https://www.crwlr.software/robots.txt');
$robotsTxt = RobotsTxt::parse($robotsTxtContent);

$robotsTxt->isAllowed('/packages', 'MyBotName');

You can also check with an absolute url.
But attention: the library won't (/can't) check if the host of your absolute url is the same as the robots.txt file was on (because it doesn't know the host where it's on, you just give it the content).

$robotsTxt->isAllowed('https://www.crwlr.software/packages', 'MyBotName');