Packages

crwlr / crawler

The main package of this collection, providing kind of a framework and a lot of ready to use, so-called steps, that you can use to build your own web crawlers and scrapers with.

docs (v3.5) source

The Swiss Army knife for urls. Parses urls to components (scheme, host, domain, path,...). You can access and modify url components, compare components of different urls and resolve relative to absolute urls. Also supports internationalized domain names.

docs (v2.1) source

crwlr / query-string

This library provides a very convenient API to create, access and manipulate query strings used in HTTP GET (as part of the URL) or POST (as part of the body) requests.

docs (v1.0) source

crwlr / robots-txt

Use this library within crawler and scraper programs to parse robots.txt files and check if your crawler user-agent is allowed to load certain paths.

docs (v1.1) source

crwlr / schema-org

This library finds schema.org structured data in JSON-LD format in HTML documents and converts them to PHP classes representing those schema.org objects.

docs (v0.1) source

crwlr / html-2-text

This very easy-to-use package, helps you to convert HTML to well formatted plain text.

docs (v0.1) source

Crawler Extension Packages

crwlr / crawler-ext-browser

This extension package for the crwlr/crawler library enables the utilization of a headless browser for advanced functionalities beyond loading pages and getting the HTML after rendering it.

docs (v2.2) source