Blog

Crwlr Recipes: How to Scan any Website for schema.org Structured Data Objects

2023-11-16

This is the first article of our "Crwlr Recipes" series, providing a collection of thoroughly explained code examples for specific crawling and scraping use-cases. This first article describes how you can crawl any website fully (all pages) and extract the data of schema.org structured data objects from all its pages, with just a few lines of code.

» Read more

10 good Reasons to use the crwlr Library

2023-02-08

I'm very proud to announce that version 1.0 of the crawler package is finally released. This article gives you an overview of why you should use this library for your web crawling and scraping jobs.

» Read more

What's new in crwlr / crawler v0.6?

2022-10-03

Version 0.6 is probably the biggest update so far with a lot of new features and steps from crawling whole websites, over sitemaps to extracting metadata and schema.org structured data from HTML. Here is an overview of all the new stuff.

» Read more

What's new in crwlr / crawler v0.5?

2022-09-03

We're already at v0.5 of the crawler package and this version comes with a lot of new features and improvements. Here's a quick overview of what's new.

» Read more

Dealing with HTTP (Url) Query Strings in PHP

2022-06-02

There is a new package in town called query-string. It allows to create, access and manipulate query strings for HTTP requests in a very convenient way. Here's a quick overview of what you can do with it and also how it can be used via the url package.

» Read more

What's new in crwlr / crawler v0.4

2022-05-10

Last friday version 0.4 of the crawler package was released with some pretty useful improvements. Read what's shipped with this new minor update.

» Read more

What's new in crwlr / crawler v0.2 and v0.3

2022-04-30

There are already two new 0.x versions of the crawler package. Here a quick summary of what's new in versions 0.2 and 0.3.

» Read more

Release of crwlr / crawler v0.1.0

2022-04-18

After months of hard work, today I'm finally releasing the first version (v0.1.0) of the crwlr / crawler package. Here some information on what it is, its state and current and future features.

» Read more

Prevent Homograph Attacks using the crwlr / url Package

2022-01-19

Homograph attacks are using internationalized domain names (IDN) for malicious links including domains that look like trusted organizations. You can use the crwlr Url class to detect and monitor urls containing IDNs in your user's input.

» Read more

Why I start crwlr.software

2018-04-15

This is just a short introduction to what crwlr.software is and will become in the future and why you may like it.

» Read more