Why I start crwlr.software

2018-04-15

First let me introduce myself, my name is Christian Olear but everybody calls me Otsch and I'm a Web Developer based in Linz, Upper Austria. A large part of my day job is building scraping and crawling software.

What you can expect here

PHP has many built-in features that help you with that kind of tasks and there are also many libraries out there. So, this project will not be about reinventing the wheel but about making things easier, fixing some flaws and most importantly about having one collection of libraries that satisfy all your scraping and crawling related needs and work together like a charm.

The first package

The first package, the url package, is already released. As its' name suggests you can parse urls with it. You can access all components of a url separately and modify them. What most libraries of this kind don't provide, is the ability to access the parts of the host (registrable domain, public domain suffix, subdomain) separately (thanks to Mozilla's Public Suffix List). Another neat feature is the ability to resolve relative to absolute urls. You can read more on it in the documentation and maybe I'll write a blog post about it in the near future.

I don't want to reveal too much about coming packages. Let's just say the next package will be about loading stuff from the interwebz 😉.

Contribute

To date I'm working alone on this project, but of course, as I'm open sourcing it, I would love to get contributions from you dev folks 😊. So if you find bugs or have a wish for another feature, feel free to file an issue or make a pull request on github. Currently there is no contribution guide, I'll deliver that in a later stage.