Documentation for crwlr / url (v1.0)

Attention: You're currently viewing the documentation for v1.0 of the url package.
This is not the latest version of the package.
If you didn't navigate to this version intentionally, you can click here to switch to the latest version.

Resolving Relative Urls

When you scrape urls from a website you will come across relative urls like /path/to/page, ../path/to/page, ?param=value, #anchor and alike. This package makes it a breeze to resolve these urls to absolute ones with the url of the page where they have been found on.

$documentUrl = Url::parse('https://www.example.com/foo/bar/baz');

$relativeLinks = [
    '/path/to/page',
    '../path/to/page',
    '?param=value',
    '#anchor'
];

$absoluteLinks = array_map(function($relativeLink) use ($documentUrl) {
    return $documentUrl->resolve($relativeLink)->toString();
}, $relativeLinks);

var_dump($absoluteLinks);

Output

array(4) {
  [0]=>
  string(36) "https://www.example.com/path/to/page"
  [1]=>
  string(40) "https://www.example.com/foo/path/to/page"
  [2]=>
  string(47) "https://www.example.com/foo/bar/baz?param=value"
  [3]=>
  string(42) "https://www.example.com/foo/bar/baz#anchor"
}

If you pass an absolute url to resolve() it will just return that absolute url.