Documentation for crwlr / url (v2.1)

Resolving Relative Urls

When you scrape urls from a website you will come across relative urls like /path/to/page, ../path/to/page, ?param=value, #anchor and alike. This package makes it a breeze to resolve these urls to absolute ones with the url of the page where they have been found on.

$documentUrl = Url::parse('https://www.example.com/foo/bar/baz');

$relativeLinks = [
    '/path/to/page',
    '../path/to/page',
    '?param=value',
    '#anchor'
];

$absoluteLinks = array_map(function($relativeLink) use ($documentUrl) {
    return $documentUrl->resolve($relativeLink)->toString();
}, $relativeLinks);

var_dump($absoluteLinks);

Output

array(4) {
  [0]=>
  string(36) "https://www.example.com/path/to/page"
  [1]=>
  string(40) "https://www.example.com/foo/path/to/page"
  [2]=>
  string(47) "https://www.example.com/foo/bar/baz?param=value"
  [3]=>
  string(42) "https://www.example.com/foo/bar/baz#anchor"
}

If you pass an absolute url to resolve() it will just return that absolute url.