Documentation for crwlr / crawler (v0.1)

Attention: You're currently viewing the documentation for v0.1 of the crawler package.
This is not the latest version of the package.
If you didn't navigate to this version intentionally, you can click here to switch to the latest version.

CSV Steps

The Csv step comes with two different options what it will expect as input.
When using Csv::parseString() it just expects to get a string (or RespondedRequest from an Http step).
When using Csv::parseFile() it expects a file path from your local filesystem that it'll read line by line. This way it should be possible to read very large CSV files without using too much memory.

Both methods have the same arguments. The first one is an array providing a column mapping (explained further below).

Csv::parseString(['id', 'name', 'homepage']);

Csv::parseFile(['id', 'name', 'homepage']);

And a second optional param to tell the step to skip the first line (when it contains column headlines).

Csv::parseString(['id', 'name', 'homepage'], true);

But actually there is also a method to achieve this, which makes it more readable:

Csv::parseString(['id', 'name', 'homepage'])->skipFirstLine();

Column mapping

The column mapping is an array of property names in the order of the columns in the CSV. So in the example above it gets the first 3 columns and in the output they'll have the keys id, name and homepage.

If you want to skip columns, you can either use numerical keys in the array matching the CSV columns starting at 0.

// 123,Christian,Olear,"https://www.otsch.codes",m

Csv::parseFile([1 => 'firstname', 3 => 'website', 4 => 'gender']);

Or use null values to skip columns. So for the same example as above:

Csv::parseFile([null, 'firstname', null, 'website', 'gender']);

Separator, Enclosure and Escape Characters

There are also methods to change the separator, enclosure and escape characters that it should use.

Csv::parseFile(['username', 'firstname', 'surname'])
    ->separator('|')
    ->enclosure('/')
    ->escape('%');

And as you can see the methods can be chained as they all return the instance.