Documentation for crwlr / html-2-text (v0.1)

Configuration Options

Indentation Size

Certain elements induce indentation in the converted text, such as ul/ol, blockquote and dl. Take a look at this example:

$html = <<<HTML
<ul>
    <li>item</li>
    <li>item
        <ul>
            <li>another item</li>
            <li>and one more</li>
        </ul>
    </li>
</ul>

<blockquote>Some text<br>Lorem ipsum...</blockquote>
HTML;

$text = Html2Text::convert($html);

This converts to the following text:

* item
* item
  * another item
  * and one more
    * third level

  Some text
  Lorem ipsum...

By default, the indentation size is set to two spaces. But you can modify it by providing any number as the second argument to the Html2Text::convert() method:

// Same $html as above

$text = Html2Text::convert($html, 6);

Converts the HTML to:

* item
* item
      * another item
      * and one more
            * third level

      Some text
      Lorem ipsum...

Skip Elements

By default, the library skips certain elements, including <head>, <script>, <style>, <canvas>, <svg>, <img>, <video>, as well as comments <!-- ... --> and empty text nodes.

Should you wish to skip additional elements, such as <p>, you can employ the skipElement() method of the Html2Text class.

use Crwlr\Html2Text\Html2Text;

$converter = new Html2Text();

$converter->skipElement('p');

When utilizing an instance of the Html2Text class, as illustrated above, you then need to use the convertHtmlToText() method. This non-static method corresponds to the static Html2Text::convert().

$text = $converter->convertHtmlToText($html);

You can also explicitly choose not to skip specific elements.

$converter->dontSkipElement('video');

This could make sense, when you've built a custom node converter for the <video> tag.