Manipulate HTML Using DOMDocument In PHP


27th July 2020 3 mins read
Share On        


Sometimes there might arise a situation to alter the stored HTML of the database to display on the fly to the user or you might want to play around with XML from PHP. This is where the DOMDocument of PHP comes to place.


NOTE: No need for any external libraries. It's built-in hidden gem of PHP.


We will be covering the following topics


  1. What was the problem that forced me to use DOMDocument
  2. DOMDocument Basics
  3. Implementation To My Project

1) The Problem


Basically, I faced the problem when writing code for the existing blog to AMP version blog.


I had my articles code snippets, images, iframes, and other HTML elements stored inside the database. These HTML elements are not directly compatible with AMP HTML and have to be replaced, common tags that I had to replace are as follows


img -> amp-img

iframe -> amp-iframe


Not only elements even a few attributes of HTML elements were not supported.


I was earlier working on WYSIWYG editors and stumbled upon PHP's hidden gem ie, DomDocument.


2) DomDocument Basics


DomDocument object is similar to that of your Javascript DOM. In case if you have earlier worked with Javascript DOM manipulation then it will be really simpler for you.


DOMDocument Initialization (DOMDocument())


You can start playing around with PHP DOMDocument by initializing it as follows

$dom = new \DOMDocument();

Loading HTML or XML Content (loadHtml / loadXml)


Now, we have to load the HTML or XML document we would like to playaround.

$dom->loadHtml($htmlCode); /** For HTML Parsing */
$dom->loadXml($xmlCode); /** For XML Parsing */


Since I have stored HTML snippet code for articles inside my database I will load it as follows


Loading My Article Body Code

$post = Post::with(['user'])->where(['slug' => $slug])->first();

$dom = new \DOMDocument();
$dom->loadHtml($post->body);

Get Elements By Tag Name (getElementsByTagName)


Basically I have my code snippets inside pre HTML tag so I can use the following ways to get all the pre elements

$codeSnippets = $dom->getElementsByTagName('pre');

Removing Attributes Of The Above Code Snippets (removeAttribute)


Since HTML spellcheck attribute is not supported by AMP HTML we need to remove it while displaying it in the AMP view page.


$post = Post::with(['user'])->where(['slug' => $slug])->first();

$dom = new \DOMDocument();
$dom->loadHtml($post->body);

$codeSnippets = $dom->getElementsByTagName('pre');

foreach ($codeSnippets as $code) {
    $code->removeAttribute('spellcheck');
}



Similarly we can remove other attributes from different element too as follows

$contentEditables = $dom->getElementsByTagName('div');
foreach ($contentEditables as $contentEditable) {
    $contentEditable->removeAttribute('contenteditable');
}

Setting Attribute Of HTML Elements (setAttribute)


Similar to remove attribute we can even set the attribute like the following

/** Set the Image Attribute */
$image->setAttribute('test', 'testvalue');

Get The Attribute Value Of HTML Element (getAttribute)


If we need any values of the element we can use getAttribute to get the value of the attribute and then we can play around as follows


/** Get the Image Attribute */
$imageSrc = $image->getAttribute('src');

Get Current Element Parent Node (parentNode)


If we need the current element parent node then we can fetch it as follows


$parentNode = $image->parentNode;

Replacing Child Nodes (replaceChild)


We can easily replace the existing Child nodes of the parent element with the help of the following


$parentNode->replaceChild($newElement, $oldElement);

3) Implementation To My Project


Now, we got our basics strong. Let's implement the same in our project


Full Implementation Demo

$post = Post::with(['user'])->where(['slug' => $slug])->first();

$dom = new \DOMDocument();
$dom->loadHtml($post->body);

/** Get all the images */
$images = $dom->getElementsByTagName('img');

/** Get the lengths of the images to perform element replacing */
$imagesLength = $images->length;

for ($i = 0; $i < $imagesLength; $i++) {
    $image=$images->item(0);

    /** Get the Image Attributes */
    $imageSrc = $image->getAttribute('src');
    $imageAlt = $image->getAttribute('alt');

    /** Get the parent node of image so that I can append the new amp-img */
    $parentNode = $image->parentNode;
    
    /** Creating new element */
    $ampImage = $image->ownerDocument->createElement('amp-img');

    $ampImage->setAttribute('src', $imageSrc);
    $ampImage->setAttribute('width', '400');
    $ampImage->setAttribute('height', '300');
    $ampImage->setAttribute('layout', 'responsive');
    $ampImage->setAttribute('alt', $imageAlt);

    /** Replace child element */
    $parentNode->replaceChild($ampImage, $image);
}


There is nothing to explain here as I have done that part in step 2.


Conclusion


If you are new to AMP then don't forget to check out the following articles




AUTHOR

Channaveer Hakari

I am a full-stack developer working at WifiDabba India Pvt Ltd. I started this blog so that I can share my knowledge and enhance my skills with constant learning.

Never stop learning. If you stop learning, you stop growing