Have an account? Sign in
Login  Register  Facebook
how i cant match this html by regex
I need to convert
$text = \"i\'m here <i>and</i> <a href=\'http://example.com\'>this is my site</a>\";
to
$text = \"i\'m here and this is my site (http://example.com)\";
and There could be multiple links in the text
All HTML tags are to be removed and the href value from <a> tags needs to be added like above.
What would be an efficient way to solve this with regex? Any code snippet would be great.
Started: September 16, 2011 Latest Activity: September 16, 2011 php regex
3 Answers
It's also very easy to do with 'simplehtmldom'
include('simple_html_dom.php');
# parse and echo
$html = str_get_html("i'm here <i>and</i> <a href='http://example.com'>this is my site</a>");
$a = $html->find('a');
$a[0]->outertext = "{$a[0]->innertext} ( {$a[0]->href} )";

echo strip_tags($html);

And that produces the code you want in your test case.

Posted: MacOS
In: September 16, 2011

The DOM solution:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//a[@href]') as $node) {
    $textNode = new DOMText(sprintf('%s (%s)',
        $node->nodeValue, $node->getAttribute('href')));
    $node->parentNode->replaceChild($textNode, $node);
}
echo strip_tags($dom->saveHTML());
and the same without XPath:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('a') as $node) {
    if($node->hasAttribute('href')) {
        $textNode = new DOMText(sprintf('%s (%s)',
            $node->nodeValue, $node->getAttribute('href')));
        $node->parentNode->replaceChild($textNode, $node);
    }
}
echo strip_tags($dom->saveHTML());

All it does is load any HTML into a DomDocument instance. In the first case it uses an XPath expression, which is kinda like SQL for XML, and gets all links with an href attribute. It then creates a text node element from the innerHTML and the href attribute and replaces the link. The second version just uses the DOM API and no Xpath.

Yes, it's a few lines more than Regex but this is clean and easy to understand and it won't give you any headaches when you need to add additional logic.

Posted: codeberg
In: September 16, 2011

regex do make your life easy here. Just match the URL.
preg_match_all('/href="([^"]+)"/', $text, $m);
$text = str_replace('</a>', ' (' . $m[1][0] . ')', $text);
$text = strip_tags($text);

Posted: xtremex
In: September 16, 2011

Your Answer

xDo you want to answer this question? Please login or create an account to post your answer