Filter HTML Contents

By using the HtmlDocument object from the HtmlAgilityPack library, html document object filtering can be achieved.

This script can be used to create a new HtmlDocument variable, loading the html code(from a text file, web), selecting the nodes to be parsed, converting the variable into a list and printing the output.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(htmlOutput);
HtmlNodeCollection selectedHtmlNodes = doc.DocumentNode.SelectNodes("//html/body");
selectedHtmlNodes.ToList();

“InnerText” - text only

“InnerHtml” - whole html code

Last updated