Log in

No account? Create an account
27 January 2017 @ 06:58 pm
StackOverflowException in AngleSharp.Parser  
PostJobFree crawler found web page that causes fatal crash in AngleSharp parser:
using AngleSharp.Parser.Html;
string pageHtml = LoadUrlContent("http://onestop.fiu.edu/financial-aid/loans/")
var parser = new HtmlParser();
var document = parser.Parse(pageHtml);
document.QuerySelectorAll("a"); // Fatal crash: "An unhandled exception of type 'System.StackOverflowException' occurred in AngleSharp.dll".

We cannot catch that exception and it simply restarts the whole process (PostJobFreeService Windows service).
That is very frustrating.

In development environment that crash is not always reproducible.
When we run code above in test - it just works.
But if we run the same code under Visual Studio debugger - it crashes with 'System.StackOverflowException'.

AngleSharp library maintainers noticed that problematic page contains a lot of "<content /><content /><content /><content />" attributes.

Obviously it is not an excuse to fail. Hopefully their latest build would fix the problem.

Originally posted at: http://dennisgorelik.dreamwidth.org/122694.html
СБsab123 on January 28th, 2017 01:20 am (UTC)
Perhaps the configured stack size is different in different environments? This looks like one of these "HTML/XML bombs" when a small original document expands to a very large parsed document. If I remember right, the entities definitions are one of the ways to do it.
Dennis Gorelikdennisgorelik on January 29th, 2017 12:36 am (UTC)
They fixed it:

... hopefully.
I still need to install Visual Studio 2015 to build the solution and test if it actually works.

There are many lessons learned:
1) Use loop instead of recursion.
2) "Young" libraries have ugly bugs.
3) It's good when people care about their project.

Edited at 2017-01-29 12:36 am (UTC)