The last couple days I’ve been farting around with how to get access to a rendering engine in order to do screenshots of web pages. My first attempts were based on the Gecko engine used in Mozilla browsers, and I didn’t have much luck with Mono or the C# wrappers around the Gtk/Gecko libraries.
Browsing around some more I found some source for the Microsoft rendering engine (aka Trident) as encapsulated in the older-style SHDOCVW/MSHTML libraries. Then I quickly found that as of .NET 2.0, there was a new control, System.Windows.Forms.WebControl, that allowed you to embed a browser in your application, with the same semantics as the old style, but which was easier to use. I remember back in the day trying to get the first IE ActiveX controls on some dumb VB form or another and thinking “how goddamned fragile can they make this thing?” Anyway, that new class discovery led me to this code which works great — see the screenshot below, taken at 1024×768, of this blog. Good enough to get started.
But all along I’d been doodling with the WebBrowser control class and never saw this neat method DrawToBitmap() in intellisense. Hm. Strange enough. I thought maybe it had been deprecated in .NET 3.5, so I fiddled around with changing the Framework Target but no, it has been out of Intellisense since at least Visual Studio 2005. I found out why when I F1-ed the method:
This method supports the .NET Framework infrastructure and is not intended to be used directly from your code.
… and yet it works. I’m actually surprised it even compiles.
Poking around even further, I found a more “nativist” approach from Alan Dean that brings back the mshtml, GDI32, and USER32 libs. mshtml can be referenced directly as c:\program files\microsoft.net\primary interop assemblies\microsoft.mshtml.dll, but the other two require the DllImports voodoo from the bad old days of yore. So, I have the resources now to do what it was I wanted to do in the first place, which was….what again? I seem to remember looking at the Amazon Associates Program, seeing a reference to the Alexa Site Thumbnail service, and thinking “how hard can that be?”
I think that the Windows code I’ve found to date can be updated in two significant ways:
- ensure you can do full-page snapshots if necessary, not just the current viewport.
- return the PNG file as an .ashx stream, not necessarily saving the image first. Of course you could cache the image and still serve it up as an .ashx, I suppose, saving some network traffic.
It’s been an interesting exploration!