When I delivered the LINQ presentation at the RONUA meeting in April, I was asked how does LINQ perform on big data sets. To answer that I decided to test LINQ to XML against a 100+MB file. I decided to extract three different sets of data from this XML file:
- 1 set representing about 0.5MB of the XML file,
- 1 set representing about 10MB of the XML file, and
- 1 set representing about 80MB of the XML file
Of course, I designed some data structures to map on the data from the XML file and run three queries against this file that would project instances of those data structures. The result was that all the three (different) queries took about same time to execute and generate my internal objects. Each time the entire file was re-parsed. The time for each query was about 3.5 seconds. Thus, I can draw two conclusions:
- LINQ is very performant: it took less than 12 seconds to extract 90% of the data from a 100MB file; the performance is several times greater than the one I get in C++ for parsing the file; not to mention that the code is more than several times simpler;
- there was’t too much difference between extracting 0.5MB or 100 times that;
I am quite confident that LINQ to SQL is as performant as LINQ to XML. If I’ll find a really big data base, I will query it.