NPOI 2.0 - Automatic Format/Type Inference for Excel 2003/2007

This is the 1st post of a series of posts about NPOI 2.0.

NPOI is jumping from version 1.2.5 directly to 2.0. A big reason for this is that it now supports Office 2007 formats (.xlsx and .docx). Given this fact, using NPOI 2.0 will seem a little bit more complicated than the previous version because NPOI now provides multiple namespaces, including HSSF (Excel 2003), XSSF (Excel 2007) and XWPF (Word 2007).

To be able to automatically identify Excel formats when reading a file from the file system and to avoid you having to infer the file type yourself, NPOI provides a very convenient class NPOI.SS.WorkbookFactory.

public class WorkbookFactory
{
    public static WorkbookFactory Create(POIFSFileSystem fs){ ... }
    public static WorkbookFactory Create(OPCPackage pkg){ ... }
    public static IWorkbook Create(Stream inputStream){ ... }
    public static IFormulaEvaluator CreateFormulaEvaluator(IWorkbook workbook){ ... }
}

The difference between the first two Create methods introduced under POIFSFileSystem and OPCPackage is that POIFSFileSystem reader library uses OLE2 format and OPCPackage uses OOXML format (commonly known as ActiveX Document Format). These two libraries are used to read both Excel 2003 (.xls) and Excel 2007 (.xlsx) formats respectively and because they’re the underlying libraries, they’re not limited to reading .xls and .xlsx formats. You can also read Thumb.db using POIFSFileSystem file format for example. You can download a sample project here.

Since you already know the difference between POIFSFileSystem and OPCPackage, you should understand what these two methods do. You are required to know what kind of document you are opening and then pass the file system to WorkbookFactory that in turn can automatically store a HSSFWorkbook or XSSFWorkbook depending on the file type. As these two classes implement IWorkbook interface, in most cases you do not need to care about what class instance it returns unless you use some advanced or specific Excel 2007 feature.

The third one is the most important part to introduce today – it does automatic inference, as long as you pass a Stream object it will know/infer whether it's .xls or .xlsx. Finally it returns the appropriate workbook instance.

The last one returns an IFormulaEvaluator that describes a formula object. Here a similar inference principle applies. Both HSSF and XSSF have formula calculation/evaluation classes, namely HSSFFormulaEvaluator and XSSFFormulaEvaluator.

OK we covered a little bit of code above but it was only abstractions, let’s see some real code:

using System;
using NPOI.SS.UserModel;
using System.IO;

class Program
{
    static void Main(string[] args)
    {
        if (args.Length < 1)
        {
            Console.WriteLine("missing argument: Excel file path (both 2003 and 2007 are supported)");
return; } using (FileStream fs = File.OpenRead(args[0])) { IWorkbook wb = WorkbookFactory.Create(fs);
Console.WriteLine("Value of Cell B2: " + wb.GetSheetAt(0).GetRow(1).GetCell(1)); }
Console.Read(); } }

This program assumes that the workbook file with a sheet exists on the file system and that the first sheet has a value in cell B2 or an error will be raised. The following is a screenshot of the workbook in question.

Excel Workbook with a single Sheet and value 5 in cell B2Figure 1 - Excel Workbook with a single Sheet and value 5 in Cell B2

Command line statement and test parameters were as follows:

Excel 2007: WorkbookFactoryDemo.exe demo2007.xlsx

Excel 2003: WorkbookFactoryDemo.exe demo2003.xls

The result should be equal the value of cell B2 = 5.

If we analyze the above program it seems we are not accessing anything from XSSF and HSSF namespaces, but in fact the wb is certainly an instance of HSSFWorkbook or XSSFWorkbook, that is, the real instance type/format is transparent to the user. This is where the IWorkbook interface shines.

For now you can study the source code here. Many NPOI 2.0 new features are not documented yet. So keep an eye on this blog for the next installment on this series.

NPOI 2.0 series of posts scheduled

NPOI logoWhat’s NPOI you may ask. So here it goes:

NPOI is the .NET version of POI Java project. POI is an open source project which can help you read/write Office 2003/2007 files.

NPOI advantages
a. It's totally free to use
b. Cover most Excel features (cell styles, data formats, formulas and so on)
c. Supports .xls, .xlsx, .docx
d. Designed to be interface-oriented (in NPOI.SS namespace)
e. Supports not only export but also import
f. .NET 2.0 based even for xlsx and docx (but it also supports .NET 4.0)
g. Successful use cases all over the world
h. Great amount of basic and to the point samples
i. Professional service support (even 24*7) directly from NPOI team (not free)

My dear friend Tony Qu from China (the guy behind NPOI development) approached me and asked me to help him share NPOI 2.0 news with the English speakers. I’m always glad to help spread the word about this amazing open source project that has served me in many situations.

The most seen and shared post of this humble blog as of now has almost 9000 shares and was published on 7/02/2009 – oh God that’s exactly 4.5 years ago… its title is Creating Excel spreadsheets .XLS and .XLSX in C#. This huge amount of shares shows that a tool that does what NPOI does is highly demanded in the market. No doubt!

You know: when I wrote that post I had no idea it would be so useful to so many people. My intention with this blog is really to share useful things with the world giving back a bit of what I find/learn/improve from the great minds spread throughout this big planet earth. This is a continuous\infinite process for sure.

OK dude: that’s enough of food for thought. Let me show you what’s going on with NPOI. Here you go…

NPOI 2.0 release
The release is planned for this January. We agreed to publish posts discussing major features/enhancements that’s arriving.

Here they are:

1 - WorkbookFactory can help make HSSF and XSSF transparent for users

2 - Excel 2007 and Word 2007 support

3 - Cell, Row, Sheet copying and Workbook merging

4 - Converting Excel XLS documents to HTML format

5 - isRightToLeft and setRightToLeft in XSSF and HSSF

6 - Diagonal line in cells for both Excel 2003 and Excel 2007

7 - Excel 2007 chart support

8 - New excel functions added in formula calculation

9 - NPOI tag added to files generated with NPOI

10 - XSSFWorkbook implements IEnumerator

Keep an eye on this blog for the next posts.

NPOI source code 
https://github.com/tonyqus/npoi

NPOI NuGet package
https://www.nuget.org/packages/NPOI/2.0.6