Showing posts with label XLS. Show all posts
Showing posts with label XLS. Show all posts

NPOI 2.0 - Converting Excel XLS documents to HTML format

This is the 4th post of a series of posts about NPOI 2.0.

This time we’re gonna see how easy it is to grab a .XLS file and then have it converted to HTML code.

Note: in NPOI 2.0.6 only XLS format is supported. XLSX support is coming in a future release (maybe 2.1) because it still needs more testing.

Enough said Winking smile, let’s get to the code:

using NPOI.HSSF.Converter;
using NPOI.HSSF.UserModel;
using System;
using System.IO;

namespace NPOI.Examples.ConvertExcelToHtml
{
    class Program
    {
        static void Main(string[] args)
        {
            HSSFWorkbook workbook;
            // Excel file to convert
            string fileName = "19599-1.xls";
            fileName = Path.Combine(Environment.CurrentDirectory, fileName);
            workbook = ExcelToHtmlUtils.LoadXls(fileName);


            ExcelToHtmlConverter excelToHtmlConverter = new ExcelToHtmlConverter();

            // Set output parameters
            excelToHtmlConverter.OutputColumnHeaders = false;
            excelToHtmlConverter.OutputHiddenColumns = true;
            excelToHtmlConverter.OutputHiddenRows = true;
            excelToHtmlConverter.OutputLeadingSpacesAsNonBreaking = false;
            excelToHtmlConverter.OutputRowNumbers = true;
            excelToHtmlConverter.UseDivsToSpan = true;

            // Process the Excel file
            excelToHtmlConverter.ProcessWorkbook(workbook);

            // Output the HTML file
            excelToHtmlConverter.Document.Save(Path.ChangeExtension(fileName, "html"));
        }
    }
}

The code above was taken from ConvertExcelToHtml sample project.

Here’s the sample spreadsheet open in Excel for Mac that was used as input:

Figure 1 - Excel spreadsheet to be converted to HTML code
Figure 1 - Excel spreadsheet to be converted to HTML code

This is the HTML generated:

Figure 2 - HTML output code generated from the conversion
Figure 2 - HTML output code generated from the conversion

Note that I used a screenshot above but it depicts the .html file (check the browser’s address bar).

Enjoy and have fun!

NPOI 2.0 - Automatic Format/Type Inference for Excel 2003/2007

This is the 1st post of a series of posts about NPOI 2.0.

NPOI is jumping from version 1.2.5 directly to 2.0. A big reason for this is that it now supports Office 2007 formats (.xlsx and .docx). Given this fact, using NPOI 2.0 will seem a little bit more complicated than the previous version because NPOI now provides multiple namespaces, including HSSF (Excel 2003), XSSF (Excel 2007) and XWPF (Word 2007).

To be able to automatically identify Excel formats when reading a file from the file system and to avoid you having to infer the file type yourself, NPOI provides a very convenient class NPOI.SS.WorkbookFactory.

public class WorkbookFactory
{
    public static WorkbookFactory Create(POIFSFileSystem fs){ ... }
    public static WorkbookFactory Create(OPCPackage pkg){ ... }
    public static IWorkbook Create(Stream inputStream){ ... }
    public static IFormulaEvaluator CreateFormulaEvaluator(IWorkbook workbook){ ... }
}

The difference between the first two Create methods introduced under POIFSFileSystem and OPCPackage is that POIFSFileSystem reader library uses OLE2 format and OPCPackage uses OOXML format (commonly known as ActiveX Document Format). These two libraries are used to read both Excel 2003 (.xls) and Excel 2007 (.xlsx) formats respectively and because they’re the underlying libraries, they’re not limited to reading .xls and .xlsx formats. You can also read Thumb.db using POIFSFileSystem file format for example. You can download a sample project here.

Since you already know the difference between POIFSFileSystem and OPCPackage, you should understand what these two methods do. You are required to know what kind of document you are opening and then pass the file system to WorkbookFactory that in turn can automatically store a HSSFWorkbook or XSSFWorkbook depending on the file type. As these two classes implement IWorkbook interface, in most cases you do not need to care about what class instance it returns unless you use some advanced or specific Excel 2007 feature.

The third one is the most important part to introduce today – it does automatic inference, as long as you pass a Stream object it will know/infer whether it's .xls or .xlsx. Finally it returns the appropriate workbook instance.

The last one returns an IFormulaEvaluator that describes a formula object. Here a similar inference principle applies. Both HSSF and XSSF have formula calculation/evaluation classes, namely HSSFFormulaEvaluator and XSSFFormulaEvaluator.

OK we covered a little bit of code above but it was only abstractions, let’s see some real code:

using System;
using NPOI.SS.UserModel;
using System.IO;

class Program
{
    static void Main(string[] args)
    {
        if (args.Length < 1)
        {
            Console.WriteLine("missing argument: Excel file path (both 2003 and 2007 are supported)");
return; } using (FileStream fs = File.OpenRead(args[0])) { IWorkbook wb = WorkbookFactory.Create(fs);
Console.WriteLine("Value of Cell B2: " + wb.GetSheetAt(0).GetRow(1).GetCell(1)); }
Console.Read(); } }

This program assumes that the workbook file with a sheet exists on the file system and that the first sheet has a value in cell B2 or an error will be raised. The following is a screenshot of the workbook in question.

Excel Workbook with a single Sheet and value 5 in cell B2Figure 1 - Excel Workbook with a single Sheet and value 5 in Cell B2

Command line statement and test parameters were as follows:

Excel 2007: WorkbookFactoryDemo.exe demo2007.xlsx

Excel 2003: WorkbookFactoryDemo.exe demo2003.xls

The result should be equal the value of cell B2 = 5.

If we analyze the above program it seems we are not accessing anything from XSSF and HSSF namespaces, but in fact the wb is certainly an instance of HSSFWorkbook or XSSFWorkbook, that is, the real instance type/format is transparent to the user. This is where the IWorkbook interface shines.

For now you can study the source code here. Many NPOI 2.0 new features are not documented yet. So keep an eye on this blog for the next installment on this series.

NPOI with Excel Table and dynamic Chart

A reader of the blog called Zip wrote a comment on the post Creating Excel spreadsheets .XLS and .XLSX in C#.

This is an excerpt from Zip’s comment:

if I add rows using NPOI in C#, rows added under the table won't be automatically included in the table, and my chart is not updated the way I would like it to be.
How can I work around this problem?

I tried to simulate the problem with a simple spreadsheet and I was getting the same problem stated by Zip, that is, if I added one row just beneath the last row in the table, such added row wasn’t included in Excel’s data table and consequently the chart bound to the table wasn’t updated to reflect the new data.

To workaround this problem, let’s consider the following spreadsheet shown in Figure 1:

NPOI with Excel Table and dynamic Chart 
Figure 1 - NPOI with Excel Table and dynamic Chart

As you see we have a simple Excel data table with a players column that represents the name arguments of the chart, 4 columns for the months that form the category labels arguments (X axis) and the values arguments for the months going from Jan through Apr (Y axis).

Using NPOI to insert a new row in the table shown above we do the following:

// Creating a new row... 0 is the first row for NPOI.
HSSFRow row = sheet.CreateRow(5); // Row 6 in Excel
// Creating new cells in the row... 0 is the first column for NPOI.
row.CreateCell(1).SetCellValue("Eve Paradise"); // Column B
row.CreateCell(2).SetCellValue(4); // Column C
row.CreateCell(3).SetCellValue(3); // Column D
row.CreateCell(4).SetCellValue(2); // Column E
row.CreateCell(5).SetCellValue(1); // Column F 

The result is shown in Figure 2:

NPOI with Excel Table and dynamic Chart - Adding a new row
Figure 2 - NPOI with Excel Table and dynamic Chart - Adding a new row

Figure 2 shows us the problem stated by Zip in his comment. The new row we just added wasn’t included in the table. The chart that is linked to the table won’t update because it isn’t aware of the new row.

How to workaround this problem? That’s the question!

After playing with this case for 4 hours I’ve found a way of doing what Zip asks for.

Here’s how I did it:

Expand your Excel data table to row 10. I expanded only 4 rows just to show you how to workaround NPOI’s current limitation.

To expand your table, click in the minuscule handle in the lower-right corner of the cell occupying the lower-right corner of the table. This handle gives you a way to expand the table. Usually, it’s easier just to add data and let Excel expand the table - what doesn’t work with NPOI. But if you want to add several new rows or columns all at once, the handle is a good way to do it.

After expanding your table save the spreadsheet. It’ll be the template spreadsheet used to create new spreadsheets.

Figure 3 shows how the above spreadsheet looks like when the table is expanded to row 10:

NPOI with Excel Table and dynamic Chart - Expanding the Table
Figure 3 - NPOI with Excel Table and dynamic Chart - Expanding the Table

We can see that row 6 added using NPOI is now part of the table because we expanded the table. The chart now shows the new data but we got a new problem: the chart shows empty (blank series) that are the reflection of the the empty rows we have on the data table - take a look at the chart’s legend for example and you’ll see squares that represent nothing.

How to get over this? Well, we just need to filter the data in the table as shown in

Figure 4:

NPOI with Excel Table and dynamic Chart - Filtering Data (blank series)
Figure 4 - NPOI with Excel Table and dynamic Chart - Filtering Data (blank series)

Filter out players removing the blank rows by unchecking (Blanks) circled in red in Figure 4. Doing so the chart will reflect the change showing only the filtered data as you see in Figure 5:

NPOI with Excel Table and dynamic Chart - Filtered Data (no empty rows)
Figure 5 - NPOI with Excel Table and dynamic Chart - Filtered Data (no empty rows)

Now we have an Excel data table that is filtered (take a look at the funnel symbol) in the Player column. Other difference is that the rows that contain data are marked in blue. Although we have only 4 rows of data being displayed, our table has indeed 8 rows of data because we expanded it. The other 4 rows are hidden because they were filtered for not having any data yet.

Positioning the mouse cursor within the Excel data table, I’ll add a Total Row (option circled in red) in the table so that I can summarize data the way I want for each column as shown in Figure 6:

NPOI with Excel Table and dynamic Chart - Adding Total Row
Figure 6 - NPOI with Excel Table and dynamic Chart - Adding Total Row

With this Excel template spreadsheet we can now use NPOI to fill our sheet with more 4 rows of data. Let’s do it. This is the code I used:

HSSFRow row7 = sheet.CreateRow(6);

row7.CreateCell(1).SetCellValue("David Goliath");
row7.CreateCell(2).SetCellValue(7);
row7.CreateCell(3).SetCellValue(7);
row7.CreateCell(4).SetCellValue(7);
row7.CreateCell(5).SetCellValue(7);

HSSFRow row8 = sheet.CreateRow(7);

row8.CreateCell(2).SetCellValue("Moses of Egypt");
row8.CreateCell(3).SetCellValue(8);
row8.CreateCell(4).SetCellValue(8);
row8.CreateCell(5).SetCellValue(8);
row8.CreateCell(6).SetCellValue(8);

HSSFRow row9 = sheet.CreateRow(8);

row9.CreateCell(1).SetCellValue("David Shepherd");
row9.CreateCell(2).SetCellValue(9);
row9.CreateCell(3).SetCellValue(9);
row9.CreateCell(4).SetCellValue(9);
row9.CreateCell(5).SetCellValue(9);

HSSFRow row10 = sheet.CreateRow(9);

row10.CreateCell(2).SetCellValue("Jesus of Nazareth");
row10.CreateCell(3).SetCellValue(10);
row10.CreateCell(4).SetCellValue(10);
row10.CreateCell(5).SetCellValue(10);
row10.CreateCell(6).SetCellValue(10);
// Forcing formula recalculation so that the Total Row gets updated
sheet.ForceFormulaRecalculation = true;

After filling the spreadsheet we get the result shown in Figure 7:

NPOI with Excel Table and dynamic Chart - Chart updated automatically/dynamically
Figure 7 - NPOI with Excel Table and dynamic Chart - Chart updated automatically/dynamically

This is the workaround! :o)

The rows added with NPOI now are part of the table and are shown in the chart.

As a last hint: remember to expand your Excel data table to the number of rows you think your spreadsheet will store so that the rows added with NPOI get included in the table and the chart gets updated.

Again this is a good proof of what free software as is the case of NPOI can make for us. Even when dealing with more elaborated concepts as is the case of Excel tables and charts NPOI makes it easy to get the job done.

I wish that the next version of NPOI does what Zip wants automatically, that is, recognize rows added under the last row of an Excel table. At least we could have a parameter to let the user define if s/he wants the row to make part of the table or not.

Hope you enjoy this post.

Visual Studio 2008 C# ASP.NET MVC Web Application
You can get the Microsoft Visual Studio Project at:

http://leniel.googlepages.com/NPOIExcelTableChartMvcProject.zip

To try out the code you can use the free Microsoft Visual Web Developer 2008 Express Edition that you can get at: http://www.microsoft.com/express/vwd/Default.aspx

Creating Excel spreadsheets .XLS and .XLSX in C#

Interesting discussion at StackOverflow:
Create Excel (.XLS and .XLSX) file from C#

If you want to see how to combine NPOI + Excel Table and Chart,
take a look at the post titled NPOI with Excel Table and dynamic Chart.

NPOI 2.0 series of posts scheduled

Recently I had to implement some code to create an Excel spreadsheet/report using C#.

The task was: given an Excel spreadsheet template - a .XLS file (with formulas, pivot tables, macros, etc) I had to fill some data in one of the sheets of the spreadsheet and send this modified spreadsheet back to the user requesting such an operation (Excel report).

The following attests the need for Excel nowadays:

Excel has long been recognized as the de facto standard when it comes to presenting management reports. The unique combination of great calculation engine, excellent charting facilities, pivot tables and the possibility to perform “what if” analysis, make it the “must have” business intelligence tool.
by John Tunnicliffe

I had a great time while studying the possible ways of doing what the task asks for.

It appears to be a simple task at first but as the time passes by you get to know that this is not the case, well, till the moment this blog post was written at least, I think. :-)

Firstly I tried automating Excel through COM automation, but as the application was meant to be used in a web context it is not recommended to use automation. Why? Because COM automation for Excel is not thread safe, that is, EXCEL.EXE was not constructed to be used by concurrent users accessing the same process, in this case EXCEL.EXE; besides, Microsoft Excel must be installed on the server what is not always possible.

For more details on why you shouldn’t use Excel on the server, read this article on Microsoft Help and Support site: Considerations for server-side Automation of Office. The key part is this:

Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment.

I’ve just experienced the above. EXCEL.EXE insisted in being alive in task manager even after processing the data and being closed in code. Each call to process a spreadsheet opens an EXCEL.EXE process on the server. With such EXCEL.EXE processes don’t being closed as they should you get lots of those processes on memory which could overload the server.

Do not use COM automation if you are developing server-side code.

ExcelPackageAfter struggling with COM automation I finally got to know ExcelPackage  which works with Office Open Document Format (OOXML). It can read an .XLSX (Microsoft Excel 2007) template and create another .XLSX file based on such template, giving you lots of possibilities to work with the template copy.

I’ve gotten really happy because I had found a way of doing what the task was asking for but with a minor detail: ExcelPackage works only with .XLSX file format letting the .XLS (Microsoft Excel 2003) format out of the game. Well it turned out to be a big impediment because the client (software buyer) wouldn’t allow us to install the famous Microsoft Office Compatibility Pack for Word, Excel, and PowerPoint 2007 File Formats on user machines so that users could open and save OOXML file formats even using Microsoft Office 2003 suite.

Discovering ExcelPackage was good but it didn’t do the trick that implies the use of an .XLS template.

NPOII got back Googling again and strived to find an open source library that would allow me to do what I wanted. After some time I finally discovered NPOI.  Wow, it could read the .XLS template and generate the end result I wanted. Great. I downloaded NPOI 1.2.1 for .NET 2.0 binaries immediately and started playing with it to see what it could really do.

OK, after this short story, I’ll show you how to use both open source projects (ExcelPackage and NPOI).

I’ve created a new ASP.NET MVC project as can be seen in this picture:

 Excel Writer Solution Explorer

In the Content folder I’ve placed the template spreadsheets.
In the Libs folder I’ve placed the DLLs necessary to use both ExcelPackage and NPOI open source projects.

The controller that interests us is the one called ExcelWriterController:

Excel Writer Controller

The methods that handle the creation of the spreadsheet are: ExcelPackageCreate and NPOICreate.

For each controller action (method) there’s a corresponding view that renders the UI to the user. Those views are the ones shown inside the ExcelWriter folder: ExcelPackage.aspx and NPOI.aspx.

This is the Home Page of the Excel Writer MVC Application - take a look at the tabs (ExcelPackage and NPOI) that lead you to the View pages:

Excel Writer Home Page

Each view has a button which when clicked calls the corresponding action method on the ExcelWriterController.

This is the NPOI view page:

Excel Writer NPOI View Page

I’ll play with a simple spreadsheet I filled with the data I got from Excel’s blog post titled Formula to Access a List of Values Interspersed with Zeros or Blanks.

Let’s see the code that goes into the ExcelPackageCreate method:

/// <summary>
/// Creates a new Excel spreadsheet based on a template using the ExcelPackage library.
/// A new file is created on the server based on a template.
/// </summary>
/// <returns>Excel report</returns>
[AcceptVerbs(HttpVerbs.Post)]
public ActionResult ExcelPackageCreate()
{
    try
    {
        FileInfo template = new FileInfo(Server.MapPath(@"\Content\ExcelPackageTemplate.xlsx"));

        FileInfo newFile = new FileInfo(Server.MapPath(@"\Content\ExcelPackageNewFile.xlsx"));

        // Using the template to create the newFile...
        using(ExcelPackage excelPackage = new ExcelPackage(newFile, template))
        {
            // Getting the complete workbook...
            ExcelWorkbook myWorkbook = excelPackage.Workbook;

            // Getting the worksheet by its name...
            ExcelWorksheet myWorksheet = myWorkbook.Worksheets["Sheet1"];

            // Setting the value 77 at row 5 column 1...
            myWorksheet.Cell(5, 1).Value = 77.ToString();

            // Saving the change...
            excelPackage.Save();
        }

        TempData["Message"] = "Excel report created successfully!";

        return RedirectToAction("ExcelPackage");
    }
    catch(Exception ex)
    {
        TempData["Message"] = "Oops! Something went wrong.";

        return RedirectToAction("ExcelPackage");
    }
}

Let’s see the code that goes into the NPOICreate method:

/// <summary>
/// Creates a new Excel spreadsheet based on a template using the NPOI library.
/// The template is changed in memory and a copy of it is sent to
/// the user computer through a file stream.
/// </summary>
/// <returns>Excel report</returns>
[AcceptVerbs(HttpVerbs.Post)]
public ActionResult NPOICreate()
{
    try
    {
        // Opening the Excel template...
        FileStream fs =
            new FileStream(Server.MapPath(@"\Content\NPOITemplate.xls"), FileMode.Open, FileAccess.Read);

        // Getting the complete workbook...
        HSSFWorkbook templateWorkbook = new HSSFWorkbook(fs, true);

        // Getting the worksheet by its name...
        HSSFSheet sheet = templateWorkbook.GetSheet("Sheet1");

        // Getting the row... 0 is the first row.
        HSSFRow dataRow = sheet.GetRow(4);

        // Setting the value 77 at row 5 column 1
        dataRow.GetCell(0).SetCellValue(77);

        // Forcing formula recalculation...
        sheet.ForceFormulaRecalculation = true;

        MemoryStream ms = new MemoryStream();

        // Writing the workbook content to the FileStream...
        templateWorkbook.Write(ms);

        TempData["Message"] = "Excel report created successfully!";

        // Sending the server processed data back to the user computer...
        return File(ms.ToArray(), "application/vnd.ms-excel", "NPOINewFile.xls");
    }
    catch(Exception ex)
    {
        TempData["Message"] = "Oops! Something went wrong.";

        return RedirectToAction("NPOI");
    }
}

One drawback of the ExcelPackage library is that it must create a file on the server. There are some modifications to the library that enables you to create the template copy on memory and send it to the user as the NPOI library does. Take a look at the ExcelPackage’s discussion page at CodePlex and specifically this thread: Why create Excel spreadsheets on the server?

The great thing about NPOI of course is that it enables you to work with the template in code and then send a copy of the spreadsheet directly to the user. The template remains intact and the user receives a modified copy of the template which contains the data processed by the application.

With NPOI when you click on the Create Excel report button you get the download dialog window:

Excel Writer NPOI Download dialog

With ExcelPackage you’d have to get the path of the file created on the server, in this case \Content\ExcelPackageNewFile.xlsx and then send that file to the user. This is an extra step and adds an additional burden to the server. I didn’t implement it and so I let this as an exercise to you.

Well, the spreadsheet included in this simple project has only formulas but you can for sure have an Excel template with lots of formulas, pivot tables, macros, etc. This gives you the power of Excel in code in a clean fashion.

Hope this helps shed some light on this topic!

Note
Using the open source libraries presented in this post you won’t need Microsoft Excel installed on the server.

Updated on 9/22/2010

There’s now EPPlus that extends ExcelPackage.

EPPlus is a .net library that reads and writes Excel 2007 files using the Open Office XML format (XLSX).

EPPlus supports ranges, cell styling, charts, pictures, shapes, named ranges, autofilters and a lot of other stuff.

Updated on 6/2/2010

I’m working on an ASP.NET project that uses .NET Framework 1.1. In such a case I needed to use NPOI 1.2.1 for .NET 1.1.

The code you’ll use with NPOI 1.2.1 for .NET 1.1 is the same presented on this post. Just pay attention to include this using directive inside your C# code:

using NPOI.HSSF.UserModel

Updated on 1/3/2010

If you like NPOI, help spread the word about it voting up on this ad:

Help make NPOI even more Awesome!

Visual Studio 2008 C# ASP.NET MVC Web Application
You can get the Microsoft Visual Studio Project at:

http://sites.google.com/site/leniel/blog/ExcelWriterMvcProject.zip

To try out the code you can use the free Microsoft Visual Web Developer 2008 Express Edition that you can get at: http://www.microsoft.com/express/vwd/Default.aspx

References
ExcelPackage: Office Open XML Format file creation
http://excelpackage.codeplex.com/

ExcelPackage binaries download
http://excelpackage.codeplex.com/Release/ProjectReleases.aspx

NPOI
http://npoi.codeplex.com/

NPOI 1.2.1 for .NET 1.1 
http://npoi.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=33203

NPOI 1.2.1 for .NET 2.0
http://npoi.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=19351

NPOI samples
http://npoi.codeplex.com/Release/ProjectReleases.aspx?ReleaseId=19351#DownloadId=70100