jxls - Reading XLS files

Overview

jXLS provides jxls-reader module to read XLS files and populate Java beans with spreadsheet data.

Usage Details

To use jXLS to parse excel files and populate your Java objects with read data you have to construct XLSReader object first. The easiest way to do this is to use a special XML configuration file. We will use departmentdata.xls file with department data to demonstrate this method

Construction of XLSReader using XML config file

Mapping between spreadsheet cells and Java objects is configured in XML file. Mapping file structure is rather straightforward. Let's take a look at xml mapping file for 'Sheet1' of our departmentdata.xls sample file

                    <?xml version="1.0" encoding="ISO-8859-1"?>
                    <workbook>
                        <worksheet name="Sheet1">
                            <section startRow="0" endRow="6">
                                <mapping cell="B1">department.name</mapping>
                                <mapping cell="A4">department.chief.name</mapping>
                                <mapping cell="B4">department.chief.age</mapping>
                                <mapping cell="D4">department.chief.payment</mapping>
                                <mapping row="3" col="4">department.chief.bonus</mapping>
                            </section>
                            <loop startRow="7" endRow="7" items="department.staff" var="employee" varType="net.sf.jxls.reader.sample.Employee">
                                <section startRow="7" endRow="7">
                                    <mapping row="7" col="0">employee.name</mapping>
                                    <mapping row="7" col="1">employee.age</mapping>
                                    <mapping row="7" col="3">employee.payment</mapping>
                                    <mapping row="7" col="4">employee.bonus</mapping>
                                </section>
                                <loopbreakcondition>
                                    <rowcheck offset="0">
                                        <cellcheck offset="0">Employee Payment Totals:</cellcheck>
                                    </rowcheck>
                                </loopbreakcondition>
                            </loop>
                        </worksheet>
                    </workbook>

As we can see the root element of xml file is workbook and it can contain any number of child worksheet elements. worksheet tag should contain name attribute indicating the name of excel worksheet which it describes (Sheet1 in our case).

worksheet element can contain any number of section and loop child elements.

section element represents a simple block of spreadsheet cells. The first and the last rows of the block are specified with startRow and endRow attributes

In the current version you have to specify sections for the whole excel sheet so that it is completely broken down into sections. It means that even if you are not going to read for example the first rows of the sheet you need to create an empty section (a section without mappings but with startRow/endRow attributes) so that these rows are reflected in XML file. The unnecessary rows will be skipped and all other sections will be read as required.

Mapping of XLS cells onto Java bean properties is defined using mapping tag which looks like following

                        <mapping cell="B1">department.name</mapping>

You also can use cell attribute to specify mapped cell and the body of the tag for a full property name to populate from this cell. By full property name we mean bean name concatenated with property name like department.name or department.chief.payment . Another option to specify mapped cell is to use cell row and column numbers (zero-based)

                    <mapping row="3" col="4">department.chief.bonus</mapping>

This defines mapping for E4 cell and maps it to department.chief.bonus property.

loop element defines loop (repetitive) block of excel rows. It should contain startRow and endRow attributes to specify start and end row of this repetitive block. items attribute names collection which should be populated with loop block data as it is known in our bean context map. var attribute specifies how to refer to each collection item during iteration in the inner sections. varType attribute defines full Java class name for collection item.

                    <loop startRow="7" endRow="7" items="department.staff" var="employee" varType="net.sf.jxls.reader.sample.Employee">

loop element can contain any number of inner section and loop elements and have to contain loopbreakcondition definition. This describes break condition to stop loop iteration. In our sample it is as simple as specifying that next row after employees data must contain "Employee Payment Totals:" string in the first cell

                    <loopbreakcondition>
                        <rowcheck offset="0">
                            <cellcheck offset="0">Employee Payment Totals:</cellcheck>
                        </rowcheck>
                    </loopbreakcondition>

This is all you need to know to create XML mapping configuration file. Next is a simple sample of code which uses ReaderBuilder class to apply XML mapping file to departmentdata.xls to construct XLSReader class and read XLS data populating corresponding Java beans with XLS data

                    InputStream inputXML = new BufferedInputStream(getClass().getResourceAsStream(xmlConfig));
                    XLSReader mainReader = ReaderBuilder.buildFromXML( inputXML );
                    InputStream inputXLS = new BufferedInputStream(getClass().getResourceAsStream(dataXLS));
                    Department department = new Department();
                    Department hrDepartment = new Department();
                    List departments = new ArrayList();
                    Map beans = new HashMap();
                    beans.put("department", department);
                    beans.put("hrDepartment", hrDepartment);
                    beans.put("departments", departments);
                    XLSReadStatus readStatus = mainReader.read( inputXLS, beans);

Sheet mapping by index

As of version 1.0.2 jxls-reader supports mapping of sheets by index. This can be convenient in case you do not know the names of the sheets. In this case the mapping file will look like this

                <?xml version="1.0" encoding="ISO-8859-1"?>
                <workbook>
                <worksheet idx="0">
                <section startRow="0" endRow="6">
                <mapping cell="B1">department.name</mapping>
                <mapping cell="A4">department.chief.name</mapping>
                <mapping cell="B4">department.chief.age</mapping>
                <mapping cell="D4">department.chief.payment</mapping>
                <mapping row="3" col="4">department.chief.bonus</mapping>
                </section>
                <loop startRow="7" endRow="7" items="department.staff" var="employee" varType="net.sf.jxls.reader.sample.Employee">
                <section startRow="7" endRow="7">
                <mapping row="7" col="0">employee.name</mapping>
                <mapping row="7" col="1">employee.age</mapping>
                <mapping row="7" col="3">employee.payment</mapping>
                <mapping row="7" col="4">employee.bonus</mapping>
                </section>
                <loopbreakcondition>
                <rowcheck offset="0">
                <cellcheck offset="0">Employee Payment Totals:</cellcheck>
                </rowcheck>
                </loopbreakcondition>
                </loop>
                </worksheet>
                </workbook>

So instead of using name attribute for worksheet tag we now can use idx attribute to specify sheet index. The sheet index is zero based.

Error Processing

By default jXLS throws XLSDataReadException if it fails to read some cell value and stops further processing.

You can override this behaviour and allow to skip errors and continue processing with setSkipErrors(true) method of ReaderConfig class. You get an instance of ReaderConfig class using ReaderConfig.getInstance() method.

                ReaderConfig.getInstance().setSkipErrors( true );

All the error messages are stored in XLSReadStatus object which is returned from read method. XLSDataReadException also contains a reference to this object. You can analyse XLSReadMessage objects after getting them from XLSReadStatus using getReadMessages() method.

Conversion Mechanism

jXLS integrates with Jakarta BeanUtils conversion utilities to perform actual conversion from Excel cell values into bean properties. See more about this in ConvertUtils. jXLS uses standard converters for primitive types provided by org.apache.commons.beanutils.converters package. For java.util.Date type we provide a custom DateConverter which uses POI utility methods to convert from Excel date representation into Date class.

BeanUtils Converters for primitive types return a default value when a conversion error occurs. jXLS overrides this behaviour in ReaderConfig class registering these classes to throw a ConversionException. You can use setUseDefaultValuesForPrimitiveTypes(true) method of ReaderConfig class if you prefer these classes to use a default value. For example

                ReaderConfig.getInstance().setUseDefaultValuesForPrimitiveTypes( true );

To define your own converters from an excel cell into your own property type you should implement Converter interface and register your converter with ConvertUtils class.

Overview

Reference

Samples

Project Documentation

Overview

Usage Details

Construction of XLSReader using XML config file

Sheet mapping by index

Error Processing

Conversion Mechanism