Leggere file Word in Java con Apache POI

Mattepuffo's logo
Leggere file Word in Java con Apache POI

Leggere file Word in Java con Apache POI

Apache POI è un'ottima libreria per visualizzare e manipolare file di MS Office.

Oggi vediamo come leggere file Word (DOC e DOCX).

Li specifico entrambi perchè richiedono parti di libreria e classi differenti.

Se usate Maven aggiungete queste dipendenze al pom.xml:

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>3.17</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>3.17</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-scratchpad</artifactId>
            <version>3.17</version>
        </dependency>

Dopo aver compilato, avviate il programma con il codice di esempio qua sotto:

import java.io.FileInputStream;
import java.io.IOException;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class Main {

    public static void main(String[] args) {
        try {
            HWPFDocument doc = new HWPFDocument(new FileInputStream("test.doc"));
            WordExtractor wex = new WordExtractor(doc);
            System.out.println(wex.getText());
            
            XWPFDocument docx = new XWPFDocument(new FileInputStream("test.docx"));
            XWPFWordExtractor we = new XWPFWordExtractor(docx);
            System.out.println(we.getText());
        } catch (IOException ex) {
            System.out.println(ex.getMessage());
        }
    }

}

Come vedete abbiamo usato HWPFDocument per i DOC, e XWPFDocument per i DOCX.

Questo codice basico si limita a leggere il file indicato, e stamparne il contenuto.

Se volete estrapolare metadati e simili, viene consigliato l'uso di Apache Tika (ne abbiamo parlato qui).

Enjoy!


Condividi

5 Commenti

  • Freddy

    You can submit your site to over 1000 different business/advertising directories for free with one click http://bit.ly/3HeiFUK

    19/01/2023
  • Lyda

    Congrats on your new site, get it listed here for free and we'll start sending people to your site bit.ly/submit_site_l1aa2sZgZ1G6

    19/01/2023
  • Aurelia

    Free submission of your new website to over 1000 business directories here bit.ly/submit_site_9jxc1c6t8mpn

    13/01/2023
  • Demetra

    Have you ever run any ads for your website only to find out that you end up paying more for the ad cost than you make back in profit? It's not really hard to imagine that scenario especially when Google charges $10 per click for some niches. We run ads on popular sites like cnn.com for you and generate traffic to your site without any cost per click ever! You choose your traffic volume and pay one flat fee and we deliver niche targeted, quality visitors to your site. It's that easy and if you're not satisfied, we're happy to give you a full refund. How much better can it get than that? Check out our plans now: http://www2.keywordtargetedwebtraffic.com

    01/06/2021
  • Una

    Have you ever run any ads for your website only to find out that you end up paying more for the ad cost than you make back in profit? It's not really hard to imagine that scenario especially when Google charges $10 per click for some niches. We run ads on popular sites like cnn.com for you and generate traffic to your site without any cost per click ever! You choose your traffic volume and pay one flat fee and we deliver niche targeted, quality visitors to your site. It's that easy and if you're not satisfied, we're happy to give you a full refund. How much better can it get than that? Check out our plans now: http://www2.keywordtargetedwebtraffic.com

    31/05/2021

Commentami!