使用TextMiniing和Apache POI获得Word文件内容，…

2008-02-23 09:34:41来源：互联网阅读 ()

/*
* Created on 2005/07/18
* 使用tm-extractors-0.4.jar
*/
package com.nova.colimas.common.doc;
import Java.io.FileInputStream;
import java.io.FileOutputStream;
import org.textmining.text.extraction.WordExtractor;

/**
* Deal with ms-word 2000/xp files.
* @author tyrone
*
*/
public class WordProcess extends DocProcess {
public static String run(String filename){
WordExtractor extractor=null;
String text=null;
try{
FileInputStream in = new FileInputStream (filename);
extractor = new WordExtractor();
text=extractor.extractText(in);
}catch(Exception ex){
//log
return null;
}
return text;
}
public static void main(String[] args){
try{
FileOutputStream out=new FileOutputStream("result.txt");
out.write(WordProcess.run(args[0]).getBytes());
out.flush();
out.close();
}catch(Exception ex){
System.out.println(ex.toString());
}
}
}

上一篇： JBoss 文档（三）——JBoss和JMS
下一篇： JNI完全手册 (收藏)

标签：

版权申明：本站文章部分自网络，如有侵权，请联系：west999com@outlook.com
特别注意：本站所有转载文章言论不代表本站观点，本站所提供的摄影照片，插画，设计作品，如需使用，请与原作者联系，版权归原作者所有