Download List

Projeto Descrição

Texterize is a text and metadata extraction tool and library which can be used to quickly get the text content of a file. It currently supports file formats like PDF, Excel, Powerpoint, Word, RTF, WordPerfect, MP3, Ogg, and all OpenDocument file formats. The output of texterize is either text or XML. It is also designed to work with Unicode input and output, and the default output character set is UTF-8. Texterize also has a recursive mode so that whole directories (or whole filesystems) can be converted to text. This recursion also works through archive files and compressed files like zip, tar, and gz files.

System Requirements

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2009-10-05 18:13
0.1.3

Foi adicionado suporte para MS Escrever e os formatos do KOffice. Extração de texto simples é suportado AmiPro (arquivos. Sam), OOXML, e dBase. Compilando agora trabalha com objdir externos e versões glib 2,0, 2,2, 2,4 e 2,6. Suporte a PDF agora é opcional. Bugfixes foram feitas para a extração tarfile.
Support was added for MS Write and the KOffice formats. Simple text extraction is supported from AmiPro (.sam files), OOXML, and dBase. Compiling now works with external objdir and glib versions 2.0, 2.2, 2.4, and 2.6. PDF support is now optional. Bugfixes were made to tarfile extraction.

2008-02-03 13:54
0.1.2

Muitas falhas encontradas por meio de difusão foram corrigidos. Alguns bugs foram corrigidos principais PDF (incluindo um bug analisador font introduzido em 0.1.1). O script de configuração foi melhorado (não mais forçado CFLAGS).
Tags: Major bugfixes
Many crashes found through fuzzing were fixed.
Some major PDF bugs were fixed (including a font
parser bug introduced in 0.1.1). The configure
script was improved (no more forced CFLAGS).

Project Resources