Metadata-Version: 1.0 Name: hachoir-parser Version: 1.0 Summary: Package of Hachoir parsers used to open binary files Home-page: http://hachoir.org/wiki/hachoir-parser Author: Hachoir team (see AUTHORS file) Author-email: UNKNOWN License: GNU GPL v2 Download-URL: http://hachoir.org/wiki/hachoir-parser Description: hachoir-parser is a package of most common file format parsers written for Hachoir framework. Not all parsers are complete, some are very good and other are poor: only parser first level of the tree for example. A perfect parser have no "raw" field: with a perfect parser you are able to know *each* bit meaning. Some good (but not perfect ;-)) parsers: * Matroska video * Microsoft RIFF (AVI video, WAV audio, CDA file) * PNG picture * TAR and ZIP archive Website: http://hachoir.org/wiki/hachoir-parser What's new in hachoir-parser 1.0? ================================= Changes: * OLE2: Support file bigger than 6 MB (support many DIFAT blocks) * OLE2: Add createContentSize() to guess content size * LNK: Improve parser (now able to parse the whole file) * EXE PE: Add more subsystem names * PYC: Support Python 2.5c2 * Fix many spelling mistakes Minor changes: * PYC: Fix long integer parser (negative number), add (disabled) code to disassemble bytecode, use self.code_info to avoid replacing self.info * OLE2: Add ".msi" file extension * OLE2: Fix to support documents generated on Mac * EXIF: set max IFD entry count to 1000 (instead of 200) * EXIF: don't limit BYTE/UNDEFINED IFD entry count * EXIF: add "User comment" tag * GIF: fix image and screen description * bzip2: catch decompressor error to be able to read trailing data * Fix file extensions of AIFF * Windows GUID use new TimestampUUID60 field type * RIFF: convert class constant names to upper case * Fix RIFF: don't replace self.info method * ISO9660: Write parser for terminator content What's new in hachoir-parser 0.10? ================================== New parsers: * Microsoft Archive parser (.mar) * Microsoft Windows animated icon (.ani): based on RIFF file format * Microsoft's HTML Help (.chm) * Windows Shortcut (.lnk) * X11 Portable Compiled Font (pcf) * Adobe Portable Document Format (PDF) Major changes: * Convert many constants to Unicode * Set charset to ISO-8859-1 for many strings with no charset. Examples: filename in gzip, strings in ID3v1 * MIME type is now in Unicode * Timestamp are stored as datetime.datetime() object * Add MAC48_Address and NIC24 parser * Add IEEE 24-bit Organizationally unique identifiers list Changes: * Disable QueryParser fallback feature * QueryParser accepts "class" tag * Split Parser in HachoirParser and Parser classes * OLE2: * Rewrite most of the code using SeekableFieldSet * Support FAT block chain * Able to parse fragmented streams * Add parser for component object and document summary * MKV: add method to convert date value to datetime.datetime() object * OGG: validate() checks magic string * Write PascalStringWin32 class * Add Win32 LANGUAGE_ID dictionary * Rewrite GUID class using RFC 4122: * Supports differents GUID format versions * Able to read timestamp * Able to read network address * iTunesDB: support sort index type and playlist * BMP: move code to parse image data in a separated function, so code can be reused; fix magic regex (reserved may be not nul) * EXIF/TIFF: reject IFD entry with more than 300 values * MPEG audio: * Frame.isValid() also checks sync field * Add getNbChannel() method * findSyncrhonizeBits() uses stronger validation to avoid false positive * validate() checks first field name and not just if stream starts with bytes "ID3" * RIFF: text: truncate to nul byte and use ISO-8859-1 charset * JPEG: reject invalid component id or quantization index (instead of using a warning message) * JPEG: support all sort of start of scan (especially progressive jpeg) * JPEG: add magic string of JPEG starting with Adobe chunk * Photoshop metadata: add parser for version information * PNG: add method to get number of bits per pixel and use do not format timestamp value * PNG: support transparency color * TTF: Reject chunk with more than 300 names * EXE: Reject PE program with more than 50 sections * EXE resource: * PE_Resource now uses SeekableFieldSet * Parse file flags * Read file subtype (for driver or font) * Reject header with more than 300 entries * Stop parser at depth 5 * Write version information parser for NE program Minor changes: * GIF: replace image marker warning with a parser error * IPTC: use charset UTF-8 and not ISO-8859-15 * CAB: validate() rejects file with more than 30 folders and fix misuse of seekBit() * AU: fix end padding size What's new in hachoir-parser 0.9? ================================= New parsers: * ACE, CAB, RAR, MOD, S3M, XM, PSD, Torrent, TTF, PDF, NE, MPEG TS Changes: * Add unique identifier and category to each parser * Use tags to choose the right parser * Create ParserList and QueryParser classes * Support magic string as regex ('magic_regex') Improved parsers: * 7-zip: parse a lot of headers, just not start and signature headers * ZIP: support file without file size, support 64-bit structures * Ogg: support "video" chunk and add function to get last page What's new in hachoir-parser 0.8.1? =================================== New features: * Rewrite setup.py: uses distutils by default (instead of setuptools), doesn't depend on hachoir-core * ICO parser: fixes to support cursors * Parser use new HACHOIR_ERRORS constant Bugfixes: * gzip: fix magic string * XCF: remove useless exceptions * RIFF: fix fourcc handler (when fourcc is a string and not Unicode) * FAT: catch ValueError when using string index() method * ASF: don't create empty fields and validate() checks header minimum size * EXE: validate() checks size_mod_512 in MSDOS header, add method to compute content size of MSDOS executable (not PE) What's new in hachoir-parser 0.8? ================================= New parsers: * 7-zip archive * Aldus Placeable Metafile (APM), variant of WMF * Audio Interchange File Format (AIFF) * Audio Interchange File Format Compressed (AIFC) * Linux swap file * LucasArts Font * New Technology File System (NTFS) * Microsoft Enhanced Metafile (EMF) * Microsoft Windows Metafile (WMF) * Musical Instrument Digital Interface (MIDI) audio file parser * Real Audio (.ra) * Real Media (.rm) * Truevision Targa Graphic (TGA) picture New features: * Add method to compute real content size * Add magic string to find file start * Add method to get file extension (file name suffix) * Add method to choose the best MIME type * Really better file validation, sometimes use arbitrary limits to detect invalid file. Examples: 50 MB for maximum SWF file size, 6000 pixels for maximum GIF picture width, etc. Changes: * Lazy decompression for bzip2 and gzip parsers * ZIP: add more MIME types and file extensions * EXE: better PE detection * Set constant name to upper case * Always use a tuple for common file extensions * Bitmap: add padding to pixels if needed, fix size of pixels field * Tcpdump: display ARP layer info (if any) and reject file if link type is unknown What's new in hachoir-parser 0.7? ================================= New parsers: * AMF metadata, used in Flash video * Flash animation (SWF) * Flash video (FLV) * Java class * Ogg/Vorbis (audio) * Ogg/Theora (video) * Reiser file system version 3 Important parser improvments: * bzip2 and gzip parser are able to decompress file * JPEG picture: * Parse quantization table and restart interval * Write stronger validate method * GIF picture: support image comment, graphic control and netscape 2.0 extension * ID3v1: support ID3 version 1.1 and 1.1b (track number and genre) * MPEG audio: * Better file validation (less false positive), don't allow padding between frames anymore * Fix computation of frame size: now works with MPEG version 2 and 2.5 * RIFF: parse AVI and ODML headers * Tcpdump: add parser for Unicast (layer 2) Other parser improvments: * Photoshop metadata: fix header, "reserved" is a string not four nul bytes * Bitmap: support version 4 * PNG: add background color parser * Sun/NeXT audio: add more codec description * Matroska video container: add ISO 639-2 language names * EXT2 file system: use bits for file mode (instead of 16-bit integer) Developer changes: * Split run_testcase.py in three: download_testcase.py, run_testcase.py for hachoir-parser and run_testcase.py for hachoir-metadata * Update for hachoir-core 0.7: * Use NullBits/NullBytes for nul padding * Rename _createDescription() to createDescription() * Rename _createValue() to createValue() * Create function parseStream() to parse a stream * Palette is now PaletteRGB and is based on UserVector class * New Parser class based on the simple Parser class from hachoir-core What's new in hachoir-parser 0.6? ================================= News of version 0.6.2: * Fix Microsoft Office parser: misuse of new array() function * Fix SECT.display attribute (convert integer to string) News of version 0.6.1: * Fix EXIF parser: SubFile import was missing News of version 0.6: * hachoir-parser is now a separated component so it's easier to release new versions and write small bugfix * New parsers: * 3DO model (by Cyril Zorin) * Abstract Syntax Notation One (ASN.1) * MPEG video * Spider-Man video (by Mike Melanson) * Tcpdump: Ethernet, IPv4, ARP, ICMP, TCP, UDP * TIFF image * ZSNES save (by Jason Gorski) * Better parsers: * MPEG audio: support padding between frames, better file validation, and guess if bit rate is constant (CBR) or variable (VBR) * Python PYC: rewritten from scratch, now support python 1.5 to 2.5 * ID3v2: support picture in v2.3.0, safer charset code * Many small bugfixes in ID3, MPEG audio and other parsers Since hachoir core 0.6 is able to "autofix" more bugs, hachoir-parser 0.6 is even stronger. Parser list =========== Archive ------- * 7zip: Compressed archive in 7z format * ace: ACE archive * bzip2: bzip2 archive * cab: Microsoft Cabinet archive * gzip: gzip archive * mar: Microsoft Archive * rar: Roshal archive (RAR) * rpm: RPM package * tar: TAR archive * unix_archive: Unix archive * zip: ZIP archive Audio ----- * aiff: Audio Interchange File Format (AIFF) * fasttracker2: FastTracker2 module * itunesdb: iPod iTunesDB file * midi: MIDI audio * mod: Uncompressed amiga module * mpeg_audio: MPEG audio version 1, 2, 2.5 * ptm: PolyTracker module (v1.17) * real_audio: Real audio (.ra) * s3m: ScreamTracker3 module * sun_next_snd: Sun/NeXT audio Container --------- * asn1: Abstract Syntax Notation One (ASN.1) * matroska: Matroska multimedia container * ogg: Ogg multimedia container * ogg_stream: Ogg logical stream * real_media: RealMedia (rm) Container File * riff: Microsoft RIFF container * swf: Macromedia Flash data File System ----------- * ext2: EXT2/EXT3 file system * fat12: FAT12 filesystem * fat16: FAT16 filesystem * fat32: FAT32 filesystem * iso9660: ISO 9660 file system * linux_swap: Linux swap file * msdos_harddrive: MS-DOS hard drive with Master Boot Record (MBR) * ntfs: NTFS file system * reiserfs: ReiserFS file system Game ---- * lucasarts_font: LucasArts Font * spiderman_video: The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV video * zsnes: ZSNES Save State File (only version 143) Image ----- * bmp: Microsoft bitmap (BMP) picture * gif: GIF picture * ico: Microsoft Windows icon or cursor * jpeg: JPEG picture * pcx: PC Paintbrush (PCX) picture * png: Portable Network Graphics (PNG) picture * psd: Photoshop (PSD) picture * targa: Truevision Targa Graphic (TGA) * tiff: TIFF picture * wmf: Microsoft Windows Metafile (WMF) * xcf: Gimp (XCF) picture Misc ---- * 3do: renderdroid 3d model. * 3ds: 3D Studio Max model * chm: Microsoft's HTML Help (.chm) * lnk: Windows Shortcut (.lnk) * ole2: Microsoft Office document * pcf: X11 Portable Compiled Font (pcf) * pdf: Portable Document Format (PDF) document * tcpdump: Tcpdump file (network) * torrent: Torrent metainfo file * ttf: TrueType font Program ------- * elf: ELF Unix/BSD program/library * exe: Microsoft Windows Portable Executable * java_class: Compiled Java class * python: Compiled Python script (.pyc/.pyo files) Video ----- * asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio) * flv: Macromedia Flash video * mov: Apple QuickTime movie * mpeg_ts: MPEG-2 Transport Stream * mpeg_video: MPEG video, version 1 or 2 Total: 70 parsers Platform: UNKNOWN Classifier: Intended Audience :: Developers Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console :: Curses Classifier: License :: OSI Approved :: GNU General Public License (GPL) Classifier: Operating System :: OS Independent Classifier: Natural Language :: English Classifier: Programming Language :: Python