GST RIFF parser module


Intro

Microsoft's RIFF format is based on AIFF, with modifications.  Duh.
What'd you expect?  Even so, it can barely be called a format, as it's
just a tagging method.

A RIFF file is built from chunks, the file itself being a single chunk
with lots of small chunks.  A chunk consists of a 4 byte Chunk ID, a 4
byte Chunk Size (which doesn't include the ID and size), and a payload.
The main file chunk has an ID of 'RIFF', a size that is 8 bytes (id+size)
less than the file size, followed by a 4 byte 'form' ID, then any number
of chunks.

In the case of WAVE files, for instance, the file looks like this:

vvvv    vvvvvvvv                    vvvv			(verbatim)
RIFFsizeWAVEfmt sizewave format infodatasizewaveform.......
|   |   |   |   |   |   |   |   |   |   |   |
0   4   8   12  16  20  24  28  32  36  40  44

Rather simple, but a pain to parse in a streaming manner.  This uLibrary
should help with that, in the context of the gstreamer library.


Using libgstriff

All operations center around a GstRiff structure, which must be created
before anything can happen.  gst_riff_new() will return a pointer to a
valid GstRiff structure.

Since this library was designed for streaming, the most fundamental
operation is to feed the library a new buffer of type GstBuffer.  To do
this, call gst_riff_next_buffer(riff,buffer).  The library will attempt to
gather all the information it can about the RIFF structure from the
buffer.

Since RIFF is designed to support chunks up to 2GB (2^32) in size, you'll
very likely not want to call to the RIFF library for every buffer passing
through your code.  To support this, the library exports a variable
containing the offset of the next expected chunk, equivalent to the end of
the current chunk.  This can be obtained with
gst_riff_get_nextlikely(riff).

As the parser finds chunks, it will add them to an internal list, which is
accessible via several library calls.  The entire list is available by
calling gst_riff_get_chunk_list(riff), which returns a GList * filled with
GstRiffChunk structures.  Each GstRiffChunk contains the id, offset, and
size of that chunk.  You can search for a specific chunk by name with
gst_riff_get_chunk(riff,"name"), where "name" is a Four Character Code
(fourcc) equivalent to the id.  To convert from numeric id (gulong) to
fourcc and back, use gst_riff_fourcc_to_id("name") and
gst_riff_id_to_fourcc(id).


Example usage

Consider the case of 'The Microsoft Sound.wav'.  Yeah, sorry, hives
breaking out everywhere, but it's a decent example.  It is so because it
has at least 3 chunks, not the normal two of a WAVE file.  The first is
the fmt chunk, then the data, then a 'DISP' chunk, and after that it seems
to meander into oblivion, though strings makes it obvious there's a
'LIST' type there with lots of data about the sound (i.e. it was produced 
by Brian Eno).

You'd start by creating the parser, obviously.  At that point it's waiting
for a buffer that contains the info necessary to validate that it's a
RIFF, and get it started parsing the file.  That is done by calling
gst_riff_next_buffer() with the first buffer.  It will find the RIFF
header, the WAVE form, and start by parsing the fmt chunk.  If you passed
it a large enough buffer, it will also find and parse the data chunk.

At this point you can query the parser for the fmt and data chunks,
retrieving info necessary to configure the sound card and start sending
data to it.  You can also query the nextlikely pointer, and find that it's
way off in the distance (some 131KB).  At this point the parser won't
learn anything by being told of each subsequent buffer, so your code
should lay off.

When the buffer comes around that has the nextlikely offset lying in the
middle of it, you'll want to send that buffer to the parser so it can pick
up the next chunk.  At this point you can ask for the most recent chunk
with gst_riff_get_chunk_number(riff,0), and decide what to do next.  You
can determine how many chunks were found in each parser cycle by the
return value from next_buffer().
