[Clam-devel] Re: Attacking your core goal
David García Garzón
dgarcia at iua.upf.edu
Mon Jul 14 03:35:27 PDT 2008
I'm moving the discussion to the list. Context: Jun planifies the steps until
the final term. Topics: Aggregator python scripts, musicbrainz, qt
interfaces.
On Divendres 11 Juliol 2008, JunJun wrote:
> David,
> Hi,
> The aggregator.py need some script as input. I failed in finding such a
> script as a reference. What is that script like?
Just take a look at AggregatorTest.py for examples.
What a pity that the class is not fully covered by tests because an example of
execution could be also very clarifying.
The scripts have several lines specifying each a command (always 'copy'), the
index of the origin xml file (you may aggregate several sources), the
descriptor scope and name you want to take from the origin and the descriptor
scope and name you want to create on the target.
> I planned several milestones of the core target. For the clarity, I just
> take concrete instances in the steps: milestone-1:
> build a new extractor, who calls a public web service, say musicBrainz.
> Below is an example result that is fetched from musicBrainz:
> <?xml version="1.0" encoding="UTF-8"?>
> <metadata xmlns="http://musicbrainz.org/ns/mmd-1.0#"
> xmlns:ext="http://example.org/ext-9.1#"> <track
> id="d6118046-407d-4e06-a1ba-49c399a4c42f">
> <title>Silent All These Years</title>
> <duration>253466</duration>
> <ext:annotation>This is a <em>very</em> nice song.</ext:annotation>
> </track>
> </metadata>
It seems like you have given MusicBrainz a lot of weight and it's a very side
goal, unless you have any reason i am missing. You already have a lot of
extractors to play with in order to have the aggregator working. Indeed you
could even use the aggregator to disggregate existing ones an do the tests
joining them back. So my advice is to go for the aggregation itself and the
interface to control it. Once we have the core project done, adding
MusicBrainz would be a very good as closure of your project.
In summary, MusicBrainz is interesting, and it is not that far to get it, but
lets give more priority to the core.
> (double-quick Todo: figure out what the python-musicbrainz2 is about.)
It is an abstraction of the webservice so i guess that it hides in some way
the xml communication with the server i also hope it deals with the
fingerprint computation.
> milestone-2:
> A related mapping schema, e.g. MusicBrainzDescriptors.sc.
>
> milestone-3:
> wrap the result of the extractor as CLAM pool xml, according to the schema
> above.
>
> milestone-4:
>
> Here the files in ./scripts should be taken advantage of--
> Aggregate the new schema with an existing schema. For instance,
> MusicBrainzDescriptors.sc will be aggregeted with
>
> CLAMDescriptors.sc
> Aggregate the new pool with the existing pool.
> Test on the annotator.
>
> (The below is pending)
> A graphical interface to build merging script.
> There is no training session about the graphical interface, right?
Yes, and that's another reason to deal with it now, that i am still around. In
any case, most people in the CLAM devel community, and most GSoC students are
proficient in Qt so you will be able to get help and advice from them. Just
another reason to get into the IRC and posting in the mailing list ;-)
So, regarding the planning, i would suggest you the following:
First of all, i would create a new extractor that does the aggregation of some
fixed sources. Such script will ovey the same command line that uses existing
extractors [1] and will spot any pending problem (or not).
Having such an extractor let's figure out which is the varying information
when changing the sources and how to feed the variable information from the
Annotator and the script (extra parameters, config files...), feed it being
constant in the annotator but input parameters for the script.
Then once we know which is the changing information we need for different
aggregations let's design an interface to configure them instead of the
current 'Extractor' field of the Project. Also providing means to store such
configuration into the project.
When storing we are writting into an aggregated pool, but original pools won't
be written so we need a write back path that currently doesn't exist.
This will cover the minimum core part. Then we should stop again and
prioritize the following aspects. Due to the current timing I will be very
happy if you cover just 2 or 3 of them, happier if you end up doing more :-)
- Adding extractors (ie. MusicBrainz, i like the steps you wrote for it)
- Doing a second iteration on the configuration interface (just having it
working is not the best, sure)
- Writing an upload script as example (MusicBrainz? Boca?)
- Being able to configure parameters on the extractors with an configuration
file
- A per descriptor read-only flag in order to control which descriptor can be
modified (avoiding the write back if not supported).
- A per descriptor modified flag in order to control which descriptor must be
saved (avoiding the write back if not needed).
- Addressing building a description from a blank sheet, ie. what to do when
Music Brainz has not found the song, or when you don't have an extractor for
it and want to generate it by hand.
> One more question: the Project.GetExtractor(), where is it?
Project is a DynamicType, i though that we saw them before, if not, you have
been lucky of dealing with CLAM code for two months and not having to deal
with them :-) They are just a kind of Component that may have or not a given
attribute. Attributes are declared with macros that conveniently expand code
for getters, setters, interface to add and remove, xml storage... 'Extractor'
is an attribute and 'GetExtractor' is the generated getter.
BTW, i read in your blog that you got Sebastian's danceability algorithm
working. Congratulations. Of course we would like it in CLAM :-) The
algorithm gives greater number for less danceable excerpts. It is not a bug
in your code that it is inverted. Really confusing, i agree.
Regards.
--
David García Garzón
(Work) dgarcia at iua dot upf anotherdot es
http://www.iua.upf.edu/~dgarcia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.clam-project.org/pipermail/clam-devel-clam-project.org/attachments/20080714/3a457a5a/attachment.sig>
More information about the clam-devel
mailing list