MetaDirectory: Creating searchable meta directory structures from XML metadata

So what can you do with your movie XML metadata?

I've had this idea for some time and now when I started studying XSLT I needed something concrete to practice it's features.

The idea was that I have files that have some kind of XML metadata information I could automatically create multiple directory structures to navigate to this file.

For instance if I have list of actors, I could go to directory Actors and open any actor directory there and it would show all the movies that actor has made. I could go further into Year subdirectory and I could select only those movies that actor made in particular year.

And it should be highly configurable.
XSLT solves that problem. It isn't the easiest of languages but it's very powerful for this kind of use.

This script creates directory structures and shortcuts to the original file in the end of the directory tree. Original file can be any file type as long as it has accompanying XML metadatafile. It your document is "My document.doc" then the XML file should be named "My Document.doc.xml".

These examples use only movie files and the XML schema is from ImdbPY. See my earlier blog post of fetching these XML files from Imdb.

Because this script only creates new directories it never deletes them. If your metadata changes only way to properly update the directory structure is to build it from scratch. And you have to run the script to update the directories after the metadata changes, of course.

The script has four parameters:
  1. TargetDir  Directory where your metadata directory structure is written
  2. XSLT File  Name of the XSLT file used for formatting the directory structure
  3. SourceDir  Source directory for the document files
  4. XML Static file Static XML file that is used with media XML file for XSLT formatting 

Here's the script and single executable file (compiled with pyInstaller). 


MetaDirectory.py
MetaDirectory.exe


As you can see the script is quite straightforward. The magic happens in XSLT file.

If you have several source directory trees you simply run MetaDirectory many times and use the same target directory. You can also create different XSLT files and Static XML files for each source directory tree. You can for instance combine music, music videos and movies into same target directories so you can have first level directory like Favorite Artists/Elvis Presley that contains all the music and all the movies of Elvis Presley.



The XML Static file can be used for inserting static XML structures. Like for instance you have a directory where all the movies are awful. You can put into your Static file
<own_movierating>Awful</own_movierating>
Another use case is actors. If you put all the actors of all the movies in one directory there will be a lot of actors. You usually want to list only the lead actors or top100. Then you can create in your static file a list of all the actors you find special and in your meta directory you can create a directory where only those actors are.

Here's a example of Static XML file:
StaticData.XML

There will also be the original document file name in the XML schema /data/DocumentFileName. You can use it if you want to name the shortcuts with original file name.

And the most important file is the XSLT file.
MovieDB.xslt



This is only an example file. It is not ment to be a complete XSLT. There's only a couple of use cases you can copy&paste to your own XSLT file. You don't have to know how XSLT works.

There's some examples of ImdbPY XML schema at the start of file that you can use. All elements are in the root element /data.

The XSLT file transform the XML metadata into list of all the possible directory names. The list starts with <items>-tag.

<xsl:for-each select="/data/movie/genres/item">
    <item>
        <xsl:copy-of select="/data/movie/kind/text()"/>/[Genre]/<xsl:copy-of select="text()"/>/[Year]/<xsl:copy-of select="/data/movie/year/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>

    </item>
</xsl:for-each>
This first section selects from XML file all the genres of that movie and loops them. For each row it also gets the media type(kind), the year movie was mad and always the last one is the name of the document and in this case it's the name of the movie.

<xsl:for-each select="/data/movie/director/person">
    <item>

        <xsl:copy-of select="/data/movie/kind/text()"/>/[Director]/<xsl:copy-of select="name/text()"/>/[Year]/<xsl:copy-of select="/data/movie/year/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
    </item>
</xsl:for-each>

<xsl:for-each select="/data/movie/director/person">
    <item>
        <xsl:copy-of select="/data/movie/kind/text()"/>/[Director]/<xsl:copy-of select="name/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
    </item>

</xsl:for-each>

This is similar than the the first one but in this case there's one root directory [Director] where two XSLT queries insert files. First one creates directory structure
movie/Director/*/Year/*/
and the second one
movie/Director/*/
So when you open the directory for particular director, it lists all the movies AND has also subdirectories where you can select particular year the movie was made.

<xsl:for-each select="/data/movie/cast/person">
    <xsl:variable name="actor" select="name/text()"/>
    <xsl:for-each select="/data/movie/genres/item">
        <item>
            <xsl:copy-of select="/data/movie/kind/text()"/>/[Actor, All]/<xsl:copy-of select="$actor"/>/[Genre]/<xsl:copy-of select="text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
        </item>
    </xsl:for-each>
</xsl:for-each>


This one is a little more complex. It lists all the actors and inside each actors directory is also directories for different genres. This has to be made with two for-each loops because one movie has many actors and many genres. So the product of this query will be a cartesian product of all the movies actors and all the movies genres.

First we loop the actors and put the name of the actor in variable. Otherwise we cannot access the actor name in the inner loop.Then we loop all the genres and when we print the row, we use the variable $actor.

<xsl:for-each select="/data/movie/cast/person">
    <xsl:variable name="actor" select="name/text()"/>
     <xsl:for-each select="/data/top_actors/actor[name=$actor]">
            <item>
                <xsl:copy-of select="/data/movie/kind/text()"/>/[Actor,Top 100]/<xsl:copy-of select="$actor"/>/[Year]/<xsl:copy-of select="
/data/movie/year/text()"/>/<xsl:copy-of select="/data/movie/title/text()"/>
            </item>
    </xsl:for-each>
</xsl:for-each>


This is an example how static data can be used to filter out rows.First I loop all the actors and save the actors name in variable actor. The I loop all the top_actors from my static data file and select only those where name of top_actor is the same as in variable actor.

<item>
    <xsl:copy-of select="/data/movie/kind/text()"/>/<xsl:copy-of select="substring(/data/movie/title/text(),1,1)"/>/<xsl:copy-of select="/data/movie/title/text()"/>
</item>


This does not loop anything because you have to use for-each loops only for items that are many for one movie. This only takes initial from movie title using substring so it creates directories A, B,C so it's easier to search by movie title. It should be fairly easy also to remove the leading A's ot The's.

Last part of each item is always the file name of the shortcut. It does not have to be movie title. It only has to be unique. You could for instance combine in file name the year and the title.

One way the script could be expanded is reading Windows file metadata like Word and Excel directly from files using Windows API. Then it would not be necessary to create separate XML files for them.

This script uses XSLT 2.0 and there's tons of capabilities in it. These examples only a scratches the surface.

Edit:
I updated the script to properly work with Unicode characters.  For instance the shortcut creation had to be done via alternate COM interface for it to work with Unicode paths.

The files are also in GitHub:
https://github.com/MarkoMarjamaa/MetaDirectory

Comments