Untergeordnete Seiten
  • Planning more metadata aspects and scraping system

  Wiki Navigation

    Loading...


 Recently Updated


 Latest Releases

 MediaPortal 1.32
            Releasenews | Download
 MediaPortal 2.5
            Releasenews | Download


Table of Contents

Overview

With the given flexibility to extend metadata aspects inside media library and to use them inside skins we are currently planning how to integrate more sources.

Tasks:

  1. Analysis of existing scraper solutions
  2. New requirements for metadata extractors and import
  3. Definition of additional sources (online databases like IMDB)
  4. Definition of needed metadata that matches the source's information
  5. Implementing metadata extractors (here online scrapers)
  6. Adding a plugin with models to expose the new data with properties for skin usage
  7. Extending skins to show and use the new metadata

1 Analysis of existing scraper solutions

The MediaPortal1 plugin "Moving Pictures" provides a very powerful scraper system. This is the short description of the process. Although the system probably can't be used directly, it shows some common problems we need to consider:

  1. Scan for Files. We check import paths. Compare filehashes to stuff already in the system. Based on some simple regex at this point we will automatically group together multipart content (think CD1 CD2 type stuff).
     
  2. Parse Local Data. We try to parse various bits of information from the file such as (but not limited to) movie name, release year, imdb id, etc. Often some of this information is in the filename. We use regex to check for imdb IDs in random text files in the same folder. With the exception of an IMDb ID this data is considered untrusted and is only used as input for the data provider system. We call this the movie signature.
     
  3. Search for Possible Matches. At this point our scripting engine comes into play. A script has several actions that can be executed (similar to ANT targets). The SEARCH action will take various bits of input (the movie signature) then return a list of possible matches. The script is generally fairly dumb. It makes no decisions on which match is the appropriate one. Each result contains limited amounts of data, sometimes just the title and the release date.
    Generally multiple scripts will run in this step. For example if I have both the IMDb script and the themoviedb.org script enabled, I would get possible matches from both.
     
  4. Attempt to Auto-Approve. We then compare the data in each possible match with the movie signature created in step 2. We use a levenshtein algorithm to figure how similar the title is with what we think the title should be. We give higher priority to possible matches that are within a year of our expected release date. There is a lot you can do, but basically we evaluate each possibel match rank it, and if the best match scores above a certain threshold, the system automatically approves it.
     
  5. Allow User to Pick a Match. If none of the possible matches scored high enough to be auto approved, the user must select the best match. The possible matches are sorted by their score so the best matches are at the top. At this point the user also has the option to re-search with modified "movie signature" information. i.e. They can adjust the title, year, etc to improve the results.
     
  6. Grab Detailed Info At this point we have a possible match we are happy with, either via auto approval or user approval. We then go back to our original scraper and call the DETAILS action. This will return full information about the movie including genre, summary, etc. It is also actually possible to get details from multiple sources. If for example I retrieve my movie data from themoviedb.org and it populates the imdb field, if themoviedb.org is missing some data then the imdb script (if enabled) will fill in the gaps.
     
  7. Grab Artwork. Covers and Backdrops are actually grabbed in two separate steps. Most of the time a completely separate script deals with covers (although this is not always the case). The point is that the user can prioritize cover scrapers separately from metadata scrapers. Same with Backdrop scrapers. In any case artwork scrapers are generally trusted to return correct data. As an input they receive the full set of currently known metadata. They return the URL for whatever artwork we are looking for and Moving Pics handles the downloading. We do have a few rules to check for proper resolution and dimensions of the image, but in general, as I said above artwork scripts are mostly trusted to return good results.

2 New requirements for metadata extractors and import

The current way to import media items is to let run all MDEs in extended mode to extract as much information as possible in one run. This can be bad for performance and leaves no way for user interaction (i.e. selecting the best of multiple matches, see 5. above).

Multi-pass importer runs

The import should be split up in multiple passes (2 at least):

  1. Quick import of basic information (like filenames and metadata that is extracted directly from file)
  2. Extended import which also allows use of internet scrapers User interactions
    If online scrapers provide multiple matches, such media items could be put into a queue ("user choice pending"). Each*(Frage)* scraper plugin should provide a GUI to do the mappings and then finish the import.
     

    Priority for metadata extractors

    Different MDEs can provide various "quality levels" of information: i.e.MovieMetadata extractor provides only the file name as "Title", while MP1 recording information can provide Program title, series / episode number, episode name, recording date.
    So MDEs for same categorie (i.e. Videos) will need a priority to control executiong order.
     
     

    3 Definition of additional sources

    MediaPortal 1 TvEngine's recording (local source)

    MediaPortal 1 TvEngine's recording information, stored in xml file with the .ts file (information: channel, long program description, recording date, description, ...). The information is stored as simple xml tags and is easy to read and import.
     

    Additional file based metadata

    For example, mkv files can contain multiple metadata (title, chapter, even covers). They should be extracted as well (new plugin or use more of the MediaInfo features?).
    There exists some more files that can provide metadata (.nfo).

   

 

This page has no comments.