Seitenhierarchie

  Wiki Navigation

    Loading...


 Recently Updated


 Latest Releases

 MediaPortal 1.32
            Releasenews | Download
 MediaPortal 2.5
            Releasenews | Download


The most important part of the config file is the template.

A site has one or more templates. The templates tell WebEPG how to find the data on the website.

The template contains the HTML tags that will be used to find the data in the whole page. The HTML source for a page can be obtained in most web browsers by right clicking on the page.

All HTML tags are supported, including comments <!-- --->

Parser tags are special tags for marking the place where the interesting data is. There are currently three types <#xxxx>, <*xxxx> and <Zxx>.

A simple template would look like this:

<tr>
<td><#START></td>
<td><#TITLE></td>
<td></td>
</tr>

The parser searches for this pattern in the HTML source and reports the number of times it finds it.

When you ask it to parse a certain occurence, it will get the text from the HTML source located where the <#START> and <#TITLE> tags are.

In this case tag = "#START" or "#TITLE" and value will be the text located in the HTML source at this location. Characters can be put in front and behind the <#> tags to remove part of the text.

So "-<#START>." will search for the '-' and '.' and pass what is between these as the value string into the SetElement method. Not just one character is used, but the string between the webepg tags (#) and the HTML tags. However, because there is sometimes too much text between two tags a way to specify the exact string used is needed. It is to specify the strings front and behind using the following syntax <#TAGNAME:front,back>, where front and back are search strings (either can be empty). If no search strings/characters are given, it will go to the next tag. Of course extra parsing can be done in the IParserData object. You just need to create a new class with this interface.

It does extra parsing of the element values, for example trimming the spaces and other junk from strings.

Templates Tags and Parsing

The Tags variable, tells the parser which HTML tags are interesting, all other tags will be ignored in both the Template and the page source. It is the first character of the HTML tag name. See Grabber file for more details.

Example:

<table>
<tr>
<td><img><a href><#START></a></td>
<td><#TITLE></td>
<td></td>
</tr>
</table>

So in this example if "T" is used all the table tags (table, tr and td -> ie all tags starting with the letter T). are used for parsing. The IMG and A Href tags are ignored. If however, one of these is required for parsing then "TA", "TI" or for both "TIA" can be used. The order doesn't matter. This means that the real HTML source could have other tags in it but the parser would match it because it would just ignore these tags. It is sometimes good to put all the tags that appear on a page even if they are not used for parsing. This can help give a more complete picture of the page source layout and changing the tags which are use can be done quickly without editing the template.

Generally it is best to use as few tags as possible to make the template unique to the data. Using too many tags can mean small changes on the source web page may require template changes. Such tags like table tags which define structure are good because the structure doesn't change often.

Dynamic Templates

The <Z> tag

This tag is used to make a template for a variable structure and deal with optional information. Some websites add extra information by changing the HTML structure (for example adding extra table rows).

With this tag regex code can be used.

An <z> tag must also have an end tag </z>. This indicates the start and end of the area with is considered optional.

Example:

<tr> <td><#START></td> <td><#TITLE></td> <z(><td><#DESCRIPTION></td></z)?> </tr>

In this example the simple regex ( )? is used to indicate that this part is optional.

In regex ? is the same as (){0,1} - 0 or 1 times. At the moment the system has problems with any number greater than 1, as it causes an imbalance between the template and the source.

Other regex code have not been tested. It will accept any valid regex code but whether it parses or not is another question.

For more details on regex try this site: http://www.regular-expressions.info/

<*MATCH> and <*VALUE>

There are currently only two <*> tags: <*MATCH> and <*VALUE>. These tags must be used in pair. They also require an extra list with a Match value and a Field value - both strings. The list is placed in the MatchList child elements. The Match element's match attribute contains the information to be used by the <*MATCH> tag to search for text in the HTML code. The field attribute specifies in what field the data found by the <*VALUE> tag will be stored.

Example

Consider the following HTML code from where we wish to extract the actor's names:

<table>
<tr> <td> Cast: </td> <td> John Doe, Jane Dadeo </td> <td> </td> </tr>
<tr> <td> </td> <td> Cast: </td> <td> Joe Shmoe, Grace Goe </td> </tr>

Normally we would use something like: <td> </td> <td> <#ACTORS> </td> <td> </td>

But this wouldn't work in this case since on the second row <#ACTORS> would get a value of: Cast:

Instead we'd like to match the text "Cast:" but grab the value from the next <td> tag. This is where <*MATCH>, <*VALUE> and <MatchList> comes in handy.

Add the following tags to your TemplateText:

<z(> <td><*MATCH></td> <td><*VALUE></td> </z)?> 
<z(> <td><*MATCH></td> <td><*VALUE></td> </z)?>

And be sure to add a MatchList:

<MatchList>
    <Match field="#ACTORS" match="Cast:" />
</MatchList>

In this case the parser will try to match the text located by the <*MATCH> tag with the list of match strings and then store the text located by the following <*VALUE> tag into the corresponding field.

   

 

This page has no comments.