MediaPortal Wiki > MediaPortal 1 > Contribute > Localization > EPG Grabbers and Tuning details > WebEPG > WebEPG Grabber

WebEPG Grabber

Was this page helpful?
Redirected from MediaPortal WebEPG Grabber
  • You do not have permissions to view this page - please try logging in.
  • You do not have permissions to view this page - please try logging in.

<?xml version="1.0" encoding="utf-8"?>

Standard xml file header

<Grabber>

<Info language="" availableDays="" timezone="" version="" />

Attribute

Value(s)

Description

language e.g. "ru"  
availableDays e.g. "7"  
timezone    
version e.g. "2.0"  

<Channels>

All the channels for this site

<Channel id="" siteId="" />

The information for each channel

Attribute

Value(s)

Description

Required
id See channel ID section The ID of the channel in the channel db
siteId   The identifier for the channel on the site

</Channels>

End of Channels section

<Listing type="">

Attribute

Value(s)

Description

Required
type Html, Xml, Data The type of the listing

<Site url="" post="" external="" encoding="" delay="" user-agent=""/>

Attribute

Value(s)

Description

Required
url    
external true/false Use external browser (IE) for downloading page data. Will load certain Javascript sections.
Optional
post    
encoding   Normally auto-detected.
delay   time in milliseconds to wait between each http request.
user-agent  
MPLogo120.png

1.2.0 beta:   optional User-Agent string to be used in each HTTP request. If not specified the default will be used.

 

In the url and post tags are used for WebEPG to insert the required data for each channel, date, etc:

 

Tag

Description

[ID] Site Channel ID - from the ChannelList section
[LIST_OFFSET] offset position in a list longer then one page starts at 0 and is the MaxCount for the next page, MaxCount is added for each page after used together with MaxCount if number of listings on a page is less then MaxCount it stops look for next pages
[PAGE_OFFSET] Same as LIST_OFFSET but only 1 is added for each new page and not MaxCount
[DAY_OFFSET] offset of the day from today (0). Use OffsetStart to change the start.
[YYYY] year
[MONTH] month full name (ie January)
[MM] month with leading 0
[_M] month without leading 0
[WEEKDAY] day of week full name (ie Monday). Weekday names can be changed by including a WeekDayNames section in Search
[DAY_OF_WEEK] day of week as a number. 0=Sunday, 6=Saturday. Specifying startOffset in Search will shift the first day of the week by the same amount of days. E.g. when startOffset=2, 0=Friday, 6=Thursday.
[DD] day with leading 0
[_D] day without leading 0
[EPOCH_TIME] number of seconds since 1/1/1970 8:00:00 AM
[EPOCH_DATE] number of days since 1/1/1970 8:00:00 AM
[DAY_NAME] a string for the name example: today, tomorrow, etc. Requires DayNames section

Note: Replace all '&' with "&"

<Search startOffset="" maxlistings="" startPage="" endPage="" language="" weekday="" />

Attribute

Value(s)

Description

Optional
startOffset   for [DAY_OFFSET]
maxlistings   for [LIST_OFFSET] & [PAGE_OFFSET] (page offset added in 2.4.6.0)
listStart   for [LIST_OFFSET]
startPage   for [PAGE_OFFSET]
endPage   for [PAGE_OFFSET]
language   language to use for [WEEKDAY]. Must be a specific country/language not a neutral language group. For example "es-ES" not just "es".
weekday dddd, ddd format for weekday (long, short)
<DayNames>
<Day>value</Day>

The name of the day to be used with [DAY_NAME] tag in url.

</DayNames>

End of DayNames

<WeekDayNames>

Optional section to redefine weekday names. If present these will be used instead of the weekday format specified above.

<WeekDay>value</WeekDay>

The name of each day to be used with [WEEKDAY] tag in url. The first day is by default Sunday, but can be shifted by setting start startOffset. Increasing startOffset will shift days backwards. E.g. when startOffset=1, first day is Saturday.

</WeekDayNames>

End of weekday names section.

</Search>

End of Search

<Html>

Must match listing type.

<Template name="" start="" end="">

Attribute

Value(s)

Description

Required
name string The template name, must be a default template.
Optional
start search string string to search for the start of the listing area.
end search string string to search for the end of the listing area.
<SectionTemplate tags="">

Attribute

Value(s)

Description

Required
tags HTML tag letters The first letter of each HTML tag to be used for matching. Letters must be in upper case. Multiple tags are given in a string "TSD"


Some common tags:

Letter

Tag(s)

T All table tags <table>, <tr>, <td>, <th>, etc.
D

Expected "=" to follow "div"

S

Expected "=" to follow "span"

P

Expected "=" to follow "p"

H , <h2>, etc
I

Expected "=" to follow "img"

A

Expected "=" to follow "a"

Although the first letter is not unique for every different HTML tag, it is generally good enough to build a unique template for finding data on the page.

<TemplateText>

The template is the HTML tags and data fields that make up the program listing. It can be made up of any HTML tags, however, ONLY those listed in the tags attribute of the SectionTemplate will be used for matching. The others will be ignored. Only second letters of tags are used for matching! For example template "<SPAN class="class1"> will match any <SPAN> tag, not only those with class="class1". However it is useful to write more self descriptive template text, not only the shortest possible.
The template, special tags are used by WebEPG to locate the required data.

 

Tag

Description

Required
#START or #STARTXMLTV program start time
Possible START time formats:
* hh:MM am/pm
* HH:MM
* HH.MM
* HHhMM
STARTXMLTV format: 20080113011500
#TITLE program title
Optional
#END program end time
#ENDXMLTV program end time in XMLTV format.
#DESCRIPTION program description text
#DAY program day (required if not part of page look up)
#MONTH  
#SUBTITLE program subtitle or series episode name
#GENRE program genre
#EPISODE Episode number
#SEASON Season number
#ACTORS actors
</TemplateText>

End of the TemplateText

<MatchList>
<Match field="" match="" />

Attribute

Value(s)

Description

Required
field #FIELD The field to store the data
match match string String to match
</MatchList>

End of MatchList

</SectionTemplate>

End of the SectionTemplate

</Template>

End of the Template

<DataPreference>
<Preference template="" title="" subtitle="" genre="" description="" />

Attribute

Value(s)

Description

Required
template template name The of the template
title 0-3 Preference of this value
subtitle 0-3 Preference of this value
genre 0-3 Preference of this value
description 0-3 Preference of this value
</DataPreference>
<Sublinks>

Sublinks are linked pages that contain extra data, that may not be provided on the main listing page. Optional.

<Sublink search="" template="">

Attribute

Value(s)

Description

Required
search search string string to identify the correct <A Href> tag for this sublink
template template name Name of the template to use for this sublink. Must match a template name.
<Link url="" post="" external="" encoding="" user-agent=""/>

Optional only required if URL is different (cannot be built) from the main site URL. (See Site URL for details).

MPLogo120.png

1.2.0 beta:   You can specify an optional User-Agent string to be used in each HTTP request using user-agent attribute. If not specified the default will be used. Note that if you specify user-agent in <Site> it will NOT be propagated here.

For Javascript URLs.

Example needed.

</Sublink>

End of this Sublink

</Sublinks>

End of the Sublinks section

<Searches>
<Search match="" field="" remove="" />

Attribute

Value(s)

Description

Required
match regex search regex to find data
field #Field name Name of the field used to store the data
remove true/false Remove data from store. Stops data being added to other fields

This command searches the whole section of source page matching the template (all tags, their attributes and values). It finds the value corresponding to given regular expression match and pastes it to given field. If remove is set to true, the whole text corresponding to regular expression match will be cut out of the source page, so it will not be part of output from template parsing. It can be also used to remove undesired parts of descriptions, titles etc. More than 1 search can be used, however only latest match will be used.

Same fields as for TemplateText are allowed.

Example:

<Search match="\([0-9]{1,3}[,][0-9]{0,3}\)" field="#EPISODE" remove="true" />
<Search match="\([0-9]{1,3}\)" field="#EPISODE" remove="true" />
<Search match="\([0-9]{1,3}[/][0-9]{0,3}\)" field="#EPISODE" remove="true" />

This complex search searches for episode number in any of the form (N), (N1, N2) or (N/Count). Episode number will be removed.

</Searches>

End of Searches section

<DateTime>
<Month>value</Month>

Used for matching <#MONTH> tag in template. Only required if <#MONTH> tag is use in a template.

Value is the tet as found on the site. There must be 12 months in the correct order (Jan-Dec).

</DateTime>

End of DateTime

</Html>

End of the Html section

<Xml>

Must match listing type.

<Data>

Must match listing type.

</Listing>

End of Listing section

<Actions>

<Modify channel="" field="" search="" action="">value</Modify>

Attribute

Value(s)

Description

Required
channel * or channel id The channel on which the modify will be performed. (* = all channels)
field field to modify  
search search string  
action Replace/Remove  
value string to replace Only required for Replace action.

</Actions>

End of the Actions section

</Grabber>

End of the grabber config

Further Information




Go to top
Powered by MindTouch
Running the latest version?
V1.2.3 - released April 2012
Releasenews | Download
Changelog
 | Requirements
opensource-logoTeam-MediaPortal 
About
Contact |  Press
Partners