searxng/searx/engines/vimeo.py

## Vimeo (Videos)
#
# @website     https://vimeo.com/
# @provide-api yes (http://developer.vimeo.com/api),
#              they have a maximum count of queries/hour
#
# @using-api   no (TODO, rewrite to api)
# @results     HTML (using search portal)
# @stable      no (HTML can change)
# @parse       url, title, publishedDate,  thumbnail, embedded
#
# @todo        rewrite to api
# @todo        set content-parameter with correct data

from urllib import urlencode
from HTMLParser import HTMLParser
from lxml import html
from searx.engines.xpath import extract_text
from dateutil import parser

# engine dependent config
categories = ['videos']
paging = True

# search-url
base_url = 'https://vimeo.com'
search_url = base_url + '/search/page:{pageno}?{query}'

# specific xpath variables
url_xpath = './a/@href'
content_xpath = './a/img/@src'
title_xpath = './a/div[@class="data"]/p[@class="title"]/text()'
results_xpath = '//div[@id="browse_content"]/ol/li'
publishedDate_xpath = './/p[@class="meta"]//attribute::datetime'

embedded_url = '<iframe data-src="//player.vimeo.com/video{videoid}" ' +\
    'width="540" height="304" frameborder="0" ' +\
    'webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>'


# do search-request
def request(query, params):
    params['url'] = search_url.format(pageno=params['pageno'],
                                      query=urlencode({'q': query}))

    # TODO required?
    params['cookies']['__utma'] =\
        '00000000.000#0000000.0000000000.0000000000.0000000000.0'

    return params


# get response from search-request
def response(resp):
    results = []

    dom = html.fromstring(resp.text)

    p = HTMLParser()

    # parse results
    for result in dom.xpath(results_xpath):
        videoid = result.xpath(url_xpath)[0]
        url = base_url + videoid
        title = p.unescape(extract_text(result.xpath(title_xpath)))
        thumbnail = extract_text(result.xpath(content_xpath)[0])
        publishedDate = parser.parse(extract_text(
            result.xpath(publishedDate_xpath)[0]))
        embedded = embedded_url.format(videoid=videoid)

        # append result
        results.append({'url': url,
                        'title': title,
                        'content': '',
                        'template': 'videos.html',
                        'publishedDate': publishedDate,
                        'embedded': embedded,
                        'thumbnail': thumbnail})

    # return results
    return results
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`## Vimeo (Videos)`
[fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 15:37:56 +00:00			`#`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# @website https://vimeo.com/`
[fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 15:37:56 +00:00			`# @provide-api yes (http://developer.vimeo.com/api),`
			`# they have a maximum count of queries/hour`
			`#`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# @using-api no (TODO, rewrite to api)`
			`# @results HTML (using search portal)`
			`# @stable no (HTML can change)`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 01:04:23 +00:00			`# @parse url, title, publishedDate, thumbnail, embedded`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`#`
			`# @todo rewrite to api`
			`# @todo set content-parameter with correct data`

[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00			`from urllib import urlencode`
			`from HTMLParser import HTMLParser`
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 21:15:46 +00:00			`from lxml import html`
[fix] import 2014-03-24 11:04:07 +00:00			`from searx.engines.xpath import extract_text`
extract publishDate from vimeo 2014-03-18 14:56:22 +00:00			`from dateutil import parser`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# engine dependent config`
			`categories = ['videos']`
			`paging = True`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# search-url`
			`base_url = 'https://vimeo.com'`
			`search_url = base_url + '/search/page:{pageno}?{query}'`

			`# specific xpath variables`
			`url_xpath = './a/@href'`
			`content_xpath = './a/img/@src'`
			`title_xpath = './a/div[@class="data"]/p[@class="title"]/text()'`
			`results_xpath = '//div[@id="browse_content"]/ol/li'`
			`publishedDate_xpath = './/p[@class="meta"]//attribute::datetime'`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 01:04:23 +00:00			`embedded_url = '<iframe data-src="//player.vimeo.com/video{videoid}" ' +\`
			`'width="540" height="304" frameborder="0" ' +\`
			`'webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>'`

[fix] pep/flake8 compatibility 2014-01-20 01:31:20 +00:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# do search-request`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00			`def request(query, params):`
[fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 15:37:56 +00:00			`params['url'] = search_url.format(pageno=params['pageno'],`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`query=urlencode({'q': query}))`

			`# TODO required?`
[fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 15:37:56 +00:00			`params['cookies']['__utma'] =\`
			`'00000000.000#0000000.0000000000.0000000000.0000000000.0'`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00			`return params`

[fix] pep/flake8 compatibility 2014-01-20 01:31:20 +00:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# get response from search-request`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00			`def response(resp):`
			`results = []`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00			`dom = html.fromstring(resp.text)`
[mod] vimeo engine mods 2014-01-11 10:14:46 +00:00
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 21:15:46 +00:00			`p = HTMLParser()`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# parse results`
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 21:15:46 +00:00			`for result in dom.xpath(results_xpath):`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 01:04:23 +00:00			`videoid = result.xpath(url_xpath)[0]`
			`url = base_url + videoid`
[enh] Improved vimeo search engines, and add the configuration in the sample 2014-01-06 21:15:46 +00:00			`title = p.unescape(extract_text(result.xpath(title_xpath)))`
[ehn] Add a 'featured result feature'm putting on top of the reasults ddg definitions and wikipedia (ugly html / css) [ehn] Add a templates for videos, so the thumbnails all have the same side 2014-01-12 17:31:57 +00:00			`thumbnail = extract_text(result.xpath(content_xpath)[0])`
[fix] remove unused imports ++ pep8 2014-03-18 18:24:01 +00:00			`publishedDate = parser.parse(extract_text(`
			`result.xpath(publishedDate_xpath)[0]))`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 01:04:23 +00:00			`embedded = embedded_url.format(videoid=videoid)`
extract publishDate from vimeo 2014-03-18 14:56:22 +00:00
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`# append result`
[fix] pep/flake8 compatibility 2014-01-20 01:31:20 +00:00			`results.append({'url': url,`
			`'title': title,`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00			`'content': '',`
[fix] pep/flake8 compatibility 2014-01-20 01:31:20 +00:00			`'template': 'videos.html',`
extract publishDate from vimeo 2014-03-18 14:56:22 +00:00			`'publishedDate': publishedDate,`
Integrated media in results + Deezer Engine New "embedded" item for the results, allow to give an iframe to display the media directly in the results. Note that the attributes src of the iframes are not set, but instead data-src is set, allowing to only load the iframe when clicked. Deezer engine based on public API (no key). 2015-01-05 01:04:23 +00:00			`'embedded': embedded,`
[fix] pep/flake8 compatibility 2014-01-20 01:31:20 +00:00			`'thumbnail': thumbnail})`
fix vimeo engine and add comments engine generate (Error: None), I don't know why 2014-09-01 15:10:25 +00:00
			`# return results`
[enh] 1st version of vimeo search engine (need improvments) 2014-01-05 21:10:46 +00:00			`return results`