Sumários

Web Crawling + Similarity Search

19 novembro 2018, 17:00 Bruno Emanuel Da Graça Martins

  • Preferential Crawlers
    • PageRank to Prioritize Crawls
    • Focused Crawlers and Context-Focused Crawlers
    • Topical Crawlers
    • Adaptative Crawlers
  • Crawler Ethics, Conflicts and the Robot Exclusion Protocol
  • Introdudciton to Similarity Search and Locality Sensitive Hashing


Lab 06: Link Analysis

16 novembro 2018, 11:00 Danielle Caled Vieira

Pen and Paper Exercises:

  • PageRank
  • HITS algorithm
Support to the Course Project


Lab 06: Link Analysis

16 novembro 2018, 09:30 Danielle Caled Vieira

Pen and Paper Exercises:

  • PageRank
  • HITS algorithm
Support to the Course Project


Web Crawling

16 novembro 2018, 08:00 Bruno Emanuel Da Graça Martins

  • Motivation and Taxonomy of Crawlers 
  • Basic Crawlers and Implementation Issues
    • Content Fetching
    • Parsing HTML and Other Formats
    • Relative vs. Absolute URLs and URL canonicalization
    • Spider Traps
    • Implementing the Local Page Repository
    • Concurrency
  • Universal Crawlers
    • Performance and Scalability
    • Coverage vs. Freshness


Link Analysis and Web Retrieval

15 novembro 2018, 15:30 João Miguel Cordeiro Monteiro

  • Python exercises about the PageRank and Web Information Retrieval
  • Pen and paper exercises using the PageRank and the HITS algorithms
  • Support to the Course Project