Sumários

Web Crawling

14 novembro 2019, 15:30 Bruno Emanuel Da Graça Martins

  • Motivation and taxonomy of crawlers
  • Basic crawlers and implementation issues
    • BFS vs DFS Traversal
    • Fetching the contents
    • Parsing HTML and other formats
    • Relative vs. absolute URLs
    • URL canonicalization
    • Avoiding spider traps
    • The page repository
    • Concurrent crawlers
  • Universal crawlers
    • Performance and scalability
    • Crawling policies


Web Crawling

14 novembro 2019, 15:30 Bruno Emanuel Da Graça Martins

  • Motivation and taxonomy of crawlers
  • Basic crawlers and implementation issues
    • BFS vs DFS Traversal
    • Fetching the contents
    • Parsing HTML and other formats
    • Relative vs. absolute URLs
    • URL canonicalization
    • Avoiding spider traps
    • The page repository
    • Concurrent crawlers
  • Universal crawlers
    • Performance and scalability
    • Crawling policies


Learning to Rank

14 novembro 2019, 14:00 João Miguel Cordeiro Monteiro

  • Python exercises on Learning to Rank
  • Pointwise L2R approaches using Logistic Regression
  • Pen and paper exercise using the perceptron ranking algorithm


Web Retrieval and Link Analysis (cont.)

8 novembro 2019, 12:30 Bruno Emanuel Da Graça Martins

  • The HITS Algorithm
  • Web Spamming
NOTA: Aula leccionada pelo Dr. Miguel Won.


Web Retrieval and Link Analysis (cont.)

8 novembro 2019, 12:30 Bruno Emanuel Da Graça Martins

  • The HITS Algorithm
  • Web Spamming
NOTA: Aula leccionada pelo Dr. Miguel Won.