Sumários
Web Crawling + Similarity Search
19 novembro 2018, 17:00 • Bruno Emanuel Da Graça Martins
- Preferential Crawlers
- PageRank to Prioritize Crawls
- Focused Crawlers and Context-Focused Crawlers
- Topical Crawlers
- Adaptative Crawlers
- Crawler Ethics, Conflicts and the Robot Exclusion Protocol
- Introdudciton to Similarity Search and Locality Sensitive Hashing
Lab 06: Link Analysis
16 novembro 2018, 11:00 • Danielle Caled Vieira
Pen and Paper Exercises:
- PageRank
- HITS algorithm
Lab 06: Link Analysis
16 novembro 2018, 09:30 • Danielle Caled Vieira
Pen and Paper Exercises:
- PageRank
- HITS algorithm
Web Crawling
16 novembro 2018, 08:00 • Bruno Emanuel Da Graça Martins
- Motivation and Taxonomy of Crawlers
- Basic Crawlers and Implementation Issues
- Content Fetching
- Parsing HTML and Other Formats
- Relative vs. Absolute URLs and URL canonicalization
- Spider Traps
- Implementing the Local Page Repository
- Concurrency
- Universal Crawlers
- Performance and Scalability
- Coverage vs. Freshness
Link Analysis and Web Retrieval
15 novembro 2018, 15:30 • João Miguel Cordeiro Monteiro
- Python exercises about the PageRank and Web Information Retrieval
- Pen and paper exercises using the PageRank and the HITS algorithms
- Support to the Course Project