Sumários
Web Crawling
14 novembro 2019, 15:30 • Bruno Emanuel Da Graça Martins
- Motivation and taxonomy of crawlers
- Basic crawlers and implementation issues
- BFS vs DFS Traversal
- Fetching the contents
- Parsing HTML and other formats
- Relative vs. absolute URLs
- URL canonicalization
- Avoiding spider traps
- The page repository
- Concurrent crawlers
- Universal crawlers
- Performance and scalability
- Crawling policies
Web Crawling
14 novembro 2019, 15:30 • Bruno Emanuel Da Graça Martins
- Motivation and taxonomy of crawlers
- Basic crawlers and implementation issues
- BFS vs DFS Traversal
- Fetching the contents
- Parsing HTML and other formats
- Relative vs. absolute URLs
- URL canonicalization
- Avoiding spider traps
- The page repository
- Concurrent crawlers
- Universal crawlers
- Performance and scalability
- Crawling policies
Learning to Rank
14 novembro 2019, 14:00 • João Miguel Cordeiro Monteiro
- Python exercises on Learning to Rank
- Pointwise L2R approaches using Logistic Regression
- Pen and paper exercise using the perceptron ranking algorithm
Web Retrieval and Link Analysis (cont.)
8 novembro 2019, 12:30 • Bruno Emanuel Da Graça Martins
- The HITS Algorithm
- Web Spamming
Web Retrieval and Link Analysis (cont.)
8 novembro 2019, 12:30 • Bruno Emanuel Da Graça Martins
- The HITS Algorithm
- Web Spamming