de
en
Schliessen
Detailsuche
Bibliotheken
Projekt
Impressum
Datenschutz
zum Inhalt
Detailsuche
Schnellsuche:
OK
Ergebnisliste
Titel
Titel
Inhalt
Inhalt
Seite
Seite
Im Dokument suchen
Reuter, Timo: Event-based stream classification framework – a supervised clustering approach for social media applications. 2015
Inhalt
Titlepage
Abstract
Table of Contents
List of Figures
List of Tables
I Introduction
1 Introduction
1.1 Motivating Use Cases
1.2 Goal and Challenges
1.2.1 Clustering of Large Datasets
1.2.2 Clustering of Continuous Data Streams
1.2.3 Classifying of Concept Drifting Time Series Data
1.2.4 Noisy Data
1.3 Research Contributions of this Dissertation
1.4 Structure and Outline of this Thesis
2 Fundamentals of This Work
2.1 From the Categorization Idea to Event Clustering: Definition and Development
2.1.1 Categorization in Philosophy — The Classical View
2.1.2 Categorization in Cognitive Psychology — The Prototype View
2.1.3 Event Clustering Characterization
2.2 Characterization of an Event
2.2.1 Event Definition in Philosophy
2.2.2 Event Definition in Cognition and Psychology
2.2.3 Events in Recent Literature of Machine Learning and Information Retrieval
2.2.4 Discussion and Definition
3 Foundations and Related Work
3.1 Classification
3.2 Clustering
3.3 Distance Functions
3.4 Knowledge-based Clustering
3.5 Large-scale Processing and Scalability
3.5.1 Task-based Techniques
3.5.2 Data-based Techniques
3.5.3 Candidate Retrieval
3.5.4 Stream Data
3.6 New Event Detection
3.6.1 Statistical Approaches
3.6.2 Unsupervised Approaches
3.6.3 Supervised Approaches
3.7 Event Identification and Detection
4 Event Clustering Dataset
4.1 Creation and Collection of the Dataset
4.1.1 Fetching of Metadata
4.1.2 Fetching of Uploader Information
4.1.3 Fetching of Picture Files
4.2 Labeling of the Data — Creation of the Gold Standard
4.2.1 Usage of Social Event Calendars for Data Labeling
4.2.2 Fetching of Event Information from Upcoming and Last.fm
4.2.3 Labeling Process
4.3 Dataset Statistics
4.3.1 Data Quality
4.3.2 License Constraints
4.3.3 Data Point Distribution
4.3.4 Dataset Representation Format and Schema
4.4 Applications of the Dataset
4.4.1 MediaEval 2013
4.4.2 Further Applications
4.4.3 Evaluation Proposal for Comparability
II Supervised Single-Pass Clustering with the Event-based Stream Classification Framework
5 System Description of the Stream Classification Framework for a Single-Pass Setting
5.1 Problem Statement
5.2 Overview of the Clustering Framework
5.3 Candidate Retrieval Strategies
5.3.1 Measurements for Performance and Effectiveness
5.3.2 Candidate Retrieval Strategies
5.4 Pairwise Feature Extraction
5.4.1 Temporal Features
5.4.2 Geographical Features
5.4.3 Textual Features
5.4.4 Document-Event Similarity Vector
5.5 Scoring and Ranking — Learning Similarity Functions
5.5.1 Problem Formulation using a Support Vector Machine
5.5.2 Problem Formulation as a Decision Tree Classification Problem
5.6 New Event Detection
6 Experimental Setup and Results of the Supervised Single-Pass Classification
6.1 Definition of Evaluation Measures
6.2 Optimizing Candidate Retrieval
6.2.1 Experimental Settings
6.2.2 Results
6.2.3 Conclusion
6.3 Learning Similarity Functions
6.3.1 Experimental Settings
6.3.2 Results
6.3.3 Conclusion
6.4 New Event Detection
6.4.1 Experimental Settings
6.4.2 Results
6.4.3 Conclusion
6.5 Framework as a Whole — Results and Comparison
6.5.1 Training and Optimization of the System Parts
6.5.2 Baselines
6.5.3 Overall System Performance
6.6 Conclusions
III Multi-pass Stream Clustering
7 System Description of the Stream Classification Framework for a Multi-Pass Setting
7.1 Problem Statement
7.2 System Overview
7.3 Multi-pass Requirements and Challenges
7.3.1 Number of Passes
7.3.2 Influence on Framework Settings
7.4 Multi-pass Strategies
8 Experimental Setup and Results of Supervised Multi-Pass Clustering
8.1 Analysis of First-Pass Strategies
8.2 Gold Standard Preparation for the Second Pass
8.2.1 Quality Issues in the Preparation Process
8.2.2 Creation of the Gold Standard for the Second Pass
8.3 Optimization of the Classification Framework Steps for the Second Pass
8.3.1 Candidate Retrieval
8.3.2 Features for Similarity Function Learning and New Event Detection
8.4 Clustering Framework in Two-Pass Mode — Optimization
8.4.1 Exhaustive Search for Optimal Features in Scoring, Ranking, and New Event Detection
8.4.2 Results of the Exhaustive Search
8.4.3 Optimization of Candidate Retrieval Strategy
8.5 Results of the Clustering Framework used in Two-Pass Mode
8.6 Conclusions
IV Concluding Remarks
9 Remarks and Comparison of Clustering Approaches
9.1 Prerequisites for Event Clustering
9.2 Reflection on Multi-Pass Clustering in a Stream-based Setting
9.3 Comparison with Other Approaches
10 Conclusion
V Appendix
Glossary
Acronyms
Bibliography