ABSTRACT
We estimate that nearly one third of news articles contain references to future events. While this information can prove crucial to understanding news stories and how events will develop for a given topic, there is currently no easy way to access this information. We propose a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which we call ranking related news predictions. In this paper, we formally define this task and propose a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity. Through extensive evaluations using a corpus consisting of 1.8 millions news articles and 6,000 manually judged relevance pairs, we show that our approach is able to retrieve a significant number of relevant predictions related to a given topic.
- O. Alonso, M. Gertz, and R. Baeza-Yates. On the value of temporal information in information retrieval. ACM SIGIR Forum, 41(2):35--41, 2007. Google ScholarDigital Library
- A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On smoothing and inference for topic models. In Proceedings of UAI'2009, 2009. Google ScholarDigital Library
- R. Baeza-Yates. Searching the future. In Proceedings of ACM SIGIR workshop MF/IR 2005, 2005.Google Scholar
- K. Balog, L. Azzopardi, and M. de Rijke. A language modeling framework for expert finding. Inf. Process. Manage., 45(1):1--19, 2009. Google ScholarDigital Library
- K. Berberich, S. Bedathur, O. Alonso, and G. Weikum. A language modeling approach for temporal information needs. In Proceedings of ECIR'2010, 2010. Google ScholarDigital Library
- R. Blanco and H. Zaragoza. Finding support sentences for entities. In Proceeding of SIGIR'2010, 2010. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, March 2003. Google ScholarCross Ref
- J. Canton. The Extreme Future: The Top Trends That Will Reshape the World in the Next 20 Years. Plume, 2007.Google ScholarDigital Library
- K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. J. Mach. Learn. Res., 7:551--585, 2006. Google ScholarDigital Library
- G. Demartini, A. P. Vries, T. Iofciu, and J. Zhu. Overview of the INEX 2008 Entity Ranking Track. 2009.Google ScholarDigital Library
- F. Diaz and R. Jones. Using temporal profiles of queries for precision prediction. In Proceedings of SIGIR'2004, 2004. Google ScholarDigital Library
- T. L. Griffiths. Finding scientific topics. Proceedings of the National Academy of Science, 101:5228--5235, Jan. 2004.Google ScholarCross Ref
- A. Jatowt, K. Kanazawa, S. Oyama, and K. Tanaka. Supporting analysis of future-related information in news archives and the web. In Proceedings of JCDL'2009, 2009. Google ScholarDigital Library
- T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of KDD'2002, 2002. Google ScholarDigital Library
- P. J. Kalczynski and A. Chou. Temporal document retrieval model for business news archives. Inf. Process. Manage., 41, 2005. Google ScholarDigital Library
- N. Kanhabua and K. Nørvåg. Determining time of queries for re-ranking search results. In Proceedings of ECDL'2010, 2010. Google ScholarDigital Library
- N. Lathia, S. Hailes, L. Capra, and X. Amatriain. Temporal diversity in recommender systems. In Proceeding of SIGIR'2010, 2010. Google ScholarDigital Library
- X. Li and W. B. Croft. Time-based language models. In Proceedings of CIKM'2003, 2003. Google ScholarDigital Library
- X. Li and W. B. Croft. Improving novelty detection for general topics using sentence level information patterns. In Proceedings of CIKM'2006, 2006. Google ScholarDigital Library
- T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, 2009. Google ScholarDigital Library
- C. Macdonald and I. Ounis. Searching for expertise: Experiments with the voting model. Comput. J., 52(7):729--748, 2009. Google ScholarDigital Library
- M. Matthews, P. Tolchinsky, R. Blanco, J. Atserias, P. Mika, and H. Zaragoza. Searching through time in the new york times. In Bridging Human-Computer Interaction and Information Retrieval, 2010.Google Scholar
- D. Metzler, R. Jones, F. Peng, and R. Zhang. Improving search relevance for implicitly temporal queries. In Proceedings of SIGIR'2009, 2009. Google ScholarDigital Library
- V. Murdock. Exploring Sentence Retrieval. VDM Verlag Dr. Mueller e.K., 2008.Google Scholar
- M. J. Pazzani and D. Billsus. The adaptive web. pages 325--341, 2007.Google Scholar
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of SIGIR'1994, 1994. Google ScholarDigital Library
- R. P. Schumaker and H. Chen. Textual analysis of stock market prediction using breaking financial news: The azfin text system. ACM Trans. Inf. Syst., 27:12:1--12:19, March 2009. Google ScholarDigital Library
- S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of ICML'2007, 2007. Google ScholarDigital Library
- Y. Song, S. Pan, S. Liu, M. X. Zhou, and W. Qian. Topic and keyword re-ranking for lda-based topic modeling. In Proceeding of CIKM'2009, 2009. Google ScholarDigital Library
- M. Surdeanu, M. Ciaramita, and H. Zaragoza. Learning to rank answers on large online qa collections. In Proceedings of ACL-08: HLT, 2008.Google Scholar
- X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In Proceedings of KDD'2006, 2006. Google ScholarDigital Library
- X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of SIGIR'2006, 2006. Google ScholarDigital Library
- D. Wu, G. P. C. Fung, J. X. Yu, and Q. Pan. Stock prediction: an event-driven approach based on bursty keywords. Frontiers of Computer Science in China, 3(2):145--157, 2009.Google ScholarCross Ref
- Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In Proceedings of SIGIR'2007, 2007. Google ScholarDigital Library
- H. Zaragoza, H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi. Ranking very many typed entities on wikipedia. In Proceedings of CIKM'2007, 2007. Google ScholarDigital Library
- T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of ICML'2004, 2004. Google ScholarDigital Library
Index Terms
- Ranking related news predictions
Recommendations
Estimating query difficulty for news prediction retrieval
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementNews prediction retrieval has recently emerged as the task of retrieving predictions related to a given news story (or a query). Predictions are defined as sentences containing time references to future events. Such future-related information is ...
News video story sentiment classification and ranking
ICME '11: Proceedings of the 2011 IEEE International Conference on Multimedia and ExpoIn this paper, we present a novel approach for news video story sentiment analysis. Two research challenges are addressed: news video story sentiment classification and ranking. For classification, a graph based semi-supervised learning approach is ...
Ranking Through Clustering: An Integrated Approach to Multi-Document Summarization
Multi-document summarization aims to create a condensed summary while retaining the main characteristics of the original set of documents. Under such background, sentence ranking has hitherto been the issue of most concern. Since documents often cover a ...
Comments