ABSTRACT
Isolated silos of scientific research and the growing challenge of information overload limit awareness across the literature and hinder innovation. Algorithmic curation and recommendation, which often prioritize relevance, can further reinforce these informational “filter bubbles.” In response, we describe Bridger, a system for facilitating discovery of scholars and their work. We construct a faceted representation of authors with information gleaned from their papers and inferred author personas, and use it to develop an approach that locates commonalities and contrasts between scientists to balance relevance and novelty. In studies with computer science researchers, this approach helps users discover authors considered useful for generating novel research directions. We also demonstrate an approach for displaying information about authors, boosting the ability to understand the work of new, unfamiliar scholars. Our analysis reveals that Bridger connects authors who have different citation profiles and publish in different venues, raising the prospect of bridging diverse scientific communities.
Supplemental Material
- Jöran Beel and Bela Gipp. 2009. Google Scholar’s ranking algorithm: an introductory overview. In Proceedings of the 12th international conference on scientometrics and informetrics (ISSI’09), Vol. 1. Rio de Janeiro (Brazil), 230–241.Google Scholar
- Joeran Beel, Bela Gipp, Stefan Langer, and Corinna Breitinger. 2016. Paper recommender systems: a literature survey. International Journal on Digital Libraries 17, 4 (2016), 305–338.Google ScholarDigital Library
- Jesús Bobadilla, Fernando Ortega, Antonio Hernando, and Jesús Bernal. 2012. A collaborative filtering approach to mitigate the new user cold start problem. Knowledge-Based Systems 26 (Feb. 2012), 225–238. https://doi.org/10.1016/j.knosys.2011.07.021Google ScholarDigital Library
- Virginia Braun and Victoria Clarke. 2012. Thematic analysis.(2012).Google Scholar
- Ronald S. Burt. [n.d.]. Structural Holes and Good Ideas. 110, 2([n. d.]), 349–399. https://doi.org/10.1086/421787Google ScholarCross Ref
- Katy Börner, Chaomei Chen, and Kevin W. Boyack. 2005. Visualizing knowledge domains. Annual Review of Information Science and Technology 37, 1 (Jan. 2005), 179–255. https://doi.org/10.1002/aris.1440370106Google ScholarCross Ref
- Isabel Cachola, Kyle Lo, Arman Cohan, and Daniel S Weld. 2020. TLDR: Extreme Summarization of Scientific Documents. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 4766–4777.Google ScholarCross Ref
- Arie Cattan, Sophie Johnson, Daniel Weld, Ido Dagan, Iz Beltagy, Doug Downey, and Tom Hope. 2021. SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts. arXiv preprint arXiv:2104.08809(2021).Google Scholar
- Joel Chan, Joseph Chee Chang, Tom Hope, Dafna Shahaf, and Aniket Kittur. 2018. Solvent: A mixed initiative system for finding analogies between research papers. Proceedings of the ACM on Human-Computer Interaction 2, CSCW(2018), 1–21.Google ScholarDigital Library
- Chaomei Chen. 2017. Expert Review. Science Mapping: A Systematic Review of the Literature. Journal of Data and Information Science 2, 2 (2017), 1–40. https://doi.org/10.1515/jdis-2017-0006 00001.Google ScholarCross Ref
- Li Chen, Yonghua Yang, Ningxia Wang, Keping Yang, and Quan Yuan. 2019. How serendipity improves user satisfaction with recommendations? a large-scale user evaluation. In The World Wide Web Conference. 240–250.Google ScholarDigital Library
- Wanyu Chen, Pengjie Ren, Fei Cai, Fei Sun, and Maarten de Rijke. 2020. Improving end-to-end sequential recommendations with intent-aware diversification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 175–184.Google ScholarDigital Library
- M.j. Cobo, A.g. López-Herrera, E. Herrera-Viedma, and F. Herrera. 2011. Science mapping software tools: Review, analysis, and cooperative study among tools. Journal of the American Society for Information Science and Technology 62, 7 (July 2011), 1382–1402. https://doi.org/10.1002/asi.21525Google ScholarDigital Library
- Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, and Daniel Weld. 2020. SPECTER: Document-level Representation Learning using Citation-informed Transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 2270–2282. https://doi.org/10.18653/v1/2020.acl-main.207Google ScholarCross Ref
- Marian Dörk, Nathalie Henry Riche, Gonzalo Ramos, and Susan Dumais. 2012. PivotPaths: Strolling through faceted information spaces. Visualization and Computer Graphics, IEEE Transactions on 18, 12(2012), 2709–2718. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6327277Google ScholarDigital Library
- Alessandro Epasto, Silvio Lattanzi, and Renato Paes Leme. 2017. Ego-Splitting Framework: from Non-Overlapping to Overlapping Clusters. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’17. ACM Press, Halifax, NS, Canada, 145–154. https://doi.org/10.1145/3097983.3098054Google ScholarDigital Library
- Dieter Frey. 1986. Recent research on selective exposure to information. Advances in experimental social psychology 19 (1986), 41–80.Google Scholar
- Katherine Fu, Joel Chan, Jonathan Cagan, Kenneth Kotovsky, Christian Schunn, and Kristin Wood. 2013. The Meaning of Near and Far: The Impact of Structuring Design Databases and the Effect of Distance of Analogy on Design Output. JMD (2013).Google Scholar
- Yingqiang Ge, Shuya Zhao, Honglu Zhou, Changhua Pei, Fei Sun, Wenwu Ou, and Yongfeng Zhang. 2020. Understanding echo chambers in e-commerce recommender systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2261–2270.Google ScholarDigital Library
- Kosa Goucher-Lambert, Joshua T Gyory, Kenneth Kotovsky, and Jonathan Cagan. 2020. Adaptive Inspirational Design Stimuli: Using Design Output to Computationally Search for Stimuli That Impact Concept Generation. Journal of Mechanical Design 142, 9 (2020).Google ScholarCross Ref
- Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. In ACL. https://doi.org/10.18653/v1/2020.acl-main.740Google ScholarCross Ref
- Florian Heimerl, Qi Han, Steffen Koch, and Thomas Ertl. 2016. CiteRivers: Visual Analytics of Citation Patterns. IEEE Transactions on Visualization and Computer Graphics 22, 1 (Jan. 2016), 190–199. https://doi.org/10.1109/TVCG.2015.2467621 Conference Name: IEEE Transactions on Visualization and Computer Graphics.Google ScholarDigital Library
- Tom Hope, Aida Amini, David Wadden, Madeleine van Zuylen, Sravanthi Parasa, Eric Horvitz, Daniel Weld, Roy Schwartz, and Hannaneh Hajishirzi. 2021. Extracting a Knowledge Base of Mechanisms from COVID-19 Papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 4489–4503. https://doi.org/10.18653/v1/2021.naacl-main.355Google ScholarCross Ref
- Tom Hope, Joel Chan, Aniket Kittur, and Dafna Shahaf. [n.d.]. Accelerating Innovation Through Analogy Mining. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (New York, NY, USA, 2017-08-04) (KDD ’17). Association for Computing Machinery, 235–243. https://doi.org/10.1145/3097983.3098038Google ScholarDigital Library
- Tom Hope, Jason Portenoy, Kishore Vasan, Jonathan Borchardt, Eric Horvitz, Daniel S Weld, Marti A Hearst, and Jevin West. 2020. SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search. In EMNLP.Google Scholar
- Tom Hope, Ronen Tamari, Hyeonsu Kang, Daniel Hershcovich, Joel Chan, Aniket Kittur, and Dafna Shahaf. 2021. Scaling Creative Inspiration with Fine-Grained Functional Facets of Product Ideas. arXiv e-prints (2021), arXiv–2102.Google Scholar
- Sanjay Kairam, Nathalie Henry Riche, Steven Drucker, Roland Fernandez, and Jeffrey Heer. 2015. Refinery: Visual Exploration of Large, Heterogeneous Networks through Associative Browsing. Computer Graphics Forum (Proc. EuroVis) 34, 3 (2015). http://idl.cs.washington.edu/papers/refineryGoogle Scholar
- Marius Kaminskas and Derek Bridge. 2016. Diversity, serendipity, novelty, and coverage: a survey and empirical analysis of beyond-accuracy objectives in recommender systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 7, 1(2016), 1–42.Google ScholarDigital Library
- Lanu Kim, Jevin D West, and Katherine Stovel. 2017. Echo Chambers in Science?. In American Sociological Association.Google Scholar
- Aniket Kittur, Lixiu Yu, Tom Hope, Joel Chan, Hila Lifshitz-Assaf, Karni Gilon, Felicia Ng, Robert E. Kraut, and Dafna Shahaf. [n.d.]. Scaling up analogical innovation with crowds and AI. 116, 6([n. d.]), 1870–1877. https://doi.org/10.1073/pnas.1807185116 Publisher: National Academy of Sciences Section: Social Sciences.Google ScholarCross Ref
- Joel Klinger, Juan Mateos-Garcia, and Konstantinos Stathoulopoulos. 2020. A narrowing of AI research?arXiv preprint arXiv:2009.10385(2020).Google Scholar
- Xuan Nhat Lam, Thuc Vu, Trong Duc Le, and Anh Duc Duong. 2008. Addressing cold-start problem in recommendation systems. In Proceedings of the 2nd international conference on Ubiquitous information management and communication(ICUIMC ’08). Association for Computing Machinery, New York, NY, USA, 208–211. https://doi.org/10.1145/1352793.1352837Google ScholarDigital Library
- Y. Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, M. Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. ArXiv abs/1907.11692(2019).Google Scholar
- Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Hajishirzi. 2018. Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium. https://doi.org/10.18653/v1/D18-1360Google ScholarCross Ref
- Miller McPherson, Lynn Smith-Lovin, and James M Cook. 2001. Birds of a feather: Homophily in social networks. Annual review of sociology 27, 1 (2001), 415–444.Google Scholar
- Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing Order into Text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Barcelona, Spain, 404–411. https://www.aclweb.org/anthology/W04-3252Google Scholar
- Fionn Murtagh and Pierre Legendre. 2014. Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?Journal of classification 31, 3 (2014), 274–295.Google Scholar
- Sonia K. Murthy, Daniel King, Tom Hope, Daniel Weld, and Doug Downey. 2021. Towards personalized descriptions of scientific concepts. In The Fifth Widening Natural Language Processing Workshop at EMNLP.Google Scholar
- Rahul Nadkarni, David Wadden, Iz Beltagy, Noah A Smith, Hannaneh Hajishirzi, and Tom Hope. 2021. Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study. AKBC (2021).Google Scholar
- Shashi Narayan, Shay B. Cohen, and Mirella Lapata. 2018. Ranking Sentences for Extractive Summarization with Reinforcement Learning. In NAACL-HLT.Google Scholar
- Arpit Narechania, Alireza Karduni, Ryan Wesslen, and Emily Wall. 2021. vitaLITy: Promoting Serendipitous Discovery of Academic Literature with Transformers & Visual Analytics. IEEE Transactions on Visualization and Computer Graphics (2021), 1–1. https://doi.org/10.1109/TVCG.2021.3114820 Conference Name: IEEE Transactions on Visualization and Computer Graphics.Google ScholarDigital Library
- Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar. 2019. ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing. undefined (2019). /paper/ScispaCy%3A-Fast-and-Robust-Models-for-Biomedical-Neumann-King/de28ec1d7bd38c8fc4e8ac59b6133800818b4e29Google Scholar
- Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web. 677–686.Google ScholarDigital Library
- Mathias Wullum Nielsen and Jens Peter Andersen. 2021. Global citation inequality is on the rise. Proceedings of the National Academy of Sciences 118, 7 (2021).Google ScholarCross Ref
- Eli Pariser. 2011. The filter bubble: What the Internet is hiding from you. Penguin UK.Google ScholarDigital Library
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 2011. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 12 (2011), 2825–2830.Google Scholar
- Jason Portenoy, Jessica Hullman, and Jevin D. West. 2017. Leveraging Citation Networks to Visualize Scholarly Influence Over Time. Frontiers in Research Metrics and Analytics 2 (Nov. 2017), 8. https://doi.org/10.3389/frma.2017.00008Google ScholarCross Ref
- Jason Portenoy and Jevin D West. 2020. Constructing and evaluating automated literature review systems. Scientometrics 125(2020), 3233–3251.Google ScholarDigital Library
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In EMNLP/IJCNLP. https://doi.org/10.18653/v1/D19-1410Google ScholarCross Ref
- Ariel S. Schwartz and Marti A. Hearst. 2002. A simple algorithm for identifying abbreviation definitions in biomedical text. In Biocomputing 2003. WORLD SCIENTIFIC, 451–462. https://doi.org/10.1142/9789812776303_0042Google ScholarCross Ref
- Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June (Paul) Hsu, and Kuansan Wang. [n.d.]. An Overview of Microsoft Academic Service (MAS) and Applications. ACM Press, 243–246. https://doi.org/10.1145/2740908.2742839Google ScholarDigital Library
- Shivashankar Subramanian, Daniel King, Doug Downey, and Sergey Feldman. 2021. S2AND: A Benchmark and Evaluation System for Author Name Disambiguation. arXiv preprint arXiv:2103.07534(2021).Google Scholar
- Don R Swanson and Neil R Smalheiser. 1996. Undiscovered Public Knowledge: A Ten-Year Update.. In KDD. 295–298.Google Scholar
- Jie Tang, Sen Wu, Jimeng Sun, and Hang Su. 2012. Cross-domain collaboration recommendation. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1285–1293.Google ScholarDigital Library
- Xuli Tang, Xin Li, Ying Ding, Min Song, and Yi Bu. 2020. The pace of artificial intelligence innovations: Speed, talent, and trial-and-error. Journal of Informetrics 14, 4 (2020), 101094.Google ScholarCross Ref
- Chun-Hua Tsai and Peter Brusilovsky. 2018. Beyond the ranked list: User-driven exploration and diversification of social recommendation. In 23rd international conference on intelligent user interfaces. 239–250.Google ScholarDigital Library
- Chun-Hua Tsai, Jukka Huhtamäki, Thomas Olsson, and Peter Brusilovsky. 2020. Diversity Exposure in Social Recommender Systems: A Social Capital Theory Perspective. work 5, 11 (2020), 22.Google Scholar
- Daril Vilhena, Jacob Foster, Martin Rosvall, Jevin West, James Evans, and Carl Bergstrom. [n.d.]. Finding Cultural Holes: How Structure and Culture Diverge in Networks of Scholarly Communication. 1 ([n. d.]), 221–238. https://doi.org/10.15195/v1.a15Google ScholarCross Ref
- David Wadden, Ulme Wennberg, Yi Luan, and Hannaneh Hajishirzi. 2019. Entity, Relation, and Event Extraction with Contextualized Span Representations. In EMNLP/IJCNLP. https://doi.org/10.18653/v1/D19-1585Google ScholarCross Ref
- Huaiyu Wan, Yutao Zhang, Jing Zhang, and Jie Tang. 2019. Aminer: Search and mining of academic social networks. Data Intelligence 1, 1 (2019), 58–76.Google ScholarCross Ref
- Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Darrin Eide, Yuxiao Dong, Junjie Qian, Anshul Kanakia, Alvin Chen, and Richard Rogahn. 2019. A Review of Microsoft Academic Services for Science of Science Studies. Frontiers in Big Data 2(2019). https://doi.org/10.3389/fdata.2019.00045 Publisher: Frontiers.Google ScholarCross Ref
- Ningxia Wang, Li Chen, and Yonghua Yang. 2020. The Impacts of Item Features and User Characteristics on Users’ Perceived Serendipity of Recommendations. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization. 266–274.Google ScholarDigital Library
- Wei Wang, Jiaying Liu, Zhuo Yang, Xiangjie Kong, and Feng Xia. 2019. Sustainable collaborator recommendation based on conference closure. IEEE Transactions on Computational Social Systems 6, 2 (2019), 311–322.Google ScholarCross Ref
- Jevin D West, Ian Wesley-Smith, and Carl T Bergstrom. 2016. A recommendation system based on hierarchical clustering of an article-level citation network. IEEE Transactions on Big Data 2, 2 (June 2016), 113–123. https://doi.org/10.1109/TBDATA.2016.2541167Google ScholarCross Ref
- Mark Wilhelm, Ajith Ramanathan, Alexander Bonomo, Sagar Jain, Ed H Chi, and Jennifer Gillenwater. 2018. Practical diversified recommendations on youtube with determinantal point processes. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 2165–2173.Google ScholarDigital Library
- Pengfei Zhao and Dik Lun Lee. 2016. How much novelty is relevant? it depends on your curiosity. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 315–324.Google ScholarDigital Library
- Ziwei Zhu, Jianling Wang, and James Caverlee. 2020. Measuring and Mitigating Item Under-Recommendation Bias in Personalized Ranking Systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 449–458.Google ScholarDigital Library
Index Terms
- Bursting Scientific Filter Bubbles: Boosting Innovation via Novel Author Discovery
Recommendations
Acknowledgments in scientific publications: Presence in Spanish science and text patterns across disciplines
The acknowledgments in scientific publications are an important feature in the scholarly communication process. This research analyzes funding acknowledgment presence in scientific publications and introduces a novel approach for discovering text ...
Author name disambiguation in scientific collaboration and mobility cases
Scientists generally do scientific collaborations with one another and sometimes change their affiliations, which leads to scientific mobility. This paper proposes a recursive reinforced name disambiguation method that integrates both coauthorship and ...
Scientific impact of an author and role of self-citations
AbstractIn bibliometric and scientometric research, the quantitative assessment of scientific impact has boomed over the past few decades. Citations, being playing a major role in enhancing the impact of researchers, have become a very significant part of ...
Comments