CIRSS research activities were very well represented at this year's American Society for Information Science and Technology (ASIS&T) annual meeting in New Orleans LA, October 9-12 2011. CIRSS faculty, affiliates, staff and students presented five papers (including this year's best paper winner), three panels, two posters, and a keynote workshop address.
Disciplinary Reach: Investigating the Impact of Dataset Reuse in the Earth Sciences
In the realm of scholarly communication, scientific datasets are becoming more widely recognized for their scholarly and reuse value. However, given the investment toward maintaining and storing research data for long-term access, there is no clear strategy or metric for determining the reuse of research datasets. This study proposes a novel approach to track use and measure the impact of publically accessible datasets in scholarly publications through disciplinary reach- the number of unique journals and related subject categorizations in which articles are published. Using affiliated publication(s), described by the author as the works identified by the dataset creator or curator related to a dataset, the principles underlying the bibliometric technique of citation analysis are leveraged and applied. Preliminary results show that affiliated publications are primarily in physical science and multidisciplinary journals, indicating these earth datasets may have an impact on a number of different research areas. Continued refinement of these approaches, measures, and the design will serve to broaden our understanding of the reuse potential of scientific data and their influence on advancing scholarship.
Best Paper Winner: Building Topic Models in a Federated Digital Library Through Selective Document Exclusion
Miles Efron, Peter Organisciak and Katrina Fenlon
Building topic models in federated digital collections presents numerous challenges due to metadata inconsistencies. The quality of topical metadata is difficult to ascertain and is interspersed with often irrelevant administrative metadata. In this study, we propose a way to improve topic modeling in large collections by identifying documents that convey only weak topical information. These documents are ignored when training topic models. Their topical associations are instead inferred model training. A method is outlined for identifying weakly topical documents by defining runs of similar documents in a collection. In preliminary evaluation using a corpus from the Institute of Museum and Library Services Digital Collections and Content aggregation, results show an increase in coherence among words in topics. In showing this, we demonstrate that it may be beneficial to induce topic models using less, higher-quality data.
The Analytic Potential of Scientific Data: Understanding Re-use Value
Carole L. Palmer, Nicholas M. Weber and Melissa H. Cragin
While problems related to the curation and preservation of scientific data are receiving considerable attention from the information science and digital repository communities, relatively little progress has been made on approaches for evaluating the value of data to inform investment in acquisition, curation, and preservation. Adapting Birger Hjorland's concept of the "epistemological potential" of documents, we assert that analytic potential, or the value of data for analysis beyond its original use, should guide development of data collections for repositories aimed at supporting research. Three key aspects of the analytic potential of data are identified and discussed: potential user communities, preservation readiness, and fit for purpose. Based on evidence from research from the Data Conservancy initiative, we demonstrate how the analytic potential of data can be determined and applied to build large-scale data collections suited for grand challenge science.
A Framework for Applying the Concept of Significant Properties to Datasets
Simone Sacchi, Karen M. Wickett, Allen H. Renear and David S. Dubin
The concept of significant properties, properties that must be identified and preserved in any successful digital object preservation, is now common in data curation. Although this notion has clearly demonstrated its usefulness in cultural heritage domains its application to the preservation of scientific datasets is not as well developed. One obstacle to this application is that the familiar preservation models are not sufficiently explicit to identify the relevant entities, properties, and relationships involved in dataset preservation. We present a logic-based formal framework of dataset concepts that provides the levels of abstraction necessary to identify and correctly assign significant properties to their appropriate entities. A unique feature of this model is that recognizes that a typed symbol structure is a unique requirement for datasets, but not for other information objects.
Are Collections Sets?
Karen M. Wickett, Allen H. Renear and Jonathan Furner
The concept of a collection plays key roles in library, museum, and archival practice, and is arguably fundamental to information organization systems in general. Locating collections concepts in a reasonably robust ontology should have a number of practical advantages, including revealing inferencing opportunities on the one hand, and supporting consistency and coherence in system design on the other. However, although practices involving collections have been studied empirically there has been surprisingly little attention given to the formal analysis of the concept itself, or related notions like collection membership. With this paper we hope to convene that discussion, beginning with the question: Are collections sets? We consider in detail the substantial arguments against collections being a kind of set, but recognize that at least one version of that claim, one based on considerations from Guarino and Welty's Ontology evaluation rules, cannot be ruled out. We recognize though that ontology decisions, whether practical or theoretical, ultimately come down to weighing competing considerations and not decisive formal arguments. Any conclusions therefore must await the development of alternative theories in subsequent papers. We invite the information science community to join us in this effort.
Shaking it Up: Embracing New Methods for Publishing, Finding, Discussing, and Measuring Our Research Output
Alex Garnett, Kim Holmberg, Christina Pikas, Heather Piwowar, Jason Priem and Nicholas Weber
The scholarly communication ecosystem is changing. Scholars produce and publish a wider range of products than ever before, and scholars and others increasingly interact with these diverse products in new ways within the online ecosystem. The widespread availability of research products and interaction paths is informing new methods for finding, discussing, measuring, and rewarding diverse types of research output. Some research fields are adopting these new methods faster than supporting tools, processes, and policies can keep up. In other fields the changes are happening very slowly - perhaps at the expense of accelerated progress and impact. We have assembled a panel of information science researchers who both study and implement many of these new ways of doing research. Together with attendees of the session (you!), we will consider several new methods of scholarly communication, highlight some of their strengths and drawbacks, and discuss how they play out today in the field of information science. The session will itself follow a nontraditional format. We will begin with an out-of-your-seat and into-the-action icebreaker to capture audience-driven opinions of several fundamental issues behind these changes. Panelists will then briefly highlight several of the new approaches, including motivation, evidence of benefit (or lack thereof), and how the new method is or could make a difference in information science research. We encourage audience members to document their thoughts on these points during the panelist presentations. Audience notes will be summarized in a poster within the Interactive Showcase later in the conference. We hope this panel will inspire conversation about the ways these new approaches may impact how we study scholarly communication, as well as how we participate in scholarly communication ourselves.
Sharing Data: Practices, Barriers, and Incentives
Carol Tenopir, Jeffrey van der Hoeven, Carole L. Palmer, Jim Malone and Priyanki Sinha
Bringing together a panel of researchers who have conducted surveys regarding current data sharing practices and scientific perceptions of it, this paper addresses findings from surveys including the PARSE Insight survey, DataONE survey, Data Conservancy/University of Illinois and Purdue interviews, as well as a survey and interviews of scientists in the Southeast US done for USGS. The paper analyzes the findings of these surveys and interviews and discusses the advantages of data sharing. It addresses the varying degrees of data sharing and data hoarding and insight regarding the sharing of data among respondents. It also touches on concerns of those who are reluctant to share data and the role the development of cyberinfrastructure will play in future data sharing. The surveys and in-depth interviews discussed in this panel will help information scientists and system designers understand the current practices, barriers to data sharing, and needs of scientists into the future. Inculcating a culture of data sharing and curation requires first understanding the motivations and concerns of the scientists who collect and use research data.
The Janus Panels: Looking Back in Order to Look Forward
Robert Williams and Kathryn LaBarre
This panel session consists of two parts: "I remember ADI/ASIS/ASIS&T" and "What I want ASIST to be in 2037." In the first part, selected ASIS&T members who have held membership for at least 25 years will briefly talk about their favorite/most memorable moments in ASIS&T. Former presidents and major award winners meeting the 25 years or more membership criterion will be given precedence for the short presentations. The following members with 25+ years who have agreed to present in this part are: Samantha Hastings, Trudi Hahn, Toni Carbo, Ralf Shaw, Michael Buckland, and Chuck Ben-Ami Lipetz. In the second part, selected ASIS&T members who have been members less than 5 years will briefly tell us what they want ASIS&T to be like, to do, to represent, etc. when it turns 100 in 2037.
WORKSHOP KEYNOTE ADDRESS:
Interdisciplinary information work: Concepts and practices
Keynote address for workshop Where Your World Meets Mine: Information Used Across Domains (SIG USE)
Semi-automated Collection Evaluation for Large-scale Aggregations
Katrina Fenlon, Peter Organisciak, Jacob Jett and Miles Efron
Library and museum digital collections are increasingly aggregated at various levels. Large-scale aggregations, often characterized by heterogeneous or messy metadata, pose unique and growing challenges to aggregation administrators not only in facilitating end-user discovery and access, but in performing basic administrative and curatorial tasks in a scalable way, such as finding messy data and determining the overall topical landscape of the aggregation. This poster describes early findings on using statistical text analysis techniques to improve the scalability of an aggregation development workflow for a large-scale aggregation. These techniques hold great promise for automating historically labor-intensive evaluative aspects of aggregation development and form the basis for the development of an aggregators dashboard. The aggregators dashboard is planned as a statistical text-analysis-driven tool for supporting large-scale aggregation development and maintenance, through multifaceted, automatic visualization of an aggregations metadata quality and topical coverage. The administrators dashboard will support principled yet scalable aggregation development.
Shaken and Stirred: ASIS&T 2011 Attendee Reactions to Shaking It Up: Embracing New Methods for Publishing, Finding, Discussing and Measuring Our Research Output
Alex Garnett, University of Victoria; Kim Holmberg, Abo Akademi University; Christina Pikas, University of Maryland; Heather Piwowar, National Evolutionary Synthesis Center; Jason Priem, University of North Carolina; and Nicholas Weber, University of Illinois
What does the Information Science community think about new, open methods for publishing, finding, discussing, and measuring our research output? This poster will summarize audience member participation and reaction to an ASIS&T 2011 panel discussing these issues. Reaction data will consist of several Likert-scale and open-ended responses. The data will be collected only a day or two before the poster is displayed: classification and visualization will be done openly to accomplish a rapid summary of the data. The tight timeline and attendees-as-data-source will heighten the relevance of these exploratory results. Likert-scale response distributions will be displayed in dot-plots to facilitate additional Write-On-The-Poster contributions from poster-viewers, further increasing engagement. Through this process we hope to raise awareness of these new open methods, discuss their strengths and weaknesses for the Information Science community, experiment with new methods for face-to-face group scholarly communication, and build community.