Scientists have access to more information in electronic form than
at any other time in history. Despite this increased access scientists report
information overload because the quantity of information far exceeds human
processing capacity. This talk will draw examples from health care and the
environment to demonstrate how informatics can help to address these grand
challenges. Specifically, the talk will introduce the Claim Framework that
reflects how authors across the sciences communicate findings in empirical
studies. The Framework captures different levels of evidence by
differentiating between explicit and implicit claims, and by capturing underspecified
claims such as correlations, comparisons, and observations. The
results from 29 full-text articles show that authors report fewer than 7.84% of
scientific claims in an abstract, thus revealing the urgent need for text mining
systems to consider the full-text of an article rather than just the abstract. The
results also show that authors typically report explicit claims (77.12%) rather
than an observations (9.23%), correlations (5.39%), comparisons (5.11%) or
implicit claims (2.7%). Informed by the initial manual annotations, we
introduce an automated approach that uses syntax and semantics to identify
explicit claims automatically and measure the degree to which each feature
contributes to the overall precision and recall. Results show that a
combination of semantics and syntax is required to achieve the best system
performance.