Scientific Paper Analysis Web Portal

What's New

CORD-19 paper analysis

COVID 19-related Large-scale Knowledge Graph Induction
AI system that automatically analyzes a given large collection of academic papers related to COVID-19 and other 25M abstracts of biomedical papers to substantially augment the existing UMLS knowledge base.
Citation Network of COVID-19-related Scholarly Articles
Visual interface for cluster analysis of COVID-19-related papers based on citation information
Relation Extraction on COVID-19
An application of open information extraction tool to the COIVD database. It retrieves entities and relation between them in the COVID database

Tools and data

Overview

The purpose of this research is to construct an integrated environment for analyzing scientific papers and for supporting researchers. Specifically, we aim to realize paper retrieval from various perspectives, an idea supporting system by presenting the relation between papers and the extracted knowledge, knowledge base completion, reasoning of new knowledge, and so on. To achieve these, we develop elemental technologies and tools, evaluate and improve them in real applications, and integrate them into an integrated system to be usable across various domains.

People

Research Group Leader

Yuji Matsumoto (Riken AIP)

Main Research Collaborators

Ken Satoh (NII)
Kentaro Inui (Tohoku University/Riken AIP)
Akiko Aizawa (NII)
Yoshimasa Tsuruoka (University of Tokyo)
Junichiro Mori (University of Tokyo)
Yoshinobu Kano (Shizuoka University)
Hiroyuki Shindo (Nara Institute of Science and Technology)

Secretary

Yuko Kitagawa (Nara Institute of Science and Technology)

Research Groups

G0: Knowledge extraction from scholarly documents
G1: Legal text processing and legal document retrieval
G2: Evidence mining in scientific documents
G3: Document analysis / Resource construction
G4: Text summarization / Discovery of indirect associations
G5: Citation Analysis: Detecting research trend of academic fields
G6: Brain map construction
G7: Table and figure analysis in scholarly documents

Tools and Data

Tools

SideNoter
A Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
PDFNLT
Tools for Natural Language Text aware PDF structure analysis
Termlink
A prototype system for cross-language paper recommendation based on technical term extraction and linking
PDFAnno
Linguistic Annotation and Visualization Tool for PDF Documents
NeuroTextMining (tentatively stopped)
An integrated searching system for neuroscience
Augmented FACTA+
Tool for exploring a COVID-19 related large-scale knowledge graph built from biomedical scientific papers

Data

AASC: ACL Anthology Sentence Corpus
A corpus of 2,339,195 natural language sentences extracted from the ACL Anthology
ACL poster corpus
A corpus of academic presentation posters in which contents and styles are disentangled
RANIS - Relational representation of context-dependent roles on information science papers
A corpus of a set of research abstracts in information science domain with typed entity and relation annotations
Coreference annotated corpus
Coreference-annotated Corpus of ACL Anthology papers in CoNLL format (coreference between head noun phrases)
MWE-Aware English Dependency Corpus
- Version 1.0 (LDC2017T01)
- Version 2.0 (LDC2017T16)
A corpus in which an multiword expression (MWE) is treated as a syntactic unit. The data is built on OntoNotes Release 5.0 (LDC2013T19)