What's New
CORD-19 paper analysis
-
COVID 19-related Large-scale Knowledge Graph Induction
AI system that automatically analyzes a given large collection of academic papers related to COVID-19 and other 25M abstracts of biomedical papers to substantially augment the existing UMLS knowledge base.
-
Citation Network of COVID-19-related Scholarly Articles
Visual interface for cluster analysis of COVID-19-related papers based on citation information
-
Relation Extraction on
COVID-19
An application of open information extraction tool to the COIVD database. It retrieves entities and relation between them in the COVID database
Tools and data
Jump to Tools and Data section
Overview
The purpose of this research is to construct an integrated environment for analyzing scientific papers and for supporting researchers. Specifically, we aim to realize paper retrieval from various perspectives, an idea supporting system by presenting the relation between papers and the extracted knowledge, knowledge base completion, reasoning of new knowledge, and so on. To achieve these, we develop elemental technologies and tools, evaluate and improve them in real applications, and integrate them into an integrated system to be usable across various domains.
People
Research Group Leader
- Yuji Matsumoto (Riken AIP)
Main Research Collaborators
- Ken Satoh (NII)
- Kentaro Inui (Tohoku University/Riken AIP)
- Akiko Aizawa (NII)
- Yoshimasa Tsuruoka (University of Tokyo)
- Junichiro Mori (University of Tokyo)
- Yoshinobu Kano (Shizuoka University)
- Hiroyuki Shindo (Nara Institute of Science and Technology)
Secretary
- Yuko Kitagawa (Nara Institute of Science and Technology)
Research Groups
- G0: Knowledge extraction from scholarly documents
- G1: Legal text processing and legal document retrieval
- G2: Evidence mining in scientific documents
- G3: Document analysis / Resource construction
- G4: Text summarization / Discovery of indirect associations
- G5: Citation Analysis: Detecting research trend of academic fields
- G6: Brain map construction
- G7: Table and figure analysis in scholarly documents
Tools and Data
Tools
-
SideNoter
A Scholarly Paper Browsing System based on PDF Restructuring and Text Annotation
-
PDFNLT
Tools for Natural Language Text aware PDF structure analysis
-
Termlink
A prototype system for cross-language paper recommendation based on technical term extraction and linking
-
PDFAnno
Linguistic Annotation and Visualization Tool for PDF Documents
-
NeuroTextMining (tentatively stopped)
An integrated searching system for neuroscience
-
Augmented FACTA+
Tool for exploring a COVID-19 related large-scale knowledge graph built from biomedical scientific papers
Data
-
AASC: ACL Anthology Sentence Corpus
A corpus of 2,339,195 natural language sentences extracted from the ACL Anthology
-
ACL poster corpus
A corpus of academic presentation posters in which contents and styles are disentangled
-
RANIS - Relational representation of context-dependent
roles on information science papers
A corpus of a set of research abstracts in information science domain with typed entity and relation annotations
-
Coreference annotated corpus
Coreference-annotated Corpus of ACL Anthology papers in CoNLL format (coreference between head noun phrases)
-
MWE-Aware English Dependency Corpus
A corpus in which an multiword expression (MWE) is treated as a syntactic unit. The data is built on OntoNotes Release 5.0 (LDC2013T19)