Keyword extraction by entropy difference between the intrinsic and extrinsic mode
We strive to propose a new metric to evaluate and rank the relevance of words in a text. The method uses the Shannon’s entropy difference between the intrinsic and extrinsic mode, which refers to the fact that relevant words significantly reflect the author’s writing intention, i.e., their occurrences are modulated by the author’s purpose, while the irrelevant words are distributed randomly in the text. By using The Origin of Species by Charles Darwin as a representative text sample, the performance of our detector is demonstrated and compared to previous proposals. Since a reference text ‘‘corpus’’ is all of an author’s writings, books, papers, etc. his collected works is not needed. Our approach is especially suitable for single documents of which there is no a priori information available.
- Zhen Yang
- Weitong Chen
- Hanchen Li
- Chaoyang Li
- Ning Lu
- Longbo Zhang
- Youjun E
- We propose a new metric to evaluate and rank the relevance of words in a text.
- The metric uses the Shannon’s entropy difference between the intrinsic and extrinsic mode.
- We believe that this work is a new result in keyword extraction and ranking.
- Our approach is especially suitable for single documents of which there is no a priori information available.
-  Yang Z, Lei J, Fan K, Lai Y. “Keyword Extraction by Entropy Difference Between the Intrinsic and Extrinsic Mode.” Physica A: Statistical Mechanics and its Applications 392(19): 4523-4531.
Code & Toolbox
- Online Demo 1!
- Online Demo 2!
- Github page
- Codeproject page
- SPROUT toolbox, developed by Prof. Zhiqiang Cai, which use our algorithm to extract keywords for target corpus, and then use the keywords to find extra articles on wiki to expand the corpus.
- 张龙伯. 基于多尺度划分的关键词检测算法, 北京工业大学硕士学位论文，2014.
张龙伯. 基于多尺度划分的关键词检测系统, 计算机软著(登记号: 2014SRBJ0226)，2014.
- 在YANG’ 13算法的基础上加入多尺度分析方法。