Evaluating the impact of citations of articles based on knowledge flow patterns hidden in the citations

Authors: Mingyang Wang ^aff001; Jiaqi Zhang ^aff001; Shijia Jiao ^aff001; Tianyu Zhang ^aff001
Authors place of work: College of Information and Computer Engineering, Northeast Forestry University, Harbin, People’s Republic of China ^aff001
Published in the journal: PLoS ONE 14(11)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0225276

Summary

The effective evaluation of the impact of a scholarly article is a significant endeavor; for this reason, it has garnered attention. From the perspective of knowledge flow, this paper extracted various knowledge flow patterns concealed in articles citation counts to describe the citation impact of the articles. First, the intensity characteristic of knowledge flow was investigated to distinguish the different citation vitality of articles. Second, the knowledge diffusion capacity was examined to differentiate the size of the scope of articles’ influences on the academic environment. Finally, the knowledge transfer capacity was discussed to investigate the support degree of articles on the follow-up research. Experimental results show that articles got more citations recently have a higher knowledge flow intensity. The articles have various impacts on the academic environment and have different supporting effects on the follow-up research, representing the differences in their knowledge diffusion and knowledge transfer capabilities. Compared with the single quantitative index of citation frequency, these knowledge flow patterns can carefully explore the citation value of articles. By integrating the three knowledge flow patterns to examine the total citation impact of articles, we found that the articles exhibit distinct value of citation impact even if they were published in the same field, in the same year, and with similar citation frequencies.

Keywords:

Citation analysis – Bibliometrics – Astronomy – deep learning – Entropy – Scientific publishing – Astrophysics – Information entropy

Introduction

The effective evaluation of the impact of a scholarly article is an important research topic, as promotions in the assessment of academia and research grants usually ascribe a significant amount of weight to the impact of an individual’s publication record [1–9].

Researchers have proposed a variety of bibliometrics performance indicators to measure the impact of a single scholarly publication or a set of such publications. These indicators can be classified into two main categories. The first category is citation-based indicators where a citation count becomes the most widely used citation-based index to measure the scientific impact of an article [10–12]. In many academic search engines, such as Google Scholar, Microsoft Academic Search, and CiteSeerX, citation-based analyses have been adopted, and a few online services (e.g., citation count, h-index, citation graph) have been provided to users [13]. For publications with the same years and fields, the number of citations is considered to indicate the impact of the articles on the advancement of the fields. To measure the impact of a scientist, a college, and a research institute, the metrics of the h-index [14–23], the g-index [24, 25], and the R-index [26], respectively, have been used. These indexes use the citation count of a publication as the basis for assessing its impact because these are assumed to be valid approaches.

The second category is based on the topological structure of a citation network. The PageRank algorithm is a link analysis algorithm used by Google to rank web pages based on the importance of the web pages. Considering articles as nodes and citations as edges, a citation network is similar to a web graph. Various ranking algorithms based on the topological structure of a citation network were used to assess the impact of a scholarly article in recent years [27–40]. The importance of citing an article, the heterogeneous scholarly network, the publication time, the phenomena of self-citation, and missing citations are usually considered for these studies. Different weights are given to the edges in the citation network when considering the aforementioned factors to evaluate different citation relationships among articles.

While the aforementioned researches seem to suggest that citations may exhibit different levels of importance, they do not explicitly reveal what these levels are and how to detect and use these different levels to evaluate the impact of an article. In one of our previous work, we made a preliminary discussion on the detection of various hidden patterns in citations [41]. The indices of citation intensity, citation width, and citation depth were proposed to distinguish unequal intensities and contributions in citations [41]. In fact, the occurrence of a citation will be accompanied by knowledge flow. Knowledge will be propagated from the cited article to the citing one when the citation activity occurs [42]. Articles will show different knowledge flow patterns even if they have the same number of citations. First, articles will have different knowledge flow intensity. Since the occurrence of citation behavior can represent the knowledge flow behavior between articles, the intensity of knowledge flow in articles can be measured by whether they have a strong ability to be continuously cited. This actually reflects whether an article has strong vitality to get more citation opportunities. Obviously, articles with strong citation vitality will have high academic influence. Therefore, in determining the citation impact of articles, we should consider the intensity of knowledge flow of articles. Second, articles will have different knowledge diffusion capabilities. Each academic article comprises multiple attributes, such as publishing journals, subjects, authors’ organizations, and countries. When citations occur, knowledge diffusion among articles already has been completed, from the attribute space represented by the cited articles to the attribute space of the citing articles. If an article can be cited by larger number of different countries, institutions, disciplines, journals, and others, it shows that the knowledge of the article has an impact on the scientific research of more academic entities. This impact reflects the scope of the influence of the knowledge of the cited article on the academic environment through the occurrence of citations. When the citation impact of an article is evaluated, it is obvious that the knowledge diffusion ability of the article should be considered. Finally, articles will have different knowledge transfer capabilities. The knowledge transfer capability of an article is defined by the extent the article provides support to scientific research based on the occurrence of citations. It can be assumed that if an article can be cited with more high-quality academic achievements and has larger content similarities with these academic achievements, it can be considered that the article has a high ability to transfer knowledge. We should consider the capacity of knowledge transfer when measuring the citation impact of an article.

Therefore, in this paper, we extract the three knowledge flow patterns mentioned above to evaluate the citation impact of an article. The remainder of this paper is organized as follows: The “Methodology” section gives the detailed process of measuring the academic impact by using the knowledge flow patterns. This is followed by the “Data” section, which deals with the data used in the experiments. In the “Experimental results and discussion” section, the experimental results are presented. The final section with a summary of the overall discussion concludes the paper.

Methodology

Knowledge flow intensity

The intensity of knowledge flow reflects whether articles have larger vitality to obtain more citations, which cannot be effectively obtained by analyzing the total frequency of citations. One of our previous studies discussed about the articles that have a stronger ability to be cited consistently [43]. By developing a series of time windows, we have explored the correlation between citation frequency under different time windows and future citation ability of articles and found that neither highly cited articles nor newly published articles can absolutely have stronger future citation ability. Articles with strong sustained citation ability are those that have been cited more frequently over the past two years, regardless of their total citation count and publication date [43]. In this paper, we have considered a similar idea to construct independent and continuous time windows to explore the knowledge flow intensity of articles in different time periods and thus determine which articles should have the larger knowledge flow intensity at present.

Independent time windows: Taking a year as a unit, a series of independent time windows was established. A time parameter T is used to represent the size of the independent time window. T = 1 denotes 2015; T = 2 denotes 2014. The rest are calculated in the same manner.
Continuous time windows: Taking a year as a unit, we established a series of continuous time windows, gradually increasing by 1 year. A time parameter τ is used to represent the size of the continuous time window. τ = 1 denotes 2015 (the previous year); τ = 2 denotes the recent 2 years—2015 and 2014. The rest are calculated in the same manner.

According to the above methods of dividing independent and continuous time windows, this paper bifurcated the total citation frequencies of articles into the citation frequencies obtained in different time windows. Moreover, the citation counts of articles obtained in 2016 were used to represent the ability of the articles to be continuously cited in the future. By exploring the correlation between citation frequencies under different time windows and future citation ability of articles, this paper aims to distinguish the various abilities of citations obtained under different time windows to represent the sustained citation behavior of articles. Pearson’s correlation coefficient is used to define this relationship, which is calculated as follows:

where r_i is the Pearson’s correlation coefficient between the articles’ future citation counts and their past citations obtained in the ith time window. X_ij denotes the past citations in the ith time window for article j. X¯i is the average of citations in the ith time window for all articles. Y_j denotes the future citation counts of article j obtained in 2016. Y¯ is the average of future citation counts of all articles obtained in 2016.

The correlation coefficient calculated under different time windows can be used to explore the contribution of citation frequencies in different time windows to help articles attract new citations. The larger the correlation coefficient is, the higher the contribution of citations to the future under this time window. When determining the overall knowledge flow intensity of an article, citation frequencies obtained by the article in this time window should be given a higher weight. This paper considers correlation coefficients as the weight of citation frequencies obtained under corresponding time windows. Based on the weighted accumulation of the citation frequencies under each time window, the total knowledge flow intensity of the article can be obtained. This reflects the influence of citations with regard to knowledge flow intensity.

Knowledge diffusion capacity

Citation activities should not be expressed solely as numbers, but they should be reflected as a wide space distribution pattern from the perspective of knowledge diffusion. Every article should be seen as a carrier of knowledge, and every citation activity between articles contains the diffusion process of knowledge from the cited article to the citing one. In the earlier work done by the authors to detect the typical features influencing the citation impact of an article, we found that a wider citation distribution in various subjects, journals, countries, and institutions had a greater influence on increasing the citation impact of articles [41, 44–45]. Thus, this work constructed the feature space F from the aforementioned four features to describe the knowledge diffusion process among articles:

Detecting citation distribution patterns of cited articles on this feature space can provide important information about the size of the scope of their influence on the scientific environment. A wider citation distribution in feature space F for one article, i.e., a larger scope of influence, would be more advantageous in the knowledge diffusion of the article.

To analyze the knowledge diffusion capacity of articles, the theory of mutual information is introduced to calculate the dependency of the total citation counts of articles on each of the feature dimensions in F. Mutual information is a statistical measure of interactions among variables and can linearly/nonlinearly access their dependency [46–47]. The mutual information between variables X_i and Y is defined by the following equation:

where X_i denotes the citation distribution in the ith dimension of feature space F and Y denotes the total citations of articles. p(X_i) and p(Y) are probability density functions, and p(X_i,Y) represents the joint probability function. Mutual information is a nonnegative concept, i.e., 0≤I(X_i;Y)≤1; the value I = 1 indicates the highest dependency, and 0 denotes no intercorrelation. The dependency provides important information to analyze the contribution of knowledge diffusion properties of articles to their total citations in each feature dimension; the dependency is also the weight of the number of citations in this dimension. Then, the knowledge diffusion capacity of each article is quantified by the accumulation of weighted citations from each feature dimension in F.

Knowledge transfer capacity

The knowledge transfer capacity can be used to represent the extent to which the cited article supported subsequent research. Based on the number of citations, it is hard to analyze the extent of support for citing articles from cited articles. This paper presented a method based on deep learning technique to calculate the similarities in the content of cited articles and highly cited citing articles to quantify the support of cited articles to the subsequent research.

Deep learning has been successful in several fields because of the strong ability of feature learning and modeling [48–49]. The use of distributed representation in deep learning has shown high effectiveness in capturing the semantics of words, phrases, and sentences, which benefits natural language–processing applications such as sentiment analysis [50], syntactic parsing [51–54], text summarization [55], and others [56–59]. This research has explored the use of distributed document representation in calculating the content similarity between articles. We proposed to use the Doc2vec method, which builds a distributed vector representation at the document level using an unsupervised approach [60].

To achieve the knowledge transfer capacities of one article, it is time-consuming and unnecessary to collect all the citing articles to it. Because highly cited papers (HCPs) could represent high-quality research findings in a less rigorous manner, highly cited citing papers (HCCPs) were extracted from an article to examine the number of high-quality research findings generated under the support of an article. Then, we calculated the similarity between the content of the article and its HCCPs to evaluate the knowledge transfer capacity of the article. This was done based on the condition that if article A was cited by more HCCPs and A had a larger content similarity with these HCCPs, it could confirm that article A had more knowledge transfer capabilities in the subsequent studies, compared to articles that did not have much HCCPs and enough similarities with HCCPs either.

Suppose, there are N papers in the corpus comprising all cited articles and HCCPs citing to them, and we want to learn the distributed document vector such that each paper is mapped to a fixed dimension. There are two models of the Doc2vec method: Distributed Memory Model of Paragraph Vectors (PV-DM) and Distributed Bag of Words version of Paragraph Vector (PV-DBOW) [60]. In our experiment, each document vector is a combination of these two vectors: one learned by the PV-DM and one learned by the PV-DBOW, which are the same in Le & Mikolov's work [60]. The learned document vector representations have 50 dimensions in both PV-DM and PV-DBOW; This means that each paper is mapped to a distribution vector with 100 dimensions.

Suppose, p_i and Hc_ij denote the document vector representations of the ith article and the jth paper in HCCPs to p_i. We calculated the content similarity between the ith article and the jth HCCPs citing to it with a cosine similarity:

For the ith article, the knowledge transfer capacity is calculated as the accumulation of c(i,j) (j = 1,2, …m), where m is the number of HCCPs citing to it.

Obviously, one article can be regarded as having a high supporting value for follow-up research if it has more number of HCCPs and the larger content similarity with these HCCPs as well.

Evaluating articles’ citation impact from the above three knowledge flow patterns

After quantifying the three knowledge flow patterns mentioned above, the most important question is how to integrate these three patterns to make a holistic assessment of articles’ citation impact? In this paper, the entropy weight method was used to weigh these three patterns and achieve a universal analysis on articles’ citation impact. The method is considered as an objective method for weight calculation because weighting factors are dependent on the value of indices rather than on human subjective assessment [61]. It is derived from Shannon entropy, which was first proposed as a quantitative measurement of uncertainty in the information system. Main steps involved in this process are:

Step 1: Initialization of the matrix. Assuming that there are m articles that need to be evaluated in terms of n indices. In this paper, n = 3 refers to the three knowledge flow patterns mentioned above. The initial matrix is established as follows:

Step 2: Normalization of the matrix. To solve the uniformity of indices’ units or a value range, the normalization of all indices is performed as

where [min_new, max_new] is the new value range for all the indices, which is set as [min_new, max_new] = [0.001,0.999].

Step 3: Calculation of the weighting coefficient. The information entropy of each index is calculated by

where E_j is the information entropy of each index and pij=rij∑i=1mrij.

Based on the value of information entropy E_j, the weighting coefficient of each index is calculated by

where ∑j=1nwj=1 and 0≤w_j≤1. 1-E_j indicates the inconsistency degree of each article under the jth index from the theory of information entropy. Then, the index that can create a larger inconsistent degree among articles, or has a larger capacity to discriminate articles, will have a larger weighting coefficient.

Step 4: Calculation of the universal evaluation value on articles’ citation impact:

Following the steps discussed above, each article can get its universal citation impact CI_i by integrating the three knowledge flow patterns.

Data

In this paper, two data sets are used to verify the above methods. Data set 1 is mainly considered to analyze the three knowledge flow patterns. Data set 2 is performed to complete the evaluation of universal citation impact on articles.

Data set 1

We selected the field of “Astronomy and Astrophysics” for our experiments because its publications are well covered by journal publication databases [62] and because it is widely separated from other fields [63]. There are 62 journals in the field of “Astronomy and Astrophysics” in the Journal of Citation Reports (JCR) 2015 of the Science Citation Index (SCI). Among these journals, only 28 have published articles in 1985. We collected 7408 articles from these 28 journals and their citation data during 1985–2016. These articles will test the three knowledge flow patterns proposed in this paper. The detailed information of the 28 journals and the number of articles collected from each journal are listed in Table 1. This data set will be used to discuss the three knowledge flow patterns hidden behind articles’ citation activities.

**Tab. 1. Journals and their articles used for our experiments.**

To investigate the intensity properties of knowledge flow, the citation distribution data of them were divided into two time periods: 1) The first time period is across the time interval from each article’s publication year until 2015. The citation data collected in this time period were used to model the articles’ citation behavior in different time windows. 2) The second time period was established by the citation data of 2016. This data set was used to determine the sustained citation ability of articles in the near future.

When analyzing the knowledge diffusion capacity of each article, the citation distribution data in feature space F = (Subject Category, Journal, Country, Institution) in its citing environment were gathered. Based on the “Analyze Results” tool provided by the web version of SCI, these data were collected to evaluate the article’s knowledge diffusion capacity.

As for the knowledge transfer capacity of one article, the HCCPs were selected based on the selection method that a citing paper would be of high quality if the number of citations it received was at least ten times the mean citation rate among all citing articles. The selection method was similar to the one applied by Aksnes [64] in the study of HCPs by Norwegian authors. There were 237,458 citing articles with 11,707,971 citations. The average citation rate of these citing articles was 49.305. Thus, the citing articles that had been cited at least 493 times were taken as the HCCPs. Then, all the cited articles and their HCCPs were downloaded, which would be transformed into distributed vector representations by the Doc2vec method to facilitate the calculation on content similarity.

To better understand the different capabilities of articles in knowledge diffusion and knowledge transfer, we have segmented the collected documents into three categories according to their total citations up to the statistical year.

HCPs: The HCPs in the field of “Astronomy and Astrophysics” are selected by using the same criterion that was used for the HCCPs. Moreover, a threshold of ten times is used in the selection process.
Medium-cited papers (MCPs): An article is considered as medium cited if the number of citations received is in the range of 1–10 times the mean citation rate in the field of “Astronomy and Astrophysics.”
Low-cited papers (LCPs): This includes the rest of the publications that received a less number of citations than the mean citation rate.

Table 1 lists the number of articles classified into HCPs, MCPs, and LCPs of each journal.

It should be mentioned that we chose 10 times of the average cited frequency as the standard to select HCPs, which has nothing to do with the subject area, mainly to ensure that an appropriate number of articles are selected as HCPs. According to this standard, only 76 of the 7408 articles published in the field of Astronomy and Astrophysics in 1985 were selected as HCPs, accounting for 1% of the total number of papers published in that year, which is consistent with the standard of selecting HCPs in Web of Science. In Web of Science, papers received enough citations to place it in the top 1% in the same subject area and in the same publication year are classified as highly-cited papers.

Data set 2

One of the key journals in the field of “Astronomy and Astrophysics,” i.e., Astrophysical Journal, was used in the experiment on examining articles’ citation impact. Four highly cited articles published in 1985 in Astrophysical Journal were extracted and used to compare their universal citation impact by considering the knowledge flow patterns. These four articles were divided into two pairs, and the total number of citations of the same pair of articles is the same. We choose papers from the same field, published in the same year and with the same cited frequency for analysis, so as to clearly reveal that there are great differences in the citation impact of these papers that meet the same impact standard in traditional citation analysis. The selection method is similar to Yu et al.’s work [65], they chose four papers with similar publication time and the same total number of citations to compare the citation characteristics of them. All the citation data required for detection were collected as discussed in “Data set 1.” The detailed information of these four articles is listed in Table 2.

**Tab. 2. Detailed information about four articles.**

Experimental results and discussion

Experimental results for the intensity characteristics of knowledge flow

Figs 1 and 2 show the correlation between articles’ sustained citation performance and their citation frequency under independent and continuous time windows, respectively, and the subgraph depicts the division of the time windows. The articles’ citations obtained in different time windows generate various influences on their future citation possibilities. The correlation coefficient r_i in Fig 1 decreases continually and exponentially along with the time. The result indicates that the citations obtained in recent years have more impact on articles’ future citation performance. Fig 2 shows time periods in which the citations have the greatest influence on the future citation behavior of the articles. The correlation coefficient in Fig 2 achieves its peak when τ = 2, and then it decreases exponentially with the enlargement of the time window, showing that the citations in the past 2 years have more impact on articles’ future citation performance. The subgraph in the upper right corner in Fig 2 depicts the dependence of articles’ future citations on the number of citations obtained in the past 2 years. There is a clear linear dependence, which suggests that the articles obtaining a large number of citations in the past 2 years have a greater chance of being cited again in the near future.

**Fig. 1. Correlation between articles’ future citation performance on their past citations obtained in different independent time windows.**

**Fig. 2. Correlation between articles’ future citation performance on their past citations obtained in different increasing time windows.**

Figs 1 and 2 show different knowledge flow intensities of articles in different time periods. The articles with a large number of citations in recent years will have more knowledge flow intensities to attract new citations. When evaluating articles’ universal citation impact, these different presentations on knowledge flow intensity, representing as diverse correlation coefficients in different time windows, will be used to weigh the past citations to achieve an integral value for the knowledge flow intensity of one article.

Experimental results for knowledge diffusion capacities

Fig 3(A)–3(D) shows the citation distribution of three types of articles (HCPs, MCPs, and LCPs) based on institutions, subject categories, journals, and countries, respectively. To better understand the characteristics of knowledge diffusion of articles, this paper explored the evolution of knowledge diffusion ability of different types of articles in time series. The subgraph in Fig 3(A) shows the classification of time windows. The three types of articles showed a distinct distribution based on each of the four features. The HCPs achieved most citations from institutions, subject categories, journals, and countries in each time window. The MCPs came in second behind the HCPs, while the LCPs acquired the lowest number of citations. Therefore, articles will have a diverse scope of their influences on the academic environment, although they were the same initially when published in the same field and in the same year. Those papers that eventually grew up to be highly cited exhibited the best knowledge dissemination performance in the early post-publication period and at other stages of the whole life cycle. Such a dominated citation distribution in feature space F produces better visibility effects for these papers, suggesting that more attention from a wider range of fields can bring more citations in the later years and which has further facilitated the articles becoming highly cited eventually. As a result, the range of articles’ influences on the academic environment also indicates the diffusion characteristics of articles’ citation impact, which is incorporated as one dimension in our scheme to calculate the universal citation impact of articles.

Experimental results for knowledge transfer capacities

Table 3 shows the results of the indicators used for evaluating articles’ knowledge transfer capacities. Obviously, the average number of HCCPs for the HCPs is much bigger than that for the MCPs and LCPs. There are more high-quality research outputs generated in the citing environments of the HCPs. Undoubtedly, those high-quality outputs are probably not directly dependent on one HCP, and there are various reasons for citing one article. However, a large number of high-quality research outputs in the citing environment can be an indicator, to some extent, for demonstrating the high citation impact of the HCPs on subsequent research.

**Tab. 3. Indicators to evaluate knowledge transfer capacities for three kinds of articles.**

The last column in Table 3 lists the average content similarities between different kinds of cited articles and their HCCPs. As shown in Table 3, the average content similarity for HCPs is 0.327, which is more than MCPs (0.079) and LCPs (0.01). It indicates that HCPs have the highest content similarities with high-quality research outputs, besides the dominated average number of HCCPs.

Therefore, articles will show completely different characteristics of knowledge flow through the occurrence of citation behavior, which cannot be detected only by the characteristics of the number of citations. These knowledge flow patterns have not only shown the diverse intensity, scope, and depth of articles’ influences, but have also benefited generating diverse citation life of them. In the next experiment, we have incorporated articles’ knowledge flow patterns into the evaluation of the citation impact and verified the necessity of doing so using the article data in “Data set 2.”

Experimental results for evaluating articles’ citation impact

The citation impact of the two pairs of highly cited articles in Table 2 was universally examined by integrating the knowledge flow patterns.

As for the intensity characteristic of knowledge flow, the citation frequency obtained in the independent time window of T = i were weighted by the correlation coefficient r_i in the same time period. Then the weighted citations in different independent time windows were accumulated to quantify articles’ total intensity characters of knowledge flow. As for the knowledge diffusion capacities, mutual information was used to calculate the dependency of the paper’s total citation counts on each of the feature dimension in F, which is the weight for each feature dimension. Then, the weighted citations from each feature dimension were accumulated to quantify the knowledge diffusion capacity of each article. As for the knowledge transfer capacities, we collected the HCCPs for each article in Table 2, and used the deep learning model of Doc2Vec to calculate the content similarity between each article in Table 2 and its corresponding HCCPs. After quantifying the knowledge flow patterns, the entropy weight method was used to weigh each pattern and calculate the universal citation impact of each article.

Table 4 lists the citation distribution data of the four articles in each feature dimension, as well as the final quantified knowledge diffusion capacity of them. The results show that the knowledge diffusion ability and the overall knowledge diffusion performance of the articles are completely different in each dimension of F, even if they have the same or similar citation counts. The diverse knowledge diffusion capacities show the diverse influences of the articles on the academic environment.

**Tab. 4. Knowledge diffusion capacities for four articles.**

Table 5 shows the experimental results of the universal citation impact of the articles. The articles in the same pair have shown different universal citation impact after the integration of the information from the knowledge flow patterns, especially in the first pair, HCP-1 and HCP-2. HCP-1 shows a more significant citation impact than HCP-2 because HCP-1 exhibits better than HCP-2 in all three aspects of knowledge flow. At the same time, HCP-4 has shown a little higher citation impact than HCP -⁠ 2, although HCP-4 has a much lower citation count than HCP-2. There are eight HCCPs in the citing environment of HCP-4, far more than the three HCCPs in HCP-2’s citing environment. It benefits HCP-4 a better performance in knowledge transfer capacity than HCP-2. In the other two knowledge flow patterns, two articles almost matched. However, we still want to detect their citation distributions to observe their citation performance in recent years. Fig 4 shows the articles’ citation distribution from 1985 to 2015. It is easy to find that the three articles of HCP-1, HCP-3, and HCP-4 show strong citation vitalities in recent years, especially for HCP-4, which exhibits a vigorous citation performance from 2008. At the same time, HCP-2 lacks motivation. As shown in the subgraph in Fig 2, articles that got more citations in recent years will have higher knowledge flow intensity to attract more citations in the future. Therefore, HCP-4 is more active than HCP-2 in attracting new citations.

**Fig. 4. The citation distribution of four articles from their publication year 1985 to 2015.**

**Tab. 5. Universal citation impact for the four articles.**

Thus, it can be concluded that articles will show completely diverse citation impact, even if they have the same or a similar number of total citation counts. Moreover, articles can show similar citation impact, even if they have diverse citations. The results indicate that it is necessary to make this analysis by incorporating the different knowledge flow patterns hidden behind articles’ citation activities.

Conclusions

In this paper, different knowledge flow patterns hidden behind the number of citations were identified to describe the impact of articles. The purpose of this paper was to highlight that the citation count of articles should not only be seen as a single number, but consideration should also be given to the process of knowledge flow hidden behind the number of citations. According to the idea being put forward, three knowledge flow patterns—the intensity properties of knowledge flow, the knowledge diffusion capacities, and the knowledge transfer capacities were identified and incorporated to examine the citation impact of an article. This paper collected articles related to the field of “Astronomy and Astrophysics” to analyze the performance of them in the knowledge flow patterns, and accordingly discussed the necessity of incorporating these three patterns into citation impact assessments. Experimental results showed that articles present diverse knowledge flow intensities in different time windows. Those having higher citation counts in recent years have large knowledge flow intensities to be cited more in the future. HCPs received more citations in the four feature dimensions of knowledge diffusion in all the time periods. A broader distribution under these features would be a considerable proof for the large scope of their influence in the academic environment, as well as evidence for more visibility in the generation of more citations in the future. HCPs had a greater possibility to be cited by more high-quality research outputs. Considering major content similarities of HCPs with these research outputs, HCPs are more active in knowledge transfer. Thus, articles have shown diverse intensity, diffusion, and transfer characteristics of knowledge flow. These different representations regarding the characteristics of knowledge flow contribute to the formation of different citation trajectories of articles, even if they were published in the same subject area and in the same year. It is necessary to incorporate the three patterns when making a universal analysis on articles’ citation impact.

To detect whether articles with the same citation counts would still have the same citation impact after considering the three knowledge flow patterns, four highly cited articles in Astrophysical Journal were collected and divided into two pairs with each pair having the same citation count. The entropy weight method was used to weigh the three knowledge flow patterns to make a holistic evaluation on articles’ citation impact. Experiment results show that articles exhibit completely different citation impact, although they have the same citation count. In addition, articles will have similar citation impact even if they vary largely in the total citation count. The results show that there is a necessity to consider the knowledge flow patterns while analyzing articles’ citation impact.

In conclusion, detecting the citation impact from the perspective of knowledge flow will provide a novel consideration to evaluate the value of publications. Mainly, the focus has been on the evaluation of a publication’s accumulated influence gathered using a citation-count-based assessment. However, as discussed in this paper, citation activities should not be expressed solely as a single number, but they should reflect various citation properties from the perspective of knowledge flow. The citation performance of publications can vary according to different citation patterns, even if they have the same citation count. This can be helpful in evaluating publications and can be useful for decision-makers to evaluate the academic performance of different researchers or different institutions. It can also be valuable in steering research policy and for hiring or promotion decisions. In different applications, the three aspects and weights proposed in this paper are not invariable. Decision-makers can select or focus on different aspects of knowledge flow as required and formulate a reasonable weight system in line with the actual situation, to complete the evaluation task based on a realistic point of view rather than the number of citations.

It is worth mentioning that although the current work is to take journal papers as an example to discuss the issue of citation impact assessment of academic achievements, the ideas and methods of this work are still applicable to other types of academic achievements, such as conference papers. Taking conference papers as an example, the total number of citations is also the result of the accumulation of citations in the citation life. The citations obtained in different periods will also have different knowledge flow intensity, which will have different influence on their ability of attracting new citations in the future. In addition, through the occurrence of citation behavior, conference papers will also have an impact on other academic entities, resulting in different characteristics of knowledge diffusion intensity and knowledge transfer intensity. Therefore, there are also differences in the characteristics of knowledge flow in the citation behavior of conference papers. By using entropy weight method to synthesize the knowledge flow characteristics of conference papers, the comprehensive evaluation results of citation influence of conference papers can also be obtained.

Although some interesting phenomena have been found in this study, which has certain reference value for the construction of scientific and effective evaluation mechanism of academic papers, there are still some limitations. The method proposed in this paper can not be directly used to evaluate the influence of academic papers in different disciplines. There are great differences in citation characteristics in different disciplines, which leads to significant differences in the three-dimensional characteristics of knowledge flow. This method can not be directly used to distinguish the citation effect of articles in different disciplines before finding appropriate technology to effectively measure the differences brought by domain features and using these differences to complete the standardization of domain features. In the future research, we will further explore the standardized method of domain characteristics to solve the deviation of influence evaluation of academic papers in different fields.

Supporting information

S1 File [xlsx]
Article data and the accordingly citation data in .

Zdroje

1. Abbott A. Italy introduces performance-related funding. Nature. 2009;460(7255):559–559. doi: 10.1038/460559b

2. Bai X, Liu H, Zhang F, Ning Z, Kong X, Lee I et al. An Overview on Evaluating and Predicting Scholarly Article Impact. Information. 2017;8(3):73.

3. Bai X, Zhang F, Lee I. Predicting the citations of scholarly paper. Journal of Informetrics. 2019;13(1):407–418.

4. Cai L, Tian J, Liu J, Bai X, Lee I, Kong X et al. Scholarly impact assessment: a survey of citation weighting solutions. Scientometrics. 2019;118(2):453–478.

5. Chi P S, Gorraiz J, Glänzel W. Comparing capture, usage and citation indicators: an altmetric analysis of journal papers in chemistry disciplines. Scientometrics. 2019;120(3):1461–1473.

6. Glänzel W, Heeffer S, Thijs B. A model for publication and citation statistics of individual authors. In 15th International Conference of the International-Society-for-Scientometrics-and-Informetrics on Scientometrics and Informetrics. Istanbul, Turkey. 2015 : 942–952.

7. King D. The scientific impact of nations. Nature. 2004;430(6997):311–316. doi: 10.1038/430311a 15254529

8. Petersen A. M., Stanley H. E., & Succi S. (2011). Statistical regularities in the rank-citation profile of scientist, 1, 181.

9. Zhang F, Bai X, Lee I. Author Impact: Evaluations, Predictions, and Challenges. IEEE Access. 2019;7 : 38657–38669.

10. Subelj L, Fiala D, Bajec M. Network-based statistical comparison of citation topology of bibliographic databases. Scientific Reports. 2014;4 : 6496. doi: 10.1038/srep06496 25263231

11. Ravenscroft J, Liakata M, Clare A, Duma D. Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvements. PLOS ONE. 2017;12(3):e0173152. doi: 10.1371/journal.pone.0173152 28278243

12. De Groote S, Shultz M, Smalheiser N. Examining the Impact of the National Institutes of Health Public Access Policy on the Citation Rates of Journal Articles. PLOS ONE. 2015;10(10):e0139951. doi: 10.1371/journal.pone.0139951 26448551

13. Wan X, Liu F. Are all literature citations equally important? Automatic citation strength estimation and its applications. Journal of the Association for Information Science and Technology. 2014;65(9):1929–1938.

14. Hirsch J. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences. 2005;102(46):16569–16572.

15. Braun T, Glänzel W, Schubert A. A Hirsch-type index for journals. Scientometrics. 2006;69(1):169–173.

16. Bertoli-Barsotti L, Lando T. The h-index as an almost-exact function of some basic statistics. Scientometrics. 2017;113(2):1209–1228. doi: 10.1007/s11192-017-2508-6 29081557

17. Chi P S, Glänzel W. Do usage and scientific collaboration associate with citation impact? In 21st International Conference on Science and Technology Indicators. Valencia, Spain. 2016 : 1223–1228.

18. Chi P S, Glänzel W. An empirical investigation of the associations among usage, scientific collaboration and citation impact. Scientometrics. 2017;112(1):403–412.

19. Chi P S, Glänzel W. Comparison of citation and usage indicators in research assessment in scientific disciplines and journals. Scientometrics. 2018;116(1):537–554.

20. Gao C, Wang Z, Li X, Zhang Z, Zeng W. PR-Index: Using the h-Index and PageRank for Determining True Impact. PLOS ONE. 2016;11(9):e0161755. doi: 10.1371/journal.pone.0161755 27627767

21. Glänzel W, Thijs B. The role of baseline granularity for benchmarking citation impact. The case of CSS profiles. Scientometrics. 2018; 116(1):521–536.

22. Wang L, Thijs B, Glänzel W. Characteristics of international collaboration in sport sciences publications and its influence on citation impact. Scientometrics. 2015;105(2):843–862.

23. Zhang C. A novel triangle mapping technique to study the h-index based citation distribution. Scientific Reports. 2013;3(1).

24. Egghe L. Theory and practise of the g-index. Scientometrics. 2006;69(1):131–152.

25. Lee JY. A proposal on modified g-index for evaluating research performance. Scientometrics. 2017;34(3):209–228.

26. Jin B, Liang L, Rousseau R, Egghe L. The R -⁠ and AR-indices: Complementing the h-index. Chinese Science Bulletin. 2007;52(6):855–863.

27. Bai X, Xia F, Lee I, Zhang J, Ning Z. Identifying Anomalous Citations for Objective Evaluation of Scholarly Article Impact. PLOS ONE. 2016;11(9):e0162364. doi: 10.1371/journal.pone.0162364 27606817

28. Bai X, Zhang F, Hou J, Lee I, Kong X, Tolba A et al. Quantifying the impact of scholarly papers based on higher-order weighted citations. PLOS ONE. 2018;13(3):e0193192. doi: 10.1371/journal.pone.0193192 29596426

29. Chen C. Predictive effects of structural variation on citation counts. Journal of the American Society for Information Science and Technology. 2011;63(3):431–449.

30. Li J, Willett P. ArticleRank: a PageRank-based alternative to numbers of citations for analysing citation networks. Aslib Proceedings. 2009;61(6):605–618.

31. Li Z, Peng Q K, Liu C. Two citation-based indicators to measure latent referential value of papers. Scientometrics. 2016;108(3):1299–1313.

32. Yan E, Ding Y, Sugimoto C. P-Rank: An indicator measuring prestige in heterogeneous scholarly networks. Journal of the American Society for Information Science and Technology. 2011;62(3):467–477.

33. Singh A, Shubhankar K, Pudi V. An efficient algorithm for ranking research papers based on citation network. In 3rd Conference on Data Mining and Optimization (DMO), Putrajaya, Malaysia, 2011 (pp. 88–95). IEEE.

34. Walker D, Xie H, Yan K, Maslov S. Ranking scientific publications using a model of network traffic. Journal of Statistical Mechanics: Theory and Experiment. 2007;2007(06):P06010–P06010.

35. Su C, Pan Y, Zhen Y, Ma Z, Yuan J, Guo H et al. PrestigeRank: A new evaluation method for papers and journals. Journal of Informetrics. 2011;5(1):1–13.

36. Ma N, Guan J, Zhao Y. Bringing PageRank to the citation analysis. Information Processing & Management, 2008, 44 (2), 800–810.

37. Qiao H, Wang Y, Liang Y. A value evaluation method for papers based on improved PageRank algorithm. In 2nd International Conference on Computer Science and Network Technology (ICCSNT), Changchun, China, 2012 (pp. 2201–2205). IEEE.

38. Senanayake U, Piraveenan M, Zomaya A. The Pagerank-Index: Going beyond Citation Counts in Quantifying Scientific Impact of Researchers. PLOS ONE. 2015;10(8):e0134794. doi: 10.1371/journal.pone.0134794 26288312

39. Spitz A, Horvát E. Measuring Long-Term Impact Based on Network Centrality: Unraveling Cinematic Citations. PLoS ONE. 2014;9(10):e108857. doi: 10.1371/journal.pone.0108857 25295877

40. Wang J, Thijs B, Glänzel W. Interdisciplinarity and Impact: Distinct Effects of Variety, Balance, and Disparity. PLOS ONE. 2015;10(5):e0127298. doi: 10.1371/journal.pone.0127298 26001108

41. Wang M, Ren J, Li S, Chen G. Quantifying a Paper’s Academic Impact by Distinguishing the Unequal Intensities and Contributions of Citations. IEEE Access. 2019;7 : 96198–96214.

42. Kwon S, Solomon G, Youtie J, Porter A. A measure of knowledge flow between specific fields: Implications of interdisciplinarity for impact and funding. PLOS ONE. 2017;12(10):e0185583. doi: 10.1371/journal.pone.0185583 29016631

43. Wang M, Li S, Chen G. Detecting latent referential articles based on their vitality performance in the latest 2 years. Scientometrics. 2017;112(3):1557–1571.

44. Wang M, Yu G, An S, Yu D. Discovery of factors influencing citation impact based on a soft fuzzy rough set model. Scientometrics. 2012;93(3):635–644.

45. Wang M, Yu G, Xu J, He H, Yu D, An S. Development a case-based classifier for predicting highly cited papers. Journal of Informetrics. 2012;6(4):586–599.

46. Lee J, Kim DW. Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn. Lett. 2013;34(3):349–357.

47. Han M, Ren W, Liu X. Joint mutual information-based input variable selection for multivariate time series modeling. Engineering Applications of Artificial Intelligence. 2015;37 : 250–257.

48. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst. 2013;26 : 3111–3119.

49. Yu J, Zhang B, Kuang Z, Lin D, Fan J. iPrivacy: Image Privacy Protection by Identifying Sensitive Objects via Deep Multi-Task Learning. IEEE Transactions on Information Forensics and Security. 2017;12(5):1005–1016.

50. Richard Socher, Chen DQ, Manning CD, Ng AY. Reasoning with neural tensor networks for knowledge base completion. Adv Neural Inf Process Syst. 2013, 926–934.

51. Socher Richard, Perelygin Alex, Wu Jean Y, Chuang J. Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Washington, USA, 2013, 1631–1642.

52. Zhang W. N., Ming Z. Y., Zhang Y, Nie L. Q., Liu T, Chua T. S. The use of dependency relation graph to enhance the term weighting in question retrieval. COLING, 2012 : 3105–3120.

53. Zhang W. N., Ming Z. Y., Zhang Y, Liu T, Chua T. S. Capturing the semantics of key phrases using multiple languages for question retrieval. IEEE Transactions on Knowledge and Data Engineering. 2016, 28(4): 888–900.

54. Zhang W. N., Zhang Y, Liu T. A topic inference based translation model for question retrieval in community-based question answering services. Chinese Journal of Computers. 2015, 38(2):313–321.

55. Zhang, C., Zhang L., Wang C. J., & Xie, J. Y. Text summarization based on sentence selection with semantic representation. IEEE 26th International Conference on Tools with Artificial Intelligence (ICTAI), Limassol, Cyprus, 2014, 584–590.

56. Zhang W. N., Liu T, Yin Q. Y., Zhang Y. Neural recovery machine for Chinese dropped pronoun. Frontiers of Computer Science. 2019, 13(5): 1023–1033.

57. Zhang W. N., Zhu Q. F., Wang Y. F., Zhao Y. Y., Liu T. Neural personalized response generation as domain adaptation. World Wide Web-internet and Web Information Systems. 2019, 22(4): 1427–1446.

58. Liu T, Zhang W. N., Zhang Y. SocialRobot: a big data-driven humanoid intelligent system in social media services. Multimedia Systems, 2016, 22(1):17–27.

59. Yin Q Y., Zhang W. N., Zhang Y., Liu T. A joint model for ellipsis identification and recovery. Journal of Computer Research and Development. 2015, 52(11):2460–2467.

60. Le QV, Mikolov T. Distributed representations of sentences and documents. Proceedings of the 31th International Conference on Machine Learning; Beijing, China, 2014 1188–1196.

61. Huang S, Chang J, Leng G, Huang Q. Integrated index for drought assessment based on variable fuzzy set theory: A case study in the Yellow River basin, China. Journal of Hydrology. 2015;527 : 608–618.

62. Moed HF, (Ed.). (2005). Citation analysis in research evaluation. Dordrecht: Springer. p. 130.

63. Gläser J, Glänzel W, Scharnhorst A. Same data—different results? Towards a comparative approach to the identification of thematic structures in science. Scientometrics. 2017;111(2):981–998.

64. Aksnes D. Characteristics of highly cited papers. Research Evaluation. 2003;12(3):159–170.

65. Yu G., Yu T., Wang L. Assessing influence of scientific articles based on feature spaces of citations. International Conference on Management Science and Engineering (ICMSE 2016). Switzerland, Alten, 2016 : 41–48.