On the correlation between bibliometric indicators and rankings of conferences and researchers

Copy/paste this URL to link to this paper

This paper aims at finding correlation between bibliometric indicators, that are traditionally used in research evaluation (e.g. citations count, h-index, g-index, etc.), with perceived reputation of researchers and conferences. The empirical results show that while the mentioned indicators are the essential features, among other objective criteria, in reputation of conferences, they are not the discriminant features in reputation of individual researchers.

Keywords : scientometrics, ranking science, informetrics

Abstract : Abstract

This paper aims at finding correlation between bibliometric indicators, that are traditionally used in research evaluation (e.g. citations count, h-index, g-index, etc.), with perceived reputation of researchers and conferences. The empirical results show that while the mentioned indicators are the essential features, among other objective criteria, in reputation of conferences, they are not the discriminant features in reputation of individual researchers.

Introduction : Introduction

Most of the methodologies for assessing research performance nowadays are largely based on evaluation of bibliometric indicators. The latter range from very simple citation counts to sophisticated indexes like the H-index [4] or the G-index [3]. Although these indicators have been widely used recently, there has been also some criticism with respect to the bias, which they have created to the publication behaviour. For example, some authors argue that H-index favors publishing in bigger scientific domains over smaller ones [7,9]. Additionally, Laloe and Mosseri [7] suggest that in order to maximize metrics such as H-index and G-index, the authors should focus on more mainstream research topics with respect to more revolutionary work, which may bring more benefits in long-term perspective. Such suggestions, which aim to maximize the short-term performance with respect to currently used metrics, may have negative impact to research in long-term perspective.

More recently, Jensen et al. [6] used a dataset of more than 600 CNRS scientists to test the predictive power of scientific promotion of several indicators, such as the number of publications, the number of citations, H-index. Their experiments showed that none of the analyzed indicators was able to recover more than a half of correct predictions.

With respect to the reputation of conferences, Zhuang et al [12] conducted an empirical study to reverse engineer ranking of conferences from objectively measurable criteria. Although the paper was focused on conference rankings, we can interpret ranking of a conference as an indicator of its perceived reputation. However, the study was very limited since it only analyzed the number of articles published at conferences. There are currently many more bibliometric indicators, such as the average number of citations a paper receives, acceptance rate, are available and used in research evaluation. 

Motivated by these examples, we have conducted some experiments to study the correlation between bibliometric indicators and reputation of scientists and conferences. Our preliminary results show strong similarity to those of Jensen in the case of researchers. In the following paragraphs, a brief explanation of the experiments and preliminary results is provided as the seed for a future work with emphasis on proposing better methods to assess systematically reputation in science.

State of the Art : Related Work


Bibliometric indicators are currently extensively used in research performance evaluation. The indicators range from very simple citation counts to sophisticated indexes like the H-index [4] or the G-index [3]. Although these indicators have been widely used in latter years, there are also voices arguing about their potential problems. For example, some authors argue that H-index favors publishing in bigger scientific domains over smaller ones [7, 9]. Laloe and Mosseri [7] suggest that in order to maximize metrics such as H-index and G-index, the authors should focus to more mainstream research topics with respect to more revolutionary work, which have impact in long-term perspective.   The results of Shi et al. [9] show that crossing-community, or bridging citation patterns are of high risk and high reward since such patterns are characteristic for both low and high impact papers. The same authors conclude that citation networks of recently published paper are trending toward more bridging and interdisciplinary forms. In the case of conferences it implies that more interdisciplinary conferences should have higher potential for high impact.      

One of the early steps in automated evaluation of scientific venues was the work of Garfield [13] who proposed a measure for ranking journals and called it the ImpactFactor. The initial version was an approximation of the average number of citations within a year given to the set of articles in a journal published during the two preceding years. Based on this early work, a variety of impact factors have been proposed prominently exploiting the number of citations per articles. The latter approaches led to measuring the popularity of the articles but not the prestige. The latter is usually measured by scores similar to PageRank [14], which was adopted to the citation network in order to rank scientific publication [15, 16]. Liu et al. [17] extended the reach of PageRank from pure citation networks to co-authorship networks while ranking scientists. Zhou et al [18] confirmed through empirical findings that the ImpactFactor finds the popularity while PageRank score shows the reputation. Sidiropoulos and Manolopoulos [19] presented one of the first approaches to automated ranking of collections of articles, including conference proceedings. The ranking was based on analyzing citation networks. The main shortage of this paper is that the rankings were not validated with respect to rankings constructed by manually by a set of experts in the field. Jensen et al. [6] have identified that bibliometric indicators predict promotions of researchers better than random assignment the best predictor for promotion being H-index [4] followed by the number of published papers. The study was performed on analyzing promotions of about 600 CNRS scientists. Our results confirm that the same principles apply to conferences as well though better predictor is the acceptance rate. Hamermesh and Pfann [20] identified that the number of published papers has generally small impact for reputation though it implies that a scholar is able to change jobs, and it also raises salaries. We agree with the criticism of Adler et al. [21] in the context of evaluating researchers' performance and impact of publications and venues based on bibliometric indicators, such as H-index, is that their meaning is not well understood though the intuition is clear. Thus any automatically computed ranking, which is based on a simplified model without any empirical validation, should be used with care, especially when we aim at quantifying intangible properties such as reputation with the help of quantifiable features. More specifically, some articles are highly cited for reasons other than high quality and some research groups are not reputable despite of the volume of publications or their citations. Moed and Visser [17] analyzed rank correlation between peer ratings and bibliometric indicators of research groups. It was found that the bibliometric indicator showing the highest rank correlation with the quality peer ratings of the Netherlands academic Computer Science groups, is the number of articles in the Expanded WoS database. The authors propose that this can be interpreted also as evidence that the extent to which groups published in refereed international journals and in important conference proceedings (ACM, LNCS, IEEE) has been an important criterion of research quality for the Review Committee.

Central Claim : Reputation in Science

We compare bibliometric indicators that are traditionally used in research evaluation (e.g. citations count, h-index, g-index, etc.) with perceived reputation of researchers and conferences, setting the basis for a future work on estimating reputation. In case of conferences we also study in which extent conference acceptance rate can be used to discriminate conference rankings and thereby determine their perceived reputation. To perform such comparison, we have used three main sources of reputation information:

1.publicly available rankings of conferences that result from subjective selection processes (e.g. evaluation of committees),
2.online surveys asking researchers to rate their peers, and
3.results of contests for researchers' positions that are selected by committees of peers.

Preliminary results of our experiment show that only small positive or negative correlation exists between these rankings, allowing us to conclude that further research needs to be done along this line, with the goal of having a more results and finding better indicators that are closer to perceived reputation. The figure below is a summary of correlations between reputation of researchers and bibliometric indicators that show how this correlation is under the range to be considered significative. Details of these results are in the following section.

However, we found that in the conference rankings their acceptance rate and bibliometric indicators are important determinants of a rank. We also found that top-tier conferences can be identified with relatively high accuracy through acceptance rates and bibliometric indicators. On the other hand, acceptance rates and bibliometric indicators fail to discriminate between mid-tier and bottom-tier conferences. The former finding indicates that acceptance rate and bibliometric indicators are the key determinants of conference reputation in Computer Science.

Results : Experimental Results

Reputation of Researchers

Table 1. Correlation between average rating and bibliometric indicators

                                                     Kendall-Tau values
Metric                  Source            p-value     tau correlation coefficient

# Publications      DBLP               0.003978   0.3495518
Citations/Paper    RESEVAL        0.002994   0.3597175
H-Index                Palsberg List   0.3119        0.1239786
# Citations            RESEVAL       0.8586        0.02158305
Most Read Pub.   ReaderMeter   0.5328       0.07594937
HR-Index             ReaderMeter   0.8813        0.01863025
GR-Index            ReaderMeter    0.9762        0.003718918
G-Index               RESEVAL        0.7105        -0.04508583
# Bookmarks      ReaderMeter    0.6669        -0.05211244
H-Index              RESEVAL         0.9408        -0.009066217
# Publications    ReaderMeter     0.3575        -0.1115124
# Publications    RESEVAL          0.01978      -0.2826311

As can be seen in the table, non of the metrics could achieved a correlation higher than 0.5, required to consider the correlation significant. It is also important to notice as a drawback that due to the lack of data for some metrics,  we also have low significance p-values.

On the other end, Table 2 shows percentage of cases in which the winner of one particular contest within the italian recruitment process has a lower indicator than the loser (W < L), has a better indicator than the loser (W > L) and finally has the same indicator (W = L). From all these results, we see that the best predictor is the H-Index, but its effectiveness is below 50%.

Table 2. Bibliometric Indicators performance to forecast Italian Contest results
              H-Index         Citation Count    Cited Publications

W < S    47.1% (98)    56.2% (117)       50.5% (105)
W > S    38.9% (81)    39.4% ( 98)        47.6% ( 99)
W = S    13.9% (29)    4.33% ( 9)          1.92% ( 4)

Reputation of Conferences

The second thread of experiments explored rankings of computer science conferences with respect to the same bibliometric indicators applied to researchers, but recalculated to include all the publications in the context of a particular conference. 

First of all we wanted to understand whether there is a consensus on rankings of major computer science conferences. We analyzed correlation between the ERA 2010 [2] and the one published at Sourav Bhowmick homepage [1] (multiple versions of this ranking are circulating in the Web while itsreal origin and the ranking methodology is unknown).   Table 3 summarizes the correlation between these 2 rankings. Although the correlation between these 2 rankings is quite average in general, for some research fields, such as Databases, there is larger correlation. Generally ERA 2010 ranks conferences higher than the ones available at Sourav's Web page [1]. Therefore we chose ERA 2010 ranking for further experiments to learn whether citations per paper metric is a good predictor of conferences'rankings. From these results, it turns out thata bibliometric indicator such as citations per paper is a proper indicator of perceived reputation. However, it should be noted that the predictive power of bibliographic indicators depends also on the domain of the conference venue.   Table 3. Correlation between conference rankings

Area                                                                     Correlation
Whole data                                                                    0.555
Databases                                                                     0.857
Artificial Intelligence and Related Subjects                   0.648
Hardware and Architecture                                           0.509
Applications and Media                                                 0.575
System Technology                                                       0.667   
Programming Languages and Software Engineering   0.461
Algorithms and Theory                                                  0.450
Biomedical                                                                     0.535
Miscellaneous                                                               0.109

The next step was to use machine learning to reverse-engineer conference reputation metric function through given rankings. Since we wanted to extract human-interpretable models from the selected set of features we used decision tree learning methods. We experimented with several decision tree learning algorithms.

The experimental results confirm that acceptance rate is generally the major objective criteria for identifying a conference ranks in Computer Science. More specifically, learned classifier for ranking a conference by using its acceptance rate only provides f-measure of 0.72 in case of ERA 2010 and 0.48 in case of Rank X. This is contrasts with significantly lower f-measure of 0.48 for both Rank X and ERA 2010 in case of bibliometric indicators. Although acceptance rate of a conference is the best predictor of its rank, some increase in f-measure can be achieved by using acceptance rate and bibliometric indicators together. Namely, f-measure of 0.75 and 0.55 can be achieved respectively for ERA 2010 and Rank X. These numbers clearly show strong reection of bibliometric indicators and acceptance rates in conference rankings.

Methods : Data Analysis

For conferences rankings we used the following data sources:

1. Rank X: http://www3.ntu.edu.sg/home/assourav/crank.htm (mirrored with some modifications by http://dsl.serc.iisc.ernet.in/publications/CS_ConfRank.htm) - a list of Computer Science conferences containing (at the time of retrieval in October 2010) 527 entries for conferences giving their acronymes, names, rankings and subdiscipline of Computer Science; 2. ERA 2010: http://www.arc.gov.au/era/era_2010.htm - ERA 2010 ranking of conferences and journals compiled by the Australian Research Council. The list of Computer Science conferences are ranked into three tiers (from top tier to the bottom tier): A tier, B tier, and C tier. These lists are the result of a consultation across all Computer Science department in Australia. Basically, researchers propose that a conference be classified as A, B or C, and these proposals are sent to a committee which has to approve the tier of a conference (based on majority consensus). So in a way ranking is based on a voting procedure. More details on ERA ranking has been presented by Vanclay [12] together with some criticism with respect to its journal rankings.   While the first ranking can be seen as a sort of ad hoc community-driven ranking with no published evaluation criteria, the second one presents a national one with well-documented ranking guidelines.   For acceptance rates and other features data from the following sources was extracted in October 2010: - http://wwwhome.cs.utwente.nl/~apers/rates.html - database conferences statistics from Peter Aper's Stats Page; - http://www.cs.wisc.edu/~markhill/AcceptanceRates_and_PCs.xls - architecture conference statistics for conferences such as ISCA, Micro, HPCA, ASPLOS by Prichard, Scopel, Hill, Sohi, and Wood; - http://people.engr.ncsu.edu/txie/seconferences.htm - software engineering conference statistics by Tao Xie; - http://www.cs.ucsb.edu/~almeroth/conf/stats/|networking conference statistics by Kevin C. Almeroth; - http://web.cs.wpi.edu/~gogo/hive/AcceptanceRates/ - statistics for conferences in graphics/interaction/vision by Rob Lindeman; - http://faculty.cs.tamu.edu/guofei/sec_conf_stat.htm - computer security conference statistics by Guofei Gu;   For retrieving bibliometric data, such as the number of papers published in conference proceedings and the overall number of citations to conference papers, we used Microsoft Academic Search (http://academic.research.microsoft.com/ ). We retrieved data altogether for 2511 Computer Science conferences.   For measuring in which extent objective criteria are reflected in conference rankings we reverse-engineered the selected rankings by using machine learning methods with a set of available objective measures as features. From that perspective weighted average f-measure, used for measuring performance of learned classifiers in machine learning, is the objective measure we chose to identify in  which extent certain features are reflected in rankings. F-measure is the harmonic mean of precision and recall and captures both measures in a compact manner. We experimented with several decision tree learning algorithms by using the following combinations of features by using the 6 datasets we compiled:   - conference statistics (average number of submissions over time, average number of accepted paper over time, average acceptance rate over time, rankings (both ERA 2010 and Rank X)); - bibliometric indicators (the overall number of articles, citations and citation per article) + conference ranking only (both ERA 2010 and Rank X); - conference statistics together with bibliometric indicators (both ERA 2010 and Rank X).   We extracted classification rules with the main decision tree learning methods including ZeroR, IB1, J48, LADTree, BFTree, NaiveBayes, Naive-BayesMultinominal, NaiveBayesUpdateable, OneR, RandomForest, and RandomTree. Generally the best learning method was RandomTree giving us classification precision up to 75%. The latter could be interpreted as a degree of which a highly reputable conference can be identified given their acceptance rates and bibliometric indicators only.

Discussion/Conclusions : Discussion and Conclusions

In this paper we presented our results on reverse-engineering perceived reputation of conferences with the aim to reveal to what extent existing conference rankings reflect objective criteria, specifically submission and acceptance statistics and bibliometric indicators. We used conference rankings as a metric for their perceived reputation and used machine learning to figure out the rules, which would enable identifying conference rankings in terms of their bibliometric indicators and acceptance rates. It turns out that acceptance rate of a conference is generally the best predictor of its reputation for top-tier conferences. However, combination of acceptance rates and bibliometric indicators, more specifically the number of citations to articles in conference proceedings and citations per article count, gives even better results for identifying top-tier conferences both in community-driven and a national ranking.

We also found empirical evidence that acceptance rates and bibliometric indicators are good features in identifying top-tier conferences from the rest, whereas there is a little help of these features in distinguishing middle-tier and bottom-tier conferences from each-other. This might indicate that other, intangible features or subjective opinions, are those, which explain rankings of conferences, which are not top-tier. Another explanation for this finding could be that perceived reputation of conferences divides conferences into top-tier and other conferences.   A recent study of the major database conferences and journals shows that many of the citations reach back at least five years [22]. Thus citation statistics takes time to accumulate and we probably have to target this aspect in our future studies. As one of the future works we would like to run the experiments with wider array of features such as conference location, season etc. Our current intuition tells that in such a way a better classifier for distinguishing middle-tier and bottom-tier conferences. Additionally we would like to learn more about the dynamics of conferences to predict the perceived reputation of conferences in their beginning.   Our experiments also revealed that while the mentioned indicators are the essential features, among other objective criteria, in reputation of conferences, they are not the discriminant features in reputation of individual researchers. Namely, preliminary results of our experiment show only small positive or negative correlation between reputation of researchers and bibliometric indicators. This allows us to conclude that further research needs to be done to find better indicators that are closer to perceived reputation of individual researchers.



1. Sourav Bhowmick. Computer science conference rankings. http://www3. ntu.edu.sg/home/assourav/crank.htm.

2. Australian Research Council. Era - excellence in research for australia initiative. http://www.arc.gov.au/era/era_2010.htm.

3. L. Egghe. An improvement of the h-index: the g-index. ISSI Newsletter, 2(1):8-9, 2006.

4. J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102(46):16569-16572, 2005.

5. Jens Palsberg. The h index for computer science. http://www.cs.ucla. edu/~palsberg/h-number.html.

6. Jensen, J. B. Rouquier, Y. Croissant, Testing bibliometric indicators by their prediction of scientists promotions, Scientometrics 78 (3) (2009) 467-479.

7. Franck Laloe and Remy Mosseri. Bibliometric evaluation of individual researchers: not even right... not even wrong! Europhysics News, 40(5):26-29, October 2009.

8. MIUR - Ministero dell'Instruzione, dell'Universita` e della Ricerca. Il sito pubblico delle valutazioni comparative per il reclutamento dei professori e dei ricercatori universitari. http://reclutamento.miur.it/index.html.

9. X. Shi, J. Leskovec, and D.A. McFarland. Citing for High Impact. Arxiv preprint arXiv:1004.3351, 2010.

10. http://project.liquidpub.org/groupcomparison/

11. http://www.informatik.uni-trier.de/ ley/db/indices/atree/prolific/index.html

12. J. K. Vanclay, An evaluation of the australian research council's journal ranking, Journal of Informetrics 5 (2) (2011) 265-274.

13. E. Garfield, Citation analysis as a tool in journal evaluation, American Association for the Advancement of Science, 1972.

14. L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: bringing order to the web, Tech. rep., Stanford Digital Library Technologies Project (1998).

15. N. Ma, J. Guan, Y. Zhao, Bringing PageRank to the citation analysis, Information Processing & Management 44 (2) (2008) 800-810.

16. P. Chen, H. Xie, S. Maslov, S. Redner, Finding scientific gems with Google's PageRank algorithm, Journal of Informetrics 1 (1) (2007) 8-15.

17. X. Liu, J. Bollen, M. L. Nelson, H. Van de Sompel, Co-authorship networks in the digital library research community, Information Processing & Management 41 (6) (2005) 1462-1480.

18. D. Zhou, S. A. Orshanskiy, H. Zha, C. L. Giles, Co-ranking authors and documents in a heterogeneous network, in: Seventh IEEE International Conference on Data Mining, ICDM 2007, IEEE, 2008, pp. 739-744.

19. A. Sidiropoulos, Y. Manolopoulos, A new perspective to automatically rank scientific conferences using digital libraries, Information Processing & Management 41 (2) (2005) 289-312.

20. D. S. Hamermesh, G. A. Pfann, Markets for reputation: evidence on quality and quantity in academe, SSRN eLibrary. URL http://ssrn.com/paper=1533208

21. R. Adler, J. Ewing, P. Taylor, Citation statistics, Tech. rep., Joint IMU/ICIAM/IMS Committee on Quantitative Assessment of Research (2008). URL http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf

22. E. Rahm, A. Thor, Citation analysis of database publications, ACM Sigmod Record 34 (4) (2005) 48-53.
Attribution by


Hirsch, J. E. 2005: "An index to quantify an individual's scientific research output". Proceedings of the National Academy of Sciences, , pp. 102(46):16569–16572.

Start a new discussion thread (are you logged in ?)

Write a review (are you logged in ?)

Moderator(s): Gloria Origgi , Judith Simon , Jordi Sabater-Mir 
  • Perceived reputation, objective reputation and local biases (5 contributions)
    Gloria Origgi, Nov 23 2010 09:02 UTC
    I find the data you presented extremely interesting, but I am a bit puzzled about the general principles we may extrapolate from them. First of all, we all know very well that there are discrepancies between the perceived reputation of a researcher and his or her exposure in the citation indexes (see for example Hamermesh and Pfan, 2009). But the discrepancy may be generated for good reasons or for bad reasons. Among the good reasons there may be a general perception of the contribution of a researcher or a colleague in our intellectual life that goes beyond the strict assessment of his or her measurable achievements. Among the bad reasons there may be national biases in the recruitment process and in the careers promotions that have to do with local social norms that would be unacceptable on global standards. It is the case for example of Italy and France, in which researchers' publications are practically not considered for career, and local connections and recommendations matter much more. I hope that the evidence of the discrepancies between objective and subjective reputations can help us to go beyond the pure objective criteria and also get rid of the biases and injustices that block careers in many countries not only in Southern Europe.
    • Perceived reputation vs bibliometric indices (4 replies)
      Peep Kungas, Nov 24 2010 08:40 UTC
      I completely agree that that there are discrepancies between the perceived reputation of a researcher and his/her academic performance in terms of bibliometric indices such as H-index. This thread of research reported in this article was partly initiated due to the fact that the major bibliometric indicators, such as H-index, the number of articles etc, are widely use for measuring academic performance without any empirical evidence that they really suit for the purpose.

      From this light our results clearly show that bibliometric indicators are not universally suitable. Another question would be how to measure the reputation. In this article we used the competition data, which in principal gave us a partial ranking of researchers through competition results. The main assumption was that there is a strong correlation between the ranking and reputation. However, relation between these two needs further analysis.

      I like your distinction between subjective and objective criteria though in practice the objective criteria seems to be narrowed down to bibliometric indicators and this might be a clear threat to R&D in longer term. For instance, Laloe and Mosseri [7] suggest that in order to maximize metrics such as H-index and G-index, the authors should focus to more mainstream research topics with respect to more revolutionary work. Thus while favoring pure bibliometric indicators for performance metrics we might end up biasing scientific explorations towards what I would like to call "the local maxima of research question space" and this has negative impact in longer term.

      Regarding the subjective criteria we did not found a good data source for aquiring this data and thus dropped the idea. We probably have to return to this thread and any help on getting this data is highly appreciated.
      • Subjective criteria (3 replies)
        Gloria Origgi, Nov 24 2010 15:37 UTC
        I am aware of two qualitative works on subjective criteria that may be relevant for you, even if they are based on qualitative sociological analysis. On is the recent book by Harvard sociologist Michèle Lamont on "How Professor Think" (Harvard UP, 2009) in which she analyses 12 panels of experts in the humanities ans social science and extrapolate subjective criteria for decision-making in each different discipline.

        Another one is Veronica Boix-Mansilla (psychologist at the Harvard Center for Education: Project Zero" on the assessment of symptoms of quality in the case of innovative and multidisciplinary work: "Assessing Expert Interdisciplinary Work at the Frontier" in G. Laudel, G. Origgi (2006) "Assessing Interdisciplinary Research", Journal of Research Evaluation, vol. 15.
        • Don't base anything on Italian competitions! (1 reply)
          Mario Paolucci, Dec 6 2010 09:16 UTC
          I totally agree with Gloria - and I thank her for the pointers, interesting publications.

          when you write "with the goal of having a more results and finding better indicators that are closer to perceived reputation."

          you make an implicit judgment of value - seem to assume that the perceived reputation is "right". You have to demonstrate this. Especially in Italy, your test case, there are cases known of full professorships granted to people with h-index 3. The common opinion in Italy, though, is that that the h-index reflects reality much better than the tenure, not the other way around.

          Peep hits the main issue with the comment on objective criteria. But one cannot just deduce that "bibliometric indicators are not universally suitable." Especially because this will be used against their application at large. What one should say is that indicators ARE reliable when applied with some corrective measures, and these corrective measures are what we should be looking for.
          • Subjective Criteria (no reply)
            Cristhian Parra Trepowski, Dec 10 2010 17:09 UTC
            You are right. When we wrote the abstract we were actually too much focused on the issue of finding the correlation in order to see whether there was an space for exploration. For this reason we stated first that the goal was to find new measures that better approximate reputation.

            Later on we refined our goal since we understood that what is really interesting here is to understand how objective criteria and subjective criteria are actually used to run assessment. This could be expressed as understanding the nature of reputation in Research, having models and algorithms that describe this reputation in terms of a mix of objective and subjective criteria.

            The results of correlation give us some sort of motivation to further explore this topic untill we reach a proper understanding of how reputation is calculated for real in the realm of Research.
        • Subjective Criteria (no reply)
          Cristhian Parra Trepowski, Dec 10 2010 17:04 UTC
          Thanks Gloria for the pointers to this relevants works.

          Indeed, the ultimate goal of this works it is actually to understand the nature of reputation in research.
          Particularly, we have been focusing our work in the domain of Computer Science, but this could be extended to a broader set of domains.

          The works you mentioned are extremely relevant because the ultimate goal is to derive what are this criteria in the head of evaluators and how they used for real when they assess.

          Ultimately, having this understanding could help us to design better methods of assessment that are closer to the way evaluation actually works in the head of people. Intuitively, we call this "reputation", but maybe in the future we will see that this goes beyond the concept of reputation and trust.
  • Bibliometric indicators and acceptance rates as good predictors of perceived reputation and quality of conferences (no contribution)
    Peep Kungas, Nov 23 2010 17:52 UTC
    Our recent findings indicate a strong correlation between bibliometric indicators and bibliometric indicators of conferences and their rankings. This tendency seems rather prevalent in national rankings such as the ERA 2010 ranking used in Australian institutions.

    Given the preceding the philosophical question is whether current conference/workshop rankings are explicitly based on their bibliometric indicators and acceptance rates or are these metrics purely natural predictors of perceived reputation of such forums. Are there any findings available supporting either of these two preceding claims?
  • Perceived reputation in different subfields (1 contribution)
    Martina Franzen, Dec 9 2010 11:22 UTC
    I would like to learn more about the methodology of the online survey to measure perceived reputation. To me it seems very difficult to measure, just because it is difficult to decide, who are actually the peers? Could you add some information on your sampling (discipline, subfield, area of research - together with the reference list that is not yet displayed)? Furthermore, I think the ratings of reputation are based a lot on acquaintances and social networks. My question is how do the perceptions of scientists' reputation correlate among scientists working in the same area of research?
    • Method (no reply)
      Cristhian Parra Trepowski, Dec 10 2010 17:32 UTC
      Indeed, is very difficult to get reputation opinions and then to measure actual reputation.

      I will explain this information better in the abstract.

      In the meantime, I can explain it here.

      - who are actually the peers?
      By peers we mean researchers rating researchers.

      - Could you add some information on your sampling (discipline, subfield, area of research - together with the reference list that is not yet displayed)?

      As for the survey, the candidates are taken from here: http://www.cs.ucla.edu/~palsberg/h-number.html
      We created different surveys for different communities. You can check them here: http://reseval.org/survey

      All communities are in the domain of computer science and the candidates for each community are calculated as follows:
      * the 20 researchers “closer” to the XXXX community based on co-authorship distance computed from DBLP data, and
      * other 20 researchers randomly selected

      As for the italian contests, the candidates are multidisciplinary ranging from Computer Science domain to Economics or Biology. All the data was extracted from the Italian Recruitment Site

      - how do the perceptions of scientists' reputation correlate among scientists working in the same area of research?

      This is the next step in our research. We first started by asking how bibliometrics are correlated with reputation. Then we would like to understand how the reputation changes from field to field.
  • Relating objective and subjective measures (no contribution)
    Darren L. Dahly, Dec 12 2010 10:17 UTC
    I really appreciate this line of inquiry. I think you will be preaching to the converted with this group, but it's nice to have some evidence that we should not be making important decisions based solely on citation metrics. Most of the people I work with understand this, but some do not.

    As others have noted there is room for both objective and subjective measures when evaluating a particular researcher, while focusing on one or the other will lead to problems. A researcher can obviously have many good qualities that are not captured in citation measures, but relying on purely subjective criteria opens the door for bias, politics, etc.

    There is no real surprise that the objective and subjective measures do not agree; nor do I think this is a problem. I would be more interested to see where and how the objective measures differ. I would also be interested to see the various citation measures compared to other objectively measured indicators that perhaps capture different aspects of research impact. A few that come to mind are: number of PhD students supervised, amount of research grant income, or number of different co-authors. It also seems that more complex analyses of citations networks will play some role in the future.

    Clearly the reason for evaluation is also important (in addition to the specific field). A decision to hire someone is going to be made on different (but overlapping) criteria than those used to award funding for a piece of research. To further refine the goals you set out to accomplish I think you will need to place them in a very specific context in order to set out some kind of theoretical basis to frame your specific questions.

    The question "What is reputation and what would I do with it?" is relevant. In some respects I think the strength of this line of questioning is not to find the best way of evaluation (because I don't think that exists in any kind of generalizable way), but rather to show the limitations of these metrics.