Rxivist indexes articles from bioRxiv, a free preprint server by Cold Spring Harbor Laboratory. This package is a client for the Rxivist API and can be used to access metadata from:

  • Metadata on articles published on biorxiv, indexed in the Rxivist
  • Information about authors of the aforementioned articles
  • Usage statistics (e.g. number of downloads for specific paper)

Installation

To install rxivistr package from CRAN, run:

install.packages("rxivistr")

…or install it from GitHub:

devtools::install_github("ikodvanj/rxivistr")

Using rxivistr

Load the package using library() function.

library(rxivistr)

Package contains following functions:

  • rxivist_search - retrieves articles with the matching description
  • article_details - retrieves data about a single paper and all of its authors
  • article_downloads - retrieves monthly download statistics for articles.
  • authors_rank - retrieves top 200 authors in the specified category.
  • author - provides information about the specified author.
  • category_list - retrieves a list of all categories
  • rxivist_stats - retrieves basic statistics about the number of articles indexed by the Rxivist.

In the following text, examples are provided for each function.

article_details

At the time of writing this vignette, the most downloaded article had an id 72514. With the following function we will retrieve information about this article:

res <- article_details(72514)
dplyr::glimpse(res)
#> List of 11
#>  $ id          : chr "72514"
#>  $ doi         : chr "10.1101/2020.01.30.927871"
#>  $ first_posted: chr "2020-01-31"
#>  $ biorxiv_url : chr "https://www.biorxiv.org/content/10.1101/2020.01.30.927871v2"
#>  $ url         : chr "https://api.rxivist.org/v1/papers/72514"
#>  $ title       : chr "Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag"
#>  $ category    : chr "evolutionary-biology"
#>  $ abstract    : chr "This paper has been withdrawn by its authors. They intend to revise it in response to comments received from th"| __truncated__
#>  $ authors     :'data.frame':    9 obs. of  4 variables:
#>   ..$ id         : int [1:9] 580441 580442 580443 580444 580445 582554 295517 580447 580448
#>   ..$ name       : chr [1:9] "Prashant Pradhan" "Ashutosh Kumar Pandey" "Akhilesh Mishra" "Parul Gupta" ...
#>   ..$ institution: chr [1:9] "Acharya Narendra Dev College, University of Delhi" "Kusuma School of biological sciences, Indian institute of technology" "Kusuma School of biological sciences, Indian institute of technology" "Kusuma School of biological sciences, Indian institute of technology" ...
#>   ..$ orcid      : chr [1:9] NA NA NA "http://orcid.org/0000-0002-0190-8753" ...
#>  $ ranks       :List of 4
#>   ..$ alltime  :List of 4
#>   .. ..$ downloads: int 962296
#>   .. ..$ rank     : int 1
#>   .. ..$ out_of   : int 99794
#>   .. ..$ tie      : logi FALSE
#>   ..$ ytd      :List of 4
#>   .. ..$ downloads: int 962296
#>   .. ..$ rank     : int 1
#>   .. ..$ out_of   : int 99794
#>   .. ..$ tie      : logi FALSE
#>   ..$ lastmonth:List of 4
#>   .. ..$ downloads: int 14987
#>   .. ..$ rank     : int 3
#>   .. ..$ out_of   : int 99794
#>   .. ..$ tie      : logi FALSE
#>   ..$ category :List of 4
#>   .. ..$ downloads: int 962296
#>   .. ..$ rank     : int 1
#>   .. ..$ out_of   : int 5736
#>   .. ..$ tie      : logi FALSE
#>  $ publication : Named list()

article_downloads

To investigate the number of downloads, article_downloads function can be used:

article_downloads(72514)
#>   month year downloads  views
#> 1     1 2020    564379  93404
#> 2     2 2020     96925 105222
#> 3     3 2020    135194 138613
#> 4     4 2020    101153 144663
#> 5     5 2020     34235  45337
#> 6     6 2020     15423  17637
#> 7     7 2020     10178  17688
#> 8     8 2020      4809  10117

authors_rank

To retrieve top 200 authors based on the number of article downloads, authors_rank can be used:

res <- authors_rank()
dplyr::glimpse(res)
#> Rows: 200
#> Columns: 5
#> $ id        <int> 295517, 580448, 580441, 580442, 580443, 580444, 580447, 582…
#> $ name      <chr> "James Gomes", "Bishwajit Kundu", "Prashant Pradhan", "Ashu…
#> $ rank      <int> 1, 2, 2, 2, 2, 2, 2, 8, 8, 10, 11, 12, 13, 14, 15, 16, 17, …
#> $ downloads <int> 963907, 963589, 963589, 963589, 963589, 963589, 963589, 962…
#> $ tie       <lgl> FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALS…

author

With function author, we can retrieve more information about specific author.

author(295517)
#> $id
#> [1] 295517
#> 
#> $name
#> [1] "James Gomes"
#> 
#> $institution
#> [1] "Kusuma School of biological sciences, Indian institute of technology"
#> 
#> $orcid
#> NULL
#> 
#> $emails
#> [1] "jgomes@bioschool.iitd.ac.in" "jgomes.bioschool@gmail.com" 
#> 
#> $articles
#>      id                       doi
#> 1 72514 10.1101/2020.01.30.927871
#> 2 83161 10.1101/2020.05.07.082768
#> 3 32598            10.1101/414425
#>                                                   biorxiv_url
#> 1 https://www.biorxiv.org/content/10.1101/2020.01.30.927871v2
#> 2 https://www.biorxiv.org/content/10.1101/2020.05.07.082768v2
#> 3            https://www.biorxiv.org/content/10.1101/414425v1
#>                                       url
#> 1 https://api.rxivist.org/v1/papers/72514
#> 2 https://api.rxivist.org/v1/papers/83161
#> 3 https://api.rxivist.org/v1/papers/32598
#>                                                                                                                               title
#> 1                                        Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag
#> 2 Mutation landscape of SARS-CoV-2 reveals five mutually exclusive clusters of leading and trailing single nucleotide substitutions
#> 3                                                                Immune differentiation regulator p100 tunes NF-κB responses to TNF
#>               category ranks.alltime.downloads ranks.alltime.rank
#> 1 evolutionary-biology                  962296                  1
#> 2             genomics                    1293               8032
#> 3      systems-biology                     318              53888
#>   ranks.alltime.out_of ranks.alltime.tie ranks.ytd.downloads ranks.ytd.rank
#> 1                99794             FALSE              962296              1
#> 2                99794             FALSE                1293           1304
#> 3                99794             FALSE                  55          81139
#>   ranks.ytd.out_of ranks.ytd.tie ranks.lastmonth.downloads ranks.lastmonth.rank
#> 1            99794         FALSE                     14987                    3
#> 2            99794         FALSE                       419                  784
#> 3            99794         FALSE                         8                76347
#>   ranks.lastmonth.out_of ranks.lastmonth.tie ranks.category.downloads
#> 1                  99794               FALSE                   962296
#> 2                  99794               FALSE                     1293
#> 3                  99794               FALSE                      318
#>   ranks.category.rank ranks.category.out_of ranks.category.tie
#> 1                   1                  5736              FALSE
#> 2                1216                  5955              FALSE
#> 3                1569                  2425              FALSE
#> 
#> $ranks
#>   downloads  rank out_of   tie             category
#> 1    963907     1 422246 FALSE              alltime
#> 2    962296     1  20691  TRUE evolutionary-biology
#> 3      1293 14583  42763  TRUE             genomics
#> 4       318  8777  11761  TRUE      systems-biology

category_list

This function returns a list of all categories to which articles are classified:

category_list()
#> $results
#>  [1] "animal-behavior-and-cognition"         
#>  [2] "biochemistry"                          
#>  [3] "bioengineering"                        
#>  [4] "bioinformatics"                        
#>  [5] "biophysics"                            
#>  [6] "cancer-biology"                        
#>  [7] "cell-biology"                          
#>  [8] "clinical-trials"                       
#>  [9] "developmental-biology"                 
#> [10] "ecology"                               
#> [11] "epidemiology"                          
#> [12] "evolutionary-biology"                  
#> [13] "genetics"                              
#> [14] "genomics"                              
#> [15] "immunology"                            
#> [16] "microbiology"                          
#> [17] "molecular-biology"                     
#> [18] "neuroscience"                          
#> [19] "paleontology"                          
#> [20] "pathology"                             
#> [21] "pharmacology-and-toxicology"           
#> [22] "physiology"                            
#> [23] "plant-biology"                         
#> [24] "scientific-communication-and-education"
#> [25] "synthetic-biology"                     
#> [26] "systems-biology"                       
#> [27] "zoology"

rxivist_stats

Returns information about the number of articles indexed by the Rxivist.

res <- rxivist_stats()
dplyr::glimpse(res)
#> List of 8
#>  $ papers_indexed   : int 99794
#>  $ authors_indexed  : int 422246
#>  $ missing_abstract : int 1
#>  $ missing_date     : int 0
#>  $ outdated_count   :List of 28
#>   ..$ animal-behavior-and-cognition         : int 1436
#>   ..$ biochemistry                          : int 3132
#>   ..$ bioengineering                        : int 2086
#>   ..$ bioinformatics                        : int 8553
#>   ..$ biophysics                            : int 4021
#>   ..$ cancer-biology                        : int 3287
#>   ..$ cell-biology                          : int 4769
#>   ..$ clinical-trials                       : int 99
#>   ..$ developmental-biology                 : int 2741
#>   ..$ ecology                               : int 4036
#>   ..$ epidemiology                          : int 1554
#>   ..$ evolutionary-biology                  : int 5552
#>   ..$ genetics                              : int 4718
#>   ..$ genomics                              : int 5821
#>   ..$ immunology                            : int 2775
#>   ..$ microbiology                          : int 8102
#>   ..$ molecular-biology                     : int 3172
#>   ..$ neuroscience                          : int 16230
#>   ..$ paleontology                          : int 125
#>   ..$ pathology                             : int 536
#>   ..$ pharmacology-and-toxicology           : int 879
#>   ..$ physiology                            : int 1287
#>   ..$ plant-biology                         : int 2863
#>   ..$ scientific-communication-and-education: int 627
#>   ..$ synthetic-biology                     : int 872
#>   ..$ systems-biology                       : int 2365
#>   ..$ zoology                               : int 486
#>   ..$ null                                  : int 49
#>  $ missing_authors  : int 63
#>  $ missing_category : int 4932
#>  $ authors_no_papers: int 420