Overview

Row

Workflow

Packages

Row

Datasets

  1. Untidy version of DSjobtracker(DSraw)

  2. World citities

          city  city_ascii      lat       lng       country iso2 iso3
1        Tokyo       Tokyo  35.6850  139.7514         Japan   JP  JPN
2     New York    New York  40.6943  -73.9249 United States   US  USA
3  Mexico City Mexico City  19.4424  -99.1310        Mexico   MX  MEX
4       Mumbai      Mumbai  19.0170   72.8570         India   IN  IND
5   São Paulo   Sao Paulo -23.5587  -46.6250        Brazil   BR  BRA
6        Delhi       Delhi  28.6700   77.2300         India   IN  IND
7     Shanghai    Shanghai  31.2165  121.4365         China   CN  CHN
8      Kolkata     Kolkata  22.4950   88.3247         India   IN  IND
9  Los Angeles Los Angeles  34.1139 -118.4068 United States   US  USA
10       Dhaka       Dhaka  23.7231   90.4086    Bangladesh   BD  BGD
          admin_name capital population         id
1            TÃ…\215kyÃ…\215 primary   35676000 1392685764
2           New York           19354922 1840034016
3  Ciudad de México primary   19028000 1484247881
4      MahÄ\201rÄ\201shtra   admin   18978000 1356226629
5         São Paulo   admin   18845000 1076532519
6              Delhi   admin   15926000 1356872604
7           Shanghai   admin   14987000 1156073548
8        West Bengal   admin   14787000 1356060520
9         California           12815475 1840020491
10             Dhaka primary   12797394 1050529279

Untidy Version

Column

Untidy version of the dataset

# A tibble: 6 x 152
     ID Consultant DateRetrieved DatePublished Job_title Company     R   SAS
  <dbl> <chr>      <chr>         <chr>         <chr>     <chr>   <dbl> <dbl>
1     1 Thiyanga   05/08/2020    <NA>          <NA>      <NA>        1     1
2     2 Jayani     07/08/2020    31/07/2020    Junior D~ Dialog~     1     0
3     3 Jayani     07/08/2020    06/08/20      Engineer~ London~     0     0
4     4 Jayani     07/08/2020    24/07/2020    CI-Stati~ E.D. B~     1     1
5     5 Jayani     07/08/2020    24/07/2020    DA-Data ~ E.D. B~     0     1
6     6 Jayani     07/08/2020    13/08/2020    Data Sci~ Emirat~     1     0
# ... with 144 more variables: SPSS <dbl>, Python <dbl>, MAtlab <dbl>,
#   Scala <dbl>, `C#` <dbl>, `MS Word` <dbl>, `Ms Excel` <dbl>, `OLE/DB` <dbl>,
#   `Ms Access` <dbl>, `Ms PowerPoint` <dbl>, Spreadsheets <dbl>,
#   Data_visualization <dbl>, Presentation_Skills <dbl>, Communication <dbl>,
#   BigData <dbl>, Data_warehouse <dbl>, cloud_storage <dbl>,
#   Google_Cloud <dbl>, AWS <dbl>, Machine_Learning <dbl>, `Deep
#   Learning` <dbl>, Computer_vision <dbl>, Java <dbl>, `C++` <dbl>, C <dbl>,
#   `Linux/Unix` <dbl>, SQL <dbl>, NoSQL <dbl>, RDBMS <dbl>, Oracle <dbl>,
#   MySQL <dbl>, PHP <dbl>, Flash_Actionscript <dbl>, SPL <dbl>,
#   web_design_and_development_tools <dbl>, Wordpress <dbl>, AI <dbl>,
#   `Natural_Language_Processing(NLP)` <dbl>, `Microsoft Power BI` <dbl>,
#   Google_Analytics <dbl>, graphics_and_design_skills <dbl>,
#   Data_marketing <dbl>, SEO <dbl>, Content_Management <dbl>, Tableau <dbl>,
#   D3 <dbl>, Alteryx <dbl>, KNIME <dbl>, Spotfire <dbl>, Spark <dbl>,
#   S3 <dbl>, Redshift <dbl>, DigitalOcean <dbl>, Javascript <dbl>,
#   Kafka <dbl>, Storm <dbl>, Bash <dbl>, Hadoop <dbl>, Data_Pipelines <dbl>,
#   MPP_Platforms <dbl>, Qlik <dbl>, Pig <dbl>, Hive <dbl>, Tensorflow <dbl>,
#   `Map/Reduce` <dbl>, Impala <dbl>, Solr <dbl>, Teradata <dbl>,
#   MongoDB <dbl>, Elasticsearch <dbl>, YOLO <dbl>, `agile execution` <dbl>,
#   Data_management <dbl>, pyspark <dbl>, Data_mining <dbl>,
#   Data_science <dbl>, Web_Analytic_tools <dbl>, IOT <dbl>,
#   Numerical_Analysis <dbl>, Economic <dbl>, Finance_Knowledge <dbl>,
#   Investment_Knowledge <dbl>, Problem_Solving <dbl>, Korean_language <dbl>,
#   `Bash\\Linux Scripting` <dbl>, Knowledge_in <chr>, Experience <chr>,
#   City <chr>, Location <chr>, Educational_qualifications <chr>, Salary <chr>,
#   Team_Handling <dbl>, Debtor_reconcilation <dbl>, Payroll_management <dbl>,
#   Bayesian <dbl>, Optimization <dbl>, `Bahasa Malaysia` <dbl>, `English
#   proficiency` <chr>, URL <chr>, Search_Term <chr>, ...

Column

Glimpse of untidy data

Observations: 551
Variables: 152
$ ID                                 <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1...
$ Consultant                         <chr> "Thiyanga", "Jayani", "Jayani", ...
$ DateRetrieved                      <chr> "05/08/2020", "07/08/2020", "07/...
$ DatePublished                      <chr> NA, "31/07/2020", "06/08/20", "2...
$ Job_title                          <chr> NA, "Junior Data Scientist", "En...
$ Company                            <chr> NA, "Dialog Axiata PLC", "London...
$ R                                  <dbl> 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0,...
$ SAS                                <dbl> 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0,...
$ SPSS                               <dbl> NA, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0...
$ Python                             <dbl> 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0,...
$ MAtlab                             <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ Scala                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ `C#`                               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `MS Word`                          <dbl> NA, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0...
$ `Ms Excel`                         <dbl> NA, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0...
$ `OLE/DB`                           <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `Ms Access`                        <dbl> NA, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ `Ms PowerPoint`                    <dbl> NA, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0...
$ Spreadsheets                       <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Data_visualization                 <dbl> NA, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0...
$ Presentation_Skills                <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Communication                      <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ BigData                            <dbl> NA, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1...
$ Data_warehouse                     <dbl> NA, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ cloud_storage                      <dbl> NA, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Google_Cloud                       <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ AWS                                <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Machine_Learning                   <dbl> NA, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1...
$ `Deep Learning`                    <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Computer_vision                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
$ Java                               <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1...
$ `C++`                              <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ C                                  <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ `Linux/Unix`                       <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ SQL                                <dbl> 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1,...
$ NoSQL                              <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ RDBMS                              <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Oracle                             <dbl> NA, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ MySQL                              <dbl> NA, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1...
$ PHP                                <dbl> NA, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0...
$ Flash_Actionscript                 <dbl> NA, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0...
$ SPL                                <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ web_design_and_development_tools   <dbl> NA, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0...
$ Wordpress                          <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ AI                                 <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `Natural_Language_Processing(NLP)` <dbl> NA, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0...
$ `Microsoft Power BI`               <dbl> NA, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0...
$ Google_Analytics                   <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ graphics_and_design_skills         <dbl> NA, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0...
$ Data_marketing                     <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ SEO                                <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Content_Management                 <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Tableau                            <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
$ D3                                 <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
$ Alteryx                            <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ KNIME                              <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Spotfire                           <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Spark                              <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0...
$ S3                                 <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ Redshift                           <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ DigitalOcean                       <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ Javascript                         <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ Kafka                              <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Storm                              <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Bash                               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Hadoop                             <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
$ Data_Pipelines                     <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ MPP_Platforms                      <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Qlik                               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Pig                                <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Hive                               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
$ Tensorflow                         <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `Map/Reduce`                       <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Impala                             <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Solr                               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Teradata                           <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ MongoDB                            <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Elasticsearch                      <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ YOLO                               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `agile execution`                  <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1...
$ Data_management                    <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ pyspark                            <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Data_mining                        <dbl> NA, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ Data_science                       <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
$ Web_Analytic_tools                 <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ IOT                                <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Numerical_Analysis                 <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Economic                           <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Finance_Knowledge                  <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Investment_Knowledge               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Problem_Solving                    <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Korean_language                    <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `Bash\\Linux Scripting`            <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Knowledge_in                       <chr> NA, "Not_define", "Elasticsearch...
$ Experience                         <chr> "4+", "2-3", "1-2", "2+", "Not_d...
$ City                               <chr> NA, "Colombo", "Colombo", "Colom...
$ Location                           <chr> "NY", "SL", "SL", "SL", "SL", "M...
$ Educational_qualifications         <chr> NA, "Degree in Engineering / IT ...
$ Salary                             <chr> NA, "Not_define", "Not_define", ...
$ Team_Handling                      <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Debtor_reconcilation               <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Payroll_management                 <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Bayesian                           <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Optimization                       <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `Bahasa Malaysia`                  <dbl> NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ `English proficiency`              <chr> NA, "Not_define", "Not_define", ...
$ URL                                <chr> NA, "https://www.google.com/sear...
$ Search_Term                        <chr> NA, "Data Analysis Jobs in Sri L...
$ X109                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X110                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X111                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X112                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X113                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X114                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X115                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X116                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X117                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X118                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X119                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X120                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X121                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X122                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X123                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X124                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X125                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X126                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X127                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X128                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X129                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X130                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X131                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X132                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X133                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X134                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X135                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X136                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X137                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X138                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X139                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X140                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X141                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X142                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X143                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X144                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X145                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X146                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X147                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X148                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X149                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X150                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X151                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...
$ X152                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, ...

Data wrangling workflow

Column

Data wrangling process step by step

Column

###Untidy columns

  1. ID

  2. Consultant - To extract complete cases

  3. Software columns -R, Python, SAS etc.

  4. Job_title

  5. Experience

  6. Location

  7. Educational_qualifications

Column {data-width=350}

ID

Consultant

###Softwares

###Job_title

###Experience

###Location

###Educational_qualifications

Tidy Version

Column

Tidy version of the dataset

    ID Consultant DateRetrieved DatePublished
1    1   Thiyanga    05/08/2020          <NA>
320  2     Jayani    07/08/2020    31/07/2020
321  3     Jayani    07/08/2020      06/08/20
322  4     Jayani    07/08/2020    24/07/2020
95   5     Jayani    07/08/2020    24/07/2020
244  6     Jayani    07/08/2020    13/08/2020
                                      Job_title
1                                          <NA>
320                       Junior Data Scientist
321          Engineer, Analytics & Data Science
322 CI-Statistical Analyst/Business Analyst-CMB
95                          DA-Data Analyst-CMB
244                              Data Scientist
                                               Company R SAS SPSS Python MAtlab
1                                                 <NA> 1   1    0      1      1
320                                  Dialog Axiata PLC 1   0    0      1      0
321                 London Stock Exchange Group plc3.1 0   0    0      1      0
322                               E.D. Bullard Company 1   1    1      0      0
95                                E.D. Bullard Company 0   1    1      0      0
244 Emirates Center for Strategic Studies and Research 1   0    0      1      0
    Scala C. MS_Word Ms_Excel OLE_DB Ms_Access Ms_PowerPoint Spreadsheets
1       0  0       0        0      0         0             0            0
320     0  0       0        0      0         0             0            0
321     0  0       0        0      0         0             0            0
322     0  0       0        0      0         0             0            0
95      0  0       1        1      0         1             1            0
244     0  0       0        0      0         0             0            0
    Data_visualization Presentation_Skills Communication BigData Data_warehouse
1                    0                   0             0       0              0
320                  1                   0             0       1              1
321                  1                   0             0       1              0
322                  0                   0             0       0              0
95                   0                   0             0       0              0
244                  0                   0             0       0              0
    cloud_storage Google_Cloud AWS Machine_Learning Deep_Learning
1               0            0   0                0             0
320             1            0   0                1             0
321             0            0   0                1             0
322             0            0   0                0             0
95              0            0   0                0             0
244             0            0   0                1             0
    Computer_vision Java C.. C Linux_Unix SQL NoSQL RDBMS Oracle MySQL PHP
1                 0    0   0 0          0   1     0     0      0     0   0
320               0    0   0 0          0   0     0     0      0     0   0
321               0    0   0 0          0   0     0     0      0     0   0
322               0    0   0 0          0   1     0     0      0     0   0
95                0    0   0 0          0   1     0     0      1     1   0
244               0    1   1 1          0   1     0     0      0     0   0
    Flash_Actionscript SPL web_design_and_development_tools Wordpress AI
1                    0   0                                0         0  0
320                  0   0                                0         0  0
321                  0   0                                0         0  0
322                  0   0                                0         0  0
95                   0   0                                0         0  0
244                  0   0                                0         0  0
    Natural_Language_Processing.NLP. Microsoft_Power_BI Google_Analytics
1                                  0                  0                0
320                                0                  0                0
321                                1                  1                0
322                                0                  0                0
95                                 0                  0                0
244                                0                  0                0
    graphics_and_design_skills Data_marketing SEO Content_Management Tableau D3
1                            0              0   0                  0       0  0
320                          0              0   0                  0       0  0
321                          0              0   0                  0       0  0
322                          0              0   0                  0       0  0
95                           0              0   0                  0       0  0
244                          0              0   0                  0       0  0
    Alteryx KNIME Spotfire Spark S3 Redshift DigitalOcean Javascript Kafka
1         0     0        0     0  0        0            0          0     0
320       0     0        0     0  0        0            0          0     0
321       0     0        0     0  0        0            0          0     0
322       0     0        0     0  0        0            0          0     0
95        0     0        0     0  0        0            0          0     0
244       0     0        0     1  1        1            1          1     0
    Storm Bash Hadoop Data_Pipelines MPP_Platforms Qlik Pig Hive Tensorflow
1       0    0      0              0             0    0   0    0          0
320     0    0      0              0             0    0   0    0          0
321     0    0      0              0             0    0   0    0          0
322     0    0      0              0             0    0   0    0          0
95      0    0      0              0             0    0   0    0          0
244     0    0      0              0             0    0   0    0          0
    Map_Reduce Impala Solr Teradata MongoDB Elasticsearch YOLO agile_execution
1            0      0    0        0       0             0    0               0
320          0      0    0        0       0             0    0               0
321          0      0    0        0       0             0    0               0
322          0      0    0        0       0             0    0               0
95           0      0    0        0       0             0    0               0
244          0      0    0        0       0             0    0               0
    Data_management pyspark Data_mining Data_science Web_Analytic_tools IOT
1                 0       0           0            0                  0   0
320               0       0           0            0                  0   0
321               0       0           0            0                  0   0
322               0       0           0            0                  0   0
95                0       0           0            0                  0   0
244               0       0           1            0                  0   0
    Numerical_Analysis Economic Finance_Knowledge Investment_Knowledge
1                    0        0                 0                    0
320                  0        0                 0                    0
321                  0        0                 0                    0
322                  0        0                 0                    0
95                   0        0                 0                    0
244                  0        0                 0                    0
    Problem_Solving Korean_language Bash_Linux_Scripting
1                 0               0                    0
320               0               0                    0
321               0               0                    0
322               0               0                    0
95                0               0                    0
244               0               0                    0
                       Knowledge_in Experience         City Location
1                              <NA>         4+         <NA>       NY
320                            <NA>        2-3      Colombo       LK
321 Elasticsearch, Logstash, Kibana        1-2      Colombo       LK
322                            <NA>         2+      Colombo       LK
95                             <NA>       <NA>      Colombo       LK
244                            <NA>        5-7 Kuala Lumpur Malaysia
                                                                                              Educational_qualifications
1                                                                                                                   <NA>
320 Degree in Engineering / IT or specialized in Computer Science / Statistics from a recognized university or institute
321                                                               Degree in Statistics / Mathematics / Computer Science.
322                                                       Undergraduate degree in statistics, mathematics or engineering
95   Bachelor's in Information Management, Information Technology, Computing, Mathematics, Statistics, or related fields
244                    Master<U+0092>s or PHD in Statistics, Mathematics, Computer Science or another quantitative field
    Salary Team_Handling Debtor_reconcilation Payroll_management Bayesian
1     <NA>             0                    0                  0        0
320   <NA>             0                    0                  0        0
321   <NA>             0                    0                  0        0
322   <NA>             0                    0                  0        0
95    <NA>             0                    0                  0        0
244   <NA>             0                    0                  0        0
    Optimization Bahasa_Malaysia English_proficiency
1              0               0                <NA>
320            0               0                <NA>
321            0               0                <NA>
322            0               0                <NA>
95             0               0                <NA>
244            0               0                <NA>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              URL
1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <NA>
320 https://www.google.com/search?sxsrf=ALeKk00MUun1FouYtWJYm7L0o3wlM5pWbA:1596811359019&source=hp&ei=XmgtX9XyO-G_8QOttrSQAg&q=latest+jobs+for+data+scientist&oq=Latest+Jobs+for+data+scie&gs_lcp=CgZwc3ktYWIQAxgAMggIIRAWEB0QHjIICCEQFhAdEB4yCAghEBYQHRAeMggIIRAWEB0QHjIICCEQFhAdEB4yCAghEBYQHRAeMggIIRAWEB0QHjIICCEQFhAdEB4yCAghEBYQHRAeMggIIRAWEB0QHjoHCCMQ6gIQJzoECCMQJzoICAAQkQIQiwM6CAgAELEDEIMBOggILhCxAxCDAToFCAAQsQM6DggAELEDEIMBEJECEIsDOgsIABCxAxCDARCLAzoHCAAQAxCLAzoICC4QsQMQiwM6CAgAELEDEIsDOgUIABCLAzoCCAA6BggAEBYQHjoFCCEQoAFQ4RhY87gBYJ_IAWgCcAB4AIABwgOIAfQwkgEKMC4xNC43LjQuM5gBAKABAaoBB2d3cy13aXqwAQq4AQI&sclient=psy-ab&ibp=htl;jobs&sa=X&ved=2ahUKEwi7iIn-qYnrAhXS7XMBHR2PCx8Qp4wCMAB6BAgLEAE#fpstate=tldetail&htivrt=jobs&htiq=latest+jobs+for+data+scientist&htidocid=G184piKqa2o_fj-gAAAAAA%3D%3D&sxsrf=ALeKk00mvUvmmBGPtIAJqR8AKbUqgn_goA:1596811391427
321                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          https://www.glassdoor.com/Job/sri-lanka-statistics-jobs-SRCH_IL.0,9_IN45_KO10,20.htm
322                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          https://www.glassdoor.com/Job/sri-lanka-statistics-jobs-SRCH_IL.0,9_IN45_KO10,20.htm
95                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           https://www.glassdoor.com/Job/sri-lanka-statistics-jobs-SRCH_IL.0,9_IN45_KO10,20.htm
244                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=false&clickSource=searchBtn&typedKeyword=&locT=N&locId=170&jobType=&context=Jobs&sc.keyword=statistics&dropdown=0
                        Search_Term                                  Job_Titles
1                              <NA>                                        <NA>
320 Data Analysis Jobs in Sri Lanka                       junior data scientist
321 Data Analysis Jobs in Sri Lanka          engineer  analytics & data science
322 Data Analysis Jobs in Sri Lanka ci statistical analyst business analyst cmb
95  Data Analysis Jobs in Sri Lanka                         da data analyst cmb
244 Statistics top jobs in Malaysia                              data scientist
    Job_Category Minimum_Years_of_experience Experience_Category   Job_Country
1    Unimportant                          NA   Two or less years United States
320 Data Science                           2   Two or less years     Sri Lanka
321 Data Science                           1   Two or less years     Sri Lanka
322 Data Analyst                           2   Two or less years     Sri Lanka
95  Data Analyst                          NA   Two or less years     Sri Lanka
244 Data Science                          NA   Two or less years      Malaysia
    Edu_Category
1           <NA>
320  Some Degree
321  Some Degree
322  Some Degree
95       Min_Bsc
244   Min_Master

Column

Glimpse of Tidy data

Observations: 435
Variables: 114
$ ID                               <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,...
$ Consultant                       <chr> "Thiyanga", "Jayani", "Jayani", "J...
$ DateRetrieved                    <chr> "05/08/2020", "07/08/2020", "07/08...
$ DatePublished                    <chr> NA, "31/07/2020", "06/08/20", "24/...
$ Job_title                        <chr> NA, "Junior Data Scientist", "Engi...
$ Company                          <chr> NA, "Dialog Axiata PLC", "London S...
$ R                                <dbl> 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0...
$ SAS                              <dbl> 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0...
$ SPSS                             <dbl> 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0...
$ Python                           <dbl> 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0...
$ MAtlab                           <dbl> 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Scala                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ C.                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ MS_Word                          <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1...
$ Ms_Excel                         <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1...
$ OLE_DB                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Ms_Access                        <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1...
$ Ms_PowerPoint                    <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0...
$ Spreadsheets                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Data_visualization               <dbl> 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0...
$ Presentation_Skills              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Communication                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ BigData                          <dbl> 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0...
$ Data_warehouse                   <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ cloud_storage                    <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Google_Cloud                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ AWS                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Machine_Learning                 <dbl> 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0...
$ Deep_Learning                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Computer_vision                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Java                             <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0...
$ C..                              <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ C                                <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ Linux_Unix                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ SQL                              <dbl> 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 0...
$ NoSQL                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ RDBMS                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Oracle                           <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0...
$ MySQL                            <dbl> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0...
$ PHP                              <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
$ Flash_Actionscript               <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
$ SPL                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ web_design_and_development_tools <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
$ Wordpress                        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ AI                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Natural_Language_Processing.NLP. <dbl> 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0...
$ Microsoft_Power_BI               <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Google_Analytics                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ graphics_and_design_skills       <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0...
$ Data_marketing                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ SEO                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Content_Management               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Tableau                          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
$ D3                               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0...
$ Alteryx                          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ KNIME                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Spotfire                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Spark                            <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0...
$ S3                               <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ Redshift                         <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ DigitalOcean                     <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ Javascript                       <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ Kafka                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Storm                            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Bash                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Hadoop                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
$ Data_Pipelines                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ MPP_Platforms                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Qlik                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Pig                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Hive                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
$ Tensorflow                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Map_Reduce                       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Impala                           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Solr                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Teradata                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ MongoDB                          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Elasticsearch                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ YOLO                             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ agile_execution                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0...
$ Data_management                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ pyspark                          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Data_mining                      <dbl> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0...
$ Data_science                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0...
$ Web_Analytic_tools               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ IOT                              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Numerical_Analysis               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Economic                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Finance_Knowledge                <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Investment_Knowledge             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Problem_Solving                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Korean_language                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Bash_Linux_Scripting             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Knowledge_in                     <chr> NA, NA, "Elasticsearch, Logstash, ...
$ Experience                       <chr> "4+", "2-3", "1-2", "2+", NA, "5-7...
$ City                             <chr> NA, "Colombo", "Colombo", "Colombo...
$ Location                         <chr> "NY", "LK", "LK", "LK", "LK", "Mal...
$ Educational_qualifications       <chr> NA, "Degree in Engineering / IT or...
$ Salary                           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA...
$ Team_Handling                    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Debtor_reconcilation             <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Payroll_management               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Bayesian                         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Optimization                     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ Bahasa_Malaysia                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ English_proficiency              <chr> NA, NA, NA, NA, NA, NA, "1", NA, N...
$ URL                              <chr> NA, "https://www.google.com/search...
$ Search_Term                      <chr> NA, "Data Analysis Jobs in Sri Lan...
$ Job_Titles                       <chr> NA, "junior data scientist", "engi...
$ Job_Category                     <chr> "Unimportant", "Data Science", "Da...
$ Minimum_Years_of_experience      <dbl> NA, 2, 1, 2, NA, NA, NA, NA, 1, NA...
$ Experience_Category              <chr> "Two or less years", "Two or less ...
$ Job_Country                      <chr> "United States", "Sri Lanka", "Sri...
$ Edu_Category                     <chr> NA, "Some Degree", "Some Degree", ...

About

Author:

Jayani Lakshika Piyadi Gamage

Link to the Git-repository:

https://github.com/thiyangt/DSjobtrackerDataWrangling

Data

#install.packages("devtools")
devtools::install_github("thiyangt/DSjobtracker")
library(DSjobtracker)