Getting started with the cansim package

2019-10-15

About

The cansim package provides R bindings to Statistics Canada’s main socioeconomic time series database, previously known as (and frequently referred to in this package, and elsewhere, as) CANSIM. Data can be accessed by table number, vector or both table number and coordinate. The package accepts both old and new (NDM) CANSIM table catalogue numbers.

Installing cansim

The cansim package is available on CRAN and can be installed directly using the default package installation process:

install.packages("cansim")

Alternatively, the latest development version of the package can be downloaded from Github using the devtools or remotes packages.

# install.packages("remotes")
remotes::install_github("mountainmath/cansim")

library(cansim)

Usage

If you know the data table catalogue number you are interested in, use get_cansim to download the entire table.

data <- get_cansim("14-10-0293")
#> Accessing CANSIM NDM product 14-10-0293 from Statistics Canada
#> Parsing data
#> Folding in metadata
head(data)
#> # A tibble: 6 x 21
#>   REF_DATE GEO   DGUID `Labour force c… Statistics UOM   UOM_ID
#>   <chr>    <chr> <chr> <chr>            <chr>      <chr> <chr> 
#> 1 2001-03  Cana… 2016… Population       Estimate   Pers… 249   
#> 2 2001-03  Cana… 2016… Labour force     Estimate   Pers… 249   
#> 3 2001-03  Cana… 2016… Labour force     Standard … Pers… 249   
#> 4 2001-03  Cana… 2016… Labour force     Standard … Pers… 249   
#> 5 2001-03  Cana… 2016… Employment       Estimate   Pers… 249   
#> 6 2001-03  Cana… 2016… Employment       Standard … Pers… 249   
#> # … with 14 more variables: SCALAR_FACTOR <chr>, SCALAR_ID <chr>,
#> #   VECTOR <chr>, COORDINATE <chr>, VALUE <dbl>, STATUS <chr>,
#> #   SYMBOL <chr>, TERMINATED <chr>, DECIMALS <chr>, GeoUID <chr>,
#> #   `Classification Code for Labour force characteristics` <chr>,
#> #   `Hierarchy for Labour force characteristics` <chr>, `Classification
#> #   Code for Statistics` <chr>, `Hierarchy for Statistics` <chr>

By default, the data tables retrieved by the package comes in the original format provided by Statistics Canada, but often it is convenient to cast the data into a cleaner data object and to use the included data to transform values by their appropriate scaling or unit variable. This makes it easier to work on the data directly and minimize unnecessary data manipulation. For example, data may be reported as a value in “millions” but with unitless numbers. A built-in convenience function, normalize_cansim_values, refers to the appropriate scaling unit and transforms the raw values into the appropriate absolute value. If using dplyr, applying normalization is as straightforward as piping data through the normalize_cansim_values function.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data <- get_cansim("14-10-0293") %>% 
  normalize_cansim_values
#> Reading CANSIM NDM product 14-10-0293 from cache.
head(data)
#> # A tibble: 6 x 20
#>   REF_DATE GEO   DGUID `Labour force c… Statistics UOM   UOM_ID VECTOR
#>   <chr>    <chr> <chr> <chr>            <chr>      <chr> <chr>  <chr> 
#> 1 2001-03  Cana… 2016… Population       Estimate   Pers… 249    v9141…
#> 2 2001-03  Cana… 2016… Labour force     Estimate   Pers… 249    v9141…
#> 3 2001-03  Cana… 2016… Labour force     Standard … Pers… 249    v1018…
#> 4 2001-03  Cana… 2016… Labour force     Standard … Pers… 249    v1018…
#> 5 2001-03  Cana… 2016… Employment       Estimate   Pers… 249    v9141…
#> 6 2001-03  Cana… 2016… Employment       Standard … Pers… 249    v1018…
#> # … with 12 more variables: COORDINATE <chr>, VALUE <dbl>, STATUS <chr>,
#> #   SYMBOL <chr>, TERMINATED <chr>, DECIMALS <chr>, GeoUID <chr>,
#> #   `Classification Code for Labour force characteristics` <chr>,
#> #   `Hierarchy for Labour force characteristics` <chr>, `Classification
#> #   Code for Statistics` <chr>, `Hierarchy for Statistics` <chr>,
#> #   Date <date>

Taking a look at an overview of the data within a table is a common first step. This is implemented in the package with the get_cansim_table_overview(table_number) function.

get_cansim_table_overview("14-10-0293")
#> Labour force characteristics by economic region, three-month moving average, unadjusted for seasonality, last 5 months
#> CANSIM Table 14-10-0293
#> Start Reference Period: 2001-03-01, End Reference Period: 2019-09-01, Frequency: Monthly
#> 
#> Column Geography (76)
#> Canada, Newfoundland and Labrador, Avalon Peninsula, Newfoundland and Labrador, South Coast-Burin Peninsula and Notre Dame-Central Bonavista Bay, Newfoundland and Labrador, West Coast-Northern Peninsula-Labrador, Newfoundland and Labrador, Prince Edward Island, Nova Scotia, Cape Breton, Nova Scotia, North Shore, Nova Scotia, Annapolis Valley, Nova Scotia, ...
#> 
#> Column Labour force characteristics (10)
#> Population, Labour force, Employment, Full-time employment, Part-time employment, Unemployment, Not in labour force, Unemployment rate, Participation rate, Employment rate
#> 
#> Column Statistics (3)
#> Estimate, Standard error of estimate, Standard error of year-over-year change

When a table number is unknown, you can browse the available tables or search by survey name, keyword or title.

search_cansim_tables("housing price indexes")
#> Your CANSIM table overview data is 51 days old.
#> Consider setting options(cansim.cache_path="your cache path")
#> in your .Rprofile and refreshing the table via list_cansim_tables(refresh=TRUE).
#> The table won't be able to be refreshed if options(cansim.cache_path="your cache path") is not set.
#> # A tibble: 10 x 21
#>    title title_en title_fr keywords_en keywords keywords_fr notes notes_en
#>    <chr> <chr>    <chr>    <chr>       <chr>    <chr>       <chr> <chr>   
#>  1 Buil… Buildin… Indices… constructi… constru… constructi… <p>B… <p>Buil…
#>  2 Expe… Experim… Indice … constructi… constru… constructi… <p>E… <p>Expe…
#>  3 New … New hou… Indice … constructi… constru… constructi… <p>N… <p>New …
#>  4 Pric… Price i… Indices… constructi… constru… constructi… <p>T… <p>This…
#>  5 New … New hou… Indices… constructi… constru… constructi… This… This ta…
#>  6 Pric… Price i… Indices… constructi… constru… constructi… This… This ta…
#>  7 New … New hou… Indice … constructi… constru… constructi… <p>N… <p>New …
#>  8 Non-… Non-res… Indice … constructi… constru… constructi… <p>A… <p>Arch…
#>  9 New … New hou… Indices… constructi… constru… constructi… This… This ta…
#> 10 New … New hou… Indice … constructi… constru… constructi… Mont… Monthly…
#> # … with 13 more variables: notes_fr <chr>, state <chr>, subject <chr>,
#> #   date_published <chr>, frequency <chr>, revision_id <chr>,
#> #   time_period_coverage_start <chr>, time_period_coverage_end <chr>,
#> #   metadata_created <chr>, metadata_modified <chr>, url_en <chr>,
#> #   url_fr <chr>, cansim_table_number <chr>

Individual series in Statistics Canada data tables can also be accessed by using individual numbered vectors. This is especially useful when building reports using specific indicators. For convenience, the cansim package allows users to specify named vectors, where the label field will be added to the returned data frame containing the specified name for each vector.

get_cansim_vector(c("Metro Van Apartment Construction Price Index"="v44176267",
                    "Metro Van CPI"="v41692930"),
                  start_time = "2015-05-01",
                  end_time="2015-08-01")
#> # A tibble: 5 x 10
#>   DECIMALS VALUE REF_DATE releaseTime SYMBOL frequencyCode SCALAR_ID
#>      <int> <dbl> <chr>    <chr>        <int>         <int>     <int>
#> 1        1  122. 2015-05… 2018-06-12…      0             6         0
#> 2        1  122. 2015-06… 2018-06-12…      0             6         0
#> 3        1  122. 2015-07… 2018-06-12…      0             6         0
#> 4        1  123. 2015-08… 2018-06-12…      0             6         0
#> 5        1  153  2015-07… 2015-11-10…      0             9         0
#> # … with 3 more variables: COORDINATE <chr>, VECTOR <chr>, label <chr>

License

The code in this package is licensed under the MIT license. The bundled table metadata in Sysdata.R, as well as all Statistics Canada data retrieved using this package is made available under the Statistics Canada Open Licence Agreement, a copy of which is included in the R folder. The Statistics Canada Open Licence Agreement requires that:

Subject to this agreement, Statistics Canada grants you a worldwide, royalty-free, non-exclusive licence to:
 
  - use, reproduce, publish, freely distribute, or sell the Information;
  - use, reproduce, publish, freely distribute, or sell Value-added Products; and,
  - sublicence any or all such rights, under terms consistent with this agreement.

In doing any of the above, you shall:
 
  - reproduce the Information accurately;
  - not use the Information in a way that suggests that Statistics Canada endorses you or your use of the Information;
  - not misrepresent the Information or its source;
  - use the Information in a manner that does not breach or infringe any applicable laws;
  - not merge or link the Information with any other databases for the purpose of attempting to identify an individual person, business or organization; and
  - not present the Information in such a manner that gives the appearance that you may have received, or had access to, information held by Statistics Canada about any identifiable individual person, business or organization.

Attribution

Subject to the Statistics Canada Open Licence Agreement, licensed products using Statistics Canada data should employ the following acknowledgement of source:

Acknowledgment of Source

(a) You shall include and maintain the following notice on all licensed rights of the Information:

  - Source: Statistics Canada, name of product, reference date. Reproduced and distributed on an "as is" basis with the permission of Statistics Canada.
 
(b) Where any Information is contained within a Value-added Product, you shall include on such Value-added Product the following notice:

  - Adapted from Statistics Canada, name of product, reference date. This does not constitute an endorsement by Statistics Canada of this product.