5 Data Manipulation - Real Data Sets
5.1 Data Sci Real Data
read
Print all of the data from the large-cities data set.
code
#lang data-sci (define data (data-set large-cities)) data
5.2 Data Sci Real Data
read
Print the first five rows of the large-cities data set.
code
#lang data-sci (define data (data-set large-cities)) (take data 5)
5.3 Data Sci Real Data
read
Print the all but the first five rows of the large-cities data set.
code
#lang data-sci (define data (data-set large-cities)) (drop data 5)
5.4 Data Sci Real Data
read
Print the second 5 rows of the large-cities data set.
code
#lang data-sci (define data (data-set large-cities)) (define all-but-first-5 (drop data 5)) (take all-but-first-5 5)
5.5 Data Sci Real Data
read
Print only the city names in the large-cities data set.
code
#lang data-sci (define data (data-set large-cities)) (define city-names (map first data)) city-names
5.6 Data Sci Real Data
read
Print only the first five city names in the large-cities data set.
code
#lang data-sci (define data (data-set large-cities)) (define city-names (map first data)) (take city-names 5)
5.7 Data Sci Real Data
read
Print the first five city names in the large-cities data set in reverse order.
code
#lang data-sci (define data (data-set large-cities)) (define city-names (map first data)) (define first-5-city-names (take city-names 5)) (reverse first-5-city-names)
5.8 Data Sci Real Data
read
Sort the data alphabetically by city name and print it.
code
#lang data-sci (define data (data-set large-cities)) (define sorted-data (sort data string<? #:key first)) sorted-data
5.9 Data Sci Real Data
read
Print only the populations from the large-cities data set.
code
#lang data-sci (define data (data-set large-cities)) (define populations (map third data)) populations
5.10 Data Sci Real Data
read
Display a bar chart that compares the population of all cities in the large-cities data set.
code
#lang data-sci (define the-data (data-set large-cities)) (define cities-only (map first the-data)) (define population-only (map third the-data)) (define cities-with-population (zip cities-only population-only)) (plot-pict (discrete-histogram cities-with-population #:invert? #t) #:height 700 #:width 700)
5.11 Data Sci Real Data
read
Display a bar chart that shows the count of largest cities per country in the large-cities data set.
code
#lang data-sci (define the-data (data-set large-cities)) (define country-names (sort (map second the-data) string<?)) (define unique-countries (remove-duplicates country-names)) (define (count-times s) (count (lambda (x) (equal? s x)) country-names)) (define counted (map count-times unique-countries)) (define country-times (zip unique-countries counted)) (plot-pict (discrete-histogram country-times #:invert? #t) #:height 700 #:width 700)
5.12 Data Sci Real Data
read
Print the first ten rows of the titanic data set.
code
#lang data-sci (define the-data (data-set titanic)) (define first-10-rows (take the-data 10)) first-10-rows
5.13 Data Sci Real Data
read
Print only the names of everyone in the titanic data set.
code
#lang data-sci (define the-data (data-set titanic)) (define names (map titanic-row-name the-data)) names
5.14 Data Sci Real Data
read
Print the number of the people under the age of 10 in the titanic data set.
code
#lang data-sci (define the-data (data-set titanic)) (define (is-under-10? row) (> 10 (titanic-row-age row))) (define under-10 (filter is-under-10? the-data)) (length under-10)
5.15 Data Sci Real Data
read
Print the names of the people under the age of 30 in the titanic data set.
code
#lang data-sci (define the-data (data-set titanic)) (define (is-under-30? row) (> 30 (titanic-row-age row))) (define under-30 (filter is-under-30? the-data)) (define names (map titanic-row-name under-30)) names
5.16 Data Sci Real Data
read
Print the average age of everyone over thirty in the titanic data set.
code
#lang data-sci (define the-data (data-set titanic)) (define (is-over-30? row) (< 30 (titanic-row-age row))) (define over-30 (filter is-over-30? the-data)) (define ages (map titanic-row-age over-30)) (define average (mean ages)) average
5.17 Data Sci Real Data
read
Make a graph with the average ages of the people who lived and who died.
code
#lang data-sci (define the-data (data-set titanic)) (define (filter-ages-by n) (map titanic-row-age (filter (lambda (x) (= n (first x))) the-data))) (define lived-average (mean (filter-ages-by 1))) (define died-average (mean (filter-ages-by 0))) (plot-pict #:title "Average Age by Survival" (discrete-histogram (list (vector 'alive-people lived-average) (vector 'dead-people died-average))))
5.18 Data Sci Real Data
read
Make a histogram of the ages and names of everyone under 10 in the titanic data set. Sort your data so the histogram looks pretty.
code
#lang data-sci (define the-data (sort (data-set titanic) < #:key titanic-row-age)) (define (is-under-10? row) (> 10 (titanic-row-age row))) (define under-10 (filter is-under-10? the-data)) (define names (map titanic-row-name under-10)) (define ages (map titanic-row-age under-10)) (plot (discrete-histogram #:invert? #t (zip names ages)) #:height 1000 #:width 700)
5.19 Data Sci Real Data
read
Make a histogram of the surviving rate by age.
code
#lang data-sci (define the-data (sort (data-set titanic) < #:key titanic-row-age)) (define ages (map titanic-row-age the-data)) (define unique-ages (remove-duplicates ages)) (define status (map titanic-row-survived? the-data)) (define (counted-ages a) (count (lambda (x) (equal? a x)) ages)) (define age-frequency (map counted-ages unique-ages)) (define position 0) (define (count-alive to) (define sub-list (take (drop status position) to)) (set! position (+ position to)) (count (lambda (x) (equal? #t x)) sub-list)) (define status-per-age (map count-alive age-frequency)) (define (get-rate s n) (truncate (* 100 (exact->inexact (/ s n))))) (define rate-by-age (map get-rate status-per-age age-frequency)) (define surviving-rate-by-age (zip unique-ages rate-by-age)) (plot-pict (discrete-histogram surviving-rate-by-age #:invert? #t) #:height 1200 #:width 700 #:y-label "Age" #:x-label "Surviving Rate")