On this page:
5.1 Data Sci Real Data
5.2 Data Sci Real Data
5.3 Data Sci Real Data
5.4 Data Sci Real Data
5.5 Data Sci Real Data
5.6 Data Sci Real Data
5.7 Data Sci Real Data
5.8 Data Sci Real Data
5.9 Data Sci Real Data
5.10 Data Sci Real Data
5.11 Data Sci Real Data
5.12 Data Sci Real Data
5.13 Data Sci Real Data
5.14 Data Sci Real Data
5.15 Data Sci Real Data
5.16 Data Sci Real Data
5.17 Data Sci Real Data
5.18 Data Sci Real Data
5.19 Data Sci Real Data

5 Data Manipulation - Real Data Sets

5.1 Data Sci Real Data

read
Print all of the data from the large-cities data set.
code
#lang data-sci
 
(define data (data-set large-cities))
 
data

5.2 Data Sci Real Data

read
Print the first five rows of the large-cities data set.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(take data 5)

5.3 Data Sci Real Data

read
Print the all but the first five rows of the large-cities data set.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(drop data 5)

5.4 Data Sci Real Data

read
Print the second 5 rows of the large-cities data set.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(define all-but-first-5 (drop data 5))
 
(take all-but-first-5 5)

5.5 Data Sci Real Data

read
Print only the city names in the large-cities data set.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(define city-names (map first data))
 
city-names

5.6 Data Sci Real Data

read
Print only the first five city names in the large-cities data set.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(define city-names (map first data))
 
(take city-names 5)

5.7 Data Sci Real Data

read
Print the first five city names in the large-cities data set in reverse order.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(define city-names (map first data))
 
(define first-5-city-names (take city-names 5))
 
(reverse first-5-city-names)

5.8 Data Sci Real Data

read
Sort the data alphabetically by city name and print it.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(define sorted-data
   (sort data string<? #:key first))
 
sorted-data

5.9 Data Sci Real Data

read
Print only the populations from the large-cities data set.
code
#lang data-sci
 
(define data (data-set large-cities))
 
(define populations (map third data))
 
populations

5.10 Data Sci Real Data

read
Display a bar chart that compares the population of all cities in the large-cities data set.
code
#lang data-sci
 
(define the-data (data-set large-cities))
 
(define cities-only (map first the-data))
 
(define population-only (map third the-data))
 
(define cities-with-population
   (zip cities-only population-only))
 
(plot-pict
  (discrete-histogram
   cities-with-population
   #:invert? #t)
  #:height 700
  #:width 700)

5.11 Data Sci Real Data

read
Display a bar chart that shows the count of largest cities per country in the large-cities data set.
code
#lang data-sci
 
(define the-data (data-set large-cities))
 
(define country-names
   (sort (map second the-data) string<?))
 
(define unique-countries
   (remove-duplicates country-names))
 
(define (count-times s)
   (count
    (lambda (x) (equal? s x))
    country-names))
 
(define counted
   (map count-times unique-countries))
 
(define country-times
   (zip unique-countries counted))
 
(plot-pict
  (discrete-histogram country-times #:invert? #t)
  #:height 700
  #:width 700)

5.12 Data Sci Real Data

read
Print the first ten rows of the titanic data set.
code
#lang data-sci
 
(define the-data (data-set titanic))
 
(define first-10-rows (take the-data 10))
 
first-10-rows

5.13 Data Sci Real Data

read
Print only the names of everyone in the titanic data set.
code
#lang data-sci
 
(define the-data (data-set titanic))
 
(define names (map titanic-row-name the-data))
 
names

5.14 Data Sci Real Data

read
Print the number of the people under the age of 10 in the titanic data set.
code
#lang data-sci
 
(define the-data (data-set titanic))
 
(define (is-under-10? row)
   (> 10 (titanic-row-age row)))
 
(define under-10 (filter is-under-10? the-data))
 
(length under-10)

5.15 Data Sci Real Data

read
Print the names of the people under the age of 30 in the titanic data set.
code
#lang data-sci
 
(define the-data (data-set titanic))
 
(define (is-under-30? row)
   (> 30 (titanic-row-age row)))
 
(define under-30 (filter is-under-30? the-data))
 
(define names (map titanic-row-name under-30))
 
names

5.16 Data Sci Real Data

read
Print the average age of everyone over thirty in the titanic data set.
code
#lang data-sci
 
(define the-data (data-set titanic))
 
(define (is-over-30? row)
   (< 30 (titanic-row-age row)))
 
(define over-30 (filter is-over-30? the-data))
 
(define ages (map titanic-row-age over-30))
 
(define average (mean ages))
 
average

5.17 Data Sci Real Data

read
Make a graph with the average ages of the people who lived and who died.
code
#lang data-sci
 
(define the-data (data-set titanic))
 
(define (filter-ages-by n)
   (map
    titanic-row-age
    (filter
     (lambda (x) (= n (first x)))
     the-data)))
 
(define lived-average (mean (filter-ages-by 1)))
 
(define died-average (mean (filter-ages-by 0)))
 
(plot-pict
  #:title "Average Age by Survival"
  (discrete-histogram
   (list
    (vector 'alive-people lived-average)
    (vector 'dead-people died-average))))

5.18 Data Sci Real Data

read
Make a histogram of the ages and names of everyone under 10 in the titanic data set. Sort your data so the histogram looks pretty.
code
#lang data-sci
 
(define the-data
   (sort
    (data-set titanic)
    <
    #:key titanic-row-age))
 
(define (is-under-10? row)
   (> 10 (titanic-row-age row)))
 
(define under-10 (filter is-under-10? the-data))
 
(define names (map titanic-row-name under-10))
 
(define ages (map titanic-row-age under-10))
 
(plot
  (discrete-histogram
   #:invert? #t
   (zip names ages))
  #:height 1000
  #:width 700)

5.19 Data Sci Real Data

read
Make a histogram of the surviving rate by age.
code
#lang data-sci
 
(define the-data
   (sort
    (data-set titanic)
    <
    #:key titanic-row-age))
 
(define ages (map titanic-row-age the-data))
 
(define unique-ages (remove-duplicates ages))
 
(define status
   (map titanic-row-survived? the-data))
 
(define (counted-ages a)
   (count (lambda (x) (equal? a x)) ages))
 
(define age-frequency
   (map counted-ages unique-ages))
 
(define position 0)
 
(define (count-alive to)
   (define sub-list
     (take (drop status position) to))
   (set! position (+ position to))
   (count (lambda (x) (equal? #t x)) sub-list))
 
(define status-per-age
   (map count-alive age-frequency))
 
(define (get-rate s n)
   (truncate (* 100 (exact->inexact (/ s n)))))
 
(define rate-by-age
   (map get-rate status-per-age age-frequency))
 
(define surviving-rate-by-age
   (zip unique-ages rate-by-age))
 
(plot-pict
  (discrete-histogram
   surviving-rate-by-age
   #:invert? #t)
  #:height 1200
  #:width 700
  #:y-label "Age"
  #:x-label "Surviving Rate")