User Tools

Site Tools


cluster:174


Back

SQL on GPU

MapD built the first ever open source SQL engine to harness GPU computing for analytics. Designed for maximum performance, the MapD SQL engine dynamically compiles SQL to run across multiple GPUs and CPUs. Massively parallel database servers.

Browser

Log into one of the tails like petaltail, swallowtail or cottontail2 preferably, and with x11-forwarding enabled go to port 9092 of node n37 and create a dashboard. Use the “data” subnet (10.10) eth1.

  1. Click New Dashboard.
  2. Click Add Chart.
  3. Click SCATTER.
  4. Click Add Data Source.
  5. Choose the flights_2008_7M table as the data source.
  6. Click X Axis +Add Measure.
  7. Choose depdelay. (departure delay)
  8. Click Y Axis +Add Measure.
  9. Choose arrdelay. (arrival delay)

7 million records, amazingly fast. I mean like < 50 ms (milliseconds). There are pie, bar, stacked bar, bubble etc charts and point maps or chloropleth maps. A SQL editor. And I do not even observe a blip on gpu utilization. Each of these K20 nodes still has 7 drive bays available which would allow for a large database storage platform if we had a mapd project.

Interactive

[root@n37 ~]# export MAPD_USER=mapd
[root@n37 ~]# export MAPD_GROUP=mapd
[root@n37 ~]# export MAPD_STORAGE=/var/lib/mapd
[root@n37 ~]# export MAPD_PATH=/opt/mapd
[root@n37 ~]# cd $MAPD_PATH
[root@n37 mapd]# ./bin/mapdql
Password: (HyperInteractive)
User mapd connected to database mapd
mapdql>

mapdql> \t ^flight.*
flights_2008_7M

mapdql> SELECT origin_city AS "Origin", dest_city AS "Destination", 
- AVG(airtime) AS "Average Airtime" FROM flights_2008_7M 
- WHERE distance < 175 GROUP BY origin_city, dest_city;
...
Portland|North Bend|46.12162162162162
Medford|North Bend|28
Covington|Huntington|24.98076923076923
mapdql> 

Demos

Load sample

/opt/mapd/insert_sample_data

Enter dataset number to download, or 'q' to quit:
 #     Dataset                   Rows    Table Name             File Name
 1)    Flights (2008)            7M      flights_2008_7M        flights_2008_7M.tar.gz
 2)    Flights (2008)            10k     flights_2008_10k       flights_2008_10k.tar.gz
 3)    NYC Tree Census (2015)    683k    nyc_trees_2015_683k    nyc_trees_2015_683k.tar.gz
1 <enter>
/opt/mapd/sample_datasets /opt/mapd
- downloading and extracting flights_2008_7M.tar.gz
--2018-08-27 10:00:53--  https://data.mapd.com/flights_2008_7M.tar.gz
Resolving data.mapd.com (data.mapd.com)... 72.28.97.165
Connecting to data.mapd.com (data.mapd.com)|72.28.97.165|:443... connected.


flights_2008_7M/
flights_2008_7M/flights_2008_7M.csv
flights_2008_7M/flights_2008_7M.sql
/opt/mapd
- adding schema
User mapd connected to database mapd
User mapd disconnected from database mapd
- inserting file: /opt/mapd/sample_datasets/flights_2008_7M/flights_2008_7M.csv
User mapd connected to database mapd
Result
Loaded: 7009728 recs, Rejected: 0 recs in 24.642000 secs
User mapd disconnected from database mapd


Back

cluster/174.txt · Last modified: 2018/08/28 08:38 by hmeij07