\\
**[[cluster:0|Back]]**
==== SQL on GPU ====
MapD built the first ever open source SQL engine to harness GPU computing for analytics. Designed for maximum performance, the MapD SQL engine dynamically compiles SQL to run across multiple GPUs and CPUs. Massively parallel database servers.
* http://www.mapd.com
** Browser **
Log into one of the tails like petaltail, swallowtail or cottontail2 preferably, and with x11-forwarding enabled go to port 9092 of node ''n37'' and create a dashboard. Use the "data" subnet (10.10) eth1.
* http://n37-eth1:9092
- Click New Dashboard.
- Click Add Chart.
- Click SCATTER.
- Click Add Data Source.
- Choose the flights_2008_7M table as the data source.
- Click X Axis +Add Measure.
- Choose depdelay. (departure delay)
- Click Y Axis +Add Measure.
- Choose arrdelay. (arrival delay)
7 million records, amazingly fast. I mean like < 50 ms (milliseconds). There are pie, bar, stacked bar, bubble etc charts and point maps or chloropleth maps. A SQL editor. And I do not even observe a blip on gpu utilization. Each of these K20 nodes still has 7 drive bays available which would allow for a large database storage platform if we had a mapd project.
{{ :cluster:dep-arr-scatterplot.png?nolink&400 |}}
** Interactive **
* https://www.mapd.com/docs/latest/3_mapdql.html
* can be scripted using mapdql
[root@n37 ~]# export MAPD_USER=mapd
[root@n37 ~]# export MAPD_GROUP=mapd
[root@n37 ~]# export MAPD_STORAGE=/var/lib/mapd
[root@n37 ~]# export MAPD_PATH=/opt/mapd
[root@n37 ~]# cd $MAPD_PATH
[root@n37 mapd]# ./bin/mapdql
Password: (HyperInteractive)
User mapd connected to database mapd
mapdql>
mapdql> \t ^flight.*
flights_2008_7M
mapdql> SELECT origin_city AS "Origin", dest_city AS "Destination",
- AVG(airtime) AS "Average Airtime" FROM flights_2008_7M
- WHERE distance < 175 GROUP BY origin_city, dest_city;
...
Portland|North Bend|46.12162162162162
Medford|North Bend|28
Covington|Huntington|24.98076923076923
mapdql>
** Demos **
* https://www.mapd.com/demos/
** Load sample **
/opt/mapd/insert_sample_data
Enter dataset number to download, or 'q' to quit:
# Dataset Rows Table Name File Name
1) Flights (2008) 7M flights_2008_7M flights_2008_7M.tar.gz
2) Flights (2008) 10k flights_2008_10k flights_2008_10k.tar.gz
3) NYC Tree Census (2015) 683k nyc_trees_2015_683k nyc_trees_2015_683k.tar.gz
1
/opt/mapd/sample_datasets /opt/mapd
- downloading and extracting flights_2008_7M.tar.gz
--2018-08-27 10:00:53-- https://data.mapd.com/flights_2008_7M.tar.gz
Resolving data.mapd.com (data.mapd.com)... 72.28.97.165
Connecting to data.mapd.com (data.mapd.com)|72.28.97.165|:443... connected.
flights_2008_7M/
flights_2008_7M/flights_2008_7M.csv
flights_2008_7M/flights_2008_7M.sql
/opt/mapd
- adding schema
User mapd connected to database mapd
User mapd disconnected from database mapd
- inserting file: /opt/mapd/sample_datasets/flights_2008_7M/flights_2008_7M.csv
User mapd connected to database mapd
Result
Loaded: 7009728 recs, Rejected: 0 recs in 24.642000 secs
User mapd disconnected from database mapd
* https://www.mapd.com/docs/latest/4_centos7-yum-gpu-ce-recipe.html
\\
**[[cluster:0|Back]]**