สร้างกราฟแบบโปร อย่างง่าย! Python Data Visualization

Ultimate Python
May 26, 2021
4 min read

Updated: May 30, 2021

เพราะไม่ใช่แค่สวย แต่ทำให้เข้าใจข้อมูลได้มากยิ่งขึ้นกับ 6 กราฟ พร้อมเทคนิคจัดเต็ม เรียนกับ Ultimate Python

https://www.youtube.com/watch?v=c3Lh3ZnZMBw

การสร้างกราฟ

ไม่ใช่แค่ช่วยแสดงผลข้อมูล แต่ยังทำให้สามารถทำความเข้าใจข้อมูลได้ดียิ่งขึ้น การทำ Data Visualization จึงเป็นอีกทักษะที่สำคัญสำหรับการทำความเข้าใจข้อมูลผ

เรียนรู้เพิ่มเติมเกี่ยวกับ Matplotlib

ติดตั้งเครื่องมือ

เราจะใช้ library matplotlib เพื่อสร้างกราฟเป็นหลัก โดยมีการใช้งาน numpy เพื่อใช้สร้างข้อมูลตัวอย่าง และ seaborn ที่จะใช้ปรับ style ของกราฟเพื่อความสวยงาม

In [1]:
!pip install matplotlib

Collecting matplotlib   Downloading matplotlib-3.4.2-cp37-cp37m-manylinux1_x86_64.whl (10.3 MB)      |████████████████████████████████| 10.3 MB 2.1 MB/s eta 0:00:01 Collecting numpy>=1.16   Downloading numpy-1.20.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.3 MB)      |████████████████████████████████| 15.3 MB 83.8 MB/s eta 0:00:01 Collecting pillow>=6.2.0   Downloading Pillow-8.2.0-cp37-cp37m-manylinux1_x86_64.whl (3.0 MB)      |████████████████████████████████| 3.0 MB 45.8 MB/s eta 0:00:01 Requirement already satisfied: python-dateutil>=2.7 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.8.1) Collecting cycler>=0.10   Downloading cycler-0.10.0-py2.py3-none-any.whl (6.5 kB) Collecting kiwisolver>=1.0.1   Downloading kiwisolver-1.3.1-cp37-cp37m-manylinux1_x86_64.whl (1.1 MB)      |████████████████████████████████| 1.1 MB 43.9 MB/s eta 0:00:01 Requirement already satisfied: pyparsing>=2.2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib) (2.4.7) Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.7/site-packages (from cycler>=0.10->matplotlib) (1.15.0) Installing collected packages: pillow, numpy, kiwisolver, cycler, matplotlib Successfully installed cycler-0.10.0 kiwisolver-1.3.1 matplotlib-3.4.2 numpy-1.20.3 pillow-8.2.0

In [2]:
!pip install numpy

Requirement already satisfied: numpy in /srv/conda/envs/notebook/lib/python3.7/site-packages (1.20.3)

In [3]:
!pip install seaborn

Collecting seaborn   Downloading seaborn-0.11.1-py3-none-any.whl (285 kB)      |████████████████████████████████| 285 kB 2.1 MB/s eta 0:00:01 Requirement already satisfied: matplotlib>=2.2 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (3.4.2) Requirement already satisfied: numpy>=1.15 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from seaborn) (1.20.3) Collecting pandas>=0.23   Downloading pandas-1.2.4-cp37-cp37m-manylinux1_x86_64.whl (9.9 MB)      |████████████████████████████████| 9.9 MB 10.6 MB/s eta 0:00:01 Collecting scipy>=1.0   Downloading scipy-1.6.3-cp37-cp37m-manylinux1_x86_64.whl (27.4 MB)      |████████████████████████████████| 27.4 MB 38.6 MB/s eta 0:00:01 MB 38.6 MB/s eta 0:00:01 Requirement already satisfied: pillow>=6.2.0 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (8.2.0) Requirement already satisfied: pyparsing>=2.2.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.4.7) Requirement already satisfied: python-dateutil>=2.7 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.8.1) Requirement already satisfied: kiwisolver>=1.0.1 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.3.1) Requirement already satisfied: cycler>=0.10 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.10.0) Requirement already satisfied: six in /srv/conda/envs/notebook/lib/python3.7/site-packages (from cycler>=0.10->matplotlib>=2.2->seaborn) (1.15.0) Requirement already satisfied: pytz>=2017.3 in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2021.1) Installing collected packages: scipy, pandas, seaborn Successfully installed pandas-1.2.4 scipy-1.6.3 seaborn-0.11.1

นำเข้าเครื่องมือ

เมื่อติดตั้งเรียบร้อย เราจะใช้คำสั่ง import นำเครื่องมือมาใช้งาน โดยเราจะใช้เครื่องมือ matplotlib.pyplot, numpy, และ seaborn

In [4]:
import matplotlib.pyplot as plt

In [5]:
import numpy as np

In [6]:
import seaborn as sns

Matplotlib

เป็น library ที่มีคำสั่งใช้สร้างกราฟต่างๆ หลายรูปแบบที่สามารถปรับเปลี่ยนการแสดงผล สี เนื้อหา รายละเอียดต่างๆ ได้ครบถ้วน และเป็นเครื่องมือพื้นฐานที่ใช้กันในการแสดงผลข้อมูล

เรียนรู้เพิ่มเติมเกี่ยวกับ Matplotlib

การสร้างกราฟเส้น

ใช้คำสั่ง plt.plot() ที่รับค่า x และค่า y โดย ในตัวอย่างใส่เป็น array matplotlib จะนำข้อมูลที่ตำแหน่งตรงกันมาประกอบเป็นจุดพิกัดและนำไป plot เช่นจุดแรกจะเกิดจากข้อมูลตัวแรกจาก x และข้อมูลตัวแรกจาก y สร้างเป็นพิกัด x,y

ข้อมูลที่ใช้

กราฟเส้นเหมาะกับการใช้แสดงข้อมูลที่มีความต่อเนื่อง เช่น ความสัมพันธ์ของ x และ y = x**2

In [7]:
x = np.linspace(1,10,10)
y = x**2

การใช้งาน

เพื่อสร้างกราฟจะใช้การสั่งคำสั่งเพื่อทำสิ่งต่างๆ และตามด้วยคำสั่ง plt.show() เพื่อประมวลผลคำสั่งต่างๆ ที่เกี่ยวข้องกับการแสดงผลกราฟ และแสดงออกมาเป็นกราฟตามคำสั่ง

การ plot กราฟ y = x

In [8]:
plt.plot(x,x)
plt.show()

การ plot หลายข้อมูล

ใช้การสั่งคำสั่ง plt.plot() แทนการสร้างกราฟสำหรับ 1 ข้อมูล สามารถสร้างกราฟของกี่ข้อมูลก็ได้ ให้สั่งคำสั่งตามแต่ละลำดับไป เช่น การสร้างกราฟจากข้อมูล y = x และข้อมูล y = x**2 บนกราฟเดียวกันสามารถเขียนได้ลักษณะดังนี้

In [9]:
plt.plot(x,x)
plt.plot(x,x**2)
plt.show()

สร้างความแตกต่างแต่ละข้อมูล

ในการอ่านทำความเข้าใจกราฟที่ประกอบไปด้วยหลายข้อมูลอาจทำให้เกิดความสับสนได้ เราสามารถสร้างความแตกต่างของการแสดงผลแต่ละข้อมูลได้ผ่านการเปลี่ยน สี ลักษณะเส้นกราฟ และสัญลักษณ์ ของแต่ละข้อมูล

ผ่านการระบุ string ที่ใส่สัญลักษณ์ของแต่ละค่าเอาไว้ เช่น กราฟเส้นสีดำใช้ 'k' กราฟเส้นทึบใช้ '-' กราฟที่มีสัญลักษณ์ข้าวหลามตัดใช้ 'D' เขียนรวมได้เป็น 'k-D' และระบุเป็น parameter เอาไว้ใน plt.plot() ใช้งานในลักษณะดังนี้

เรียนรู้เพิ่มเติมเกี่ยวกับการสร้างความแตกต่าง

In [10]:
plt.plot(x,x,'k-D')
plt.plot(x,x**2)
plt.show()

การใส่แกน x,y

เพื่อเพิ่มความเข้าใจค่าที่ระบุในแกน เราสามารถแปะ label สำหรับแกน x,y ได้ด้วยคำสั่ง .xlabel() .ylabel() ที่รับค่า string ที่เก็บชื่อที่ต้องการเอาไว้

In [11]:
plt.plot(x,x,'k-D')
plt.plot(x,x**2)
plt.xlabel('x')
plt.ylabel('y')
plt.show()

ชื่อแต่ละข้อมูล

เราสามารถแสดงผลหลายข้อมูลบนกราฟได้พร้อมกัน ซึ่งอาจทำให้เกิดความสับสนเกี่ยวกับข้อมูลที่แสดงอยู่ได้ ซึ่งการระบุชื่อของตัวข้อมูลหรือ label เพิ่มเติมจากสี ลักษณะเส้นกราฟ และสัญลักษณ์ พิเศษจะช่วยให้กราฟเข้าใจได้ง่ายขึ้น

label

ในการระบุชื่อข้อมูลที่ต้องการ เราจะใช้การระบุ label เป็น parameter สำหรับ plt.plot() เพื่อตั้งชื่อให้กับข้อมูล ซึ่ง label จะแสดงผลผ่าน legend ซึ่งต้องเขียนคำสั่งเพื่อแสดงผล legend

legend

คือ กล่องข้อความที่แสดงชื่อของแต่ละข้อมูล โดยการระบุผ่านสี ลักษณะเส้นกราฟ และสัญลักษณ์ ซึ่งจะแสดงต่อเมื่อใช้คำสั่ง plt.legend()

• ตำแหน่งของ legend

สามารถปรับแต่งได้โดยใช้ parameter loc= ตามด้วย string ที่ระบุค่าการกำหนดตำแหน่งที่มีให้เลือก ดังนี้ best, upper right, upper left, lower left, lower right, right, center left, center right, lower center, upper center, center

ซึ่งหากไม่ระบุตำแหน่งจะใช้ default เป็น best คือให้ matplotlib เลือกให้ ว่าควรอยู่ที่ใด

In [12]:
plt.plot(x,x,'k-D',label='y = x')
plt.plot(x,x**2,label='y = x**2')
plt.legend(loc='best')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

การใส่ชื่อกราฟ

ใช้คำสั่ง plt.title() ที่รับค่า string ที่เก็บข้อมูลชื่อที่ต้องการตั้ง และสามารถใส่ parameter fontsize= เพื่อกำหนดขนาดชื่อกราฟ

In [13]:
plt.title("Graphs showing y = x and y = x**2",fontsize=15)
plt.plot(x,x,'k-D',label='y = x')
plt.plot(x,x**2,label='y = x**2')
plt.legend(loc='best')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

กราฟแท่ง

ใช้คำสั่ง .bar() สำหรับกราฟแท่งแนวตั้ง และ .barh() สำหรับกราฟแท่งแนวนอน ในการสร้างโดยรับค่า x, y เป็น parameter หลัก และสามารถใช้ร่วมกับการตั้งชื่อกราฟ สร้างความแตกต่างแต่ละข้อมูล ตั้งชื่อข้อมูล และอื่นๆ ได้

ข้อมูลที่ใช้

กราฟแท่งเหมาะกับการใช้เปรียบเทียบข้อมูลที่ไม่ต่อเนื่องกัน เช่น ยอดขายของสินค้าแต่ละประเภท

In [14]:
x = ['shirts','pants','shorts','shoes']
y = [1000,1200,800,1800]

กราฟแท่งแนวตั้ง

In [15]:
plt.bar(x,y)
plt.title('Sales by category')
plt.ylabel('sales revenue')
plt.xlabel('item type')
plt.show()

กราฟแท่งแนวนอน

In [16]:
plt.barh(x,y)
plt.title('Sales by category')
plt.ylabel('item type')
plt.xlabel('sales revenue')
plt.show()

Histogram

เป็นกราฟที่ใช้เพื่อดูความถี่ของการเกิดขึ้นของข้อมูล จะเหมาะกับข้อมูลที่มีจำนวนมาก ซึ่ง Histogram จะมีการแบ่งข้อมูลออกเป็น bins และนับการเกิดของแต่ละ bins เพื่อนับความถี่ของแต่ละ bins ที่เกิดขึ้นและสร้างเป็นกราฟลักษณะคล้ายกราฟแท่ง

ข้อมูลที่ใช้

เราจะสร้างข้อมูล จำนวน 100 ข้อมูล จากการสุ่มจากคำสั่ง np.random.normal() ที่สุ่มข้อมูลมาจาก normal distribution

In [17]:
x = np.random.normal(size=100)

สร้าง histogram

ใช้คำสั่ง .hist() รับค่า x และจำนวน bins หรือจำนวนแท่งกราฟที่ต้องการ โดย plt.hist จะทำการแบ่งข้อมูลเป็นกลุ่มต่างๆ จำนวนเท่ากับ bins และทำการนับแต่ละ bins

In [18]:
plt.title('Data Distribution')
plt.hist(x,bins=20)
plt.xlabel('y value')
plt.ylabel('frequency')
plt.show()

Scatter Plot

นิยมใช้แสดงผลข้อมูลเพื่อดูความสัมพันธ์ระหว่างแกน x,y หรือ ระหว่างในชุดข้อมูลเอง การสร้างใช้คำสั่ง .scatter() ที่รับค่า x, y

In [19]:
plt.scatter(np.linspace(1,100,100),np.random.normal(size=100))
plt.show()

Box Plot

เป็นกราฟที่ใช้ดูการกระจายตัวของข้อมูล โดยมีพื้นฐานการคำนวนจากการวัด Quartile ซึ่งทำการเรียงข้อมูลทั้งหมดจากน้อยไปมาก และแบ่งข้อมูลออกเป็น 4 ส่วน Q1 Q2 Q3 Q4 โดยมีกล่องตรงกลาง แสดงพื้นที่ Q1 ถึง Q3 มีเส้นสีส้มแสดงค่า median และระความยาวของกล่องตรงกลางเรียกว่าค่า IQR (Interquartile-range, Q3-Q1) และมีเส้นตรงขีดออกไปยังค่า Q1-(1.5 x IQR) ในด้านที่ค่าน้อยกว่า และ Q3+(1.5 x IQR) ในด้านที่ค่ามากกว่า โดยหากข้อมูลใดมีค่ามากกว่า หรือน้อยกว่าจุดสิ้นสุดของเส้นตรงดังกล่าว เราจะเรียกข้อมูลเหล่านั้นว่า Outlier

ข้อมูลที่ใช้

ใน boxplot เหมาะกับข้อมูลจำนวนมาก ที่เราต้องการดูการกระจายตัว ในตัวอย่างจะสร้างเลขจำนวน 100 ตัวจากการสุ่มจาก normal distribution

In [20]:
x = np.random.normal(size=100)

การใช้งาน boxplot

ใช้คำสั่ง .boxplot() โดยรับค่าที่ต้องการดูการกระจายตัวข้อมูล

In [21]:
plt.boxplot(x)
plt.xlabel('100 data points from Normal Distribution')
plt.ylabel('data values')
plt.show()

การสร้างหลายกราฟ

ใช้การสร้างพื้นที่ในการสร้างกราฟที่แบ่งแต่ละกราฟออกเป็นส่วนๆ และทำการใช้คำสั่งสร้างกราฟที่เราเรียนไปต่างๆ สร้างกราฟลงไป

สร้างพื้นที่การทำงาน

ด้วยคำสั่ง plt.subplots() โดยให้กำหนด nrows= กำหนดจำนวนแถว ncols= กำหนดจำนวนกราฟแต่ละแถว และ figsize= กำหนดขนาดของกราฟทั้งหมดเป็น tuple ที่ประกอบไปด้วย (ความกว้าง,ความสูง)

เช่น การสร้างพื้นที่เพื่อสร้าง 6 กราฟ โดยแบ่งเป็น 3 แถว แถวละ 2 กราฟ โดยมีขนาด 8 x 8 โดยเราจะสร้างตัวแปร fig, axes มา unpack ข้อมูลที่ได้จาก คำสั่ง .subplots()

In [22]:
fig, axes = plt.subplots(nrows=3,ncols=2,figsize=(8,8))
plt.tight_layout()

ใส่กราฟลงพื้นที่

โดยใช้ข้อมูลในตัวแปร axes ตามด้วยการระบุตำแหน่งตามหลังด้วยสัญลักษณ์ [แถว,คอลัมน์] เพื่อระบุตำแหน่งของกราฟที่ต้องการใส่ ตำแหน่งแรกเริ่มจาก 0

เช่น การใส่กราฟเส้นลงไปที่ตำแหน่ง [0,0] หรือ ซ้ายบน และตามด้วยคำสั่ง .plot() (เพื่อสร้างกราฟเส้น) และเราจะใช้คำสั่ง plt.tight_layout() เพื่อปรับขนาดให้ตัวอักษรไม่ทับกันโดยอัตโนมัติ

In [23]:
fig, axes = plt.subplots(nrows=3,ncols=2,figsize=(8,8))

axes[0,0].set_title("Graphs showing y = x and y = x**2")
axes[0,0].plot(np.linspace(1,10,10),np.linspace(1,10,10),'k-D',label='y = x')
axes[0,0].plot(np.linspace(1,10,10),np.linspace(1,10,10)**2,label='y = x**2')
axes[0,0].legend(loc='best')
axes[0,0].set_xlabel('x')
axes[0,0].set_ylabel('y')

plt.tight_layout()

ทำซ้ำจนครบทุกกราฟ

In [24]:
fig, axes = plt.subplots(nrows=3,ncols=2,figsize=(8,8))

axes[0,0].set_title("Graphs showing y = x and y = x**2")
axes[0,0].plot(np.linspace(1,10,10),np.linspace(1,10,10),'k-D',label='y = x')
axes[0,0].plot(np.linspace(1,10,10),np.linspace(1,10,10)**2,label='y = x**2')
axes[0,0].legend(loc='best')
axes[0,0].set_xlabel('x')
axes[0,0].set_ylabel('y')

axes[0,1].set_title('Sales by category')
axes[0,1].bar(['shirts','pants','shorts','shoes'],[1000,1200,800,1800])
axes[0,1].set_ylabel('sales revenue')
axes[0,1].set_xlabel('item type')

axes[1,0].set_title('Sales by category')
axes[1,0].barh(['shirts','pants','shorts','shoes'],[1000,1200,800,1800])
axes[1,0].set_ylabel('item type')
axes[1,0].set_xlabel('sales revenue')

axes[1,1].set_title('Data Distribution')
axes[1,1].hist(np.random.normal(size=100),bins=20)
axes[1,1].set_xlabel('value')
axes[1,1].set_ylabel('frequency')

axes[2,0].set_title('Scatter Plot')
axes[2,0].scatter(np.linspace(1,100,100),np.random.normal(size=100))
axes[2,0].set_xlabel('x')
axes[2,0].set_ylabel('Values')

axes[2,1].set_title('Boxplot')
axes[2,1].boxplot(np.random.normal(size=100))
axes[2,1].set_xlabel('100 Randomly Generated')
axes[2,1].set_ylabel('Values')

plt.tight_layout()

เปลี่ยน style

สามารถทำได้โดยการใช้ style ที่ถูกสร้างไว้สำเร็จรูป ซึ่ง seaborn เป็นหนึ่งใน library ที่ใช้ตกแต่ง รวมไปถึง plot กราฟในรูปแบบต่างๆ ที่เราจะใช้เพื่อเปลี่ยนรูปแบบการแสดงผล

เราจะใช้คำสั่ง sns.set_style() และกำหนด parameter style= ซึ่งมีให้เลือกเบื้องต้น white, dark, whitegrid, darkgrid, ticks สามารถใช้งานได้ดังนี้

เรียนรู้เกี่ยวกับ seaborn

In [25]:
import seaborn as sns
sns.set_style(style='whitegrid')

fig, axes = plt.subplots(nrows=3,ncols=2,figsize=(8,8))

axes[0,0].set_title("Graphs showing y = x and y = x**2")
axes[0,0].plot(np.linspace(1,10,10),np.linspace(1,10,10),'k-D',label='y = x')
axes[0,0].plot(np.linspace(1,10,10),np.linspace(1,10,10)**2,label='y = x**2')
axes[0,0].legend(loc='best')
axes[0,0].set_xlabel('x')
axes[0,0].set_ylabel('y')

axes[0,1].set_title('Sales by category')
axes[0,1].bar(['shirts','pants','shorts','shoes'],[1000,1200,800,1800])
axes[0,1].set_ylabel('sales revenue')
axes[0,1].set_xlabel('item type')

axes[1,0].set_title('Sales by category')
axes[1,0].barh(['shirts','pants','shorts','shoes'],[1000,1200,800,1800])
axes[1,0].set_ylabel('item type')
axes[1,0].set_xlabel('sales revenue')

axes[1,1].set_title('Data Distribution')
axes[1,1].hist(np.random.normal(size=100),bins=20)
axes[1,1].set_xlabel('value')
axes[1,1].set_ylabel('frequency')

axes[2,0].set_title('Scatter Plot')
axes[2,0].scatter(np.linspace(1,100,100),np.random.normal(size=100))
axes[2,0].set_xlabel('x')
axes[2,0].set_ylabel('Values')

axes[2,1].set_title('Boxplot')
axes[2,1].boxplot(np.random.normal(size=100))
axes[2,1].set_xlabel('100 Randomly Generated')
axes[2,1].set_ylabel('Values')

plt.tight_layout()

เสร็จสิ้นos การจัดการกับ file

เข้ากลุ่มแลกเปลี่ยนความรู้

สร้างกราฟแบบโปร อย่างง่าย! Python Data Visualization

Recent Posts

Comments