๐Ÿ“Š [DataCamp] Intermediate Python : Basic plots with Matplotlib, Pandas series & dataframe (loc,iloc)

2020. 9. 1. 19:42ใ†Learning archive/Data Science

๐Ÿ“Œ Gapminder World Map

 

 

๐Ÿ“Œ How to use Matplotlib

plot function tells python what to show, how to show! 

 

๐Ÿ“ŒScatter plot : doesn't connect the  dots, more honest way 

 

 

๐Ÿ“ŒPractice : ๋‚˜์˜ ์ฒซ ๊ทธ๋ž˜ํ”„ 

 

 

 

 

Histogram 

ํžˆ์Šคํ† ๊ทธ๋žจ

 

 

 

e.g. population pyramid : 

 

ํžˆ์Šคํ† ๊ทธ๋žจ ์—์‹œ๋กœ ์œ ๋ช…ํ•œ babyboom ์„ธ๋Œ€์˜ ์ธ๊ตฌ 

 

 

Histogram_practice 

 

 

 Build a histogram(3) : compare 

 

 

 

 

customizaiton : how to customize plots 

 

 

 

 

different plot types * (colors, shapes, .. etc. ) customizations * Data * story 

 

 

 

 

 

+ add axis labels 

 befrore calling the show func, call axis labels func 
can give more info to the readers 

 

 

 

padding more data to each axis 

 

 

 

Labels 

 

New python type : Dictionaries (so useful!) 

๊ธฐ์กด์˜ ๋ฐฉ์‹ : 2๊ฐœ์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งŒ๋“ ๋‹ค. ์ธ๋ฑ์Šค๋กœ ๋‘ ๋ฆฌ์ŠคํŠธ๋ฅผ ์—ฐ๊ฒฐํ•œ๋‹ค. not convenient, not intuitive 

 

 

Dictionary์˜ ๋ฐฉ์‹ 

pop = [30, 2, 39]
countries = ["afg", "alba", "algeria"]

... 

world = {"afg":30, "alba":2, "algeria":39}
world[alba]

// intuitive, more efficient (more speed)

 

 

Dictionary Manipulation(1) add keys 

 

 

 

 

๐Ÿ“Œ Pandas, Part1 

 

Tabular data set examples 

row = observations ๊ด€์ธก๊ฐœ์ฒด

col = variable = ๋ณ€์ˆ˜ 

 

 

 

๐Ÿ“Œ pandas = high level data manipulation tool (built on Numpy) 

 

 

 

 

you can build it manually #1 : from dict using pd.DataFrame(dictname)

 

#2 Dataframe from CSV file 

 

- CSVํŒŒ์ผ์€ comma-separated values์˜ ์•ฝ์ž๋กœ, ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ๊ณต์œ ํ•˜๋Š” ๋งค์šฐ ๊ฐ„ํŽธํ•œ ํฌ๋งท์ด๋‹ค,
- CSVํŒŒ์ผ์€ ์ˆซ์ž๋‚˜ ๋ฌธ์ž์—ด๋กœ ๊ตฌ์„ฑ๋œ ํ‘œ๋ฅผ ์ผ๋ฐ˜ ํ…์ŠคํŠธ(plain text)๋กœ ์ €์žฅํ•˜๋ฏ€๋กœ, ์ด๋ฅผ ์ €์žฅ, ์ „์†ก, ์ฒ˜๋ฆฌ ํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋กœ๊ทธ๋žจ์ด ๋‹ค์–‘ํ•˜๋‹ค. 
- ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ(csv file)์™€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ(python ์Šคํฌ๋ฆฝํŠธ๊ฐ€ ๋ถ„๋ฆฌ๋˜๋ฏ€๋กœ, ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์— ๋˜‘๊ฐ™์€ ์ฒ˜๋ฆฌ ๊ณผ์ •์„ ๋ณด๋‹ค ์‰ฝ๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. 

index_col=0 

๋งจ ์•ž ์—ด์„ label๋กœ ์„ค์ •ํ•จ 

 

 

 

practice 

 

'

 

 

dictionary๋กœ dataframe ์‹คํ–‰์‹œํ‚ค๊ณ , ํ–‰์— label๋ถ™์ด๊ธฐ 

 

CSV to Dataframe(1) 

 

pandas ํ˜ธ์ถœ, csv ๋ถˆ๋Ÿฌ์˜ค๊ณ , index_col=0

 

 

Pandas part2

 

๐Ÿ“Œ Index and selet data! : first, with [] 

In the video, you saw that you can index and select Pandas DataFrames in many different ways. The simplest, but not the most powerful way, is to use square brackets.

 

 

 

2 Column ๋„ ok [] 

 

 

Row Access [] 

 

but [] : limited functionality 

 

 

Pandas loc์™€ iloc์„ ์‚ฌ์šฉํ•ด๋ณด์ž 

 

 

 

loc (label-based) 

 

 

in a row 

 

 

difference here = you can extend ur sellection with your commas, + 

 

 

 

 

 

 

loc(label-based) 

 

Row Acces iloc  : can use index 

 

 

loc vs iloc 

 

Practice! 

 

 

 

 

 

 

์ถœ์ฒ˜ 

datacamp.com