登录查看更多内容

A Discussion on Visualization (with Python)

李晨松

Knowledge Associate at Russell Reynolds Associates

发布日期: 2020年2月16日

In this article I would like to share some of my personal understanding and practice on data visualization in the environment of Python, to those who, like myself, just started exploring in this field. This article would demonstrate the whole process of data analysis, starting from data cleanse, to modeling, and concluding with visualization. I would try to present this process as clearly as possible.

I think that the keys of visualization are to define what and how do you like to illustrate. Adjusting whats and hows might present totally different results. This idea would run throughout the entire article.

This article consisted of three sections.

Section I: Why we need visualization and how it can help us;
Section II: Visualization itself may be not enough, but better served after model analysis;
Section III: Visualization can be cool, but shouldn't be excessive.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

%matplotlib inline

Data

To help with examples, two parts of economic data were employed in this article: World Development Indicators (December 2019) by the World Bank, and provincial GDP per capita of China slated in its national and provincial statistical year books (published from 1985 to 2010).

Section I

Why we need visualization and how it can help us

Once when I was talking about visualization with others, I was told that people don't need "pretty" graphs to tell them what was already in the numbers.

Well, I can't agree with that. First, I think "pretty" is not an overuse but a nice feature. Second, facts lie in the numbers, yes, but sometimes you cannot really see them without helps.

A basic question: what can we know from the numbers, and how?

When comparing simple numbers like 3 apples and 5 bananas, or KPIs of a small team, we don't need tools such as visualization to help interpret them. We simply compare them directly and save time. But what if the numbers are beyond our daily awareness? What if there was more than one variable for each observation to be compared?

For example, let's take a look at the agriculture, industry, and services outputs in 2010 of the G20 countries (minus the European Union).

# World Development Indicators data
wdi = pd.read_csv('Data/WDIData.csv')

wdi.head(3)

g20 = ['AUS', 'IND', 'ARG', 'FRA', 'CHN', 
       'CAN', 'RUS', 'BRA', 'DEU', 'IDN', 
       'SAU', 'ZAF', 'MEX', 'ITA', 'JPN', 
       'USA', 'TUR', 'GBR', 'KOR']


outputs = ['NV.AGR.TOTL.KD', 'NV.IND.TOTL.KD', 'NV.SRV.TOTL.KD']
cols = ['Agriculture', 'Industry', 'Services']

t20 = '2010'

# Extracting data 
data20 = pd.DataFrame(index=g20, columns=cols).sort_index()


for g in g20:
    for o in range(len(outputs)):
        data20.loc[g][cols[o]] = wdi[wdi['Country Code'] == g][wdi['Indicator Code'] == outputs[o]][t20].tolist()[0]

What can we tell from this data? It's obvious the industry and services outputs of the USA were larger than the other countries by one magnitude. China had a larger agriculture output than the USA. Indonesia's industrial output was almost equivalent to its services output.

What else?

Limited by our brain's mechanism, we can only compare 2 numbers at a time. It means that ordinary people couldn't grab the general idea from a dataset of 19 countries and 3 variables.

This is where the visualization kicks in, by which we can get the big picture with a single glimpse.

First Let's compare the output composition, that is the percentage of GDP contributed by agriculture, industry, and services. Conventionally it can be done by a series of pie charts.

# Add a blank row for easier alignment of charts 
data20.loc['blank'] = 0

# Setting up the colors, this is important 
colors20 = ['#4FD4DB', '#5EA1A2', '#C6C7E3']

# Setting up how many charts to be shown in each row and column 
nrows21 = 4
ncols21 = round(len(data20.index)/nrows21)


fig21, ax21 = plt.subplots(nrows=nrows21, ncols=ncols21, figsize = (20, 16))
fig21.patch.set_facecolor('xkcd:white')


for i in range(nrows21):
    for j in range(ncols21):
        
        ax21[i][j].pie(
            data20.iloc[i*ncols21+j], 
            colors=colors20, explode=[0, 0, 0], shadow=False, 
            startangle=-270, counterclock=False, 
            wedgeprops={'width':0.6, 'edgecolor':'white', 'linewidth':1}, 
            autopct='%1.2f%%', pctdistance=0.7, 
            labeldistance=1.2, 
            textprops={'size':16}
        )
        
        ax21[i][j].set_ylabel(data20.index.tolist()[i*ncols21+j], fontsize=18, labelpad=0.1)


plt.legend(cols, loc='lower right', fontsize=18)
plt.show()

Voila! And now we can see the big picture and check out what we like to know. For example, it is assumed that the higher the percentage of GDP contributed by services, the wealthier the country. We can check this much easier in the pie charts than in the plain numbers.

However, when looking at the pie charts, we are still in the fence of only comparing two items at a time.

Is it possible to compare all of these in the same time? Yes, it is. And this would bring up the key point about "how": choose the way you show it.

In this case, a Nightingale's rose chart can be employed to deliver a more impacting result. Note that the Nightingale's rose chart is only suitable for limited observations.

data22 = data20.drop('blank')
# Pie charts can generate proportions automatically, but other charts cannot 
data22['sum'] = data22[cols[0]] + data22[cols[1]] + data22[cols[2]]

# Setting up the circle 
N22 = len(data22.index)


theta22 = np.linspace(start=0.0, stop=2*np.pi, num=N22, endpoint=False)


angels22 = [n/float(N22)*2*np.pi for n in range(N22)]
angels22.append(angels22[0])  # It has to be a closure 


# For this kind of graphs, tick labels are between but not on tick points 
witdh22 = (2*np.pi)/N22
angels22_2 = [n+witdh22/2 for n in angels22]

# Setting up the petals of the rose 
base_height22 = 2
layer22 = 3  # How many layers of petals 


radii22 = []  # Radius of each layer  
bottom22 = []
data22['base_bottom'] = [0.5]*N22


for i in range(layer22):
    radii22.append(base_height22*data22.iloc[:, i]/data22['sum'])
    
    if i == 0:
        pass
    else:
        data22['base_bottom'] += radii22[-2]
    
    bottom22.append(data22['base_bottom'].tolist())
    
radii22 = [r.tolist() for r in radii22]

fig22, ax22 = plt.subplots(figsize=(10, 10), subplot_kw={'projection':'polar'})
fig22.patch.set_facecolor('xkcd:white')


for r, b, c in zip(radii22, bottom22, colors20):
    ax22.bar(x=theta22, height=r, bottom=b, width=witdh22, color=c, lw=1, edgecolor='white', alpha=1)


ax22.set_xticklabels([])
ax22.set_yticklabels([])
ax22.set_yticks([])
ax22.set_xticks(angels22_2)
ax22.tick_params(axis='x', which='both', grid_linestyle='--',grid_linewidth=0.2, grid_alpha=0.5)
ax22.set_axis_off()


for n in range(N22):
    ax22.text(x=angels22[n], y=bottom22[-1][n]+radii22[-1][n]+0.2, 
              s=data22.index.tolist()[n], 
              ha='center', va='center', fontsize=14
              )


ax22.set_title('Economic Output Composition of G20 Countries in ' + t20, fontsize=20, pad=40)


plt.legend(cols, loc='lower right', bbox_to_anchor=(1.05, -0.1), fontsize=14)
plt.tight_layout()
plt.show()

Next let's pay a visit to the key of "how": choose the angle you present it.

In the previous example, Argentina looks wealthier than India, because of a larger part of services output in the GDP. Is it true?

This time we try to illustrate the stack data rather than the proportional data in the same form of graph.

data23 = data20.drop('blank')

N23 = len(data23.index)


theta23 = np.linspace(start=0.0, stop=2*np.pi, num=N23, endpoint=False)


angels23 = [n/float(N23)*2*np.pi for n in range(N23)]
angels23.append(angels23[0])  # It has to be a closure 


# For this kind of graphs, tick labels are between but not on tick points
witdh23 = (2*np.pi)/N23
angels23_2 = [n+witdh23/2 for n in angels22]

base_height23 = 10**13
layer23 = 3  # How many layers of petals 


radii23 = []  # Radius of each layer  
bottom23 = []
data23['base_bottom'] = [0.2]*N23


for i in range(layer23):
    radii23.append(data23.iloc[:, i]/base_height23)
    
    if i == 0:
        pass
    else:
        data23['base_bottom'] += radii23[-2]
    
    bottom23.append( data23['base_bottom'].tolist())
    
radii23 = [r.tolist() for r in radii23]

fig23, ax23 = plt.subplots(figsize=(16, 16), subplot_kw={'projection':'polar'})
fig23.patch.set_facecolor('xkcd:white')


for r, b, c in zip(radii23, bottom23, colors20):
    ax23.bar(x=theta23, height=r, bottom=b, width=witdh23, color=c, lw=0.5, edgecolor='white')


ax23.set_xticklabels([])
ax23.set_yticklabels([])
ax23.set_yticks([])
ax23.set_xticks(angels23_2)
ax23.tick_params(axis='x', which='both', grid_linestyle='--',grid_linewidth=0.2, grid_alpha=0.5)
ax23.set_axis_off()


for n in range(N23):
    ax23.text(x=angels23[n], y=bottom23[-1][n]+radii23[-1][n]+0.1, 
              s=data23.index.tolist()[n], 
              ha='center', va='center', fontsize=14)


ax23.text(x=0, y=0, s='G20\nOutputs', ha='center', va='center', fontsize=14)


ax23.set_title('Economic Outputs of G20 Countries in ' + t20, fontsize=20, pad=-170)


plt.legend(cols, loc='lower left', bbox_to_anchor=(0.1, 0.1), fontsize=16)
plt.show()

And Argentina's total output was like one fifth of that of India, in spite of having a larger proportion of services.

Obviously, rose charts can be more striking for presenting stack data rather than proportional data.

However, in this graph we can find an obvious problem: the part of agriculture can be barely seen. It indicates the rose chart could be less suitable if values of variables differ from each other drastically. It also implies that finding the better whats and hows is the everlasting theme in the subject of visualization.

Section II

Visualization itself may be not enough, but better served after model analysis

Sometimes numbers themselves could not tell us anything useful. But it should not necessarily mean that there was no information hidden in them.

When visualizing plain data cannot present reasonable illustrations, we can turn to visualize the modeled results of the original data as this could be a better "what".

For instance, when studying the exogenous and endogenous economic growth capacities of China from 1978 to 2010, whether a province had access to coast lines was chosen as the proxy of the exogenous factors, because all coastal provinces were selected as special economic zones by the central government successively.

Consider a data set with a dependent variable of GDP per capita, and two explanatory dummy variables of southeast provinces and coastal provinces. Does it make sense to visualize this data?

Probably no.

t_start = 1978
t_end = 2010

t = [t for t in range(t_start, t_end+1)]
tt = [str(t) for t in t]

# China statistical data consisted of two parts with different structures
GpC = pd.read_excel('Data/China GDP per Capita.xlsx', index_col='Province')
dummy = pd.read_excel('Data/China Prinvince Series.xlsx', index_col='Province')

province = GpC.index.tolist()

index=np.arange(len(t)*len(province)).tolist()
variables = ['year', 'province', 'G', 'SE', 'CL']

data = pd.DataFrame(index=index, columns=variables)

idx = 0
for year in t:
    for i in province:
        data['year'].iloc[idx] = year
        data['province'].iloc[idx] = i
        data['G'].iloc[idx] = GpC.loc[i][year]
        for a in variables[3:]:
            data[a].iloc[idx] = dummy.loc[i][a]
        idx += 1

There are three difficulties for this data set to be directly visualized:

It is both chronological and categorical;
It has both numerical and Boolean values;
It has more than one dummy variables.

These make it hard to visualize the whole picture in one place. Or we do one huge infographic or multiple smaller graphs. But still, it doesn't help to verify the factors affecting growth capacities.

In this case a regression model can help.

This model compares the growth among Northwest, Southeast, and coastal provinces, and investigates the advantages of the coastal provinces given by the central government’s policy. Please see the appendix for details of the rationale.

G = a0 + a1SE + a2CL + u

The variables stand for:

G: GDP per capita of each province in a given year;
SE: the dummy variable of southeast provinces;
CL: the dummy variable of coastal provinces.

The estimated coefficients (a) are:

a0: the mean GPD per capita of the northwest provinces;
a1: the mean GDP per capita of southeast provinces larger than that of northwest provinces;
a2: the mean GDP per capita of coastal provinces larger than that of northwest provinces.

u stands for the estimation residual.

The key in this modeling is that this is not a single shot. As the GDP per capita was current values and not adjusted for inflation, the model repeatedly analyzed panel data for each year. Note that in this case we are not building a robust model, but testing the robustness of the model against each year's data.

# The target data to be collected are the estimated coefficients, the p values, and the r squared

a = [[], [], []]
p = [[], [], []]
R2 = []


for year in t:
    modeldata = data[data['year'] == year][variables[1:]].set_index('province')
    
    model1 = smf.ols('G ~ SE + CL', data=modeldata.astype(float))
    result1 = model1.fit()
    
    for i in range(len(a)):
        a[i].append(result1.params.tolist()[i])
        p[i].append(result1.pvalues.tolist()[i])
    R2.append(result1.rsquared)


matrix = pd.DataFrame(
    {
        'Year': t, 
        'a0': a[0], 
        'p0': p[0], 
        'a1': a[1], 
        'p1': p[1], 
        'a2': a[2], 
        'p2': p[2],
        'R2': R2
    }
)

The p values stand for the probability of estimated coefficients being equal to 0, which means the corresponding variable has totally no explanatory power over the dependent variable. The smaller the p value, the more significant the independent variable in the model.

R2 stands for the total explanatory power of the model, the higher the stronger.

Once again comes the problem: what can we know from the numbers， without being dazzled? And again, this is where the visualization kicks in.

Time to define the "what"! Let's first look at how the estimated coefficients are doing.

fig11, ax11 = plt.subplots(figsize=(12, 6))
fig11.patch.set_facecolor('xkcd:white')


a_NW = matrix['a0'].tolist()
a_SE = (matrix['a0']+matrix['a1']).tolist()
a_CL = (matrix['a0']+matrix['a2']).tolist()


ax11.plot(t, a_NW, label='Northwest')
ax11.plot(t, a_SE, label='Southeast')
ax11.plot(t, a_CL, label='Coastal')


ax11.legend(['Northwest', 'Southeast', 'Coastal'], fontsize=14)
ax11.grid(axis='y', ls='--')
ax11.set_ylabel('GDP per Capita (Yuan)', fontsize=14, labelpad=10)


ax11.set_title('GDP per Capita of China Provinces (1978 ~ 2010)', fontsize=20)


plt.xlim(t[0], t[-1]+1)
plt.ylim(0, 50000)
plt.xticks(np.arange(t[0], t[-1]+1, 1), rotation=90)

plt.show()

Nothing special.

Although the mean GDP per capita in Southeast provinces can be higher than that in Northwest provinces, due to that a1 never presented any significance throughout the period from 1978 to 2010, the mean GDP per capita in Southeast and Northwest provinces were never statistically different from each other. The coastal ones did behave themselves differently, but aside from the curve slope, I'd say it's one of those graphs you see people draw every day.

From the graph above, by the illustrations of the estimated coefficients, we couldn't really find anything interesting. It probably means that we have to change the angle. So let's go over the p values of the coastal provinces instead.

fig12, (ax121, ax122) = plt.subplots(ncols=1, nrows=2, figsize=(12, 8))
fig12.patch.set_facecolor('xkcd:white')


ax121.plot(t, matrix['p2'].tolist(), label='The p-values of b2')
ax121.xaxis.set_ticks(np.arange(t[0], t[-1]+1, 1))
ax121.yaxis.set_ticks(np.arange(0, 0.14, 0.01))
ax121.tick_params(axis='x', rotation=90)
ax121.grid(ls='--', alpha=0.3)
ax121.legend(fontsize=14)
ax121.set_ylabel('p-values', fontsize=14, labelpad=10)


ax122.plot(t, matrix['R2'].tolist(), label='R-squared', color='red')
ax122.xaxis.set_ticks(np.arange(t[0], t[-1]+1, 1))
ax122.tick_params(axis='x', rotation=90)
ax122.grid(ls='--', alpha=0.3)
ax122.legend(fontsize=14)
ax122.set_ylabel('R-squared', fontsize=14, labelpad=10)

plt.show()

Here comes the hidden gem, in which we can see the historical events.

a2 was not always significant, meaning that the mean GDP per capita of the coastal provinces was not always significantly higher than the average value of the country. As illustrated in the figure, before 1985, it was generally insignificant. Only in 1981, in which the p value was 0.0883, it was slightly significant. 1981 was the second year that the special economic zones were established in Guangdong and Fujian Province. And in 1985, the p value reached 0.0801, and kept decreasing throughout the whole period. This coincides with the evidence that the coastal development areas were founded in all the mainland coastal provinces in 1984.

It also coincides with the increase in R2 since 1985: as the coastline was opened to foreign trade, the explanatory power of this model increases. Further that since 1989, the p value experienced another rapid decrease till 1994. It can be explained by that Hainan Province was established and appointed as the largest special economic zone in 1988.

The p value of a2 reached the lowest point, 0.0043, in 1994. Then it began to slightly increase but kept being significant (smaller than 0.020). It may indicate that, though the initial opening up policies were still significantly benefiting the growth in the coastal provinces, other factors began to influence the growth more strongly in the other parts of China. So did the R2 tell.

Section III

Visualization can be cool, but shouldn't be excessive

Some may argue over the graphs of p values and R squared: these are not visualization but simple graphs that can be made in Excel!

It is true. But it is what the visualization meant to be in the first place. All in all, data visualization is a vehicle of data or information, which means its most important role is to present information clearly.

The clarity comes before the style. In Section I, I said "pretty" is a nice feature. I mean it is a feature of the visualization, but not the visualization itself. In the following I'd like to share a failed example of mine, which made me realize that there should be a balance between the effects added and the information supposed to be delivered.

The example was to illustrate the annual growth rate of agriculture, industry, and services of China from 1978 to 2010. I wanted to show these three categories side by side but didn't want to do a conventional bar plot. I thought, why don't I let the bars move away from each other a little bit, and make them kind of transparent?

From the perspective of coding, these effects totally can be realized. But from the perspective of visualization, this was a failure. Aside from the color selection, I'm afraid one could be dazzled harder by the result shown below than by plain numbers. One word: this is no art.

ServiceG_CHN = wdi[wdi['Country Code'] == 'CHN'][wdi['Indicator Code'] == 'NV.SRV.TOTL.KD.ZG'][tt].fillna(0).iloc[0].tolist()

AgricultureG_CHN = wdi[wdi['Country Code'] == 'CHN'][wdi['Indicator Code'] == 'NV.AGR.TOTL.KD.ZG'][tt].fillna(0).iloc[0].tolist()

IndustryG_CHN = wdi[wdi['Country Code'] == 'CHN'][wdi['Indicator Code'] == 'NV.IND.TOTL.KD.ZG'][tt].fillna(0).iloc[0].tolist()

cat = ['Agriculture', 'Industry', 'Services']
colors3 = ['#4FD4DB', '#5EA1A2', '#C6C7E3']

fig31, ax31 = plt.subplots(figsize=(14, 6))
fig31.patch.set_facecolor('xkcd:white')


cmap = plt.get_cmap('plasma')
color31 = cmap(np.arange(20, 500, 90))


t03_1 = [t-0.15 for t in t]
t03_2 = [t+0.15 for t in t]
t03_3 = [t+0.3 for t in t]


t03 = [t03_1, t, t03_2]
var_03 = [AgricultureG_CHN, IndustryG_CHN, ServiceG_CHN]
# label31=['Service', 'Agriculture', 'Industry']


for i in range(len(t03)):
    ax31.bar(t03[i], var_03[i], width=0.5, 
             edgecolor=color31[i], facecolor='white', 
             align='center', 
             alpha=0.8, lw=1, 
             label=cat[i]
             )


ax31.set_xticks(t)
ax31.tick_params(axis='x', rotation=90)
ax31.set_ylabel('Growth Rate', fontsize=14, labelpad=10)
ax31.grid(axis='y', ls='--', alpha=0.5)
ax31.legend()


ax31.set_title('Economic Outputs Growth of China (1978 ~ 2010)', fontsize=20)


ax31.spines['top'].set_visible(False)
ax31.spines['right'].set_visible(False)
ax31.spines['left'].set_visible(False)

plt.show()

If let me choose again, I would go for a simple stacked bar plot.

fig32, ax32 = plt.subplots(figsize=(14, 6))
fig32.patch.set_facecolor('xkcd:white')


height = [AgricultureG_CHN, IndustryG_CHN, ServiceG_CHN]


b0 = [0]*len(AgricultureG_CHN)
b1 = [a if a > 0 else 0 for a in AgricultureG_CHN]
b2 = [sum(b) for b in zip(b1, IndustryG_CHN)]
bottom = [b0, b1, b2]


for h, b, c, i in zip(height, bottom, colors3, cat):
    ax32.bar(x=t, height=h, bottom=b, color=c, lw=1, edgecolor='white', alpha=1, label=i)


ax32.set_xticks(t)
ax32.tick_params(axis='x', rotation=90)
ax32.set_ylabel('Growth Rate', fontsize=14, labelpad=10)
ax32.grid(axis='y', ls='--', alpha=0.5)
ax32.legend()


ax32.set_title('Economic Outputs Growth of China (1978 ~ 2010)', fontsize=20)


ax32.spines['top'].set_visible(False)
ax32.spines['right'].set_visible(False)
ax32.spines['left'].set_visible(False)

plt.show()

And again, what if changing to another angle to see this? Is there a better way to present the trend in the economy?

Agriculture_CHN = wdi[wdi['Country Code'] == 'CHN'][wdi['Indicator Code'] == 'NV.AGR.TOTL.KD'][tt].fillna(0).iloc[0].tolist()

Industry_CHN = wdi[wdi['Country Code'] == 'CHN'][wdi['Indicator Code'] == 'NV.IND.TOTL.KD'][tt].fillna(0).iloc[0].tolist()

Service_CHN = wdi[wdi['Country Code'] == 'CHN'][wdi['Indicator Code'] == 'NV.SRV.TOTL.KD'][tt].fillna(0).iloc[0].tolist()

SUM = [sum(a) for a in zip(Agriculture_CHN, Industry_CHN, Service_CHN)]
AC = [a/b for (a, b) in zip(Agriculture_CHN, SUM)]
IC = [a/b for (a, b) in zip(Industry_CHN, SUM)]
SC = [a/b for (a, b) in zip(Service_CHN, SUM)]

fig33, ax33 = plt.subplots(figsize=(12, 6))
fig33.patch.set_facecolor('xkcd:white')

ax33.stackplot(t, AC, IC, SC, 
               baseline='zero', 
               edgecolor='white', lw=1.5, 
               colors=colors3, 
               labels=cat
               )


ax33.set_xticks(t)
ax33.tick_params(axis='x', rotation=90)
ax33.set_ylabel('Economic Component', fontsize=14, labelpad=10)
ax33.legend(loc='upper right', bbox_to_anchor=(1.12, 0.95))


ax33.set_title('Economic Component of China (1978 ~ 2010)', fontsize=20)


ax33.spines['top'].set_visible(False)
ax33.spines['right'].set_visible(False)
ax33.spines['left'].set_visible(False)


plt.show()

Conclusion

Thanks for reading this far! It took me more than one month to do this article. It's not only a sharing but also a general conclusion of my learnings in the past months.

As discussed above, carefully choosing what and how to visualize, this would determine if we can excavate the information hidden under data. I'd say this is data mining in a (much) smaller scale.

Appendix

The model analyzed in this article was inspired by the 2005 paper of Acemoglu et al, The rise of Europe: Atlantic trade, institutional change and economic growth. In that paper, Acemoglu et al discussed the relationship between Atlantic trade and the growth in European countries.

They argued that in Western Europe, the countries which accessed Atlantic trade grew faster than those which could not access Atlantic trade, and that the countries which focused in Atlantic trade grew faster than those which neglected it. The most important, they argued that the initial political institutions of different Western European countries were essential for their incentives in investing in Atlantic trade.

There are similarities between China since 1978 and Europe since 1500. First, as Europe is conventionally divided into Western Europe and Eastern Europe, China is demographically and naturally divided into Southeast and Northwest. In Europe, only countries in Western Europe accessed to Atlantic trade; in China, all the Special Economic Zones (SEZs) and Coastal Development Areas (CDAs) are in Southeast. Second, not all countries in Western Europe accessed Atlantic trade, and only the Atlantic traders grew rapidly during 1500~1800. Similarly, China only granted 4 cities in two provinces to be SEZs in 1980, and 14 cities in 10 provinces to be CDAs in 1984. Due to these similarities, the growth in China since 1978 can be studied by a similar method for the rise of Europe.

However, there is a significant difference between China since 1978 and Europe since 1500: in Europe, the influence on Atlantic trade from the political institutions was endogenous, but in China, it was exogenous.

A Discussion on Visualization (with Python)

李晨松

Knowledge Associate at Russell Reynolds Associates

Data

Section I

Why we need visualization and how it can help us

Section II

Visualization itself may be not enough, but better served after model analysis

Section III

Visualization can be cool, but shouldn't be excessive

Conclusion

Appendix

社区洞察

其他会员也浏览了

Exploring Raw Material Data: Analyzing Trends and Insights with Python

Data Manipulation with Pandas

Data Analysis In Python

Mastering Data Analysis with Pandas Series: A Comprehensive Guide with Examples

NUMPY

Data Comprehension in Python

Quick Revision Of Pandas Library

Box Plot using Python: Data Summary by 5 Numbers

Python for Data Science: Leveraging Pandas, NumPy, and Matplotlib

MBES data processing challenges and Python