For the purpose of learning about and playing with data visualization, Seaborn comes with several built-in datasets that are absolutely perfect. The datasets in question are representative of a wide range of domains and offer a rich playground for gaining a grasp of Seaborn’s capabilities. We will now provide a comprehensive description of each built-in dataset and the significance of each one:
1. Anscombe
A, B, C, and D are the four datasets that are included in this dataset. Although their summary statistics are nearly identical, their distributions are considerably different. This is a famous example that demonstrates why it is important to visualize data rather than relying just on statistical measurements such as mean and variance.
Key Features: X and y values for each dataset are the most important features.
import seaborn as sns
import matplotlib.pyplot as plt
# Load the Anscombe dataset
df = sns.load_dataset('anscombe')
# Scatter plot to visualize the data
sns.relplot(x="x", y="y", hue="dataset", kind="scatter", data=df)
plt.title("Anscombe Dataset Visualization")
plt.show()
2. Attention
In the field of psychology, the attention dataset is utilized for the purpose of analyzing the impact that various treatments have on reaction time.
Key Features: Variables for the content, attention level, and score.
# Load the Attention dataset
df = sns.load_dataset('attention')
# Box plot to explore group-level differences
sns.boxplot(x="attention", y="score", data=df)
plt.title("Attention Dataset Visualization")
plt.show()
3. Car Crashes
A number of criteria, such as alcohol use, speeding, and insurance coverage, are included in this dataset that offers statistics on automobile accidents that occur in the United States.
Key Features: It includes both numerical and category data.
# Load the Car Crashes dataset
df = sns.load_dataset('car_crashes')
# Pair plot to explore relationships
sns.pairplot(df)
plt.title("Car Crashes Dataset Visualization")
plt.show()
4. Diamonds
The dataset on diamonds includes information about almost 54,000 gems, including the price, cut, color, and clarity of each diamond.
Key Features: This tool is frequently utilized for regression analysis and the examination of price patterns.
# Load the Diamonds dataset
df = sns.load_dataset('diamonds')
# Histogram to explore price distribution
sns.histplot(data=df, x="price", hue="cut", multiple="stack")
plt.title("Diamonds Dataset Visualization")
plt.show()
5. Dots
The purpose of this dataset is to evaluate movement perception in the midst of distractions, and it is utilized in experiments that investigate perception.
Key Features: Variables include align, choice, and coherence.
# Load the Dots dataset
df = sns.load_dataset('dots')
# Line plot to visualize firing rate coherence
sns.lineplot(x="coherence", y="firing_rate", hue="align", data=df)
plt.title("Dots Dataset Visualization")
plt.show()
6. Exercise
The dataset on exercise includes information about different types of workouts as well as individual pulse rates.
Key Features: Activity type, duration, and subject data are some of the variables that are included in the key features.
# Load the Exercise dataset
df = sns.load_dataset('exercise')
# Bar plot to compare pulse rates
sns.barplot(x="diet", y="pulse", hue="kind", data=df)
plt.title("Exercise Dataset Visualization")
plt.show()
7. Flights
The data set in question includes the total number of passengers who traveled by air on a monthly basis during the years 1949 and 1960.
Key Features: Time-series in the data.
# Load the Flights dataset
df = sns.load_dataset('flights')
# Heatmap to show passenger trends
df_pivot = df.pivot("month", "year", "passengers")
sns.heatmap(df_pivot, annot=True, fmt="d", cmap="Blues")
plt.title("Flights Dataset Visualization")
plt.show()
8. Penguins
The dataset on penguins includes measurements of various kinds of penguins that were collected from islands located in the Palmer Archipelago group.
Key Features: The length of the flippers, the depth of the bill, and the species are all important characteristics.
# Load the Penguins dataset
df = sns.load_dataset('penguins')
# Scatter plot for flipper length vs bill depth
sns.scatterplot(data=df, x="flipper_length_mm", y="bill_depth_mm", hue="species")
plt.title("Penguins Dataset Visualization")
plt.show()
9. Titanic
Information regarding passengers who were on board the Titanic and whether or not they survived is included in the Titanic dataset.
Key Features: Age, gender, social class, and survival status.
# Load the Titanic dataset
df = sns.load_dataset('titanic')
# Bar plot to visualize survival rates
sns.barplot(x="class", y="survived", hue="sex", data=df)
plt.title("Titanic Dataset Visualization")
plt.show()
AI Generated Apps AI Code Learning Technology