网站首页 > 技术文章正文

使用Python实现水质预测

nanyue 2024-12-08 17:05:38 技术文章 12 ℃

阅读文章前辛苦您点下“关注”，方便讨论和分享，为了回馈您的支持，我将每日更新优质内容。

如需转载请附上本文源链接！

介绍

水质预测是环境监测中的重要任务，通过预测水质，我们可以提前采取措施，确保水资源的安全和健康。本文将介绍如何使用Python和机器学习技术来实现水质预测。

环境准备

首先，我们需要安装一些必要的Python库：

pip install pandas numpy scikit-learn matplotlib seaborn

数据准备

我们将使用一个公开的水质数据集。你可以从Kaggle下载数据集。

import pandas as pd

# 读取数据
data = pd.read_csv('water_potability.csv')
# 查看数据前几行
print(data.head())

数据预处理

数据预处理是机器学习中的重要步骤。我们需要处理缺失值、标准化数据等。

# 处理缺失值
data = data.dropna()

# 标准化数据
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
data_scaled = scaler.fit_transform(data.drop('Potability', axis=1))

# 转换为DataFrame
data_scaled = pd.DataFrame(data_scaled, columns=data.columns[:-1])
data_scaled['Potability'] = data['Potability'].values

特征选择

选择合适的特征对模型的性能有很大影响。我们将选择所有特征来进行预测。

features = data_scaled.drop('Potability', axis=1)
target = data_scaled['Potability']

数据分割

将数据分为训练集和测试集。

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

模型训练

我们将使用随机森林分类器来进行预测。

from sklearn.ensemble import RandomForestClassifier

# 创建模型
model = RandomForestClassifier(n_estimators=100, random_state=42)

# 训练模型
model.fit(X_train, y_train)

模型评估

使用测试集评估模型性能。

from sklearn.metrics import accuracy_score, classification_report

# 预测
y_pred = model.predict(X_test)

# 评估
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print('Classification Report:')
print(report)

可视化结果

最后，我们可以可视化特征重要性。

import matplotlib.pyplot as plt
import seaborn as sns

# 特征重要性
feature_importances = model.feature_importances_
features = features.columns

# 可视化
plt.figure(figsize=(10, 6))
sns.barplot(x=feature_importances, y=features)
plt.xlabel('Importance')
plt.ylabel('Features')
plt.title('Feature Importance')
plt.show()

总结

通过以上步骤，我们实现了一个简单的水质预测模型。你可以尝试使用不同的模型和特征来提高预测性能。希望这个教程对你有所帮助！

网站首页 > 技术文章 正文