open terminal
conda create <name>
to activate this environment,use
conda activate <name>
to deactivate
conda deactivate
to put it into jupyter kernel
sudo python -m ipykernel install --name <name>
pandas合并两个csv文件
pandas中,string以object的形式出现
merge,join,concat和append,其中merge和join适用于横向的合并(axis = 1),concat和append则更适用于纵向的合并(axis = 0)。
sklearn
in high-dimensional spaces, data can more easily be separated linearly and the simplicity of classifiers such as naive Bayes and linear SVMs might lead to better generalization than is achieved by other classifiers.
预处理Encoding categorical features
sklearn.preprocessing.LabelEncoder
LabelEncoder的输入是一维,比如1d ndarray
OrdinalEncoder的输入是二维,比如 DataFrame
特征缩放
CountVectorizer中包括文本预处理,标记化和停用词过滤功能
###