MindMap Gallery KNN algorithm
If most of the K most similar samples of a sample in the feature space belong to a certain category, then the sample also belongs to this category. Let's take a look at the selection of K values and kd tree knowledge.
Edited at 2023-03-30 20:41:06Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
Avatar 3 centers on the Sully family, showcasing the internal rift caused by the sacrifice of their eldest son, and their alliance with other tribes on Pandora against the external conflict of the Ashbringers, who adhere to the philosophy of fire and are allied with humans. It explores the grand themes of family, faith, and survival.
This article discusses the Easter eggs and homages in Zootopia 2 that you may have discovered. The main content includes: character and archetype Easter eggs, cinematic universe crossover Easter eggs, animal ecology and behavior references, symbol and metaphor Easter eggs, social satire and brand allusions, and emotional storylines and sequel foreshadowing.
[Zootopia Character Relationship Chart] The idealistic rabbit police officer Judy and the cynical fox conman Nick form a charmingly contrasting duo, rising from street hustlers to become Zootopia police officers!
KNN algorithm
definition
If most of the K most similar samples of a sample in the feature space belong to a certain category, then the sample also belongs to this category.
API
KNeighborsClassifier(n_neighbors=5,algorithm="auto")
n_neighbors represents the selection of K value
algorithm indicates whether the nearest neighbor method, brutal search method, or auto is used
distance
Euclidean distance
manhattan distance
Chebyshev distance
Minkowski distance
Normalized Euclidean distance
cosine distance
Hamming distance
Jaccard distance
Mahalanobis distance
Selection of K value
The k value is too small: easily affected by outliers - overfitting
The k value is too large: it suffers from the problem of sample balancing-underfitting
Approximation error (focusing on the training set) - overfitting - performs well on the training set, but performs poorly on the test set
Estimation error (focusing on the test set) - smaller, indicating better prediction of unknown data
kd tree
Construction of kd tree
nearest neighbor search
Search within this domain
To search across other domains
Case: Prediction of flower species--
Obtaining data sklearn.datasets
Acquisition of small data: sklearn.datasets.load_* (local)
Acquisition of big data: sklearn.datasets.fetch_* (network)
Introduction to data set return values: The return value is batch (dictionary)
data: characteristic data array
target: tag (target) array
DESCR: data description
feature_names: feature names
target_names: tag names
Data visualization: seaborn(x=,y=,data=,hue="target value",fit_reg=False does not perform data fitting)
Data set division: sklearn.model_selection.train_test_split (x=feature value, y=target value, test_size=0.2 test set size, random_size=22 random number seed)
x_train,x_test,y_train,y_test are return values
feature engineering
Feature preprocessing
Normalized
Definition: Map original data to a certain range (0, 1)
Api
sklearn.preprocessing.MinMaxScaler(feature_range=(0,1))
fit_transformer(X) converts to an array of the same shape
Summary: The maximum and minimum values are very susceptible to outliers, so this method is less robust and is only suitable for traditional accurate small data scenarios.
standardization
Definition: Transform the original data into a range with a mean of 0 and a standard deviation of 1
API
StandardScaler()
fit_trasformer()
Summary: Outliers have less impact on the results and are suitable for modern noisy data.
Summarize
Simple and effective, low retraining cost, suitable for cross-class samples, and suitable for automatic classification of large samples
Lazy learning, category scoring is not standardized, is not good at unbalanced samples, and requires a large amount of calculation
machine learning
Get dataset
Basic data processing
Data splitting
feature engineering
Feature preprocessing
Normalized
standardization
machine learning
Model evaluation
Cross-validation
Definition: Divide the training set into a validation set and a test set, leaving the test set unchanged. Divide the training set into several equal parts, which is called several-fold cross-validation.
Features: Cannot improve test accuracy, just to provide more accurate prediction results
grid search
For some hyperparameters of the model, it is impossible to determine how big to choose. When the model training effect is the best, grid search is generally used to test the performance of different hyperparameters and return the optimal hyperparameter selection.
API
GridSearchCV(estimator=cross-validation model name, param_grid=number of hyperparameters, cv=several-fold cross-validation)
return value
best_estimator: best model
best_params_: Parameter combination for best results
best_score_: best prediction result
cv_results_: prediction results of the overall model