luoq08@gmail.com OR hzluoqiang@corp.netease.com
Not included
Why is Python so popular in machine learning?
pros:
cons:
parallel: multiprocessing
dynamic(test needed)
Install anaconda
conda install xxx
pip install yyy
start jupyter
jupyter notebook
access server. ssh tunnel may be useful
import libraries
import numpy as np
import scipy.sparse as sp
import pandas as pd
from sklearn.linear_model import LogsiticRegression
import xgboost as xgb
import gensim
import nltk
from sklearn.feature_extraction import Counte
import matplotlib.pylab as plt
import seaborn
seaborn.set()
%matplotlib inline
The SciPy Stack: Scientific Computing Tools for Python
caution:
more:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(loss='l1')
model.fit(X_train, y_train)
model.predict(X_test)
model.predict_proba(X_test)
``model.predict`` | ``model.transform`` |
---|---|
Classification | Preprocessing |
Regression | Dimensionality Reduction |
Clustering | Feature Extraction |
Feature selection |
function work(){
pv --rate -i 5 \
| csvcut -c 'images_array_1,images_array_2' | csvjson --stream \
| parallel --gnu -k --pipe -N 20 --jobs 16 python -m feature.image_feature
}
# Generate image feature for training data set and testing data set
cat data/data_files/image_itemPairs_train.csv | work > data/data_files/image_feature_train.csv
cat data/data_files/image_itemPairs_test.csv | work > data/data_files/image_feature_test.csv
feature.image_feature.py:
if __name__ == '__main__':
import sys
for line in sys.stdin:
line = line.rstrip()
#do something with line
...
print(result)
Data-Driven Documents
D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.
web crawling
html parsing: pyquery, lxml, BeautifulSoup
chrome: SelectorGadget, Network panel