Convolutional hypercolumns in Python

如果你是以下一些机器学习的消息,你肯定看到了瑞恩·达尔所做的工作Automatic ColorizationHacker News commentsreddit的评论)。这个惊人的工作是采用像素hypercolumn从VGG-16网络,以便提取上色图像的信息。Samimalso used the network to process Black & White video frames and produced the amazing video below:

https://www.youtube.com/watch?v=_MJU8VK2PI4

着色黑白电影与神经网络(由SAMIM视频,网络由Ryan)

但如何做到这一点hypercolumns作品?如何提取他们在这样的各种像素的分类问题使用?这篇文章的主要思想是使用具有Keras和Scikit-学习VGG-16预训练的网络一起,以提取像素hypercolumns并采取目前其上的信息的肤浅的样子。我写这篇文章,因为我还没有发现在Python什么做到这一点,这可能是别人对像素分类,分割等工作真的很有用

Hypercolumns

Many algorithms using features from CNNs (Convolutional Neural Networks) usually use the last FC (fully-connected) layer features in order to extract information about certain input. However, the information in the last FC layer may be too coarse spatially to allow precise localization (due to sequences of maxpooling, etc.), on the other side, the first layers may be spatially precise but will lack semantic information. To get the best of both worlds, the authors of thehypercolumn paperdefine the hypercolumn of a pixel as the vector of activations of all CNN units “above” that pixel.

Hypercolumn Extraction
Hypercolumn Extraction通过Hypercolumns为对象分割和细粒度本地化

在hypercolumns的提取的第一步是将图像馈送到CNN(Convolutional Neural Network)并提取特征图激活针对图像的每个位置。最棘手的部分是,当特征地图比输入图像更小,例如一个池操作之后,该论文的作者,然后做特征映射的双线性采样,以保持特征地图上输入相同的大小。也有与FC的问题(fully-connected)layers, because you can’t isolate units semantically tied only to one pixel of the image, so the FC activations are seen as 1×1 feature maps, which means that all locations shares the same information regarding the FC part of the hypercolumn. All these activations are then concatenated to create the hypercolumn. For instance, if we take the VGG-16 architecture to use only the first 2 convolutional layers after the max pooling operations, we will have a hypercolumn with the size of:

64个过滤器池之前第一CONV层

+

128 filterssecond conv layer before pooling)=192 features

This means that each pixel of the image will have a 192-dimension hypercolumn vector. This hypercolumn is really interesting because it will contain information about the first layers (where we have a lot of spatial information but little semantic) and also information about the final layers (with little spatial information and lots of semantics). Thus this hypercolumn will certainly help in a lot of pixel classification tasks such as the one mentioned earlier of automatic colorization, because each location hypercolumn carries the information about what this pixel semantically and spatially represents. This is also very helpful on segmentation tasks (you can see more about that on the original paper introducing the hypercolumn concept).

一亚洲金博宝切听起来很酷,但我们如何在实践中提取hypercolumns?

VGG-16

Before being able to extract the hypercolumns, we’ll setup the VGG-16 pre-trained network, because you know, the price of a good GPU (I can’t even imagine many of them) here in Brazil is very expensive and I don’t want to sell my kidney to buy a GPU.

VGG16网络体系结构(通过志燕等。)
VGG16网络体系结构(通过志燕等。)

要设置上Keras预训练VGG-16网络,你需要下载文件的权重from here(具有大约500MB vgg16_weights.h5文件),然后设置的体系结构和加载使用Keras下载的权重(more information about the weights file and architecturehere):

from matplotlib import pyplot as plt import theano import cv2 import numpy as np import scipy as sp from keras.models import Sequential from keras.layers.core import Flatten, Dense, Dropout from keras.layers.convolutional import Convolution2D, MaxPooling2D from keras.layers.convolutional import ZeroPadding2D from keras.optimizers import SGD from sklearn.manifold import TSNE from sklearn import manifold from sklearn import cluster from sklearn.preprocessing import StandardScaler def VGG_16(weights_path=None): model = Sequential() model.add(ZeroPadding2D((1,1),input_shape=(3,224,224))) model.add(Convolution2D(64, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(64, 3, 3, activation='relu')) model.add(MaxPooling2D((2,2), stride=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, 3, 3, activation='relu')) model.add(MaxPooling2D((2,2), stride=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, 3, 3, activation='relu')) model.add(MaxPooling2D((2,2), stride=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, 3, 3, activation='relu')) model.add(MaxPooling2D((2,2), stride=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, 3, 3, activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, 3, 3, activation='relu')) model.add(MaxPooling2D((2,2), stride=(2,2))) model.add(Flatten()) model.add(Dense(4096, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(4096, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(1000, activation='softmax')) if weights_path: model.load_weights(weights_path) return model

As you can see, this is a very simple code to declare the VGG16 architecture and load the pre-trained weights (together with Python imports for the required packages). After that we’ll compile the Keras model:

model = VGG_16('vgg16_weights.h5') sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True) model.compile(optimizer=sgd, loss='categorical_crossentropy')

Now let’s test the network using an image:

im_original = cv2.resize(cv2.imread('madruga.jpg'), (224, 224)) im = im_original.transpose((2,0,1)) im = np.expand_dims(im, axis=0) im_converted = cv2.cvtColor(im_original, cv2.COLOR_BGR2RGB) plt.imshow(im_converted)

Image used

Image used

As we can see, we loaded the image, fixed the axes and then we can now feed the image into the VGG-16 to get the predictions:

OUT = model.predict(IM)plt.plot(out.ravel())

Predictions
Predictions

As you can see, these are the final activations of the softmax layer, the class with the “jersey, T-shirt, tee shirt” category.

Extracting arbitrary feature maps

Now, to extract the feature map activations, we’ll have to being able to extract feature maps from arbitrary convolutional layers of the network. We can do that by compiling a Theano function using theget_output()Keras的方法,如在下面的例子:

get_feature = theano.function([model.layers[0].input], model.layers[3].get_output(train=False), allow_input_downcast=False) feat = get_feature(im) plt.imshow(feat[0][2])

特征映射

特征映射

在上面的例子中,我编译Theano函数来获取3层(卷积层)的特征图,然后仅示出第三特征地图。在这里我们可以看到激活的强度。如果我们从功能,最终层激活的地图上,我们可以看到,提取的特征是比较抽象的,一样的眼睛,等看从15卷积层下面这个例子:

get_feature = theano.function([model.layers[0].input], model.layers[15].get_output(train=False), allow_input_downcast=False) feat = get_feature(im) plt.imshow(feat[0][13])

More semantic feature maps

More semantic feature maps.

正如你所看到的,这第二个特征图提取更抽象的特点。而且你还可以注意到,当我们前面看到的功能相比,形象似乎更加捉襟见肘,这是因为第一个特征映射具有224×224大小,这其中有56×56,由于各层的缩小操作卷积层之前,这就是为什么我们失去了很多的空间信息。

Extracting hypercolumns

Now finally let’s extract the hypercolumns of arbitrary set of layers. To do that, we will define a function to extract these hypercolumns:

DEF extract_hypercolumn(型号,layer_indexes,实例):层= [model.layers [LI] .get_output(列车=假),用于在layer_indexes LI] get_feature = theano.function([model.layers [0]。输入],层allow_input_downcast =假)feature_maps = get_feature(实例)hypercolumns = []用于feature_maps convmap:用于convmap FMAP [0]:=放大的sp.misc.imresize(FMAP,大小=(224,224),模式= “F”,口译= '双线性')hypercolumns.append(放大的)返回np.asarray(hypercolumns)

我们可以看到,这个函数将预计三个标准ameters: the model itself, an list of layer indexes that will be used to extract the hypercolumn features and an image instance that will be used to extract the hypercolumns. Let’s now test the hypercolumn extraction for the first 2 convolutional layers:

layers_extract = [3, 8] hc = extract_hypercolumn(model, layers_extract, im)

就这样,我们提取的每个像素的hypercolumn载体。这个“HC”可变的形状是:(192L,224L,224L),这意味着我们为224×224像素中的每一个(总共50176个像素与每个192 hypercolumn功能)192维hypercolumn。

Let’s plot the average of the hypercolumns activations for each pixel:

ave = np.average(hc.transpose(1, 2, 0), axis=2) plt.imshow(ave)
Hypercolumn average for layers 3 and 8.
Hypercolumn average for layers 3 and 8.

Ad you can see, those first hypercolumn activations are all looking like edge detectors, let’s see how these hypercolumns looks like for the layers 22 and 29:

layers_extract = [22,29] HC = extract_hypercolumn(模型,layers_extract,1M)平均= np.average(hc.transpose(1,2,0),轴= 2)plt.imshow(AVE)
Hypercolumn average for the layers 22 and 29.
Hypercolumn average for the layers 22 and 29.

As we can see now, the features are really more abstract and semantically interesting but with spatial information a little fuzzy.

Remember that you can extract the hypercolumns using all the initial layers and also the final layers, including the FC layers. Here I’m extracting them separately to show how they differ in the visualization plots.

Simple hypercolumn pixel clustering

Now, you can do a lot of things, you can use these hypercolumns to classify pixels for some task, to do automatic pixel colorization, segmentation, etc. What I’m going to do here just as an experiment, is to use the hypercolumns (from the VGG-16 layers 3, 8, 15, 22, 29) and then cluster it using KMeans with 2 clusters:

m = hc.transpose(1 2 0)。重塑(50176 1)kmeans= cluster.KMeans(n_clusters=2, max_iter=300, n_jobs=5, precompute_distances=True) cluster_labels = kmeans .fit_predict(m) imcluster = np.zeros((224,224)) imcluster = imcluster.reshape((224*224,)) imcluster = cluster_labels plt.imshow(imcluster.reshape(224, 224), cmap="hot")
KMeans clustering using hypercolumns.
KMeans clustering using hypercolumns.

现在,你可以想像是多么有用hypercolumns可以像关键点提取,分割等,这是一个非常优雅的,简单实用的理念任务。亚洲金博宝

I hope you liked it !

– Christian S. Perone

Cite this article as: Christian S. Perone, "Convolutional hypercolumns in Python," in亚洲金博宝未知领域,11/01/2016,//www.cpetem.com/2016/01/convolutional-hypercolumns-in-python/

Real time Drone object tracking using Python and OpenCV

After flying this past weekend (together withGabrieland莱昂德罗)with Gabriel’s drone (which is an handmade APM 2.6 based quadcopter) in our town (Porto Alegre, Brasil), I decided to implement a tracking for objects using OpenCV and Python and check how the results would be using simple and fast methods like均值漂移。The result was very impressive and I believe that there is plenty of room for optimization, but the algorithm is now able to run in real time using Python with good results and with a Full HD resolution of 1920×1080 and 30 fps.

Here is the video of the flight that was piloted byGabriel

看到它在全高清的更多细节。

该算法可以被描述如下,它是非常简单的(少于50行的Python)和直接的:亚洲金博宝

  • A ROI (Region of Interest) is defined, in this case the building that I want to track
  • The normalized histogram and back-projection are calculated
  • The Meanshift algorithm is used to track the ROI

The entire code for the tracking is described below:

import numpy as np import cv2 def run_main(): cap = cv2.VideoCapture('upabove.mp4') # Read the first frame of the video ret, frame = cap.read() # Set the ROI (Region of Interest). Actually, this is a # rectangle of the building that we're tracking c,r,w,h = 900,650,70,70 track_window = (c,r,w,h) # Create mask and normalized histogram roi = frame[r:r+h, c:c+w] hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv_roi, np.array((0., 30.,32.)), np.array((180.,255.,255.))) roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0, 180]) cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX) term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 80, 1) while True: ret, frame = cap.read() hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV) dst = cv2.calcBackProject([hsv], [0], roi_hist, [0,180], 1) ret, track_window = cv2.meanShift(dst, track_window, term_crit) x,y,w,h = track_window cv2.rectangle(frame, (x,y), (x+w,y+h), 255, 2) cv2.putText(frame, 'Tracked', (x-25,y-10), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,255,255), 2, cv2.CV_AA) cv2.imshow('Tracking', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() if __name__ == "__main__": run_main()

I hope you liked it !