FCOS核心代码阅读笔记

fcos_core/modeling/rpn/fcos/fcos.py

这个文件主要包括fcos的网络结构,包含三个loss: loss_cls, loss_reg, loss_centerness。

FCOS Architecture

其中一个关键函数是compute_locations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def compute_locations(self, features):
locations = []
for level, feature in enumerate(features):
h, w = feature.size()[-2:]
locations_per_level = self.compute_locations_per_level(
h, w, self.fpn_strides[level],
feature.device
)
locations.append(locations_per_level)
return locations

def compute_locations_per_level(self, h, w, stride, device):
shifts_x = torch.arange(
0, w * stride, step=stride,
dtype=torch.float32, device=device
)
shifts_y = torch.arange(
0, h * stride, step=stride,
dtype=torch.float32, device=device
)
shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)
shift_x = shift_x.reshape(-1)
shift_y = shift_y.reshape(-1)
locations = torch.stack((shift_x, shift_y), dim=1) + stride // 2
return locations

对于fpn的5个特征图:P3,P4,P5,P6,P7,计算特征图上的点映射到原图的位置,即生成一个二维网格(meshgrid)。

上面代码的24行加上stride // 2是为了解决一个向下取整造成的问题,使原图上的对应点尽可能接近location(x,y)的感受野中心。

最后得到的locations是一个list,包含5个level特征图的所有点映射到原图的坐标。

Read More

How to use 10,582 trainaug images on DeeplabV3 code?

You know what I mean if you have experience on training segmentation network models on Pascal VOC dataset. The dataset only provides 1464 pixel-level image annotations for training. But every paper uses 10,582 images for training, which is usually called trainaug. The additional annotations are from SBD, but the annotation format is not the same as Pascal VOC. Fortunately someone has already made a converted version, which is SegmentationClassAug.

DeeplabV3 code do not contain SBD annotations for some reasons that we can understand. So I wrote a simple script to solve this.

To use 10,582 trainaug images on DeeplabV3 code, you just need to do the following steps:

1. Create a script named convert_voc2012_aug.sh.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Exit immediately if a command exits with a non-zero status.
set -e

CURRENT_DIR=$(pwd)
WORK_DIR="./pascal_voc_seg"
mkdir -p ${WORK_DIR}

cd ${WORK_DIR}
tar -xf "../VOCtrainval_11-May-2012.tar"
cp "../trainaug.txt" "./VOCdevkit/VOC2012/ImageSets/Segmentation"
unzip "../SegmentationClassAug.zip" -d "./VOCdevkit/VOC2012"
rm -r "./VOCdevkit/VOC2012/__MACOSX"

cd ${CURRENT_DIR}

# Root path for PASCAL VOC 2012 dataset.
PASCAL_ROOT="${WORK_DIR}/VOCdevkit/VOC2012"

# Remove the colormap in the ground truth annotations.
SEG_FOLDER="${PASCAL_ROOT}/SegmentationClassAug"
SEMANTIC_SEG_FOLDER="${PASCAL_ROOT}/SegmentationClassAugRaw"

echo "Removing the color map in ground truth annotations..."
python ./remove_gt_colormap.py \
--original_gt_folder="${SEG_FOLDER}" \
--output_dir="${SEMANTIC_SEG_FOLDER}"

# Build TFRecords of the dataset.
# First, create output directory for storing TFRecords.
OUTPUT_DIR="${WORK_DIR}/tfrecord"
mkdir -p "${OUTPUT_DIR}"

IMAGE_FOLDER="${PASCAL_ROOT}/JPEGImages"
LIST_FOLDER="${PASCAL_ROOT}/ImageSets/Segmentation"

echo "Converting PASCAL VOC 2012 dataset..."
python ./build_voc2012_data.py \
--image_folder="${IMAGE_FOLDER}" \
--semantic_segmentation_folder="${SEMANTIC_SEG_FOLDER}" \
--list_folder="${LIST_FOLDER}" \
--image_format="jpg" \
--output_dir="${OUTPUT_DIR}"

2. Create a txt file named trainaug.txt with this content.

3. Download Pascal VOC dataset and SegmentationClassAug annotations.

4. Put all of them (‘convert_voc2012_aug.sh’, ‘trainaug.txt’, ‘VOCtrainval_11-May-2012.tar’, ‘SegmentationClassAug.zip’) to the research/deeplab/datasets folder.

5. Execute convert_voc2012_aug.sh (give it execute permission) in research/deeplab/datasets.

6. Change the code in research/deeplab/datasets/segmentation_dataset.py from:

1
2
3
4
5
6
7
8
9
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1464,
'trainval': 2913,
'val': 1449,
},
num_classes=21,
ignore_label=255,
)

to:

1
2
3
4
5
6
7
8
9
10
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1464,
'trainaug': 10582,
'trainval': 2913,
'val': 1449,
},
num_classes=21,
ignore_label=255,
)

7. Don’t forget to change the train_split parameter in research/deeplab/train.py to trainaug.

解决数据集导致的大内存占用和磁盘IO问题

看到Ubuntu上的磁盘IO和内存占用出奇的高,忍无可忍,决定必须解决一下了。

内存占用最高的是chrome和gvfsd-metadata,前者1.6G没办法,后者竟然有1.8G。查了一下维基百科

gvfsd-metadata is a daemon acting as a write serialiser to the internal gvfs metadata storage. It is autostarted by GIO clients when they make metadata changes. Read operations are done by client-side GIO code directly, and don’t require the daemon to be running. The gvfs metadata capabilities are used by the GNOME Files file manager, for example.

虽然仍然没搞懂它是个什么东西,但是罪魁祸首就是它没跑了,形成的原因大概是我打开过很多大的数据集的文件夹,比如pascal voc、shapenet、pascal3d,要是还有imagenet就更恐怖了。从这个帖子来看,它还导致了100%的CPU占用。后面给出了临时的解决办法:

rm -rf ~/.local/share/gvfs-metadata
pkill gvfsd-metadata

如果这种情况继续出现,那么直接按照这里所说取消gvfsd-metadata的执行权限即可:sudo chmod -x /usr/lib/gvfs/gvfsd-metadata

另外,iowait也达到了27%左右,我只不过开了个sublime-text而已。原因仍然是数据集,有好几个sublime_text --crawl的进程在不停地读磁盘,给文件做索引。这些数据集的文件格式大多是图片,所以只要按照这个帖子给sublime-text的配置加上:

"folder_exclude_patterns": [".svn", ".git", ".hg", "CVS", "node_modules/*"],
"binary_file_patterns": ["*.mat","*.jpg", "*.jpeg", "*.png", "*.gif", "*.ttf", "*.tga", "*.dds", "*.ico", "*.eot", "*.pdf", "*.swf", "*.jar", "*.zip"],

问题就解决了。

不过对于ShapeNet这样的仍然没有办法==,所以最好还是避免用sublime-text打开带有数据集的文件夹(这种需要却并不少见,因为用软链接方便)。

How to make linemod and KinectV2 work with ROS Indigo?

I’m using Ubuntu 14.04.5 with ROS Indigo, and I want to make ork work with linemod, a fairly simple need. But sometimes if some packages are not maintained well (especially in ROS), you have to investigate the problem and even to contribute code to the project…

Following the installation guide to install ork is very simple, don’t forget to install couchdb. Building from source is the only choice now, you have to modify the code to make it work as you wish.

In my case, tabletop method works well with KinectV1, even with KinectV2(but only the hd resolution config worked). However, linemod caused a huge memory leak, nearly 1GB/s, and it didn’t publish /recognized_object_array topic. At the beginning I thought it is the problem is in linemod, but it turned out to be in ork_renderer. A thread in ORK Google Group said,

Currently LINEMOD uses ork_renderer for its training phase. ork_renderer uses either GLUT or osmesa to generate synthetic images of the training data. It seems that the ork_renderer in your computer is linked to osmesa.

Fortunately now we just can change CMakeLists.txt to use GLUT. Just change option(USE_GLUT "Use GLUT instead of OSMesa" OFF) to option(USE_GLUT "Use GLUT instead of OSMesa" ON).

Update: Now I just use the version from JimmyDaSilva but not the official wg-perception.

But the current linemod version still have some problem related to assimp_devel, it seems the developer is working on it, you have to revert linemod to the previous version(35aebd).

So I just created a repo here to make the whole thing work. When linemod is training it will show a assimp window, but it do not contain anything in my case, not a serious problem, linemod works anyway with KinectV1, but not with KinectV2, because KinectV2 has a special resolution, causing an OpenCV error in linearMemoryPyramid. Fortunately again an awesome guy has worked it out, and he also fixed many other issues. I need to use KinectV2 in my work, so I followed this guy’s modification and successfully made it work on KinectV2 with QHD resolution. If you want to use SD resolution, you can set T={2,4} in linemod_detect.cpp and renderer_width: 512 renderer_height: 424 in training.ork as JimmyDaSilva said.

This ork repo integrated all of them, just to make it easier to work on ork, maybe for myself in the future.

Tips:

  • If you used linemod to train, you’d better delete the whole object_recognition database in CouchDB.
  • Using coke.stl is simpler, using coke.obj with texture will get a better result.
  • When training linemod, make sure you are in the folder which contains the obj, mtl and image files.

多系统多头单尾的工作环境

我总会有一些 奇怪 的工作环境需求。比如因为很多软件只有Windows能用,我需要一台Windows,然而工作上的软件又运行在Ubuntu下,可能出于其它原因你甚至还需要一台Mac。除此以外,单个屏幕显然是不够用的,双显示器是必须的,其实三台显示器或许才够用,不过加上一台笔记本我的桌面几乎放不下了。实验室也不会给我辣么多显示器。。。

而且,我还要在所有屏幕所有主机上共享一套鼠标和键盘!还要共享剪贴板,让文本可以直接从一个系统复制粘贴到另一个系统!

由于我有时用到自带屏幕的笔记本,我还需要有时台式机接双显示器,有时笔记本自带屏幕加一台显示器做双屏!

经过一番折腾,上述奇怪的需求全都实现了。这可以叫做多系统、多头、单尾的工作环境。5年前B哥的这篇文章第一次让我知道了计算机的

Synergy是一个通过局域网实现的鼠标键盘共享软件。然而作为一个开源软件,它的下载竟然是收费的!虽然开源软件收费并不是什么坏事,然而开源软件本身是开源的,只要自己能编译出来或者使用别人编译的程序似乎也不违规?我对它的盈利模式深表怀疑。。。而且,Synergy也有Nightly Build是可以搜到的,也就是说你·根·本·就·可·以·免·费·下载到最新的版本(不过可能不够稳定)。

我尝试了Synergy的很多个版本,发现并不是很稳定,而且如果服务器和客户端是不同的版本会有不兼容的问题。最后找了一个老的版本以Daemon/Windows服务的形式运行,大概图形界面就是罪魁祸首吧,我也懒得去确定最新版本用这种方式能否稳定工作了。

Read More