python - Kinect の実世界の座標処理アルゴリズムをベクトル化して高速化する

Question

私は最近、pylibfreenect2 を使用して Linux で Kinect V2 を使い始めました。

最初に深度フレームデータを散布図で表示できたとき、どの深度ピクセルも正しい位置にないように見えてがっかりしました。

部屋の側面図 (天井が湾曲していることに注意してください)。

私はいくつかの調査を行い、変換を行うためにいくつかの単純なトリガーが含まれていることに気付きました。

テストするために、列、行、および深度ピクセル強度を受け入れ、そのピクセルの実際の位置を返す pylibfreenect2 の事前に作成された関数から始めました。

X, Y, Z = registration.getPointXYZ(undistorted, row, col)

これは、位置を修正する際に驚くほどうまく機能します。

getPointXYZ()またはgetPointXYZRGB( )を使用する唯一の欠点は、一度に 1 つのピクセルしか処理できないことです。次のようにネストされた for ループを使用する必要があるため、Python では時間がかかる場合があります。

n_rows = d.shape[0]
n_columns = d.shape[1]
out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
for row in range(n_rows):
    for col in range(n_columns):
        X, Y, Z = registration.getPointXYZ(undistorted, row, col)
        out[row * n_columns + col] = np.array([Z, X, -Y])

getPointXYZ() が座標を計算する方法をよりよく理解しようとしました。私の知る限りでは、これは OpenKinect-for-Processing 関数のdepthToPointCloudPos()に似ています。libfreenect2 のバージョンには、内部でさらに多くのことが行われているのではないかと思いますが。

その gitHub ソースコードを例として使用して、私自身の実験のために Python で書き直そうとしたところ、次のようになりました。

#camera information based on the Kinect v2 hardware
CameraParams = {
  "cx":254.878,
  "cy":205.395,
  "fx":365.456,
  "fy":365.456,
  "k1":0.0905474,
  "k2":-0.26819,
  "k3":0.0950862,
  "p1":0.0,
  "p2":0.0,
}

def depthToPointCloudPos(x_d, y_d, z, scale = 1000):
    #calculate the xyz camera position based on the depth data    
    x = (x_d - CameraParams['cx']) * z / CameraParams['fx']
    y = (y_d - CameraParams['cy']) * z / CameraParams['fy']

    return x/scale, y/scale, z/scale

これは、従来の getPointXYZ とカスタム関数の比較です。

彼らは非常に似ています。ただし、明らかな違いがあります。左の比較は、フラットな天井のエッジがよりまっすぐで、正弦波の形状を示しています。追加の数学が関係していると思われます。

私の機能と libfreenect2 の getPointXYZ との違いについて誰かがアイデアを持っているかどうか聞いてみたいと思います。

しかし、私がここに投稿した主な理由は、各要素をループする代わりに、上記の関数をベクトル化して配列全体で動作させることについて質問することです。

上記から学んだことを適用して、depthToPointCloudPos のベクトル化された代替と思われる関数を作成することができました。

[編集]

この機能をさらに効率的にする手助けをしてくれた Benjamin に感謝します!

def depthMatrixToPointCloudPos(z, scale=1000):
    #bacically this is a vectorized version of depthToPointCloudPos()
    C, R = np.indices(z.shape)

    R = np.subtract(R, CameraParams['cx'])
    R = np.multiply(R, z)
    R = np.divide(R, CameraParams['fx'] * scale)

    C = np.subtract(C, CameraParams['cy'])
    C = np.multiply(C, z)
    C = np.divide(C, CameraParams['fy'] * scale)

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

これは機能し、前の関数 depthToPointCloudPos() と同じポイントクラウドの結果を生成します。唯一の違いは、処理速度が 1 fps から 5 ～ 10 fps になったことです (WhooHoo!)。これにより、Python がすべての計算を行うことによるボトルネックが解消されると思います。そのため、半現実世界の座標が計算され、散布図が再びスムーズに実行されるようになりました。

深度フレームから 3D 座標を取得するための効率的な関数ができたので、このアプローチを適用して、カラーカメラデータを深度ピクセルにマッピングしたいと思います。ただし、それを行うためにどのような数学や変数が関係しているのかはわかりません.Googleで計算する方法についてはあまり言及されていません.

別の方法として、libfreenect2 を使用して、getPointXYZRGB を使用して色を深度ピクセルにマッピングすることができました。

#Format undistorted and regisered data to real-world coordinates with mapped colors (dont forget color=out_col in setData)
n_rows = d.shape[0]
n_columns = d.shape[1]
out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
colors = np.zeros((d.shape[0] * d.shape[1], 3), dtype=np.float64)
for row in range(n_rows):
    for col in range(n_columns):
        X, Y, Z, B, G, R = registration.getPointXYZRGB(undistorted, registered, row, col)
        out[row * n_columns + col] = np.array([X, Y, Z])
        colors[row * n_columns + col] = np.divide([R, G, B], 255)
sp2.setData(pos=np.array(out, dtype=np.float64), color=colors, size=2)

ポイントクラウドと色付きの頂点を生成します (非常に遅い <1Fps):

要約すると、私の2つの質問は基本的に次のとおりです。

depthToPointCloudPos()関数 (およびベクトル化された実装)から返される実際の 3D 座標データが、libfreenect2 の getPointXYZ() によって返されるデータにより似ているようにするには、どのような追加手順が必要ですか?
また、自分のアプリケーションで深さから色へのレジストレーションマップを生成する (おそらくベクトル化された) 方法を作成するには、何が必要でしょうか? これは解決済みですので、アップデートを参照してください。

[アップデート]

登録されたフレームを使用して、カラーデータを各ピクセルにマッピングすることができました。これは非常に単純で、setData() を呼び出す前に次の行を追加するだけで済みました。

colors = registered.asarray(np.uint8)
colors = np.divide(colors, 255)
colors = colors.reshape(colors.shape[0] * colors.shape[1], 4 )
colors = colors[:, :3:] #BGRA to BGR (slices out the alpha channel)  
colors = colors[...,::-1] #BGR to RGB

これにより、Python はカラーデータをすばやく処理し、スムーズな結果を得ることができます。以下の機能例にそれらを更新/追加しました。

Pythonでリアルタイムに実行される色登録による実世界の座標処理!

(GIF画像の解像度を大幅に落としています)

[アップデート]

アプリケーションでもう少し時間を費やした後、散布図の視覚的な品質を向上させ、この例/質問をより直感的にすることを期待して、いくつかの追加パラメーターを追加し、それらの値を調整しました。

最も重要なことは、頂点を不透明に設定したことです。

sp2 = gl.GLScatterPlotItem(pos=pos)
sp2.setGLOptions('opaque') # Ensures not to allow vertexes located behinde other vertexes to be seen.

その後、サーフェスに非常に接近してズームすると、隣接する頂点間の距離が拡大し、表示されるすべてのものがほとんど空のスペースになることに気付きました。これは、頂点のポイントサイズが変更されていないことが原因の 1 つです。

色付きの頂点でいっぱいの「ズームに適した」ビューポートの作成を支援するために、現在のズームレベルに基づいて頂点ポイントサイズを計算する次の行を追加しました (更新ごとに)。

# Calculate a dynamic vertex size based on window dimensions and camera's position - To become the "size" input for the scatterplot's setData() function.
v_rate = 8.0 # Rate that vertex sizes will increase as zoom level increases (adjust this to any desired value).
v_scale = np.float32(v_rate) / gl_widget.opts['distance'] # Vertex size increases as the camera is "zoomed" towards center of view.
v_offset = (gl_widget.geometry().width() / 1000)**2 # Vertex size is offset based on actual width of the viewport.
v_size = v_scale + v_offset

そして見よ：

(ここでも、GIF 画像の解像度が大幅に低下しています)

ポイントクラウドのスキニングほど良くはないかもしれませんが、実際に見ているものを理解しようとするときに物事を簡単にするのに役立つようです.

言及されたすべての変更は、機能例に含まれています。

[アップデート]

前の 2 つのアニメーションで見られるように、現実世界の座標の点群が、グリッド軸と比較して歪んだ方向を持っていることは明らかです。これは、私が実際の Kinect の実際の向きを補正していなかったからです!

したがって、各頂点の新しい (回転およびオフセット) 座標を計算する追加のベクトル化された三角関数を実装しました。これにより、実空間での Kinect の実際の位置を基準にして、それらが正しく方向付けられます。また、傾斜する三脚を使用する場合に必要です (リアルタイムフィードバック用に INU またはジャイロ/加速度計の出力を接続するためにも使用できます)。

def applyCameraMatrixOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # bacically this is a vectorized version of applyCameraOrientation()
    # uses same trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[:, ax1] ** 2 + pt[:, ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[:, ax2], pt[:, ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[:, ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[:, ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(1, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(0, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y plane

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt

注意:rotatePoints() は「仰角」と「方位角」に対してのみ呼び出されます。これは、ほとんどの三脚がロールをサポートしておらず、CPU サイクルを節約するためにデフォルトで無効になっているためです。何か派手なことをするつもりなら、遠慮なくコメントを外してください!!

この画像ではグリッドの床が水平になっていますが、左側のポイントクラウドはそれに合わせていないことに注意してください。

Kinect の向きを設定するパラメータ:

CameraPosition = {
    "x": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "y": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "z": 1.7, # height in meters of actual kinect sensor from the floor.
    "roll": 0, # angle in degrees of sensor's roll (used for INU input - trig function for this is commented out by default).
    "azimuth": 0, # sensor's yaw angle in degrees.
    "elevation": -15, # sensor's pitch angle in degrees.
}

センサーの実際の位置と向きに応じてこれらを更新する必要があります。

最も重要な 2 つのパラメーターは、シータ (仰角) 角度と床からの高さです。使用したのは単純な巻尺と目盛り付きの目だけですが、いつかエンコーダーまたは INU データをフィードして、これらのパラメーターをリアルタイムで更新するつもりです (センサーが動き回るにつれて)。

ここでも、すべての変更が機能例に反映されています。

誰かがこの例の改善に成功した場合、または物事をよりコンパクトにする方法について提案がある場合は、詳細を説明するコメントを残していただければ幸いです。

このプロジェクトの完全に機能する例を次に示します。

#! /usr/bin/python

#--------------------------------#
# Kinect v2 point cloud visualization using a Numpy based 
# real-world coordinate processing algorithm and OpenGL.
#--------------------------------#

import sys
import numpy as np

from pyqtgraph.Qt import QtCore, QtGui
import pyqtgraph.opengl as gl

from pylibfreenect2 import Freenect2, SyncMultiFrameListener
from pylibfreenect2 import FrameType, Registration, Frame, libfreenect2

fn = Freenect2()
num_devices = fn.enumerateDevices()
if num_devices == 0:
    print("No device connected!")
    sys.exit(1)

serial = fn.getDeviceSerialNumber(0)
device = fn.openDevice(serial)

types = 0
types |= FrameType.Color
types |= (FrameType.Ir | FrameType.Depth)
listener = SyncMultiFrameListener(types)

# Register listeners
device.setColorFrameListener(listener)
device.setIrAndDepthFrameListener(listener)

device.start()

# NOTE: must be called after device.start()
registration = Registration(device.getIrCameraParams(),
                            device.getColorCameraParams())

undistorted = Frame(512, 424, 4)
registered = Frame(512, 424, 4)


#QT app
app = QtGui.QApplication([])
gl_widget = gl.GLViewWidget()
gl_widget.show()
gl_grid = gl.GLGridItem()
gl_widget.addItem(gl_grid)

#initialize some points data
pos = np.zeros((1,3))

sp2 = gl.GLScatterPlotItem(pos=pos)
sp2.setGLOptions('opaque') # Ensures not to allow vertexes located behinde other vertexes to be seen.

gl_widget.addItem(sp2)

# Kinects's intrinsic parameters based on v2 hardware (estimated).
CameraParams = {
  "cx":254.878,
  "cy":205.395,
  "fx":365.456,
  "fy":365.456,
  "k1":0.0905474,
  "k2":-0.26819,
  "k3":0.0950862,
  "p1":0.0,
  "p2":0.0,
}

def depthToPointCloudPos(x_d, y_d, z, scale=1000):
    # This runs in Python slowly as it is required to be called from within a loop, but it is a more intuitive example than it's vertorized alternative (Purly for example)
    # calculate the real-world xyz vertex coordinate from the raw depth data (one vertex at a time).
    x = (x_d - CameraParams['cx']) * z / CameraParams['fx']
    y = (y_d - CameraParams['cy']) * z / CameraParams['fy']

    return x / scale, y / scale, z / scale

def depthMatrixToPointCloudPos(z, scale=1000):
    # bacically this is a vectorized version of depthToPointCloudPos()
    # calculate the real-world xyz vertex coordinates from the raw depth data matrix.
    C, R = np.indices(z.shape)

    R = np.subtract(R, CameraParams['cx'])
    R = np.multiply(R, z)
    R = np.divide(R, CameraParams['fx'] * scale)

    C = np.subtract(C, CameraParams['cy'])
    C = np.multiply(C, z)
    C = np.divide(C, CameraParams['fy'] * scale)

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

# Kinect's physical orientation in the real world.
CameraPosition = {
    "x": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "y": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "z": 1.7, # height in meters of actual kinect sensor from the floor.
    "roll": 0, # angle in degrees of sensor's roll (used for INU input - trig function for this is commented out by default).
    "azimuth": 0, # sensor's yaw angle in degrees.
    "elevation": -15, # sensor's pitch angle in degrees.
}

def applyCameraOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # This runs slowly in Python as it is required to be called within a loop, but it is a more intuitive example than it's vertorized alternative (Purly for example)
    # use trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[ax1] ** 2 + pt[ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[ax2], pt[ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(0, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(1, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y plane

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt

def applyCameraMatrixOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # bacically this is a vectorized version of applyCameraOrientation()
    # uses same trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[:, ax1] ** 2 + pt[:, ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[:, ax2], pt[:, ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[:, ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[:, ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(1, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(0, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt


def update():
    colors = ((1.0, 1.0, 1.0, 1.0))

    frames = listener.waitForNewFrame()

    # Get the frames from the Kinect sensor
    ir = frames["ir"]
    color = frames["color"]
    depth = frames["depth"]

    d = depth.asarray() #the depth frame as an array (Needed only with non-vectorized functions)

    registration.apply(color, depth, undistorted, registered)

    # Format the color registration map - To become the "color" input for the scatterplot's setData() function.
    colors = registered.asarray(np.uint8)
    colors = np.divide(colors, 255) # values must be between 0.0 - 1.0
    colors = colors.reshape(colors.shape[0] * colors.shape[1], 4 ) # From: Rows X Cols X RGB -to- [[r,g,b],[r,g,b]...]
    colors = colors[:, :3:]  # remove alpha (fourth index) from BGRA to BGR
    colors = colors[...,::-1] #BGR to RGB

    # Calculate a dynamic vertex size based on window dimensions and camera's position - To become the "size" input for the scatterplot's setData() function.
    v_rate = 5.0 # Rate that vertex sizes will increase as zoom level increases (adjust this to any desired value).
    v_scale = np.float32(v_rate) / gl_widget.opts['distance'] # Vertex size increases as the camera is "zoomed" towards center of view.
    v_offset = (gl_widget.geometry().width() / 1000)**2 # Vertex size is offset based on actual width of the viewport.
    v_size = v_scale + v_offset

    # Calculate 3d coordinates (Note: five optional methods are shown - only one should be un-commented at any given time)

    """
    # Method 1 (No Processing) - Format raw depth data to be displayed
    m, n = d.shape
    R, C = np.mgrid[:m, :n]
    out = np.column_stack((d.ravel() / 4500, C.ravel()/m, (-R.ravel()/n)+1))
    """

    # Method 2 (Fastest) - Format and compute the real-world 3d coordinates using a fast vectorized algorithm - To become the "pos" input for the scatterplot's setData() function.
    out = depthMatrixToPointCloudPos(undistorted.asarray(np.float32))

    """
    # Method 3 - Format undistorted depth data to real-world coordinates
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float32)
    for row in range(n_rows):
        for col in range(n_columns):
            z = undistorted.asarray(np.float32)[row][col]
            X, Y, Z = depthToPointCloudPos(row, col, z)
            out[row * n_columns + col] = np.array([Z, Y, -X])
    """

    """
    # Method 4 - Format undistorted depth data to real-world coordinates
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
    for row in range(n_rows):
        for col in range(n_columns):
            X, Y, Z = registration.getPointXYZ(undistorted, row, col)
            out[row * n_columns + col] = np.array([Z, X, -Y])
    """

    """
    # Method 5 - Format undistorted and regisered data to real-world coordinates with mapped colors (dont forget color=colors in setData)
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
    colors = np.zeros((d.shape[0] * d.shape[1], 3), dtype=np.float64)
    for row in range(n_rows):
        for col in range(n_columns):
            X, Y, Z, B, G, R = registration.getPointXYZRGB(undistorted, registered, row, col)
            out[row * n_columns + col] = np.array([Z, X, -Y])
            colors[row * n_columns + col] = np.divide([R, G, B], 255)
    """


    # Kinect sensor real-world orientation compensation.
    out = applyCameraMatrixOrientation(out)

    """
    # For demonstrating the non-vectorized orientation compensation function (slow)
    for i, pt in enumerate(out):
        out[i] = applyCameraOrientation(pt)
    """


    # Show the data in a scatter plot
    sp2.setData(pos=out, color=colors, size=v_size)

    # Lastly, release frames from memory.
    listener.release(frames)

t = QtCore.QTimer()
t.timeout.connect(update)
t.start(50)


## Start Qt event loop unless running in interactive mode.
if __name__ == '__main__':
    import sys
    if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
        QtGui.QApplication.instance().exec_()

device.stop()
device.close()

sys.exit(0)

python - Kinect の実世界の座標処理アルゴリズムをベクトル化して高速化する

1 に答える 1

Related

Reference