[機械学習・進化計算による株式取引最適化] No.04 データセットの作成

本章の目的

前回のデータセットよりもより多くの特徴量を持つデータセットを作成することです．

ディレクトリ・ファイル構造

work_share
├04_get_stock_price_ver2
  ├Dockerfile
  ├docker-compose.yml
  └src
    ├dataset(自動生成)
    ├original_data_2010-01-01_2023-03-01_1d(自動生成)
    ├time_cluster_result(自動生成)
    ├get_stock_price.py
    ├make_dataset.py
    ├make_original_data.py
    ├make_time_cluster_dataset.py
    ├calculate_nearest_codes.py
    ├make_technical_data.py
    └stocks_code.xls(自動生成)

本章では以下の6つのプログラムを作成し，順次実行していくことでデータセットを作成します．

get_stock_price.py
make_original_data.py
make_time_cluster_dataset.py
calculate_nearest_codes.py
make_technical_data.py
make_dataset.py

Dockerfile

FROM nvcr.io/nvidia/pytorch:22.04-py3
USER root

RUN apt-get update
RUN apt-get -y install locales && \
    localedef -f UTF-8 -i ja_JP ja_JP.UTF-8
ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
ENV TZ JST-9
ENV TERM xterm

ENV TZ=Asia/Tokyo
RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

ENV PYTHONPATH "/root/src:$PYTHONPATH"

RUN apt-get update

RUN pip install --upgrade pip
RUN pip install --upgrade setuptools

RUN python -m pip install requests
RUN python -m pip install numpy
RUN python -m pip install pandas
RUN python -m pip install matplotlib
RUN python -m pip install scikit-learn
RUN python -m pip install optuna

RUN python -m pip install seaborn

RUN python -m pip install japanize-matplotlib
RUN python -m pip install lightgbm
RUN python -m pip install notebook
RUN python -m pip install tqdm

RUN python -m pip install pandas_datareader
RUN python -m pip install yfinance
RUN python -m pip install xlrd

RUN python -m pip install tslearn
RUN wget http://prdownloads.sourceforge.net/ta-lib/ta-lib-0.4.0-src.tar.gz && \
    tar -xzf ta-lib-0.4.0-src.tar.gz && \
    cd ta-lib/ && \
    ./configure --prefix=/usr && \
    make && \
    make install

RUN python -m pip install TA-Lib

ARG USERNAME=user
ARG GROUPNAME=user
ARG UID=1000
ARG GID=1000
ARG PASSWORD=user
RUN groupadd -g $GID $GROUPNAME && \
    useradd -m -s /bin/bash -u $UID -g $GID -G sudo $USERNAME && \
    echo $USERNAME:$PASSWORD | chpasswd

USER $USERNAME

docker-compose.yml

ほとんど前回書いたymlと変わりませんが，　-../01_get_stock_price/src/dataset_2018_2023:/work/dataset
が追記されています．これにより前回生成したデータセットのディレクトリをマウントすることができます．

version: '3'
services:
  stock_predict_python:
    restart: always
    build: .
    container_name: 'python_stock_predict'
    working_dir: '/work/src'
    tty: true
    volumes:
      - ./src:/work/src
    ulimits:
      memlock: -1
      stack: -1
    shm_size: '10gb'
    deploy:
      resources:
          reservations:
              devices:
                - capabilities: [gpu]

コンテナの実行と仮想環境に入る

Dockerfileと同じディレクトリ上で実行します．

仮想環境のビルド

docker compose up -d --build

仮想環境に入る

docker compose exec stock_predict_python bash

仮想環境の終了

docker compose down

back No.03-04

chapters

next No.04-01