% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dfMaker.R
\name{dfMaker}
\alias{dfMaker}
\title{dfMaker Function}
\usage{
dfMaker(
  input.folder,
  config.path,
  output.file = NULL,
  output.path = NULL,
  no_save = FALSE,
  fast_scaling = TRUE,
  transformation_coords = c(1, 1, 5, 5)
)
}
\arguments{
\item{input.folder}{Path to the folder containing 'OpenPose' JSON files. The folder should contain only the JSON files to be processed; including other files may lead to unexpected behavior.}

\item{config.path}{Path to the configuration file used for extracting metadata from filenames, particularly when processing data from the UCLA NewsScape archive. If not provided, default settings are used, which may not extract NewsScape-specific metadata.}

\item{output.file}{Name of the output file. If \code{NULL} and there is only one unique \code{id} in the data, an auto-generated name with a \code{.parquet} extension is used. If multiple unique \code{id} values are present and \code{output.file} is \code{NULL}, the function will return an error prompting you to provide an explicit output file name. If the specified \code{output.file} ends with \code{.csv}, the output will be saved in CSV format; otherwise, the default format is \code{.parquet}.}

\item{output.path}{Path to save the output file. If \code{NULL}, the file is saved in a default directory called \code{df_outputs} in the current working directory.}

\item{no_save}{Logical. If \code{TRUE}, the output is not saved to a file.}

\item{fast_scaling}{Logical. If \code{TRUE}, uses a simplified scaling transformation that utilizes only the primary base vector \code{v.i}, ignoring the secondary base vector \code{v.j}. \strong{Warning:} When \code{fast_scaling} is \code{TRUE}, scaling is performed only using pose keypoints (\code{t_typepoint = 1}), and the secondary vector \code{v.j} is not utilized.}

\item{transformation_coords}{Numeric vector of length 4 specifying the transformation parameters:
\describe{
\item{\strong{t_typepoint}}{Type of keypoints to use for transformation. Possible values:
\itemize{
\item \code{1}: Body keypoints (pose).
\item \code{2}: Face keypoints.
\item \code{3}: Left hand keypoints.
\item \code{4}: Right hand keypoints.
}}
\item{\strong{o_point}}{Index of the keypoint used as the origin in the new coordinate system.}
\item{\strong{i_point}}{Index of the keypoint that defines the primary base vector \code{v.i}.}
\item{\strong{j_point}}{Index of the keypoint that defines the secondary base vector \code{v.j}.
\itemize{
\item If \code{i_point == j_point}, \code{v.j} is calculated as a perpendicular vector to \code{v.i} (orthonormalization).
\item If \code{fast_scaling = TRUE}, \code{v.j} is not utilized and should be set to \code{NA}.
}}
}}
}
\value{
A data frame containing the processed keypoints data with the following columns:

\describe{
\item{\strong{id}}{\code{Character}: Identifier derived from the name of the file processed.}
\item{\strong{frame}}{\code{Numeric}: Frame number from which the data is extracted.}
\item{\strong{people_id}}{\code{Integer}: Identifier for each detected person in the frame.}
\item{\strong{type_points}}{\code{Character}: Type of keypoints (e.g., \code{"pose_keypoints"}, \code{"face_keypoints"}, \code{"hand_left_keypoints"}, \code{"hand_right_keypoints"}).}
\item{\strong{points}}{\code{Integer}: Index of the keypoints sequence.}
\item{\strong{x}}{\code{Numeric}: X-coordinate of the keypoint.}
\item{\strong{y}}{\code{Numeric}: Y-coordinate of the keypoint.}
\item{\strong{c}}{\code{Numeric}: Confidence score for the detected keypoint, ranging from 0 to 1.}
\item{\strong{nx}}{\code{Numeric}: Transformed X-coordinate in the new coordinate system.}
\item{\strong{ny}}{\code{Numeric}: Transformed Y-coordinate in the new coordinate system.}
}

Additional metadata columns may be included if extracted based on the configuration, such as \code{datetime}, \code{exp_search}, \code{country_code}, \code{network_code}, \code{program_name}, and \code{time_range}.
}
\description{
\code{dfMaker()} processes and organizes keypoints data generated by 'OpenPose' (Cao et al., 2019), compiling multiple JSON files into a structured data frame. It applies linear transformations to align and scale the keypoints within a custom coordinate system defined by the user. The function supports datasets containing pose keypoints alone or combined with face and hand keypoints.
}
\details{
The \code{dfMaker()} function processes keypoints data generated by 'OpenPose', applying linear transformations to align and scale the keypoints within a custom coordinate system defined by the user. The transformation is specified by the \code{transformation_coords} parameter, which allows you to select specific keypoints to define the origin and the base vectors for the new coordinate system.

When \code{fast_scaling = FALSE}, the function uses two base vectors (\code{v.i} and \code{v.j}) defined by the keypoints specified in \code{i_point} and \code{j_point} to perform an affine transformation. If \code{i_point == j_point}, \code{v.j} is calculated as a perpendicular vector to \code{v.i}, resulting in an orthonormal basis.

When \code{fast_scaling = TRUE}, the transformation uses only the primary base vector \code{v.i}, and the scaling is simplified. The secondary vector \code{v.j} is not utilized, and \code{j_point} should be set to \code{NA}.

This function depends on the \code{arrow} package for efficient reading and writing of JSON and Parquet files. Ensure that the \code{arrow} package is installed.

The function expects JSON files generated by 'OpenPose' (tested with version 1.7.0) with a specific structure. Variations in the 'OpenPose' configuration or version may affect the format of these files. Ensure that the JSON files conform to the expected structure for accurate processing.

Each row in the output data frame represents a single keypoint detected in a specific frame and associated with a specific person. The columns \code{nx} and \code{ny} provide the transformed coordinates based on the selected origin and linear transformation parameters. The \code{id} column links the keypoints to the corresponding input file from which they were extracted.

The data frame may contain missing values (\code{NA}) for keypoints that could not be reliably detected.
}
\note{
When processing data from the UCLA NewsScape archive, the \code{config.path} parameter allows you to specify a configuration file that defines how to extract metadata from the filenames of the videos. The filenames in this archive have specific structures containing metadata such as date, time, country code, network code, program name, and time range.

\strong{Example of Configuration File (\code{config.json}):}

\preformatted{
{
    "extract_datetime": true,
    "extract_time": true,
    "extract_exp_search": true,
    "extract_country_code": true,
    "extract_network_code": true,
    "extract_program_name": true,
    "extract_time_range": true,
    "timezone": "America/Los_Angeles"
}
}

Ensure that your data filenames follow the standard NewsScape naming convention for accurate metadata extraction. If your data does not conform to this naming convention, you may need to adjust your filenames or modify the configuration accordingly.

The \code{dfMaker()} function processes all keypoints provided by 'OpenPose', including pose, face, and hand keypoints. For the specific indices and descriptions of these keypoints, please refer to the 'OpenPose' documentation.
}
\section{R Version Requirements}{

This function uses the native pipe operator \verb{|>} introduced in R version 4.1.0. Therefore, R version 4.1.0 or higher is required.
}

\section{Error Handling}{

If the JSON files do not have the expected format or are empty, the function will skip these files and print a message indicating the issue. If \code{output.file} is \code{NULL} and multiple unique \code{id} values are found, the function will stop and return an error, prompting you to provide an explicit \code{output.file} name.
}

\examples{
# Example 1: Define paths to example data included with the package
input.folder <- system.file("extdata/eg/o1", package = "multimolang")
output.file <- file.path(tempdir(), "processed_data.csv")
output.path <- tempdir()  # Use a temporary directory for writing output

# Run dfMaker() with example data
df <- dfMaker(
  input.folder = input.folder,
  output.file = output.file,
  output.path = output.path,
  no_save = FALSE,
  fast_scaling = TRUE,
  transformation_coords = c(1, 1, 5, 5)
)

# View the first few rows of the resulting data frame
head(df)

# Example 2: Using NewsScape data with a custom configuration file
# Define the configuration file path
config.path <- system.file("extdata/config_all_true.json", package = "multimolang")

# Run dfMaker with custom configuration
df <- dfMaker(
  input.folder = input.folder,
  config.path = config.path,
  output.file = output.file,
  output.path = output.path,
  no_save = FALSE,
  fast_scaling = TRUE,
  transformation_coords = c(1, 1, 5, 5)
)

# View the first few rows
head(df)

}
\references{
For more information about the UCLA NewsScape archive, visit:
\url{https://bigdatasocialscience.ucla.edu/newsscape/}

For a detailed description of the NewsScape infrastructure and its applications, refer to:
Uhrig, P. (2018). "NewsScape and the Distributed Little Red Hen Lab: A Digital Infrastructure for the Large-Scale Analysis of TV Broadcasts." In \emph{Anglistentag 2017 in Regensburg: Proceedings}, pp. 99–114.

The \code{arrow} R package is used for efficient reading and writing of JSON and Parquet files. For more information, visit:
\url{https://cran.r-project.org/package=arrow}

'OpenPose' GitHub Repository:
\url{https://github.com/CMU-Perceptual-Computing-Lab/openpose}
}
