Skip to content

Data

continuiti.data

Data sets in continuiti. Every data set is a list of (x, u, y, v) tuples.

OperatorDataset(x, u, y, v, x_transform=None, u_transform=None, y_transform=None, v_transform=None)

Bases: OperatorDatasetBase

A dataset for operator training.

In operator training, at least one function is mapped onto a second one. To fulfill the properties discretization invariance, domain independence and learn operators with physics-based loss access to at least four different discretized spaces is necessary. One on which the input is sampled (x), the input function sampled on these points (u), the discretization of the output space (y), and the output of the operator (v) sampled on these points. Not all loss functions and/or operators need access to all of these attributes.

PARAMETER DESCRIPTION
x

Tensor of shape (num_observations, x_dim, num_sensors...) with sensor positions.

TYPE: Tensor

u

Tensor of shape (num_observations, u_dim, num_sensors...) with evaluations of the input functions at sensor positions.

TYPE: Tensor

y

Tensor of shape (num_observations, y_dim, num_evaluations...) with evaluation positions.

TYPE: Tensor

v

Tensor of shape (num_observations, v_dim, num_evaluations...) with ground truth operator mappings.

TYPE: Tensor

ATTRIBUTE DESCRIPTION
shapes

Shape of all tensors.

transform

Transformations for each tensor.

Source code in src/continuiti/data/dataset.py
def __init__(
    self,
    x: torch.Tensor,
    u: torch.Tensor,
    y: torch.Tensor,
    v: torch.Tensor,
    x_transform: Optional[Transform] = None,
    u_transform: Optional[Transform] = None,
    y_transform: Optional[Transform] = None,
    v_transform: Optional[Transform] = None,
):
    assert all([t.ndim >= 3 for t in [x, u, y, v]]), "Wrong number of dimensions."
    assert (
        x.size(0) == u.size(0) == y.size(0) == v.size(0)
    ), "Inconsistent number of observations."

    # get dimensions and sizes
    x_dim, x_size = x.size(1), x.size()[2:]
    u_dim, u_size = u.size(1), u.size()[2:]
    y_dim, y_size = y.size(1), y.size()[2:]
    v_dim, v_size = v.size(1), v.size()[2:]

    assert x_size == u_size, "Inconsistent number of sensors."
    assert y_size == v_size, "Inconsistent number of evaluations."

    super().__init__()

    self.x = x
    self.u = u
    self.y = y
    self.v = v

    # used to initialize architectures
    self.shapes = OperatorShapes(
        x=TensorShape(dim=x_dim, size=x_size),
        u=TensorShape(dim=u_dim, size=u_size),
        y=TensorShape(dim=y_dim, size=y_size),
        v=TensorShape(dim=v_dim, size=v_size),
    )

    self.transform = {
        dim: tf
        for dim, tf in [
            ("x", x_transform),
            ("u", u_transform),
            ("y", y_transform),
            ("v", v_transform),
        ]
        if tf is not None
    }

__len__()

Return the number of samples.

RETURNS DESCRIPTION
int

Number of samples in the entire set.

Source code in src/continuiti/data/dataset.py
def __len__(self) -> int:
    """Return the number of samples.

    Returns:
        Number of samples in the entire set.
    """
    return self.x.size(0)

__getitem__(idx)

Retrieves the input-output pair at the specified index and applies transformations.

PARAMETER DESCRIPTION
idx

The index of the sample to retrieve.

TYPE: int

RETURNS DESCRIPTION
Tuple[Tensor, Tensor, Tensor, Tensor]

A tuple containing the three input tensors and the output tensor for the given index.

Source code in src/continuiti/data/dataset.py
def __getitem__(
    self,
    idx: int,
) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
    """Retrieves the input-output pair at the specified index and applies transformations.

    Parameters:
        idx: The index of the sample to retrieve.

    Returns:
        A tuple containing the three input tensors and the output tensor for the given index.
    """
    return self._apply_transformations(
        self.x[idx], self.u[idx], self.y[idx], self.v[idx]
    )

split(dataset, split=0.5, seed=None)

Split data set into two parts.

PARAMETER DESCRIPTION
split

Split fraction.

DEFAULT: 0.5

Source code in src/continuiti/data/utility.py
def split(dataset, split=0.5, seed=None):
    """
    Split data set into two parts.

    Args:
        split: Split fraction.
    """
    assert 0 < split < 1, "Split fraction must be between 0 and 1."

    generator = torch.Generator()
    if seed is not None:
        generator.manual_seed(seed)

    size = len(dataset)
    split = int(size * split)

    return torch.utils.data.random_split(
        dataset,
        [split, size - split],
        generator=generator,
    )

dataset_loss(dataset, operator, loss_fn=None, device=None, batch_size=32)

Evaluate operator performance on data set.

PARAMETER DESCRIPTION
dataset

Data set.

operator

Operator.

loss_fn

Loss function. Default is MSELoss.

TYPE: Optional[Loss] DEFAULT: None

device

Device to evaluate on. Default is CPU.

TYPE: Optional[device] DEFAULT: None

batch_size

Batch size. Default is 32.

TYPE: int DEFAULT: 32

Source code in src/continuiti/data/utility.py
def dataset_loss(
    dataset,
    operator,
    loss_fn: Optional[Loss] = None,
    device: Optional[torch.device] = None,
    batch_size: int = 32,
):
    """Evaluate operator performance on data set.

    Args:
        dataset: Data set.
        operator: Operator.
        loss_fn: Loss function. Default is MSELoss.
        device: Device to evaluate on. Default is CPU.
        batch_size: Batch size. Default is 32.
    """
    loss_fn = loss_fn or MSELoss()
    device = device or torch.device("cpu")

    # Move operator to device
    operator.to(device)

    dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size)

    loss = 0
    for x, u, y, v in dataloader:
        x, u, y, v = x.to(device), u.to(device), y.to(device), v.to(device)
        loss += loss_fn(operator, x, u, y, v).item()

    return loss / len(dataloader)

Last update: 2024-08-20
Created: 2024-08-20