Cryptoasset Trading¶

The problem at hand is to train a model that reliably identifies arbitrage transactions for cryptoasssets within a network of constant function market makers (CFMMs). That is, given private valuation $p_i$ of the $i$-th cryptoasset, an analytic form of the task at hand is expressed via

$$ \min_{(x,y)} - U(x,y)\ \ \text{s.t.} \ \ (x^j,y^j) \ \ \text{forms a valid trade within the $j$-th CFMM with reserves $r^j$},$$

where $r,p,x,y\in\mathbb{R}^n$ and the utility is

$$ U(x,y) = \sum_{j} \left< A^jp, A^j(y^j - x^j)\right>,$$

where $A^j$ maps $x^j$ and $y^j$ from global coordinates to local coordinates. (Note local coordinates may be in a smaller dimensional space since not all CFMMs utilize all cryptoassets.)

As a simplified model of real-world phenomena, we introduce noise (e.g. due to front running) into the reserves $r$ to obtain noisy observable data $d$. This makes the problem inconsistent (we note below a simple mechanism for correcting this for this toy problem). To account for noise, a "cost of risk" is learned and incorporated in a data-driven utility $U_\Theta$. This utility is optimized for a collection of training data to identify the optimal arbitrage trades. That is, we define an implicit model

$$ \mathcal{N}_\Theta(d) \triangleq (x_\Theta, y_\Theta) \triangleq \text{argmin}_{(x,y)} -U_\Theta(x,y) \ \ \text{s.t.} \ \ (x^j,y^j) \ \ \text{forms a valid trade within the $j$-th CFMM with r}.$$

The weights $\Theta$ are tuned to solve the training problem

$$ \min_{\Theta} \mathbb{E}_{d\sim\mathcal{D}} \left[ - U_\Theta(x_\Theta, \tilde{y}_\Theta) \right],$$

where

$$ \tilde{y}_\Theta^j = y_\Theta^j - \tau_j p$$

and $\tau_j$ is chosen so $(x^j_\Theta, \tilde{y}^j_\Theta)$ forms a valid trade with the $j$-th CFMM that has reserves $r^j$. When $r^j = d^j$ and $U_\theta = U$, we have $y_\Theta^j = \tilde{y}_\Theta^j$. However, the presence of noise forces these to differ.

Below we overview the maths for the optimization algorithm. Then we define a PyTorch network that uses Davis-Yin Splitting (DYS) to compute inferences. Training is conducted by using Jacobian-Free Backprop (which maintains a low memory footprint by only backpropping through the final DYS iteration).

This notebook accompanies the preprint Explainable AI via Learning to Optimize. In particular, see the appendices for more mathematical details about computations in this notebook.

We emphasize the code herein is meant to be illustrative and, thus, has not be optimized for speed.

In [ ]:

Copied!

import torch

device= 'cuda:0'
import torch

device= 'cuda:0'

Levels of Abstraction in Math Formulation¶

We begin at the highest level with the arbitrage problem

$$ \min_{(x,y)} \sum_{j=1}^m \left< A^jp, A^j(x^j - y^j)\right>$$

subject to the constraints

$$ (x^j, y^j) = \text{valid transaction with $j$-th CFMM with data $d^j$.}$$

Here we assume CFMMs take use either a weighted arithmetic mean or weighted geometric mean for their constant function. See our preprint for further detials on the constraints, their decoupling, and derivations of the proximal mappings for these. In our setting, using the parameterization $\xi=(v,x,y,z)$, we use the functions

$$ f(\xi;d) \triangleq \delta_{\geq0}(x,y) + \delta_{\mathcal{M}}(v,z;d),$$

$$ g(\xi;d) \triangleq \delta_{\mathcal{R}}(v,x,y) + \delta_{\mathcal{H}}(z;d),$$

$$ h(\xi;d) \triangleq -U_\Theta(x,y; d).$$

The referenced indicator functions are given by

$$ \delta_{\mathcal{M}}(v,z;d) \triangleq \sum_{i\in\mathcal{I}_1} \delta_{\mathcal{P}_j}(v^j, z^j;d) + \sum_{j\in\mathcal{I}_2} \delta_{\mathcal{A}_j}(v^j;d),$$

$$ \delta_{\mathcal{H}}(z) \triangleq \sum_{j\in\mathcal{I}_1} \delta_{\mathcal{H}_j} (z^j;d)$$

$$ \delta_{\mathcal{R}}(v,x,y) \triangleq \sum_{j=1}^m \delta_{\mathcal{R}^j}(v^j,x^j,y^j),$$

where $\mathcal{I}_1$ is the set of all CFMM indices with weighted geometric means and $\mathcal{I}_2 = [m] - \mathcal{I}_1$ is the remaining indices for weighted arithmetic mean CFMMs, and

$$ \mathcal{P}_j \triangleq \{(v^j,z^j) : z^j \leq \ln(v^j + d) \},$$

$$ \mathcal{A}_j \triangleq \{x^j : x+d^j \geq 0,\ \left<w^j,x^j\right> \geq \delta_j \left<w,r^j\right>,$$

$$ \mathcal{H}_j \triangleq \{z^j : \left<w^j,z^j\right> = \ln(1-\delta_j) + \left<w^j, d^j\right>\},$$

$$ \mathcal{R}_j \triangleq \{(v^j,x^j,y^j) : v^j = \Gamma^j A^j x^j - A^j y^j \}.$$

Above $\Gamma^j = \text{diag}(\gamma_j,\ldots,\gamma_j) \in \mathbb{R}^{n_j\times n_j}$ gives the trade fees within a CFMM and $A^j \in \mathbb{R}^{n_j\times n}$ is a binary matrix that maps global cryptoasset indices into the local coordinates of each CFMM.

Below we provide a function for applying DYS to lists of tensors

Davis-Yin Splitting¶

From the above parameterizations, the arbitrage problem above is equivalent to the minimization problem

$$ \min_{\xi} f(\xi) + g(\xi) + h(\xi), $$

where $f$ and $g$ are proximable and $h$ is $L$-Lipschitz differentiable. Davis-Yin Splitting is applied, with $\alpha \in (0,2/L)$ via the iteration

$$ \xi^{k+1} = \text{prox}_{\alpha f}(\zeta^k)$$ $$ \psi^{k+1} = \text{prox}_{\alpha g}(2\xi^{k+1}-\zeta^k - \alpha \nabla h(\xi^{k+1}))$$ $$ \zeta^{k+1} = \zeta^k + \psi^{k+1}-\xi^{k+1}$$

Here

$$ \lim_{k\rightarrow\infty} \xi^k = \xi^\star \in \text{argmin}_{\xi} f(\xi) + g(\xi) + h(\xi).$$

In [ ]:

Copied!





def apply_DYS(prox_f, prox_g, grad_h, zeta_init, alpha, 
              fxd_pt_tol=1.0e-12, max_iters=1.0e3, verbose=False):
    ''' Apply Davis-Yin Splitting to minimize f + g + h.

        Convergence is measured by fixed point residual.

        Preconditions: 
            - Step size is positive
            - Stopping tolerance is positive
            - Initialization is a tensor

        Postconditions: 
            - xi and zeta shapes match

        Note: We are unable to directly verify whether alpha < 2 / L, with L the
              Lipschitz constant of h.
    '''
    assert alpha > 0
    assert fxd_pt_tol > 0
    assert torch.is_tensor(zeta_init)

    zeta = zeta_init.clone()
    converge = False
    iter = 0 
    while not converge and iter < max_iters:
        xi = prox_f(zeta, alpha)    
        psi = prox_g(2.0 * xi - zeta - alpha * grad_h(xi), alpha)
        zeta = zeta + psi - xi

        fix_pt_res = torch.max(torch.norm(psi - xi, dim=1))        
        ref_norm = torch.mean(torch.norm(zeta, dim=1))
        # converge = fix_pt_res <= fxd_pt_tol * ref_norm 
        converge = False
        iter += 1   

    if verbose and iter == max_iters:
        print('DYS reached max iteration count.')  

    valid_shape = xi.shape[0] == zeta_init.shape[0]
    valid_shape = xi.shape[1] == zeta_init.shape[1] and valid_shape
    assert valid_shape

    return prox_f(zeta, alpha), zeta
def apply_DYS(prox_f, prox_g, grad_h, zeta_init, alpha, 
              fxd_pt_tol=1.0e-12, max_iters=1.0e3, verbose=False):
    ''' Apply Davis-Yin Splitting to minimize f + g + h.

        Convergence is measured by fixed point residual.

        Preconditions: 
            - Step size is positive
            - Stopping tolerance is positive
            - Initialization is a tensor

        Postconditions: 
            - xi and zeta shapes match

        Note: We are unable to directly verify whether alpha < 2 / L, with L the
              Lipschitz constant of h.
    '''
    assert alpha > 0
    assert fxd_pt_tol > 0
    assert torch.is_tensor(zeta_init)

    zeta = zeta_init.clone()
    converge = False
    iter = 0 
    while not converge and iter < max_iters:
        xi = prox_f(zeta, alpha)    
        psi = prox_g(2.0 * xi - zeta - alpha * grad_h(xi), alpha)
        zeta = zeta + psi - xi

        fix_pt_res = torch.max(torch.norm(psi - xi, dim=1))        
        ref_norm = torch.mean(torch.norm(zeta, dim=1))
        # converge = fix_pt_res <= fxd_pt_tol * ref_norm 
        converge = False
        iter += 1   

    if verbose and iter == max_iters:
        print('DYS reached max iteration count.')  

    valid_shape = xi.shape[0] == zeta_init.shape[0]
    valid_shape = xi.shape[1] == zeta_init.shape[1] and valid_shape
    assert valid_shape

    return prox_f(zeta, alpha), zeta

Cryptoasset Trade Utility¶

The differentiable function $h$ gives the trade utility, i.e.

$$h(\xi;d) \triangleq -U_\Theta(x,y;d) = \underbrace{\sum_{j=1}^m \left< A^j p, A^j(x^j-y^j) \right>}_{\text{net value lossed}} + \underbrace{\dfrac{\psi_\Theta(d)}{2}\sum_{j=1}^m \left\|W^j A^j (x^j - y^j)) \right\|^2 .}_{\text{Risk term}}, $$

Sigmoid... (CHECK LIPSCHITZ STUFF)

$$ \sigma(z) = \dfrac{1}{1+\exp(-z)}\ \ \implies \ \ \nabla \sigma(z) = \sigma(z)\cdot(1-\sigma(z)).$$

$$ \psi_\Theta(d) \cdot A^\top W^\top WA(x-y) \cdot \text{diag}(\sigma' (WA(x-y)+b)) $$

where $\Theta$ are weights consisting of $W$ and parameters for $\psi_\Theta:\mathbb{R}^n\rightarrow \mathbb{R}$. The function $\psi_\Theta$ is continuous, but is not (necessarily) convex in $d$. The oracle for utility takes the form of a gradient. To this end, set

$$ W \triangleq \text{diag}(W^1,\ldots, W^m), \ \ \ A \triangleq \text{diag}(A^1,\ldots,A^m),$$

and

$$ S \triangleq \left[\begin{array}{c} \mathrm{I} \\ \vdots \\ \mathrm{I} \end{array}\right] \in \mathbb{R}^{mn\times n}.$$

Then, in full scale,

$$ -U(x,y) = p^\top S^\top A^\top A (x-y) + \dfrac{\psi_\Theta(d)}{2} \| W A(x-y)\|^2,$$

which implies

$$ -\nabla_{x} U(x,y) = A^\top ASp + \psi_\Theta(d) A^\top W^\top W A(x-y).$$

Noting the flipped signs between $x$ and $y$, we see

$$ \nabla_{y} U(x,y) = - \nabla_{x} U(x,y).$$

In [ ]:

Copied!





import torch.nn as nn 

activation = nn.Sigmoid()
leaky_relu = nn.LeakyReLU(0.01)

def grad_neg_U(x, y, p, W, A, psi, num_cfmms):
    ''' Compute gradient of -U(x,y) with respect to x.

        Preconditions: 
            - Each input is a tensor (other than 'num_cfmms')
            - Tensor shapes match

        Postconditions:
            - Output grad shape matches x shape

        Note: The gradient with respect to y is obtained by flipping signs
    '''
    for val in [x, y, p, W, A]:
        assert torch.is_tensor(val)          

    assert x.shape[0] == num_cfmms * p.shape[0]
    assert x.shape[0] == y.shape[0]
    assert x.shape[0] == A.shape[1]
    assert W.shape[0] == A.shape[0]
    assert W.shape[0] == W.shape[1]
    
    AA = torch.mm(A.permute(1,0), A).to(device) 
    WA = torch.mm(W, A).to(device) 
    AWWA = torch.mm(WA.permute(1,0), WA).to(device)
 
    Sp = torch.cat([p for _ in range(num_cfmms)], dim=0).to(device)
    cost_term = torch.mm(AA, Sp)
    
    
    risk_term = torch.mm(AWWA, x - y) * psi 
    # risk_term = torch.mm(AWWA, x - y) * torch.mm(A.permute(1,0), psi)

    grad = cost_term + risk_term   
    return grad.to(device)

def grad_h(xi, p, W, A, psi, num_cfmms):
    ''' Compute gradient of h(xi) = - U(x,y) where xi = (v, x, y, z).

        Precondition:
            - Inputs are tensors and a scalar
        
        Postcondition:
            - Output grad shape matches input xi shape

        Note: The x and y components of gradient are negatives of each other.
              The indexing of x and y blocks in the tuple xi is done using
              dummy variables 'q' and 'm'. Also, the v and z blocks of the
              gradient are automatically zero since there is no dependence in h
              on these blocks.
    '''
    for val in [xi, p, W, A]:
        assert torch.is_tensor(val) 

    q = A.shape[0]
    m = A.shape[1]
    grad = xi.clone() 

    grad[0:q, :] = torch.zeros(grad[0:q, :].shape)
    grad[-q:, :] = torch.zeros(grad[0:q, :].shape)

    grad[q:q+m, :] = grad_neg_U(xi[q:q+m, :], xi[q+m:q+2*m, :],p, W, A,
                                psi, num_cfmms)     
    grad[q+m:q+2*m, :] = - grad[q:q+m, :]

    valid_shape = grad.shape[0] == xi.shape[0] and grad.shape[1] == xi.shape[1]
    assert valid_shape

    return grad
import torch.nn as nn 

activation = nn.Sigmoid()
leaky_relu = nn.LeakyReLU(0.01)

def grad_neg_U(x, y, p, W, A, psi, num_cfmms):
    ''' Compute gradient of -U(x,y) with respect to x.

        Preconditions: 
            - Each input is a tensor (other than 'num_cfmms')
            - Tensor shapes match

        Postconditions:
            - Output grad shape matches x shape

        Note: The gradient with respect to y is obtained by flipping signs
    '''
    for val in [x, y, p, W, A]:
        assert torch.is_tensor(val)          

    assert x.shape[0] == num_cfmms * p.shape[0]
    assert x.shape[0] == y.shape[0]
    assert x.shape[0] == A.shape[1]
    assert W.shape[0] == A.shape[0]
    assert W.shape[0] == W.shape[1]
    
    AA = torch.mm(A.permute(1,0), A).to(device) 
    WA = torch.mm(W, A).to(device) 
    AWWA = torch.mm(WA.permute(1,0), WA).to(device)
 
    Sp = torch.cat([p for _ in range(num_cfmms)], dim=0).to(device)
    cost_term = torch.mm(AA, Sp)
    
    
    risk_term = torch.mm(AWWA, x - y) * psi 
    # risk_term = torch.mm(AWWA, x - y) * torch.mm(A.permute(1,0), psi)

    grad = cost_term + risk_term   
    return grad.to(device)

def grad_h(xi, p, W, A, psi, num_cfmms):
    ''' Compute gradient of h(xi) = - U(x,y) where xi = (v, x, y, z).

        Precondition:
            - Inputs are tensors and a scalar
        
        Postcondition:
            - Output grad shape matches input xi shape

        Note: The x and y components of gradient are negatives of each other.
              The indexing of x and y blocks in the tuple xi is done using
              dummy variables 'q' and 'm'. Also, the v and z blocks of the
              gradient are automatically zero since there is no dependence in h
              on these blocks.
    '''
    for val in [xi, p, W, A]:
        assert torch.is_tensor(val) 

    q = A.shape[0]
    m = A.shape[1]
    grad = xi.clone() 

    grad[0:q, :] = torch.zeros(grad[0:q, :].shape)
    grad[-q:, :] = torch.zeros(grad[0:q, :].shape)

    grad[q:q+m, :] = grad_neg_U(xi[q:q+m, :], xi[q+m:q+2*m, :],p, W, A,
                                psi, num_cfmms)     
    grad[q+m:q+2*m, :] = - grad[q:q+m, :]

    valid_shape = grad.shape[0] == xi.shape[0] and grad.shape[1] == xi.shape[1]
    assert valid_shape

    return grad   

Geometric Mean Constraints¶

Next we look at blockwise-projections onto the hyperplanes

$$ \mathcal{H}_j = \left\lbrace z^j : \left<w^j,z^j\right> = \ln(1+\delta_j) + \ln\left( \left<w^j, d^j\right>\right)\right\rbrace, \ \ \ \text{for all $j\in[m]$.}$$

The formula for the $j$-th hyperplane projection is

$$ P_{\mathcal{H}_j}(z^j) = z^j - \dfrac{\left<w^j,z^j\right>-b(d^j)}{\|w^j\|^2} w^j.$$

where

$$ b(d^j) = \ln(1+\delta_j) + \left<w, \ln(d^j)\right>.$$

Generalizing to all blocks gives the constraint

$$ \mathcal{H} = \mathcal{H}_1 \times \mathcal{H}_2 \times \cdots \times \mathcal{H}_m .$$

The projection on $\mathcal{H}$ is given by

$$ P_\mathcal{H}(z) = z - N(z) w, $$

where

$$ N(z) \triangleq \text{diag}\left( \mu_1(z) \mathrm{I}, \ldots, \mu_m(z) \mathrm{I} \right),$$

$$\delta_{jg} \triangleq \begin{cases}\begin{array}{cl} 1 & \text{if $j$-th constraint is geometric}, \\ 0 & \text{otherwise,}\end{array} \end{cases}$$

and

$$ \mu(z) \triangleq \text{diag}\left(\delta_{1g}\|w^1\|^{-2},\ldots,\delta_{mg}\|w^m\|^{-2}\right)\left[\text{diag}\left( (w^1)^\top,\ldots, (w^m)^\top \right) \cdot (z-\ln(d)) - \ln(1-\delta)\right]. $$

Note the inclusion of $\delta_{jg}$ means $\mathcal{H}$ is not actually a tuple of just $\mathcal{H}_j$ sets since some of these sets are simply all of $\mathbb{R}^{n_j}$.

In [ ]:

Copied!





def prox_H(z, w, d, delta, geo_indices, cfmm_type):
    ''' Project z onto Cartesian product of hyperplanes.

        Inputs:
                      z - vector to project, z = (z1 z2 ... zj ... zm)
                      w - vector for scalar product, w = (w1 w2 ... wj ... wm)
                      d - data vector, d = (d1 d2 ... dj ... dm)
                  delta - tolerances for projection
            geo_indices - list of indices for each CFMM
              cfmm_type - list of booleans identifying which CFMMs are geometric
        
        Output:
            prox - projection of z onto H, the product of hyperplanes

        Preconditions:
            - All inputs are tensors or lists of tensors
            - Tensors w, z, and d are same shape
        Postconditions:
            - Output prox has same shape as input z
    '''
    assert torch.is_tensor(z)     
    assert torch.is_tensor(w)   
    assert torch.is_tensor(d)   
    assert torch.is_tensor(delta)     
    assert isinstance(geo_indices, list)
    assert isinstance(cfmm_type, list)
    assert all([torch.is_tensor(entry) for _, entry in enumerate(cfmm_type)])
    #assert torch.equal(z.shape, w.shape)
    #assert torch.equal(z.shape, d.shape)

    zd = z - torch.log(d) 
    wzd = torch.cat([torch.mm(w[idx, :].permute(1,0), zd[idx, :])
                     for idx in geo_indices], dim=0) 
    wzd_d = wzd - torch.log(1.0 + delta) 
    w_inv = torch.cat([cfmm_type[j] * (torch.norm(w[idx, :]) ** - 2.0)
                       for j, idx in enumerate(geo_indices)], dim=0) 
    mu = torch.mm(torch.diag(w_inv), wzd_d) 
    Nw = torch.cat([mu[j] * w[idx, :] for j, idx in enumerate(geo_indices)],
                   dim=0) 
    prox = z - Nw

    # assert torch.equal(z.shape, prox.shape)
    
    return prox
def prox_H(z, w, d, delta, geo_indices, cfmm_type):
    ''' Project z onto Cartesian product of hyperplanes.

        Inputs:
                      z - vector to project, z = (z1 z2 ... zj ... zm)
                      w - vector for scalar product, w = (w1 w2 ... wj ... wm)
                      d - data vector, d = (d1 d2 ... dj ... dm)
                  delta - tolerances for projection
            geo_indices - list of indices for each CFMM
              cfmm_type - list of booleans identifying which CFMMs are geometric
        
        Output:
            prox - projection of z onto H, the product of hyperplanes

        Preconditions:
            - All inputs are tensors or lists of tensors
            - Tensors w, z, and d are same shape
        Postconditions:
            - Output prox has same shape as input z
    '''
    assert torch.is_tensor(z)     
    assert torch.is_tensor(w)   
    assert torch.is_tensor(d)   
    assert torch.is_tensor(delta)     
    assert isinstance(geo_indices, list)
    assert isinstance(cfmm_type, list)
    assert all([torch.is_tensor(entry) for _, entry in enumerate(cfmm_type)])
    #assert torch.equal(z.shape, w.shape)
    #assert torch.equal(z.shape, d.shape)

    zd = z - torch.log(d) 
    wzd = torch.cat([torch.mm(w[idx, :].permute(1,0), zd[idx, :])
                     for idx in geo_indices], dim=0) 
    wzd_d = wzd - torch.log(1.0 + delta) 
    w_inv = torch.cat([cfmm_type[j] * (torch.norm(w[idx, :]) ** - 2.0)
                       for j, idx in enumerate(geo_indices)], dim=0) 
    mu = torch.mm(torch.diag(w_inv), wzd_d) 
    Nw = torch.cat([mu[j] * w[idx, :] for j, idx in enumerate(geo_indices)],
                   dim=0) 
    prox = z - Nw

    # assert torch.equal(z.shape, prox.shape)
    
    return prox

Halfspace Constraints¶

Next we look at projections on the set

$$ \mathcal{A}_j = \left\lbrace v : \left<w^j, v\right> \geq \delta_j \left<w^j, d^j\right> \right\rbrace.$$

The formula is

$$ P_{\mathcal{A}_j}(v) = v - \dfrac{\left[\left<w^j,v\right> - \delta_j\left<w^j, d^j\right>\right]_-}{\|w^j\|^2} w^j.$$

Generalizing to all blocks gives the constraint

$$ \mathcal{A} = \mathcal{A}_1 \times \mathcal{A}_2 \times \cdots \times \mathcal{A}_m .$$

The projection on $\mathcal{A}$ is given by

$$ P_{\mathcal{A}}(z) = z - N(z) w, $$

where

$$ N(z) \triangleq \text{diag}\left( \mu_1(z) \mathrm{I}, \ldots, \mu_m(z) \mathrm{I} \right),$$

$$\delta_{ja} \triangleq \begin{cases}\begin{array}{cl} 1 & \text{if $j$-th constraint is arithmetic}, \\ 0 & \text{otherwise,}\end{array} \end{cases}$$

and

$$ \mu(z) \triangleq \left[ \text{diag}\left(\delta_{1a}\|w^1\|^{-2},\ldots,\delta_{ma}\|w^m\|^{-2}\right) \cdot\text{diag}\left( (w^1)^\top,\ldots, (w^m)^\top \right) \cdot (v+\delta d)\right]_-. $$

Note the inclusion of $\delta_{jg}$ means $\mathcal{A}$ is not actually a product of just $\mathcal{A}_j$ sets since some of these sets are simply all of $\mathbb{R}^{n_j}$.

In [ ]:

Copied!





def prox_A(v, w, d, delta, geo_indices, cfmm_type):
    ''' Project v onto Cartesian product of halfspaces.

        Inputs:
                      v - vector to project, v = (v1 v2 ... vj ... vm)
                      w - vector for scalar product, w = (w1 w2 ... wj ... wm)
                      d - data vector, d = (d1 d2 ... dj ... dm)
                  delta - tolerances for projection
            geo_indices - list of indices for each CFMM
              cfmm_type - list of booleans identifying which CFMMs are geometric
        
        Output:
            prox - projection of v onto A, the product of halfspaces

        Note:
            Here cfmm_type must be negated (i.e. use 1 - type) to get delta_{ja}
            since we are interested in arithemtic CFMM constraints, *not* 
            geometric CFMM constraints.
    '''
    assert torch.is_tensor(v)   
    assert torch.is_tensor(w)   
    assert torch.is_tensor(d)   
    assert torch.is_tensor(delta)        
    assert isinstance(geo_indices, list)
    assert isinstance(cfmm_type, list)
    assert all([torch.is_tensor(entry) for _, entry in enumerate(cfmm_type)])

    # delta = [11, 1]
    # d = [11, batch size]
    
    dd = (delta * d).to(device)
    #dd = torch.cat([delta[j] * d[idx, :] for j, idx in enumerate(geo_indices)],
    #               dim=0).to(device)
    vd = v - dd
    wvd = torch.cat([torch.mm(w[idx, :].permute(1,0), vd[idx, :])
                     for idx in geo_indices], dim=0).to(device)
    w_inv = torch.cat([(1.0 - cfmm_type[j]) * (torch.norm(w[idx, :]) ** - 2.0)
                       for j, idx in enumerate(geo_indices)],
                      dim=0).to(device)
    mu = torch.clamp(torch.mm(torch.diag(w_inv), wvd), max=0.0).to(device)
    Nw = torch.cat([mu[j] * w[idx, :] for j, idx in enumerate(geo_indices)],
                   dim=0).to(device)
    prox = v - Nw
    return prox
def prox_A(v, w, d, delta, geo_indices, cfmm_type):
    ''' Project v onto Cartesian product of halfspaces.

        Inputs:
                      v - vector to project, v = (v1 v2 ... vj ... vm)
                      w - vector for scalar product, w = (w1 w2 ... wj ... wm)
                      d - data vector, d = (d1 d2 ... dj ... dm)
                  delta - tolerances for projection
            geo_indices - list of indices for each CFMM
              cfmm_type - list of booleans identifying which CFMMs are geometric
        
        Output:
            prox - projection of v onto A, the product of halfspaces

        Note:
            Here cfmm_type must be negated (i.e. use 1 - type) to get delta_{ja}
            since we are interested in arithemtic CFMM constraints, *not* 
            geometric CFMM constraints.
    '''
    assert torch.is_tensor(v)   
    assert torch.is_tensor(w)   
    assert torch.is_tensor(d)   
    assert torch.is_tensor(delta)        
    assert isinstance(geo_indices, list)
    assert isinstance(cfmm_type, list)
    assert all([torch.is_tensor(entry) for _, entry in enumerate(cfmm_type)])

    # delta = [11, 1]
    # d = [11, batch size]
    
    dd = (delta * d).to(device)
    #dd = torch.cat([delta[j] * d[idx, :] for j, idx in enumerate(geo_indices)],
    #               dim=0).to(device)
    vd = v - dd
    wvd = torch.cat([torch.mm(w[idx, :].permute(1,0), vd[idx, :])
                     for idx in geo_indices], dim=0).to(device)
    w_inv = torch.cat([(1.0 - cfmm_type[j]) * (torch.norm(w[idx, :]) ** - 2.0)
                       for j, idx in enumerate(geo_indices)],
                      dim=0).to(device)
    mu = torch.clamp(torch.mm(torch.diag(w_inv), wvd), max=0.0).to(device)
    Nw = torch.cat([mu[j] * w[idx, :] for j, idx in enumerate(geo_indices)],
                   dim=0).to(device)
    prox = v - Nw
    return prox

Linear System Constraints¶

Next we consider the linear system $$ \mathcal{R}_j \triangleq \{(v^j,x^j,y^j) : v^j = \gamma_j A^j x^j - A^jy^j\}, \ \ \text{for all $j\in[m]$.}$$

Generalizing to all blocks gives the constraint

$$ \mathcal{R} \triangleq \mathcal{R_1}\times\mathcal{R_2}\times\cdots\times \mathcal{R_m} = \{ (v,x,y) : v = \Gamma Ax - Ay \},$$

where

$$ A \triangleq \text{diag}(A^1,\ldots, A^m) \ \ \text{and} \ \ \Gamma \triangleq \text{diag}(\gamma_1 \mathrm{I}, \ldots, \gamma_m \mathrm{I}). $$

Setting

$$ N \triangleq [\Gamma A \ \ -A] \ \ \text{and} \ \ M \triangleq (\mathrm{I}+N^\top N)^{-1}$$

yields

$$ \left[P_{R}(v,x,y)\right]_{(x,y)} = M\left[\begin{array}{c} x+A^\top \Gamma v \\ y - A^\top v\end{array}\right], $$

from which the $v$ component is of the projection is directly obtained by matrix multiplication.

In [ ]:

Copied!





def prox_R(v, x, y, A, Gamma):
    ''' Project (v,x,y) onto constraint set v = A * (Gamma * x - y)
    '''
    for val in [v, x, y, A, Gamma]:
        assert torch.is_tensor(val)
    # XXX - Check that shapes match up!

    prox_v = v.clone()
    prox_x = x.clone()
    prox_y = y.clone()
    
    N = torch.cat((torch.mm(Gamma, A), -A), dim=1).to(device)
    NN = torch.mm(N.permute(1,0), N).to(device)
    I = torch.diag(torch.ones(NN.shape[0])).to(device)
    M = torch.linalg.inv(I + NN).to(device)
    
    AGv = torch.mm(A.permute(1,0), torch.mm(Gamma, v)).to(device)
    Av = torch.mm(A.permute(1,0), v).to(device)
    xy = torch.cat((x + AGv, y - Av), dim=0).to(device)
    prox_xy = torch.mm(M, xy).to(device)
    prox_x = prox_xy[:x.shape[0], :]
    prox_y = prox_xy[x.shape[0]:, :]

    GAx = torch.mm(Gamma, torch.mm(A, prox_x)).to(device)
    Ay = torch.mm(A, prox_y).to(device)
    prox_v = GAx - Ay

    return prox_v, prox_x, prox_y
def prox_R(v, x, y, A, Gamma):
    ''' Project (v,x,y) onto constraint set v = A * (Gamma * x - y)
    '''
    for val in [v, x, y, A, Gamma]:
        assert torch.is_tensor(val)
    # XXX - Check that shapes match up!

    prox_v = v.clone()
    prox_x = x.clone()
    prox_y = y.clone()
    
    N = torch.cat((torch.mm(Gamma, A), -A), dim=1).to(device)
    NN = torch.mm(N.permute(1,0), N).to(device)
    I = torch.diag(torch.ones(NN.shape[0])).to(device)
    M = torch.linalg.inv(I + NN).to(device)
    
    AGv = torch.mm(A.permute(1,0), torch.mm(Gamma, v)).to(device)
    Av = torch.mm(A.permute(1,0), v).to(device)
    xy = torch.cat((x + AGv, y - Av), dim=0).to(device)
    prox_xy = torch.mm(M, xy).to(device)
    prox_x = prox_xy[:x.shape[0], :]
    prox_y = prox_xy[x.shape[0]:, :]

    GAx = torch.mm(Gamma, torch.mm(A, prox_x)).to(device)
    Ay = torch.mm(A, prox_y).to(device)
    prox_v = GAx - Ay

    return prox_v, prox_x, prox_y

Logarithmic Constraints¶

Lastly, we consider the set

$$ \mathcal{P} = \{(v,z) : \ln(v+d) \geq z \}.$$

The projection $(\overline{v},\overline{z})$ of $(v,z)$ onto $\mathcal{P}$ satisfies

$$ 0 = (\overline{v} + d) \odot (\overline{v} - {v}) + \ln(\overline{v}+d) - z.$$

Since the relations are element-wise and separable, we compute the projection $(\overline{x}, \overline{z})$ using Newton iteration. Treating the right hand side like $f(\hat{v})$ and differentiating $f:\mathbb{R}\rightarrow\mathbb{R}$ gives

$$ f'(\hat{v}) = 2\hat{v} + (d-v) + \dfrac{1}{\hat{v}+d}.$$

Then Newton iteration takes the form

$$ \hat{v}^{k+1} = \hat{v}^k - \dfrac{f(\hat{v}^k)}{ f'(\hat{v}^k)}.$$

In [ ]:

Copied!





def prox_P(v, z, d, newton_steps=10):
    ''' Project (v, z) onto constraint set { (v,z) : ln(v + d) <= z }.
    ''' 
    assert torch.is_tensor(v)
    assert torch.is_tensor(z)
    assert torch.is_tensor(d)

    #print("v.shape = ", v.shape)
    #print("z.shape = ", z.shape)
    #print("d.shape = ", d.shape)


    with torch.no_grad(): 
        #print("v.shape = ", v.shape)
        #print("d.shape = ", d.shape)
        #print("z.shape = ", z.shape)
        indices = v + d < torch.exp(z)
                    
        vv = v[indices]
        dd = d[indices]
        pv = torch.max(vv, -dd + 0.5 * torch.abs(dd))
        zz = z[indices]
        
        for _ in range(newton_steps): 
            fv = (pv + dd) * (pv - vv) + torch.log(pv + dd) - zz 
            fpv = 2 * pv + (dd - vv) + ((pv + dd) ** -1.0) 
            pv = torch.max(pv - fv / fpv, -dd + 1.0e-6 * torch.abs(dd)) 
                    
        prox_v = v.clone()
        prox_z = z.clone()

        prox_v[indices] = pv.clone()
        prox_z[indices] = torch.log(pv + dd)

    return prox_v, prox_z
def prox_P(v, z, d, newton_steps=10):
    ''' Project (v, z) onto constraint set { (v,z) : ln(v + d) <= z }.
    ''' 
    assert torch.is_tensor(v)
    assert torch.is_tensor(z)
    assert torch.is_tensor(d)

    #print("v.shape = ", v.shape)
    #print("z.shape = ", z.shape)
    #print("d.shape = ", d.shape)


    with torch.no_grad(): 
        #print("v.shape = ", v.shape)
        #print("d.shape = ", d.shape)
        #print("z.shape = ", z.shape)
        indices = v + d < torch.exp(z)
                    
        vv = v[indices]
        dd = d[indices]
        pv = torch.max(vv, -dd + 0.5 * torch.abs(dd))
        zz = z[indices]
        
        for _ in range(newton_steps): 
            fv = (pv + dd) * (pv - vv) + torch.log(pv + dd) - zz 
            fpv = 2 * pv + (dd - vv) + ((pv + dd) ** -1.0) 
            pv = torch.max(pv - fv / fpv, -dd + 1.0e-6 * torch.abs(dd)) 
                    
        prox_v = v.clone()
        prox_z = z.clone()

        prox_v[indices] = pv.clone()
        prox_z[indices] = torch.log(pv + dd)

    return prox_v, prox_z

Projection onto sum of constraints¶

Next we consider the set $\mathcal{M}$, which is most easily expressed by using the sum of indicator functions, i.e.

$$\delta_{\mathcal{M}}(\xi) \triangleq \sum_{i\in\mathcal{I}_1} \delta_{\mathcal{P}_j}(v^j, z^j) + \sum_{j\in\mathcal{I}_2} \delta_{\mathcal{A}_j}(v^j).$$

This function marks a transition of input types. Now we use $\xi = (v,x,y,z)$. For implementation, $\xi$ is a list, i.e. $\xi = [v, x, y, z]$ where each component is itself a list of tensors, one tensor for each CFMM.

In [ ]:

Copied!





def prox_M(xi, d, w, delta, geo_indices, cfmm_types):
    ''' Test...

        lsdjf
    '''
    assert torch.is_tensor(xi)
    assert torch.is_tensor(d)
    assert torch.is_tensor(w)
    # XXX - preconditions

    q = A.shape[0]
    m = A.shape[1]
 
    v_idx = range(q)
    x_idx = range(q, q+m)
    y_idx = range(q+m,q+2*m)
    z_idx = range(q+2*m, xi.shape[0])
    
    prox = xi.clone()

    prox_v, prox_z = prox_P(xi[v_idx, :], xi[z_idx, :], d)
    prox_v = prox_A(prox_v, w, d, delta, geo_indices, cfmm_types)

    prox[v_idx, :] = prox_v
    prox[z_idx, :] = prox_z

    return prox
def prox_M(xi, d, w, delta, geo_indices, cfmm_types):
    ''' Test...

        lsdjf
    '''
    assert torch.is_tensor(xi)
    assert torch.is_tensor(d)
    assert torch.is_tensor(w)
    # XXX - preconditions

    q = A.shape[0]
    m = A.shape[1]
 
    v_idx = range(q)
    x_idx = range(q, q+m)
    y_idx = range(q+m,q+2*m)
    z_idx = range(q+2*m, xi.shape[0])
    
    prox = xi.clone()

    prox_v, prox_z = prox_P(xi[v_idx, :], xi[z_idx, :], d)
    prox_v = prox_A(prox_v, w, d, delta, geo_indices, cfmm_types)

    prox[v_idx, :] = prox_v
    prox[z_idx, :] = prox_z

    return prox

Proximal of $f$¶

Moving to the highest level of abstraction, we obtain

$$f(\xi) \triangleq \delta_{\geq0}(x,y) + \delta_{\mathcal{M}}(v,z).$$

Because the two terms of $f$ depend on different components of $\xi$, we may apply the proximals sequentially.

In [ ]:

Copied!





def prox_f_model(xi, d, w, delta, geo_indices, cfmm_types):
    '''
        xi = (v, x, y, z)

        Note: We first project on M. Then we threshold the x and y values.
        
        XXX - NEED TO MAKE TESTS FOR THIS
    '''
    q = A.shape[0]
    m = A.shape[1] 

    v_idx = range(q)
    x_idx = range(q, q+m)
    y_idx = range(q+m,q+2*m)
    z_idx = range(q+2*m, xi.shape[0])

    prox = prox_M(xi, d, w, delta, geo_indices, cfmm_types)
    prox[x_idx, :] = torch.clamp(prox[x_idx, :], min=0.0).to(device)
    prox[y_idx, :] = torch.clamp(prox[y_idx, :], min=0.0).to(device)

    return prox
def prox_f_model(xi, d, w, delta, geo_indices, cfmm_types):
    '''
        xi = (v, x, y, z)

        Note: We first project on M. Then we threshold the x and y values.
        
        XXX - NEED TO MAKE TESTS FOR THIS
    '''
    q = A.shape[0]
    m = A.shape[1] 

    v_idx = range(q)
    x_idx = range(q, q+m)
    y_idx = range(q+m,q+2*m)
    z_idx = range(q+2*m, xi.shape[0])

    prox = prox_M(xi, d, w, delta, geo_indices, cfmm_types)
    prox[x_idx, :] = torch.clamp(prox[x_idx, :], min=0.0).to(device)
    prox[y_idx, :] = torch.clamp(prox[y_idx, :], min=0.0).to(device)

    return prox

Proximal of $g$¶

The final function to compute proximals for is

$$g(\xi) \triangleq \delta_{\mathcal{R}}(v,x,y) + \delta_{\mathcal{H}}(z).$$

Since the components used by each of the two terms do not overlap, we can apply the proximal for each successively.

In [ ]:

Copied!





def prox_g_model(xi, d, w, delta, geo_indices, cfmm_types, A, Gamma):
    ''' fff
    '''
    q = A.shape[0]
    m = A.shape[1]

    v_idx = range(q)
    x_idx = range(q, q+m)
    y_idx = range(q+m,q+2*m)
    z_idx = range(q+2*m, xi.shape[0])
    
    prox = torch.zeros(xi.shape).to(device)
    v, x, y = prox_R(xi[v_idx, :], xi[x_idx, :], xi[y_idx, :], A, Gamma)

    prox[v_idx, :] = v
    prox[x_idx, :] = x
    prox[y_idx, :] = y
    prox[z_idx, :] = prox_H(xi[z_idx, :], w, d, delta, geo_indices, cfmm_types)
    return prox
def prox_g_model(xi, d, w, delta, geo_indices, cfmm_types, A, Gamma):
    ''' fff
    '''
    q = A.shape[0]
    m = A.shape[1]

    v_idx = range(q)
    x_idx = range(q, q+m)
    y_idx = range(q+m,q+2*m)
    z_idx = range(q+2*m, xi.shape[0])
    
    prox = torch.zeros(xi.shape).to(device)
    v, x, y = prox_R(xi[v_idx, :], xi[x_idx, :], xi[y_idx, :], A, Gamma)

    prox[v_idx, :] = v
    prox[x_idx, :] = x
    prox[y_idx, :] = y
    prox[z_idx, :] = prox_H(xi[z_idx, :], w, d, delta, geo_indices, cfmm_types)
    return prox

PyTests¶

Various tests are used to check aspects of the above functions (mimicking what one usually encodes as pytests in a .py file).

In [ ]:

Copied!





def get_block_diag(mats):
    ''' Create block diagonal matrix from list of tensors.
    '''
    assert isinstance(mats, list)
    assert all([torch.is_tensor(entry) for _, entry in enumerate(mats)])    
    
    mat = mats[0].to(device)
    for j, val in enumerate(mats):
        if j > 0:
            zeros_top = torch.zeros(mat.shape[0], val.shape[1]).to(device)
            zeros_bot = torch.zeros(val.shape[0], mat.shape[1]).to(device)
            mat_top = torch.cat((mat, zeros_top), dim=1).to(device)
            mat_bot = torch.cat((zeros_bot, val.to(device)), dim=1).to(device)
            mat = torch.cat((mat_top, mat_bot), dim=0).to(device)
    return mat

def get_cfmm_indices(cfmm_sizes):
    n_cfmms = len(cfmm_sizes)
    idx_lo = 0
    cfmm_indices = []

    for j in range(n_cfmms):
        idx_hi = idx_lo + cfmm_sizes[j]
        cfmm_indices.append(range(idx_lo,idx_hi))
        idx_lo = idx_hi

    n_size = sum(cfmm_sizes)
    assert idx_hi == n_size

    return cfmm_indices
def get_block_diag(mats):
    ''' Create block diagonal matrix from list of tensors.
    '''
    assert isinstance(mats, list)
    assert all([torch.is_tensor(entry) for _, entry in enumerate(mats)])    
    
    mat = mats[0].to(device)
    for j, val in enumerate(mats):
        if j > 0:
            zeros_top = torch.zeros(mat.shape[0], val.shape[1]).to(device)
            zeros_bot = torch.zeros(val.shape[0], mat.shape[1]).to(device)
            mat_top = torch.cat((mat, zeros_top), dim=1).to(device)
            mat_bot = torch.cat((zeros_bot, val.to(device)), dim=1).to(device)
            mat = torch.cat((mat_top, mat_bot), dim=0).to(device)
    return mat

def get_cfmm_indices(cfmm_sizes):
    n_cfmms = len(cfmm_sizes)
    idx_lo = 0
    cfmm_indices = []

    for j in range(n_cfmms):
        idx_hi = idx_lo + cfmm_sizes[j]
        cfmm_indices.append(range(idx_lo,idx_hi))
        idx_lo = idx_hi

    n_size = sum(cfmm_sizes)
    assert idx_hi == n_size

    return cfmm_indices  

In [ ]:

Copied!





#@title

import random
import numpy as np


torch.manual_seed(42)
random.seed(42)


def test_apply_DYS():
    ''' Apply DYS to toy problem and compare output to true solution.

        Here we consider the 1D problem with 

        f(xi) = indicator for [1, 10], g(xi) = -6 * x, h(xi) = x^2 + 9.

        Combining these gives the overall problem

        min (x - 3)^2   s.t.   1 <= x <= 10,

        and xi is initialized to 15.
    '''
    tol_err = 1.0e-3

    def prox_f_toy(xi, scalar):
        return torch.clamp(xi, min=1.0, max=10.0)

    def prox_g_toy(xi, scalar):
        return xi + 6.0 * scalar

    def grad_h_toy(xi):
        return 2.0 * xi
    
    L = 2.0
    alpha = 0.5 / L
    xi_init = 15.0 * torch.ones(1, 1)
    xi_opt = 3.0 * torch.ones(1, 1)
    xi_sol, _ = apply_DYS(prox_f_toy, prox_g_toy, grad_h_toy, xi_init, alpha)
    err = torch.norm(xi_sol - xi_opt)  
    assert err <= tol_err


def test_prox_H():
    ''' Check if projection of batch of vectors is in hyperplane is feasible.

        Proximal is feasible if linear equation is satisfied (to numerical
        precision) for CFMMs that are defined using weighted geometric means.
        For the remaining arithmetic means, the proximal should give apply the
        identity operation.
    '''
    n_batches = 20
    n_cfmms = 10
    tol_feas = 1.0e-6
    cfmm_sizes = [random.randint(2, 10) for _ in range(n_cfmms)]
    cfmm_indices = get_cfmm_indices(cfmm_sizes)
    n_size = sum(cfmm_sizes)

    v = torch.randn(n_size, n_batches)
    w = torch.rand(n_size, 1)
    d = torch.rand(n_size, 1)
    delta = 0.05 * torch.rand(n_cfmms, 1) 
    cfmm_types = [torch.randint(2, (1,)) for _ in range(n_cfmms)]
    
    prox = prox_H(v, w, d, delta, cfmm_indices, cfmm_types)  

    feasible = []
    for j, idx in enumerate(cfmm_indices):
        cfmm_geometric = bool(cfmm_types[j].numpy()) 
        if cfmm_geometric:
            wt = w[idx, :].permute(1,0)

            b = torch.log(1 + delta[j]) + torch.mm(wt, torch.log(d[idx, :])) 
            res = torch.mm(wt, prox[idx, :]) - b 
            rel_res = torch.mean(res / torch.norm(prox[idx, :], dim=0))  
            feasible.append(bool(rel_res <= tol_feas))
        else:
            rel_res = torch.mean(torch.norm(prox[idx, :] - v[idx, :], dim=0)) 
            feasible.append(bool(rel_res <= tol_feas)) 

    assert all(feasible) 

def test_prox_A():
    ''' Check if projection of batch of vectors is onto halfspace is feasible.

        Proximal is feasible if linear inequality is satisfied (to numerical
        precision) for CFMMs that are defined using weighted arithmetic means.
        For the remaining geometric means, the proximal should give apply the
        identity operation.
    '''
    n_batches = 20
    n_cfmms = 10
    tol_feas = 1.0e-6
    cfmm_sizes = [random.randint(2, 10) for _ in range(n_cfmms)]
    cfmm_indices = get_cfmm_indices(cfmm_sizes)
    n_size = sum(cfmm_sizes)

    v = torch.randn(n_size, n_batches).to(device)
    w = torch.rand(n_size, 1).to(device)
    d = torch.rand(n_size, 1).to(device)
    delta = 0.05 * torch.rand(n_cfmms, 1).to(device)
    cfmm_types = [torch.randint(2, (1,)).to(device) for _ in range(n_cfmms)]
    
    prox = prox_A(v, w, d, delta, cfmm_indices, cfmm_types)  

    feasible = []
    for j, idx in enumerate(cfmm_indices):
        cfmm_arithmetic = not bool(cfmm_types[j].to('cpu').numpy()) 
        if cfmm_arithmetic:
            wt = w[idx, :].permute(1,0)

            b = delta[j] * torch.mm(wt, d[idx, :]) 
            res = torch.clamp(torch.mm(wt, prox[idx, :]) - b, max=0.0) 
            rel_res = torch.mean(res / torch.norm(prox[idx, :], dim=0))  
            feasible.append(bool(rel_res <= tol_feas))
        else:
            rel_res = torch.mean(torch.norm(prox[idx, :] - v[idx, :], dim=0)) 
            feasible.append(bool(rel_res <= tol_feas)) 

    assert all(feasible) 


def test_prox_R():
    ''' Check if projection onto {v = A * (Gamma * x - y)} is feasible.

        This test requires the creation of a few mock CFMMs, and so matrices
        A from the global coordinates to the local CFMM coordinates are needed.
        First the number of tokens in each CFMM is randomly chosen between 2 and
        the global size. Having this, a binary matrix A is constructed using
        Bernoulli distributions (violating actual A structure, but close). 
    '''
    tol_feas = 1.0e-6    
    n_batches = 20
    n_cfmms = 10
    n_tokens = 10
    cfmm_sizes = [random.randint(2, n_tokens) for _ in range(n_cfmms)]
    cfmm_indices = get_cfmm_indices(cfmm_sizes)
    n_size = sum(cfmm_sizes)

    B = [torch.rand(cfmm_sizes[j], n_tokens) for j in range(n_cfmms)] 
    A = [torch.bernoulli(B[j]).to(device) for j in range(n_cfmms)]
    A = get_block_diag(A).to(device)

    Gamma = [0.97 * torch.eye(cfmm_sizes[j]) for j in range(n_cfmms)]  
    Gamma = get_block_diag(Gamma).to(device) 

    v = torch.randn(n_size, n_batches).to(device)
    x = torch.randn(n_cfmms * n_tokens, n_batches).to(device)  
    y = torch.randn(n_cfmms * n_tokens, n_batches).to(device)

    prox_v, prox_x, prox_y = prox_R(v, x, y, A, Gamma)

    GAx = torch.mm(Gamma, torch.mm(A, prox_x))
    Ay = torch.mm(A,prox_y)
    res = torch.norm(prox_v - (GAx - Ay), dim=0)
    rel_res = torch.mean(res / torch.norm(prox_v, dim=0))
    feasible = bool(rel_res <= tol_feas)
 
    assert feasible


def test_prox_P():
    ''' Check whether projection satisfies optimality condition.

        The projection is a solution...
    '''
    n_batches = 40
    n_size = 7
    tol_feas = 1.0e-2
    
    v = torch.randn(n_size, n_batches)
    z = torch.randn(n_size, n_batches)
    d = torch.randn(n_size, n_batches)

    prox_v, prox_z = prox_P(v, z, d)

    indices = v + d < torch.exp(z) 
    pv = prox_v[indices]
    vv = v[indices]
    zz = z[indices]
    dd = d[indices]
    
    res = (pv + dd) * (pv - vv) + torch.log(pv + dd) - zz
    res = torch.mean(torch.abs(res))
    feasible = bool(res <= tol_feas)
 
    assert feasible
#@title

import random
import numpy as np


torch.manual_seed(42)
random.seed(42)


def test_apply_DYS():
    ''' Apply DYS to toy problem and compare output to true solution.

        Here we consider the 1D problem with 

        f(xi) = indicator for [1, 10], g(xi) = -6 * x, h(xi) = x^2 + 9.

        Combining these gives the overall problem

        min (x - 3)^2   s.t.   1 <= x <= 10,

        and xi is initialized to 15.
    '''
    tol_err = 1.0e-3

    def prox_f_toy(xi, scalar):
        return torch.clamp(xi, min=1.0, max=10.0)

    def prox_g_toy(xi, scalar):
        return xi + 6.0 * scalar

    def grad_h_toy(xi):
        return 2.0 * xi
    
    L = 2.0
    alpha = 0.5 / L
    xi_init = 15.0 * torch.ones(1, 1)
    xi_opt = 3.0 * torch.ones(1, 1)
    xi_sol, _ = apply_DYS(prox_f_toy, prox_g_toy, grad_h_toy, xi_init, alpha)
    err = torch.norm(xi_sol - xi_opt)  
    assert err <= tol_err


def test_prox_H():
    ''' Check if projection of batch of vectors is in hyperplane is feasible.

        Proximal is feasible if linear equation is satisfied (to numerical
        precision) for CFMMs that are defined using weighted geometric means.
        For the remaining arithmetic means, the proximal should give apply the
        identity operation.
    '''
    n_batches = 20
    n_cfmms = 10
    tol_feas = 1.0e-6
    cfmm_sizes = [random.randint(2, 10) for _ in range(n_cfmms)]
    cfmm_indices = get_cfmm_indices(cfmm_sizes)
    n_size = sum(cfmm_sizes)

    v = torch.randn(n_size, n_batches)
    w = torch.rand(n_size, 1)
    d = torch.rand(n_size, 1)
    delta = 0.05 * torch.rand(n_cfmms, 1) 
    cfmm_types = [torch.randint(2, (1,)) for _ in range(n_cfmms)]
    
    prox = prox_H(v, w, d, delta, cfmm_indices, cfmm_types)  

    feasible = []
    for j, idx in enumerate(cfmm_indices):
        cfmm_geometric = bool(cfmm_types[j].numpy()) 
        if cfmm_geometric:
            wt = w[idx, :].permute(1,0)

            b = torch.log(1 + delta[j]) + torch.mm(wt, torch.log(d[idx, :])) 
            res = torch.mm(wt, prox[idx, :]) - b 
            rel_res = torch.mean(res / torch.norm(prox[idx, :], dim=0))  
            feasible.append(bool(rel_res <= tol_feas))
        else:
            rel_res = torch.mean(torch.norm(prox[idx, :] - v[idx, :], dim=0)) 
            feasible.append(bool(rel_res <= tol_feas)) 

    assert all(feasible) 

def test_prox_A():
    ''' Check if projection of batch of vectors is onto halfspace is feasible.

        Proximal is feasible if linear inequality is satisfied (to numerical
        precision) for CFMMs that are defined using weighted arithmetic means.
        For the remaining geometric means, the proximal should give apply the
        identity operation.
    '''
    n_batches = 20
    n_cfmms = 10
    tol_feas = 1.0e-6
    cfmm_sizes = [random.randint(2, 10) for _ in range(n_cfmms)]
    cfmm_indices = get_cfmm_indices(cfmm_sizes)
    n_size = sum(cfmm_sizes)

    v = torch.randn(n_size, n_batches).to(device)
    w = torch.rand(n_size, 1).to(device)
    d = torch.rand(n_size, 1).to(device)
    delta = 0.05 * torch.rand(n_cfmms, 1).to(device)
    cfmm_types = [torch.randint(2, (1,)).to(device) for _ in range(n_cfmms)]
    
    prox = prox_A(v, w, d, delta, cfmm_indices, cfmm_types)  

    feasible = []
    for j, idx in enumerate(cfmm_indices):
        cfmm_arithmetic = not bool(cfmm_types[j].to('cpu').numpy()) 
        if cfmm_arithmetic:
            wt = w[idx, :].permute(1,0)

            b = delta[j] * torch.mm(wt, d[idx, :]) 
            res = torch.clamp(torch.mm(wt, prox[idx, :]) - b, max=0.0) 
            rel_res = torch.mean(res / torch.norm(prox[idx, :], dim=0))  
            feasible.append(bool(rel_res <= tol_feas))
        else:
            rel_res = torch.mean(torch.norm(prox[idx, :] - v[idx, :], dim=0)) 
            feasible.append(bool(rel_res <= tol_feas)) 

    assert all(feasible) 


def test_prox_R():
    ''' Check if projection onto {v = A * (Gamma * x - y)} is feasible.

        This test requires the creation of a few mock CFMMs, and so matrices
        A from the global coordinates to the local CFMM coordinates are needed.
        First the number of tokens in each CFMM is randomly chosen between 2 and
        the global size. Having this, a binary matrix A is constructed using
        Bernoulli distributions (violating actual A structure, but close). 
    '''
    tol_feas = 1.0e-6    
    n_batches = 20
    n_cfmms = 10
    n_tokens = 10
    cfmm_sizes = [random.randint(2, n_tokens) for _ in range(n_cfmms)]
    cfmm_indices = get_cfmm_indices(cfmm_sizes)
    n_size = sum(cfmm_sizes)

    B = [torch.rand(cfmm_sizes[j], n_tokens) for j in range(n_cfmms)] 
    A = [torch.bernoulli(B[j]).to(device) for j in range(n_cfmms)]
    A = get_block_diag(A).to(device)

    Gamma = [0.97 * torch.eye(cfmm_sizes[j]) for j in range(n_cfmms)]  
    Gamma = get_block_diag(Gamma).to(device) 

    v = torch.randn(n_size, n_batches).to(device)
    x = torch.randn(n_cfmms * n_tokens, n_batches).to(device)  
    y = torch.randn(n_cfmms * n_tokens, n_batches).to(device)

    prox_v, prox_x, prox_y = prox_R(v, x, y, A, Gamma)

    GAx = torch.mm(Gamma, torch.mm(A, prox_x))
    Ay = torch.mm(A,prox_y)
    res = torch.norm(prox_v - (GAx - Ay), dim=0)
    rel_res = torch.mean(res / torch.norm(prox_v, dim=0))
    feasible = bool(rel_res <= tol_feas)
 
    assert feasible


def test_prox_P():
    ''' Check whether projection satisfies optimality condition.

        The projection is a solution...
    '''
    n_batches = 40
    n_size = 7
    tol_feas = 1.0e-2
    
    v = torch.randn(n_size, n_batches)
    z = torch.randn(n_size, n_batches)
    d = torch.randn(n_size, n_batches)

    prox_v, prox_z = prox_P(v, z, d)

    indices = v + d < torch.exp(z) 
    pv = prox_v[indices]
    vv = v[indices]
    zz = z[indices]
    dd = d[indices]
    
    res = (pv + dd) * (pv - vv) + torch.log(pv + dd) - zz
    res = torch.mean(torch.abs(res))
    feasible = bool(res <= tol_feas)
 
    assert feasible

In [ ]:

Copied!





#@title

test_apply_DYS()
test_prox_A()
test_prox_H()
test_prox_P()
test_prox_R()
print('All Pytests pass.')
#@title

test_apply_DYS()
test_prox_A()
test_prox_H()
test_prox_P()
test_prox_R()
print('All Pytests pass.')

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-13-534319815045> in <module>()
      2 
      3 test_apply_DYS()
----> 4 test_prox_A()
      5 test_prox_H()
      6 test_prox_P()

<ipython-input-12-4684df0a2ab5> in test_prox_A()
    102     cfmm_types = [torch.randint(2, (1,)).to(device) for _ in range(n_cfmms)]
    103 
--> 104     prox = prox_A(v, w, d, delta, cfmm_indices, cfmm_types)
    105 
    106     feasible = []

<ipython-input-5-88ac0cae8cc8> in prox_A(v, w, d, delta, geo_indices, cfmm_type)
     29     # d = [11, batch size]
     30 
---> 31     dd = (delta * d).to(device)
     32     #dd = torch.cat([delta[j] * d[idx, :] for j, idx in enumerate(geo_indices)],
     33     #               dim=0).to(device)

RuntimeError: The size of tensor a (10) must match the size of tensor b (49) at non-singleton dimension 0

Pytorch Model¶

In [ ]:

Copied!





import torch.nn as nn

class CfmmTradeModel(nn.Module):
    ''' Implicit L2O model for trading cryptoassets.
    '''
    def __init__(self, w, A, p, cfmm_type, delta, Gamma, geo_indices):
        super(CfmmTradeModel, self).__init__()
        self.A = A.to(device)
        self.AA = torch.mm(self.A.permute(1, 0), self.A)
        self.W = nn.Parameter(torch.rand(A.shape[0], A.shape[0]).to(device))

        self.mat1 = nn.Parameter(torch.randn(A.shape[0], A.shape[0]).to(device))        
        self.mat2 = nn.Parameter(torch.randn(A.shape[0], A.shape[0]).to(device))
        self.bias1 = nn.Parameter(torch.randn(A.shape[0],1).to(device))
        self.bias2 = nn.Parameter(torch.randn(A.shape[0],1).to(device))
        self.leaky_relu = nn.LeakyReLU(0.1)
        self.tanh = nn.Tanh()
        self.relu = nn.ReLU()
        
        self.w = w.to(device)
        # self.delta = delta.to(device)
        #self.delta = torch.ones(delta.shape).to(device)
        #self.delta_scale = nn.Parameter(-3 * torch.ones(1).to(device))

        self.delta = nn.Parameter(-3 * torch.ones(delta.shape).to(device))

        self.L = None
        self.psi_mean = 0
        self.cfmm_types = cfmm_types
        self.Gamma = Gamma.to(device)
        self.p = p.to(device)
        self.geo_indices = geo_indices
        

        self.local_size = self.A.shape[0]
        self.global_size = self.A.shape[1] 


    def device(self) -> str:
        return next(self.parameters()).data.device

    def psi(self, d, tol=1.0e-1):
        ''' Give weight for risk term in utility

            Note: Since phi is used in the utility, we must ensure its output
                  does not result in a trade transacted with a coin that is not
                  available for trade on a given CFMM. This is accounted for
                  by mapping psi from global coordinates to local coordinates,
                  and then back to global coordinates via A and its transpose.
                  This process "zeros out" any token values that are "invalid."
        '''
        sigmoid = nn.Sigmoid()
        risk_weight = torch.mm(self.mat1, d) + self.bias1
        risk_weight = self.tanh(torch.mm(self.mat2, risk_weight) + self.bias2) 
        return sigmoid(risk_weight).to(device)

    def zero_weights(self):       
        n_tokens = int(self.A.shape[1] / len(self.cfmm_types))
        W_zero = torch.zeros(self.W.data.shape).to(device)          

        idx_lo = 0
        for j, n_tokens_loc in enumerate(self.geo_indices):
            idx_hi = idx_lo + len(n_tokens_loc)
            W_zero[idx_lo:idx_hi, idx_lo:idx_hi] = self.W.data[idx_lo:idx_hi,
                                                               idx_lo:idx_hi]
            idx_lo = idx_hi
                
        self.W.data = W_zero


    def forward(self, d, fxd_pt_tol=1.0e-3, max_iters=int(2.0e3), 
                verbose=False, return_obj=False, analytic=False):
        ''' Apply Davis-Yin Splitting to minimize f + g + h

            Convergence is measured by fixed point residual.

            XXX - Force L to be actual Lipschitz constant

            Note: Assumes input d = [batch_size, reserves_size]
        '''
        assert torch.is_tensor(d) 

        self.zero_weights()

        # local coordinate sizes
        v = torch.zeros(self.local_size, d.shape[1]).to(device)
        z = torch.zeros(self.local_size, d.shape[1]).to(device)

        # global coordinate sizes
        x = torch.zeros(self.global_size, d.shape[1]).to(device)  
        y = torch.zeros(self.global_size, d.shape[1]).to(device) 
        
        zeta = torch.cat((v, x, y, z), dim=0)

        # delta = torch.exp(self.delta_scale * self.delta)
        delta = torch.exp(self.delta)

        def prox_f_m(xi, alpha=None): 
            return prox_f_model(xi, d, self.w, delta, self.geo_indices,
                                self.cfmm_types)

        def prox_g_m(xi, alpha=None):
            return prox_g_model(xi, d, self.w, delta, self.geo_indices,
                                self.cfmm_types, self.A,
                                self.Gamma)   

        # psi = self.psi(d) if not analytic else 0.0 * self.psi(d)
        # self.psi_mean = torch.mean(psi).detach().to('cpu').numpy()
        
        psi = torch.ones(1).to(device) if not analytic else torch.zeros(1).to(device)


        W = torch.abs(self.W)
        # W = torch.zeros(self.W.shape).to(device)
        # if not analytic:
        #     W[self.W != 0] = torch.abs(self.W[self.W != 0])
        
        # W = torch.exp(self.W) if not analytic else torch.zeros(self.W.shape).to(device)     

        def grad_h_m(xi):  
            return grad_h(xi, self.p, W, self.A, 
                          psi,
                          len(self.cfmm_types)) 
        

        WA = torch.mm(W, self.A).to(device)
        AWWA = torch.mm(WA.permute(1,0), WA).to(device)

        # self.L = 1.0e-5 + torch.linalg.matrix_norm(AWWA, ord=2)
        
        # self.psi_scaling =  if not analytic else 1.0
        self.L = 1.0e-1 + torch.norm(AWWA) * torch.max(psi)
        alpha = 1.99 / self.L  

        with torch.no_grad(): 
            # Bound singular values to unity and zero out off block diagonals
            # self.zero_weights()
            # u, s, v = torch.svd(self.W.data)
            # s = s / s[0]
            # #s[s > 1.0] = torch.ones(s[s > 1].shape).to(device)
            # self.W.data = torch.mm(torch.mm(u, torch.diag(s)), v.t())

            #alpha = min(1.0e2, 0.99 / (float(torch.max(psi_val).cpu() * self.L)))
            # alpha = min(1.0e2, 0.99 / self.L)

            # print('alpha = ', alpha)
     
            xi, zeta = apply_DYS(prox_f_m, prox_g_m, grad_h_m, zeta, alpha, 
                                fxd_pt_tol=fxd_pt_tol, max_iters=max_iters,
                                verbose=verbose)

        # Attach gradients for final iterations
        xi, _ = apply_DYS(prox_f_m, prox_g_m, grad_h_m, zeta, alpha, 
                          fxd_pt_tol=fxd_pt_tol, max_iters=1,
                          verbose=verbose)        
        q = A.shape[0]
        m = A.shape[1]

        x_idx = range(q, q+m)
        y_idx = range(q+m,q+2*m)        
        x = xi[x_idx, :]
        y = xi[y_idx, :]
        
        # Post conditions
        tender_nonnegative  = (x >= 0).to('cpu').numpy().all()
        receive_nonnegative = (y >= 0).to('cpu').numpy().all()
        assert tender_nonnegative
        assert receive_nonnegative

        check_violation(x, y, d, self.w, self.A, self.Gamma, geo_indices,
                        self.cfmm_types, verbose=verbose,
                        tol_err=1.0e-1)

        return x.permute(1,0), y.permute(1,0)
import torch.nn as nn

class CfmmTradeModel(nn.Module):
    ''' Implicit L2O model for trading cryptoassets.
    '''
    def __init__(self, w, A, p, cfmm_type, delta, Gamma, geo_indices):
        super(CfmmTradeModel, self).__init__()
        self.A = A.to(device)
        self.AA = torch.mm(self.A.permute(1, 0), self.A)
        self.W = nn.Parameter(torch.rand(A.shape[0], A.shape[0]).to(device))

        self.mat1 = nn.Parameter(torch.randn(A.shape[0], A.shape[0]).to(device))        
        self.mat2 = nn.Parameter(torch.randn(A.shape[0], A.shape[0]).to(device))
        self.bias1 = nn.Parameter(torch.randn(A.shape[0],1).to(device))
        self.bias2 = nn.Parameter(torch.randn(A.shape[0],1).to(device))
        self.leaky_relu = nn.LeakyReLU(0.1)
        self.tanh = nn.Tanh()
        self.relu = nn.ReLU()
        
        self.w = w.to(device)
        # self.delta = delta.to(device)
        #self.delta = torch.ones(delta.shape).to(device)
        #self.delta_scale = nn.Parameter(-3 * torch.ones(1).to(device))

        self.delta = nn.Parameter(-3 * torch.ones(delta.shape).to(device))

        self.L = None
        self.psi_mean = 0
        self.cfmm_types = cfmm_types
        self.Gamma = Gamma.to(device)
        self.p = p.to(device)
        self.geo_indices = geo_indices
        

        self.local_size = self.A.shape[0]
        self.global_size = self.A.shape[1] 


    def device(self) -> str:
        return next(self.parameters()).data.device

    def psi(self, d, tol=1.0e-1):
        ''' Give weight for risk term in utility

            Note: Since phi is used in the utility, we must ensure its output
                  does not result in a trade transacted with a coin that is not
                  available for trade on a given CFMM. This is accounted for
                  by mapping psi from global coordinates to local coordinates,
                  and then back to global coordinates via A and its transpose.
                  This process "zeros out" any token values that are "invalid."
        '''
        sigmoid = nn.Sigmoid()
        risk_weight = torch.mm(self.mat1, d) + self.bias1
        risk_weight = self.tanh(torch.mm(self.mat2, risk_weight) + self.bias2) 
        return sigmoid(risk_weight).to(device)

    def zero_weights(self):       
        n_tokens = int(self.A.shape[1] / len(self.cfmm_types))
        W_zero = torch.zeros(self.W.data.shape).to(device)          

        idx_lo = 0
        for j, n_tokens_loc in enumerate(self.geo_indices):
            idx_hi = idx_lo + len(n_tokens_loc)
            W_zero[idx_lo:idx_hi, idx_lo:idx_hi] = self.W.data[idx_lo:idx_hi,
                                                               idx_lo:idx_hi]
            idx_lo = idx_hi
                
        self.W.data = W_zero


    def forward(self, d, fxd_pt_tol=1.0e-3, max_iters=int(2.0e3), 
                verbose=False, return_obj=False, analytic=False):
        ''' Apply Davis-Yin Splitting to minimize f + g + h

            Convergence is measured by fixed point residual.

            XXX - Force L to be actual Lipschitz constant

            Note: Assumes input d = [batch_size, reserves_size]
        '''
        assert torch.is_tensor(d) 

        self.zero_weights()

        # local coordinate sizes
        v = torch.zeros(self.local_size, d.shape[1]).to(device)
        z = torch.zeros(self.local_size, d.shape[1]).to(device)

        # global coordinate sizes
        x = torch.zeros(self.global_size, d.shape[1]).to(device)  
        y = torch.zeros(self.global_size, d.shape[1]).to(device) 
        
        zeta = torch.cat((v, x, y, z), dim=0)

        # delta = torch.exp(self.delta_scale * self.delta)
        delta = torch.exp(self.delta)

        def prox_f_m(xi, alpha=None): 
            return prox_f_model(xi, d, self.w, delta, self.geo_indices,
                                self.cfmm_types)

        def prox_g_m(xi, alpha=None):
            return prox_g_model(xi, d, self.w, delta, self.geo_indices,
                                self.cfmm_types, self.A,
                                self.Gamma)   

        # psi = self.psi(d) if not analytic else 0.0 * self.psi(d)
        # self.psi_mean = torch.mean(psi).detach().to('cpu').numpy()
        
        psi = torch.ones(1).to(device) if not analytic else torch.zeros(1).to(device)


        W = torch.abs(self.W)
        # W = torch.zeros(self.W.shape).to(device)
        # if not analytic:
        #     W[self.W != 0] = torch.abs(self.W[self.W != 0])
        
        # W = torch.exp(self.W) if not analytic else torch.zeros(self.W.shape).to(device)     

        def grad_h_m(xi):  
            return grad_h(xi, self.p, W, self.A, 
                          psi,
                          len(self.cfmm_types)) 
        

        WA = torch.mm(W, self.A).to(device)
        AWWA = torch.mm(WA.permute(1,0), WA).to(device)

        # self.L = 1.0e-5 + torch.linalg.matrix_norm(AWWA, ord=2)
        
        # self.psi_scaling =  if not analytic else 1.0
        self.L = 1.0e-1 + torch.norm(AWWA) * torch.max(psi)
        alpha = 1.99 / self.L  

        with torch.no_grad(): 
            # Bound singular values to unity and zero out off block diagonals
            # self.zero_weights()
            # u, s, v = torch.svd(self.W.data)
            # s = s / s[0]
            # #s[s > 1.0] = torch.ones(s[s > 1].shape).to(device)
            # self.W.data = torch.mm(torch.mm(u, torch.diag(s)), v.t())

            #alpha = min(1.0e2, 0.99 / (float(torch.max(psi_val).cpu() * self.L)))
            # alpha = min(1.0e2, 0.99 / self.L)

            # print('alpha = ', alpha)
     
            xi, zeta = apply_DYS(prox_f_m, prox_g_m, grad_h_m, zeta, alpha, 
                                fxd_pt_tol=fxd_pt_tol, max_iters=max_iters,
                                verbose=verbose)

        # Attach gradients for final iterations
        xi, _ = apply_DYS(prox_f_m, prox_g_m, grad_h_m, zeta, alpha, 
                          fxd_pt_tol=fxd_pt_tol, max_iters=1,
                          verbose=verbose)        
        q = A.shape[0]
        m = A.shape[1]

        x_idx = range(q, q+m)
        y_idx = range(q+m,q+2*m)        
        x = xi[x_idx, :]
        y = xi[y_idx, :]
        
        # Post conditions
        tender_nonnegative  = (x >= 0).to('cpu').numpy().all()
        receive_nonnegative = (y >= 0).to('cpu').numpy().all()
        assert tender_nonnegative
        assert receive_nonnegative

        check_violation(x, y, d, self.w, self.A, self.Gamma, geo_indices,
                        self.cfmm_types, verbose=verbose,
                        tol_err=1.0e-1)

        return x.permute(1,0), y.permute(1,0)

Toy Crypto Trade Problem¶

Check trade violation¶

This function is used to check whether a given trade is "valid".

In [ ]:

Copied!





def check_violation(x, y, d, w, A, Gamma, geo_indices, cfmm_type, 
                    tol_err=1.0e-2, verbose=False):

    assert torch.is_tensor(x)
    assert torch.is_tensor(y)
    assert torch.is_tensor(d)
    assert torch.is_tensor(w)
    assert torch.is_tensor(A)
    assert torch.is_tensor(Gamma)        
    violation = []

    num_samples = x.shape[1] 
    for s in range(num_samples):
        for i, j in enumerate(geo_indices):
            geometric = cfmm_type[i]
            G = Gamma[j[0]:(1+j[-1]), j[0]:(1+j[-1])]
            if geometric:
                v = torch.log(d[j, :] +  torch.mm(G, torch.mm(A[j, :], x)) - torch.mm(A[j, :], y))
                new_val = torch.mm(w[j, :].permute(1,0), v).to('cpu').detach().numpy()[0][s] 
                ref_val = torch.mm(w[j, :].permute(1,0), torch.log(d[j, :])).to('cpu').detach().numpy()[0][s] 
            else:
                v = d[j] + torch.mm(G, torch.mm(A[j, :], x)) - torch.mm(A[j, :], y)
                new_val = torch.mm(w[j, :].permute(1,0), v).to('cpu').detach().numpy()[0][s]
                ref_val = torch.mm(w[j, :].permute(1,0), d[j]).to('cpu').detach().numpy()[0][s]        
            
            err = ref_val - new_val
            if verbose:
                msg = 'Rel error in phi[{:d}](trade) = {:0.4f}%'
                print(msg.format(i, 100.0 * err / ref_val))
            assert err <= tol_err * ref_val
def check_violation(x, y, d, w, A, Gamma, geo_indices, cfmm_type, 
                    tol_err=1.0e-2, verbose=False):

    assert torch.is_tensor(x)
    assert torch.is_tensor(y)
    assert torch.is_tensor(d)
    assert torch.is_tensor(w)
    assert torch.is_tensor(A)
    assert torch.is_tensor(Gamma)        
    violation = []

    num_samples = x.shape[1] 
    for s in range(num_samples):
        for i, j in enumerate(geo_indices):
            geometric = cfmm_type[i]
            G = Gamma[j[0]:(1+j[-1]), j[0]:(1+j[-1])]
            if geometric:
                v = torch.log(d[j, :] +  torch.mm(G, torch.mm(A[j, :], x)) - torch.mm(A[j, :], y))
                new_val = torch.mm(w[j, :].permute(1,0), v).to('cpu').detach().numpy()[0][s] 
                ref_val = torch.mm(w[j, :].permute(1,0), torch.log(d[j, :])).to('cpu').detach().numpy()[0][s] 
            else:
                v = d[j] + torch.mm(G, torch.mm(A[j, :], x)) - torch.mm(A[j, :], y)
                new_val = torch.mm(w[j, :].permute(1,0), v).to('cpu').detach().numpy()[0][s]
                ref_val = torch.mm(w[j, :].permute(1,0), d[j]).to('cpu').detach().numpy()[0][s]        
            
            err = ref_val - new_val
            if verbose:
                msg = 'Rel error in phi[{:d}](trade) = {:0.4f}%'
                print(msg.format(i, 100.0 * err / ref_val))
            assert err <= tol_err * ref_val   

(First we include a snippet of code for printing trades)

In [ ]:

Copied!





#@title

def print_trade(x, y):
    for j, _ in enumerate(x):
        print('Trades with CFMM', j)
        for i in range(x[j].shape[0]): 
            val = float(x[j][i].numpy()) 
            print('Tendered {:.2f} of token {:d}.'.format(val, i))
        for i in range(y[j].shape[0]):
            val = float(y[j][i].numpy()) 
            print('Received {:.2f} of token {:d}.'.format(val, i))
#@title

def print_trade(x, y):
    for j, _ in enumerate(x):
        print('Trades with CFMM', j)
        for i in range(x[j].shape[0]): 
            val = float(x[j][i].numpy()) 
            print('Tendered {:.2f} of token {:d}.'.format(val, i))
        for i in range(y[j].shape[0]):
            val = float(y[j][i].numpy()) 
            print('Received {:.2f} of token {:d}.'.format(val, i))

Toy Problem¶

The following snippet is used to show the implementation of the model (without training and without noise).

In [ ]:

Copied!





# Tolerances for each CFMM
deltas = 1.0e-2 * torch.ones(5,1).to(device) #[torch.zeros(1).to(device) for delta in range(5)]
cfmm_sizes = [3, 2, 2, 2, 2]
geo_indices = get_cfmm_indices(cfmm_sizes)

# Trade fee parameters in each CFMM
gammas = [0.98, 0.99, 0.98, 0.99, 0.98]
Gamma = [gamma * torch.eye(cfmm_sizes[j]).to(device)
         for j, gamma in enumerate(gammas)]
Gamma = get_block_diag(Gamma).to(device)

# Weights for the modified means in each CFMM
w1 = torch.tensor([[0.2], [0.3], [0.5]]).to(device)
w2 = torch.tensor([[0.4], [0.6]]).to(device)
w3 = torch.tensor([[0.5], [0.5]]).to(device)
w4 = torch.tensor([[0.6], [0.4]]).to(device)
w5 = torch.tensor([[0.6], [0.4]]).to(device)
w = torch.cat((w1, w2, w3, w4, w5), dim=0).to(device)

# Label for which CFMMs uses geometric means (otherwise use arithmetic)
cfmm_types = [True, True, True, True, False]
cfmm_types = [float(cfmm_type) * torch.ones(1).to(device) for cfmm_type in cfmm_types] 

# Transformations from global to local coordinates of each CFMM
A1 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A2 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]).to(device)
A3 = torch.tensor([[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A4 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A5 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A = [A1, A2, A3, A4, A5]
A = get_block_diag(A).to(device)
print('A = ', A)
print('A.shape = ', A.shape)

# Define intrinsic value of assets
p = torch.tensor([[1.0], [1.0], [1.0]]).to(device)

# Trade model for CFMM network
the_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices)
print('model loaded successfully')

# Sample data for reserves in CFMM network
d1 = torch.tensor([[5.0], [7.3], [12.5]]).to(device)
d2 = torch.tensor([[10.0], [4.7]]).to(device)
d3 = torch.tensor([[5.5], [7.4]]).to(device)
d4 = torch.tensor([[4.6], [14.9]]).to(device)
d5 = torch.tensor([[7.6], [5.4]]).to(device)
d = torch.cat((d1, d2, d3, d4, d5), dim=0).to(device)
#d = d.permute(1,0).clone()

# Optimal trade for provided data
x, y = the_model(d, verbose=True)

print('x = ', x)
print('y = ', y)
# Tolerances for each CFMM
deltas = 1.0e-2 * torch.ones(5,1).to(device) #[torch.zeros(1).to(device) for delta in range(5)]
cfmm_sizes = [3, 2, 2, 2, 2]
geo_indices = get_cfmm_indices(cfmm_sizes)

# Trade fee parameters in each CFMM
gammas = [0.98, 0.99, 0.98, 0.99, 0.98]
Gamma = [gamma * torch.eye(cfmm_sizes[j]).to(device)
         for j, gamma in enumerate(gammas)]
Gamma = get_block_diag(Gamma).to(device)

# Weights for the modified means in each CFMM
w1 = torch.tensor([[0.2], [0.3], [0.5]]).to(device)
w2 = torch.tensor([[0.4], [0.6]]).to(device)
w3 = torch.tensor([[0.5], [0.5]]).to(device)
w4 = torch.tensor([[0.6], [0.4]]).to(device)
w5 = torch.tensor([[0.6], [0.4]]).to(device)
w = torch.cat((w1, w2, w3, w4, w5), dim=0).to(device)

# Label for which CFMMs uses geometric means (otherwise use arithmetic)
cfmm_types = [True, True, True, True, False]
cfmm_types = [float(cfmm_type) * torch.ones(1).to(device) for cfmm_type in cfmm_types] 

# Transformations from global to local coordinates of each CFMM
A1 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A2 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]).to(device)
A3 = torch.tensor([[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A4 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A5 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A = [A1, A2, A3, A4, A5]
A = get_block_diag(A).to(device)
print('A = ', A)
print('A.shape = ', A.shape)

# Define intrinsic value of assets
p = torch.tensor([[1.0], [1.0], [1.0]]).to(device)

# Trade model for CFMM network
the_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices)
print('model loaded successfully')

# Sample data for reserves in CFMM network
d1 = torch.tensor([[5.0], [7.3], [12.5]]).to(device)
d2 = torch.tensor([[10.0], [4.7]]).to(device)
d3 = torch.tensor([[5.5], [7.4]]).to(device)
d4 = torch.tensor([[4.6], [14.9]]).to(device)
d5 = torch.tensor([[7.6], [5.4]]).to(device)
d = torch.cat((d1, d2, d3, d4, d5), dim=0).to(device)
#d = d.permute(1,0).clone()

# Optimal trade for provided data
x, y = the_model(d, verbose=True)

print('x = ', x)
print('y = ', y)

Price Generation¶

$$ \dfrac{p_j}{p_k} = \dfrac{\nabla_j \phi(d)}{\nabla_k \phi(d)}.$$

For geometric CFMMs,

$$ \nabla_\ell \phi(d) = \dfrac{w_\ell}{d_\ell} \cdot \phi(d).$$

which implies

$$ \dfrac{p_j}{p_k} = \dfrac{w_j / d_j}{w_k / d_k} = \dfrac{w_j}{w_k}\cdot \dfrac{d_k}{d_j}.$$

Thus, for a known prices $p_j$ and $p_k$, weights $w_j$ and $w_k$, and reserves $d_k$, we can find the amount of reserves for the $j$-th token; namely,

$$ d_j = \dfrac{w_j p_k d_k}{w_k p_j}.$$

In [ ]:

Copied!





import json
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.dataset import random_split

rel_token_vals_ave = torch.tensor([[1.0], [1.9], [0.5]])
res_ave = [50.0, 30.0, 23.5, 35.7, 46.9]
the_model.p = rel_token_vals_ave


def get_token_val_sample(rel_vals, var):
    sample = torch.tensor([[(1.0 + var * torch.randn(1)) * val.numpy()]
              for _, val in enumerate(rel_vals)])
    return sample

def create_loader(A, n_tokens, cfmm_indices, train_size=5000, test_size=500,
                  train_batch_size=100, test_batch_size=500):

    var_per = 1.0e-1
   
    torch.manual_seed(2022)
    d = []
    
    batch_size = train_size + test_size

    for j, idx in enumerate(cfmm_indices): 
        
        A_block = A[idx, j*n_tokens:(j+1)*n_tokens]

        loc_rel_token_vals = torch.mm(A_block.to('cpu'), rel_token_vals_ave)

        rel_token_vals_sample = loc_rel_token_vals
        for k in range(batch_size):
            new_sample = get_token_val_sample(loc_rel_token_vals, var_per)
            if k > 0:
                rel_token_vals_sample = torch.cat((rel_token_vals_sample, new_sample), dim=1)
            else:
                rel_token_vals_sample = new_sample 

        num_tokens = len(idx)
        reserves = torch.zeros(num_tokens, batch_size)
        for t in range(num_tokens):
            res_samples = (1.0 + var_per * torch.randn(batch_size)) * res_ave[j]

            reserves[t, :] = float(w[idx, :][t]) * rel_token_vals_sample[0, :]
            reserves[t, :] *= res_samples 
            reserves[t, :] /= (float(w[idx, :][0]) * rel_token_vals_sample[t, :])

        reserves = torch.tensor(reserves)
        d.append(reserves)

    d_ten = d[0] 
    for idx, _ in enumerate(d):
        if idx > 0:
            d_ten = torch.cat((d_ten, d[idx]), dim=0) 

    d_ten = d_ten.permute(1,0)  

    # --------------------------------------------------------------------------
    # Analytic predictions with no noise
    # --------------------------------------------------------------------------
    # anal_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices) 
    # anal_model.eval()
    # with torch.no_grad():
    #     for j, val in enumerate(anal_model.W): 
    #         anal_model.W[j][0]= torch.zeros(val[0].shape)           
    
    # x_true, y_true = anal_model(d_ten.permute(1,0).to(device),
    #                             max_iters=int(1.0e4))
    # x_true = x_true.detach()
    # y_true = y_true.detach()
    # --------------------------------------------------------------------------
    # --------------------------------------------------------------------------
    std = 1.0e-3 * torch.rand(d_ten.shape[0], 1)
    std = std * torch.ones(d_ten.shape)  
    noise = torch.normal(mean=torch.zeros(d_ten.shape), std=std) 
    d_obs = (1.0 + noise) * d_ten 
    
    dataset = TensorDataset(d_obs.to(device), d_ten.to(device))
    # dataset = TensorDataset(d_obs.to(device), d_ten.to(device), x_true.to(device), y_true.to(device))

    train_data, test_data = random_split(dataset, [train_size,
                                                   test_size])
    loader_train = DataLoader(dataset=train_data,
                              batch_size=train_batch_size, shuffle=True)
    loader_test = DataLoader(dataset=test_data,
                             batch_size=test_batch_size, shuffle=False)

    return loader_train, loader_test

def convert_to_tensor(x_list):
    x_ten = x_list[0]
    x_ten = x_list[0]
    for idx, val in enumerate(x_list):
        if idx > 0:
            x_ten = torch.cat((x_ten, val), dim=0)
    x_ten = x_ten.permute(1,0)  
    return x_ten
import json
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.dataset import random_split

rel_token_vals_ave = torch.tensor([[1.0], [1.9], [0.5]])
res_ave = [50.0, 30.0, 23.5, 35.7, 46.9]
the_model.p = rel_token_vals_ave


def get_token_val_sample(rel_vals, var):
    sample = torch.tensor([[(1.0 + var * torch.randn(1)) * val.numpy()]
              for _, val in enumerate(rel_vals)])
    return sample

def create_loader(A, n_tokens, cfmm_indices, train_size=5000, test_size=500,
                  train_batch_size=100, test_batch_size=500):

    var_per = 1.0e-1
   
    torch.manual_seed(2022)
    d = []
    
    batch_size = train_size + test_size

    for j, idx in enumerate(cfmm_indices): 
        
        A_block = A[idx, j*n_tokens:(j+1)*n_tokens]

        loc_rel_token_vals = torch.mm(A_block.to('cpu'), rel_token_vals_ave)

        rel_token_vals_sample = loc_rel_token_vals
        for k in range(batch_size):
            new_sample = get_token_val_sample(loc_rel_token_vals, var_per)
            if k > 0:
                rel_token_vals_sample = torch.cat((rel_token_vals_sample, new_sample), dim=1)
            else:
                rel_token_vals_sample = new_sample 

        num_tokens = len(idx)
        reserves = torch.zeros(num_tokens, batch_size)
        for t in range(num_tokens):
            res_samples = (1.0 + var_per * torch.randn(batch_size)) * res_ave[j]

            reserves[t, :] = float(w[idx, :][t]) * rel_token_vals_sample[0, :]
            reserves[t, :] *= res_samples 
            reserves[t, :] /= (float(w[idx, :][0]) * rel_token_vals_sample[t, :])

        reserves = torch.tensor(reserves)
        d.append(reserves)

    d_ten = d[0] 
    for idx, _ in enumerate(d):
        if idx > 0:
            d_ten = torch.cat((d_ten, d[idx]), dim=0) 

    d_ten = d_ten.permute(1,0)  

    # --------------------------------------------------------------------------
    # Analytic predictions with no noise
    # --------------------------------------------------------------------------
    # anal_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices) 
    # anal_model.eval()
    # with torch.no_grad():
    #     for j, val in enumerate(anal_model.W): 
    #         anal_model.W[j][0]= torch.zeros(val[0].shape)           
    
    # x_true, y_true = anal_model(d_ten.permute(1,0).to(device),
    #                             max_iters=int(1.0e4))
    # x_true = x_true.detach()
    # y_true = y_true.detach()
    # --------------------------------------------------------------------------
    # --------------------------------------------------------------------------
    std = 1.0e-3 * torch.rand(d_ten.shape[0], 1)
    std = std * torch.ones(d_ten.shape)  
    noise = torch.normal(mean=torch.zeros(d_ten.shape), std=std) 
    d_obs = (1.0 + noise) * d_ten 
    
    dataset = TensorDataset(d_obs.to(device), d_ten.to(device))
    # dataset = TensorDataset(d_obs.to(device), d_ten.to(device), x_true.to(device), y_true.to(device))

    train_data, test_data = random_split(dataset, [train_size,
                                                   test_size])
    loader_train = DataLoader(dataset=train_data,
                              batch_size=train_batch_size, shuffle=True)
    loader_test = DataLoader(dataset=test_data,
                             batch_size=test_batch_size, shuffle=False)

    return loader_train, loader_test

def convert_to_tensor(x_list):
    x_ten = x_list[0]
    x_ten = x_list[0]
    for idx, val in enumerate(x_list):
        if idx > 0:
            x_ten = torch.cat((x_ten, val), dim=0)
    x_ten = x_ten.permute(1,0)  
    return x_ten  

Correct Trade Errors¶

Define

$$ \theta(\tau) \triangleq \phi(d + \Gamma A x - A(y - \tau p)) - \phi(d).$$

Use Newton

$$ \tau_{k+1} = \tau_k - \dfrac{ \theta(\tau_k)}{\theta'(\tau_k)}.$$

Let $q = d + \gamma x - y$. For geommetric CFMMs, we use

$$ \tilde{\phi}(v) \triangleq \left< w, \ln(v)\right>$$

so that

$$ \theta'(\tau) = \left< \nabla \tilde{\phi}(z + \tau A p),A p\right> = \tilde{\phi}(z+\tau A p) \left<\frac{w}{z+\tau A p}, A p\right>.$$

Meanwhile, for weighted arithmetic mean CFMMs,

$$ \tilde{\phi}(v) \triangleq \left< w, v\right>,$$

for which

$$ \theta'(\tau) = \left< \nabla \tilde{\phi}(z + \tau A p), Ap\right> = \left<w, Ap\right>.$$

In [ ]:

Copied!





## REPLACE THIS WITH BISECTION METHOD

def correct_trade(x, y, d, p, A, Gamma, ww, geo_indices, cfmm_type, n_tokens, newton_iters=200):
    y_correct = []

    y_correct = torch.zeros(y.shape).to(device)

    for j, idx in enumerate(geo_indices): 
        tau = 0.0
        #iter = 0
        #invalid_trade = True

        #while iter < newton_iters and invalid_trade:
        #    iter = iter + 1
        for iter in range(newton_iters): 
            
            geometric = cfmm_type[j]

            G = Gamma[idx[0]:(1+idx[-1]), idx[0]:(1+idx[-1])] 
            if geometric:
                 
                ytp = y[n_tokens*j:(j+1)*n_tokens, :] - tau * p
                ytp = ytp.to(device)
                Aytp =  torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], ytp).to(device)
                Ax = torch.mm(A[idx, :], x).to(device)
                GAxAytp =  (torch.mm(G, Ax) - Aytp).to(device) 
                d_block = d[idx, :].to(device)

                v = torch.log(d_block + GAxAytp).to(device)   
                v = torch.clamp(v, min=1.0e-6).to(device) 
                
                phi_z = torch.mm(ww[idx, :].permute(1,0), v)[0]
                phi_d = torch.mm(ww[idx, :].permute(1,0), torch.log(d[idx, :]))[0]

                theta_z = phi_z - phi_d

                # invalid_trade = (theta_z < 0).any()

                dot_product = torch.mm(torch.mm(p.permute(1,0), A[idx, n_tokens*j:(j+1)*n_tokens].permute(1,0)), ww[idx, :] / v)

                if iter < 20:
                    tau =  tau - (0.05 / float(iter + 1)) * theta_z
                else:
                    tau = tau - theta_z / (phi_z * dot_product)
            else:
                v = d[idx, :] +  torch.mm(G, torch.mm(A[idx, :], x)) - torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], y[n_tokens*j:(j+1)*n_tokens, :] - tau * p)
                theta_z = torch.mm(ww[idx, :].permute(1,0), v - d[idx, :])

                if iter < 20:
                    tau = tau - (0.05 / float(iter + 1)) * theta_z 
                else:
                    tau = tau - theta_z / torch.mm(ww[idx, :].permute(1,0), torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], p))                

        y_block =  y[n_tokens*j:(j+1)*n_tokens, :] - tau * p
        y_correct[n_tokens*j:(j+1)*n_tokens, :] = y_block

    check_violation(x, y_correct, d, ww, A, Gamma, geo_indices, cfmm_types)
    return x, y_correct

## REPLACE THIS WITH BISECTION METHOD

def correct_trade(x, y, d, p, A, Gamma, ww, geo_indices, cfmm_type, n_tokens, newton_iters=200):
    y_correct = []

    y_correct = torch.zeros(y.shape).to(device)

    for j, idx in enumerate(geo_indices): 
        tau = 0.0
        #iter = 0
        #invalid_trade = True

        #while iter < newton_iters and invalid_trade:
        #    iter = iter + 1
        for iter in range(newton_iters): 
            
            geometric = cfmm_type[j]

            G = Gamma[idx[0]:(1+idx[-1]), idx[0]:(1+idx[-1])] 
            if geometric:
                 
                ytp = y[n_tokens*j:(j+1)*n_tokens, :] - tau * p
                ytp = ytp.to(device)
                Aytp =  torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], ytp).to(device)
                Ax = torch.mm(A[idx, :], x).to(device)
                GAxAytp =  (torch.mm(G, Ax) - Aytp).to(device) 
                d_block = d[idx, :].to(device)

                v = torch.log(d_block + GAxAytp).to(device)   
                v = torch.clamp(v, min=1.0e-6).to(device) 
                
                phi_z = torch.mm(ww[idx, :].permute(1,0), v)[0]
                phi_d = torch.mm(ww[idx, :].permute(1,0), torch.log(d[idx, :]))[0]

                theta_z = phi_z - phi_d

                # invalid_trade = (theta_z < 0).any()

                dot_product = torch.mm(torch.mm(p.permute(1,0), A[idx, n_tokens*j:(j+1)*n_tokens].permute(1,0)), ww[idx, :] / v)

                if iter < 20:
                    tau =  tau - (0.05 / float(iter + 1)) * theta_z
                else:
                    tau = tau - theta_z / (phi_z * dot_product)
            else:
                v = d[idx, :] +  torch.mm(G, torch.mm(A[idx, :], x)) - torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], y[n_tokens*j:(j+1)*n_tokens, :] - tau * p)
                theta_z = torch.mm(ww[idx, :].permute(1,0), v - d[idx, :])

                if iter < 20:
                    tau = tau - (0.05 / float(iter + 1)) * theta_z 
                else:
                    tau = tau - theta_z / torch.mm(ww[idx, :].permute(1,0), torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], p))                

        y_block =  y[n_tokens*j:(j+1)*n_tokens, :] - tau * p
        y_correct[n_tokens*j:(j+1)*n_tokens, :] = y_block

    check_violation(x, y_correct, d, ww, A, Gamma, geo_indices, cfmm_types)
    return x, y_correct

In [ ]:

Copied!





def compute_energy(d, ww, A, geo_indices, cfmm_type):
    ''' Get the energy violation of system
    '''
    energy = torch.zeros(len(geo_indices), d.shape[1]).to(device)

    for j, idx in enumerate(geo_indices): 
        geometric = cfmm_type[j]

        if geometric:
            term = torch.mm(ww[idx, :].permute(1,0), torch.log(d[idx, :]))
            energy[j, :] += torch.sum(torch.exp(term))
        else: 
            energy[j, :] += torch.sum(torch.mm(ww[idx, :].permute(1,0), d[idx, :]))
    return energy


def get_violation(x, y, d, w, A, Gamma, geo_indices, cfmm_type):
    relu = nn.ReLU()
    energy_new = compute_energy(d + torch.mm(Gamma, torch.mm(A, x)) - torch.mm(A,y), w, A, geo_indices, cfmm_type)
    energy_old = compute_energy(d, w, A, geo_indices, cfmm_type) 
    error = relu(energy_old - energy_new)

    error_mean = torch.sum(torch.mean(error, dim=1))

    error_rel = torch.sum(torch.mean(error, dim=1)) / torch.sum(torch.mean(energy_old))
    return error_mean, error_rel
def compute_energy(d, ww, A, geo_indices, cfmm_type):
    ''' Get the energy violation of system
    '''
    energy = torch.zeros(len(geo_indices), d.shape[1]).to(device)

    for j, idx in enumerate(geo_indices): 
        geometric = cfmm_type[j]

        if geometric:
            term = torch.mm(ww[idx, :].permute(1,0), torch.log(d[idx, :]))
            energy[j, :] += torch.sum(torch.exp(term))
        else: 
            energy[j, :] += torch.sum(torch.mm(ww[idx, :].permute(1,0), d[idx, :]))
    return energy


def get_violation(x, y, d, w, A, Gamma, geo_indices, cfmm_type):
    relu = nn.ReLU()
    energy_new = compute_energy(d + torch.mm(Gamma, torch.mm(A, x)) - torch.mm(A,y), w, A, geo_indices, cfmm_type)
    energy_old = compute_energy(d, w, A, geo_indices, cfmm_type) 
    error = relu(energy_old - energy_new)

    error_mean = torch.sum(torch.mean(error, dim=1))

    error_rel = torch.sum(torch.mean(error, dim=1)) / torch.sum(torch.mean(energy_old))
    return error_mean, error_rel

In [ ]:

Copied!





def get_list_from_ten(d, w):
    d_list = []
    idx_lo = 0
    for j, _ in enumerate(w):
        idx_hi = idx_lo + w[j].shape[0]
        d_list.append(d[:, idx_lo:idx_hi].permute(1,0))
        idx_lo = idx_hi
    return d_list

def get_ten_from_list(d):
    d_ten = torch.zeros(d[0].shape)

    for j, val in enumerate(d):
        d_ten = torch.cat((d_ten, val), dim=0)
    return d_ten    
    
def utility(x, y, p):
    n_cfmms = int(x.shape[0] / p.shape[0])
    obj = torch.zeros(1, x.shape[1]).to(device)
    for j in range(n_cfmms):
        yx = (y - x).to(device)
        obj = obj + torch.mm(p.permute(1,0), yx[j*p.shape[0]:(j+1)*p.shape[0]])
    return obj
def get_list_from_ten(d, w):
    d_list = []
    idx_lo = 0
    for j, _ in enumerate(w):
        idx_hi = idx_lo + w[j].shape[0]
        d_list.append(d[:, idx_lo:idx_hi].permute(1,0))
        idx_lo = idx_hi
    return d_list

def get_ten_from_list(d):
    d_ten = torch.zeros(d[0].shape)

    for j, val in enumerate(d):
        d_ten = torch.cat((d_ten, val), dim=0)
    return d_ten    
    
def utility(x, y, p):
    n_cfmms = int(x.shape[0] / p.shape[0])
    obj = torch.zeros(1, x.shape[1]).to(device)
    for j in range(n_cfmms):
        yx = (y - x).to(device)
        obj = obj + torch.mm(p.permute(1,0), yx[j*p.shape[0]:(j+1)*p.shape[0]])
    return obj    

Train Model¶

In [ ]:

Copied!





from prettytable import PrettyTable

n_tokens = 3
loader_train, loader_test = create_loader(A, n_tokens, geo_indices)
num_epochs    = int(1.0e3)
learning_rate = 5.0e-3
optimizer     = torch.optim.Adam(the_model.parameters(), lr=learning_rate)
lr_scheduler  = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5,
                                                gamma=0.98)
training_msg  = '[{:4d}] Contraint Error = {:6.2f}%'
training_msg += '| loss = {:8.1e} | pred util = {:8.1e} | fro-norm = {:2.2e} '
training_msg += '| psi = {:2.3e} | lr = {:2.2e}'

MSE = nn.MSELoss()

def model_params(net):
    table = PrettyTable(["Network Component", "# Parameters"])
    num_params = 0
    for name, parameter in net.named_parameters():
        if not parameter.requires_grad:
            continue
        table.add_row([name, parameter.numel()])
        num_params += parameter.numel()
    table.add_row(['TOTAL', num_params])
    return table

print(model_params(the_model))
from prettytable import PrettyTable

n_tokens = 3
loader_train, loader_test = create_loader(A, n_tokens, geo_indices)
num_epochs    = int(1.0e3)
learning_rate = 5.0e-3
optimizer     = torch.optim.Adam(the_model.parameters(), lr=learning_rate)
lr_scheduler  = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5,
                                                gamma=0.98)
training_msg  = '[{:4d}] Contraint Error = {:6.2f}%'
training_msg += '| loss = {:8.1e} | pred util = {:8.1e} | fro-norm = {:2.2e} '
training_msg += '| psi = {:2.3e} | lr = {:2.2e}'

MSE = nn.MSELoss()

def model_params(net):
    table = PrettyTable(["Network Component", "# Parameters"])
    num_params = 0
    for name, parameter in net.named_parameters():
        if not parameter.requires_grad:
            continue
        table.add_row([name, parameter.numel()])
        num_params += parameter.numel()
    table.add_row(['TOTAL', num_params])
    return table

print(model_params(the_model))

In [ ]:

Copied!





loss_ave = 0.0
obj_ave = 0.0
 
the_model.to(device)

for epoch in range(num_epochs): 
    for d_obs, d_true in loader_train:
        optimizer.zero_grad()
        the_model.train() 
        x, y = the_model(d_obs.permute(1,0).to(device), max_iters=int(1.0e3))
        
        obj = torch.mean(utility(x.permute(1,0), y.permute(1,0), the_model.p.to(device)))
        viol, rel_viol = get_violation(x.permute(1,0), y.permute(1,0),
                                       d_true.permute(1,0), w, A, Gamma,
                                       geo_indices, cfmm_types)
        
        loss = 1.0e-1 * (viol ** 2) - obj

        loss.backward()
        optimizer.step()

        loss_curr = loss.detach().item()
        loss_ave *= 0.9
        loss_ave += 0.1 * loss_curr if loss_ave != 0 else loss_curr     

        W = torch.abs(the_model.W.data.clone()) 

        print(training_msg.format(epoch, 100 * rel_viol, loss_ave, obj,
                                  torch.norm(W),
                                  the_model.psi_mean,
                                  optimizer.param_groups[0]['lr']
                                  ))
        print('delta = ', the_model.delta)
    lr_scheduler.step()
loss_ave = 0.0
obj_ave = 0.0
 
the_model.to(device)

for epoch in range(num_epochs): 
    for d_obs, d_true in loader_train:
        optimizer.zero_grad()
        the_model.train() 
        x, y = the_model(d_obs.permute(1,0).to(device), max_iters=int(1.0e3))
        
        obj = torch.mean(utility(x.permute(1,0), y.permute(1,0), the_model.p.to(device)))
        viol, rel_viol = get_violation(x.permute(1,0), y.permute(1,0),
                                       d_true.permute(1,0), w, A, Gamma,
                                       geo_indices, cfmm_types)
        
        loss = 1.0e-1 * (viol ** 2) - obj

        loss.backward()
        optimizer.step()

        loss_curr = loss.detach().item()
        loss_ave *= 0.9
        loss_ave += 0.1 * loss_curr if loss_ave != 0 else loss_curr     

        W = torch.abs(the_model.W.data.clone()) 

        print(training_msg.format(epoch, 100 * rel_viol, loss_ave, obj,
                                  torch.norm(W),
                                  the_model.psi_mean,
                                  optimizer.param_groups[0]['lr']
                                  ))
        print('delta = ', the_model.delta)
    lr_scheduler.step()

In [ ]:

Copied!

print(the_model.delta)
print(the_model.delta_scale)
print(the_model.delta)
print(the_model.delta_scale)

Print of heatmap of matrix

In [ ]:

Copied!





from matplotlib import pyplot as plt
W = the_model.W.data.clone()
W[W != 0] = torch.exp(W[W != 0])

plt.imshow(W.to('cpu').detach().numpy(), interpolation='nearest')
plt.colorbar()
plt.show()

# CONFUSION MATRIX?

print(the_model.W.data.shape)
from matplotlib import pyplot as plt
W = the_model.W.data.clone()
W[W != 0] = torch.exp(W[W != 0])

plt.imshow(W.to('cpu').detach().numpy(), interpolation='nearest')
plt.colorbar()
plt.show()

# CONFUSION MATRIX?

print(the_model.W.data.shape)

Load analytic model and set weights $W$ to be zero, i.e. no correction term.

In [ ]:

Copied!





# anal_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices)
# with torch.no_grad():
#     for j, val in enumerate(anal_model.W): 
#         anal_model.W[j][0]= torch.zeros(val[0].shape)
# anal_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices)
# with torch.no_grad():
#     for j, val in enumerate(anal_model.W): 
#         anal_model.W[j][0]= torch.zeros(val[0].shape)       

Testing Results¶

In [ ]:

Copied!





# XXX - Loss Testing stuff....

#for d_obs, d_true, x_true, y_true in loader_test:                        
for d_obs, d_true in loader_test:   
    the_model.eval() 
    x_l2o, y_l2o = the_model(d_obs.permute(1,0).to(device), max_iters=int(1e3))

    x_anal, y_anal = the_model(d_obs.permute(1,0).to(device), max_iters=int(2e3),
                               analytic=True)

    # x_anal, y_anal = anal_model(d_obs.permute(1,0).to(device), max_iters=int(1e3))

    # d_true = get_list_from_ten(d_true, w)

    # x_l2o_corr, y_l2o_corr = correct_trade(x_l2o.permute(1,0),
    #                                          y_l2o.permute(1,0), 
    #                                          d_true.permute(1,0), 
    #                                          the_model.p.to(device), A, Gamma, 
    #                                          w, geo_indices, cfmm_types,
    #                                          n_tokens)
    
    # x_anal_corr, y_anal_corr = correct_trade(x_anal.permute(1,0),
    #                                          y_anal.permute(1,0), 
    #                                          d_true.permute(1,0), 
    #                                          the_model.p.to(device), A, Gamma, 
    #                                          w, geo_indices, cfmm_types,
    #                                          n_tokens)
    
    #y_ten_l2o = get_ten_from_list(y_l2o)
    #y_corr_l2o = get_ten_from_list(y_corr_l2o)
#mse = MSE(y_l2o, y_true) + MSE(x_l2o, x_true)
#print('y_l2o = ', y_l2o)
#print('y_true = ', y_true)
#print('mse = ', mse)
# XXX - Loss Testing stuff....

#for d_obs, d_true, x_true, y_true in loader_test:                        
for d_obs, d_true in loader_test:   
    the_model.eval() 
    x_l2o, y_l2o = the_model(d_obs.permute(1,0).to(device), max_iters=int(1e3))

    x_anal, y_anal = the_model(d_obs.permute(1,0).to(device), max_iters=int(2e3),
                               analytic=True)

    # x_anal, y_anal = anal_model(d_obs.permute(1,0).to(device), max_iters=int(1e3))

    # d_true = get_list_from_ten(d_true, w)

    # x_l2o_corr, y_l2o_corr = correct_trade(x_l2o.permute(1,0),
    #                                          y_l2o.permute(1,0), 
    #                                          d_true.permute(1,0), 
    #                                          the_model.p.to(device), A, Gamma, 
    #                                          w, geo_indices, cfmm_types,
    #                                          n_tokens)
    
    # x_anal_corr, y_anal_corr = correct_trade(x_anal.permute(1,0),
    #                                          y_anal.permute(1,0), 
    #                                          d_true.permute(1,0), 
    #                                          the_model.p.to(device), A, Gamma, 
    #                                          w, geo_indices, cfmm_types,
    #                                          n_tokens)
    
    #y_ten_l2o = get_ten_from_list(y_l2o)
    #y_corr_l2o = get_ten_from_list(y_corr_l2o)
#mse = MSE(y_l2o, y_true) + MSE(x_l2o, x_true)
#print('y_l2o = ', y_l2o)
#print('y_true = ', y_true)
#print('mse = ', mse)

Plot Data for Figure 10 in paper¶

In [ ]:

Copied!





def check_constraints(x, y, d, w, A, Gamma, geo_indices, cfmm_type):
    relu = nn.ReLU()
    energy_new = compute_energy(d + torch.mm(Gamma, torch.mm(A, x)) - torch.mm(A,y), w, A, geo_indices, cfmm_type)
    energy_old = compute_energy(d, w, A, geo_indices, cfmm_type) 
    constraint_viol = relu(energy_old - energy_new)
    return constraint_viol
def check_constraints(x, y, d, w, A, Gamma, geo_indices, cfmm_type):
    relu = nn.ReLU()
    energy_new = compute_energy(d + torch.mm(Gamma, torch.mm(A, x)) - torch.mm(A,y), w, A, geo_indices, cfmm_type)
    energy_old = compute_energy(d, w, A, geo_indices, cfmm_type) 
    constraint_viol = relu(energy_old - energy_new)
    return constraint_viol

In [ ]:

Copied!

util_l2o = utility(x_l2o.to(device).permute(1,0), y_l2o.to(device).permute(1,0), the_model.p.to(device))

print('util_l2o = ', util_l2o)
constraint_viol = check_constraints(x_l2o.permute(1,0), y_l2o.permute(1,0),
                                    d_true.permute(1,0), w, A, Gamma,
                                    geo_indices, cfmm_types)

constraint_viol = torch.sum(constraint_viol, dim=0, keepdim=True)
util_l2o[constraint_viol > 0] = 0.0 * util_l2o[constraint_viol > 0] 

print('util_l2o = ', torch.mean(util_l2o))

util_anal = utility(x_anal.permute(1,0), y_anal.permute(1,0), the_model.p.to(device))
constraint_viol = check_constraints(x_anal.permute(1,0), y_anal.permute(1,0),
                                    d_true.permute(1,0), w, A, Gamma,
                                    geo_indices, cfmm_types)

constraint_viol = torch.sum(constraint_viol, dim=0, keepdim=True)
print('pre_anal = ', util_anal)

util_anal[constraint_viol > 0] = 0.0 * util_anal[constraint_viol > 0] 

print('util_anal = ', torch.mean(util_anal))
util_l2o = utility(x_l2o.to(device).permute(1,0), y_l2o.to(device).permute(1,0), the_model.p.to(device))

print('util_l2o = ', util_l2o)
constraint_viol = check_constraints(x_l2o.permute(1,0), y_l2o.permute(1,0),
                                    d_true.permute(1,0), w, A, Gamma,
                                    geo_indices, cfmm_types)

constraint_viol = torch.sum(constraint_viol, dim=0, keepdim=True)
util_l2o[constraint_viol > 0] = 0.0 * util_l2o[constraint_viol > 0] 

print('util_l2o = ', torch.mean(util_l2o))

util_anal = utility(x_anal.permute(1,0), y_anal.permute(1,0), the_model.p.to(device))
constraint_viol = check_constraints(x_anal.permute(1,0), y_anal.permute(1,0),
                                    d_true.permute(1,0), w, A, Gamma,
                                    geo_indices, cfmm_types)

constraint_viol = torch.sum(constraint_viol, dim=0, keepdim=True)
print('pre_anal = ', util_anal)

util_anal[constraint_viol > 0] = 0.0 * util_anal[constraint_viol > 0] 

print('util_anal = ', torch.mean(util_anal))