Cryptoasset Trading¶
The problem at hand is to train a model that reliably identifies arbitrage transactions for cryptoasssets within a network of constant function market makers (CFMMs). That is, given private valuation $p_i$ of the $i$-th cryptoasset, an analytic form of the task at hand is expressed via
$$ \min_{(x,y)} - U(x,y)\ \ \text{s.t.} \ \ (x^j,y^j) \ \ \text{forms a valid trade within the $j$-th CFMM with reserves $r^j$},$$
where $r,p,x,y\in\mathbb{R}^n$ and the utility is
$$ U(x,y) = \sum_{j} \left< A^jp, A^j(y^j - x^j)\right>,$$
where $A^j$ maps $x^j$ and $y^j$ from global coordinates to local coordinates. (Note local coordinates may be in a smaller dimensional space since not all CFMMs utilize all cryptoassets.)
As a simplified model of real-world phenomena, we introduce noise (e.g. due to front running) into the reserves $r$ to obtain noisy observable data $d$. This makes the problem inconsistent (we note below a simple mechanism for correcting this for this toy problem). To account for noise, a "cost of risk" is learned and incorporated in a data-driven utility $U_\Theta$. This utility is optimized for a collection of training data to identify the optimal arbitrage trades. That is, we define an implicit model
$$ \mathcal{N}_\Theta(d) \triangleq (x_\Theta, y_\Theta) \triangleq \text{argmin}_{(x,y)} -U_\Theta(x,y) \ \ \text{s.t.} \ \ (x^j,y^j) \ \ \text{forms a valid trade within the $j$-th CFMM with r}.$$
The weights $\Theta$ are tuned to solve the training problem
$$ \min_{\Theta} \mathbb{E}_{d\sim\mathcal{D}} \left[ - U_\Theta(x_\Theta, \tilde{y}_\Theta) \right],$$
where
$$ \tilde{y}_\Theta^j = y_\Theta^j - \tau_j p$$
and $\tau_j$ is chosen so $(x^j_\Theta, \tilde{y}^j_\Theta)$ forms a valid trade with the $j$-th CFMM that has reserves $r^j$. When $r^j = d^j$ and $U_\theta = U$, we have $y_\Theta^j = \tilde{y}_\Theta^j$. However, the presence of noise forces these to differ.
Below we overview the maths for the optimization algorithm. Then we define a PyTorch network that uses Davis-Yin Splitting (DYS) to compute inferences. Training is conducted by using Jacobian-Free Backprop (which maintains a low memory footprint by only backpropping through the final DYS iteration).
This notebook accompanies the preprint Explainable AI via Learning to Optimize. In particular, see the appendices for more mathematical details about computations in this notebook.
We emphasize the code herein is meant to be illustrative and, thus, has not be optimized for speed.
import torch
device= 'cuda:0'
Levels of Abstraction in Math Formulation¶
We begin at the highest level with the arbitrage problem
$$ \min_{(x,y)} \sum_{j=1}^m \left< A^jp, A^j(x^j - y^j)\right>$$
subject to the constraints
$$ (x^j, y^j) = \text{valid transaction with $j$-th CFMM with data $d^j$.}$$
Here we assume CFMMs take use either a weighted arithmetic mean or weighted geometric mean for their constant function. See our preprint for further detials on the constraints, their decoupling, and derivations of the proximal mappings for these. In our setting, using the parameterization $\xi=(v,x,y,z)$, we use the functions
$$ f(\xi;d) \triangleq \delta_{\geq0}(x,y) + \delta_{\mathcal{M}}(v,z;d),$$
$$ g(\xi;d) \triangleq \delta_{\mathcal{R}}(v,x,y) + \delta_{\mathcal{H}}(z;d),$$
$$ h(\xi;d) \triangleq -U_\Theta(x,y; d).$$
The referenced indicator functions are given by
$$ \delta_{\mathcal{M}}(v,z;d) \triangleq \sum_{i\in\mathcal{I}_1} \delta_{\mathcal{P}_j}(v^j, z^j;d) + \sum_{j\in\mathcal{I}_2} \delta_{\mathcal{A}_j}(v^j;d),$$
$$ \delta_{\mathcal{H}}(z) \triangleq \sum_{j\in\mathcal{I}_1} \delta_{\mathcal{H}_j} (z^j;d)$$
$$ \delta_{\mathcal{R}}(v,x,y) \triangleq \sum_{j=1}^m \delta_{\mathcal{R}^j}(v^j,x^j,y^j),$$
where $\mathcal{I}_1$ is the set of all CFMM indices with weighted geometric means and $\mathcal{I}_2 = [m] - \mathcal{I}_1$ is the remaining indices for weighted arithmetic mean CFMMs, and
$$ \mathcal{P}_j \triangleq \{(v^j,z^j) : z^j \leq \ln(v^j + d) \},$$
$$ \mathcal{A}_j \triangleq \{x^j : x+d^j \geq 0,\ \left<w^j,x^j\right> \geq \delta_j \left<w,r^j\right>,$$
$$ \mathcal{H}_j \triangleq \{z^j : \left<w^j,z^j\right> = \ln(1-\delta_j) + \left<w^j, d^j\right>\},$$
$$ \mathcal{R}_j \triangleq \{(v^j,x^j,y^j) : v^j = \Gamma^j A^j x^j - A^j y^j \}.$$
Above $\Gamma^j = \text{diag}(\gamma_j,\ldots,\gamma_j) \in \mathbb{R}^{n_j\times n_j}$ gives the trade fees within a CFMM and $A^j \in \mathbb{R}^{n_j\times n}$ is a binary matrix that maps global cryptoasset indices into the local coordinates of each CFMM.
Below we provide a function for applying DYS to lists of tensors
Davis-Yin Splitting¶
From the above parameterizations, the arbitrage problem above is equivalent to the minimization problem
$$ \min_{\xi} f(\xi) + g(\xi) + h(\xi), $$
where $f$ and $g$ are proximable and $h$ is $L$-Lipschitz differentiable. Davis-Yin Splitting is applied, with $\alpha \in (0,2/L)$ via the iteration
$$ \xi^{k+1} = \text{prox}_{\alpha f}(\zeta^k)$$ $$ \psi^{k+1} = \text{prox}_{\alpha g}(2\xi^{k+1}-\zeta^k - \alpha \nabla h(\xi^{k+1}))$$ $$ \zeta^{k+1} = \zeta^k + \psi^{k+1}-\xi^{k+1}$$
Here
$$ \lim_{k\rightarrow\infty} \xi^k = \xi^\star \in \text{argmin}_{\xi} f(\xi) + g(\xi) + h(\xi).$$
def apply_DYS(prox_f, prox_g, grad_h, zeta_init, alpha,
fxd_pt_tol=1.0e-12, max_iters=1.0e3, verbose=False):
''' Apply Davis-Yin Splitting to minimize f + g + h.
Convergence is measured by fixed point residual.
Preconditions:
- Step size is positive
- Stopping tolerance is positive
- Initialization is a tensor
Postconditions:
- xi and zeta shapes match
Note: We are unable to directly verify whether alpha < 2 / L, with L the
Lipschitz constant of h.
'''
assert alpha > 0
assert fxd_pt_tol > 0
assert torch.is_tensor(zeta_init)
zeta = zeta_init.clone()
converge = False
iter = 0
while not converge and iter < max_iters:
xi = prox_f(zeta, alpha)
psi = prox_g(2.0 * xi - zeta - alpha * grad_h(xi), alpha)
zeta = zeta + psi - xi
fix_pt_res = torch.max(torch.norm(psi - xi, dim=1))
ref_norm = torch.mean(torch.norm(zeta, dim=1))
# converge = fix_pt_res <= fxd_pt_tol * ref_norm
converge = False
iter += 1
if verbose and iter == max_iters:
print('DYS reached max iteration count.')
valid_shape = xi.shape[0] == zeta_init.shape[0]
valid_shape = xi.shape[1] == zeta_init.shape[1] and valid_shape
assert valid_shape
return prox_f(zeta, alpha), zeta
Cryptoasset Trade Utility¶
The differentiable function $h$ gives the trade utility, i.e.
$$h(\xi;d) \triangleq -U_\Theta(x,y;d) = \underbrace{\sum_{j=1}^m \left< A^j p, A^j(x^j-y^j) \right>}_{\text{net value lossed}} + \underbrace{\dfrac{\psi_\Theta(d)}{2}\sum_{j=1}^m \left\|W^j A^j (x^j - y^j)) \right\|^2 .}_{\text{Risk term}}, $$
Sigmoid... (CHECK LIPSCHITZ STUFF)
$$ \sigma(z) = \dfrac{1}{1+\exp(-z)}\ \ \implies \ \ \nabla \sigma(z) = \sigma(z)\cdot(1-\sigma(z)).$$
$$ \psi_\Theta(d) \cdot A^\top W^\top WA(x-y) \cdot \text{diag}(\sigma' (WA(x-y)+b)) $$
where $\Theta$ are weights consisting of $W$ and parameters for $\psi_\Theta:\mathbb{R}^n\rightarrow \mathbb{R}$. The function $\psi_\Theta$ is continuous, but is not (necessarily) convex in $d$. The oracle for utility takes the form of a gradient. To this end, set
$$ W \triangleq \text{diag}(W^1,\ldots, W^m), \ \ \ A \triangleq \text{diag}(A^1,\ldots,A^m),$$
and
$$ S \triangleq \left[\begin{array}{c} \mathrm{I} \\ \vdots \\ \mathrm{I} \end{array}\right] \in \mathbb{R}^{mn\times n}.$$
Then, in full scale,
$$ -U(x,y) = p^\top S^\top A^\top A (x-y) + \dfrac{\psi_\Theta(d)}{2} \| W A(x-y)\|^2,$$
which implies
$$ -\nabla_{x} U(x,y) = A^\top ASp + \psi_\Theta(d) A^\top W^\top W A(x-y).$$
Noting the flipped signs between $x$ and $y$, we see
$$ \nabla_{y} U(x,y) = - \nabla_{x} U(x,y).$$
import torch.nn as nn
activation = nn.Sigmoid()
leaky_relu = nn.LeakyReLU(0.01)
def grad_neg_U(x, y, p, W, A, psi, num_cfmms):
''' Compute gradient of -U(x,y) with respect to x.
Preconditions:
- Each input is a tensor (other than 'num_cfmms')
- Tensor shapes match
Postconditions:
- Output grad shape matches x shape
Note: The gradient with respect to y is obtained by flipping signs
'''
for val in [x, y, p, W, A]:
assert torch.is_tensor(val)
assert x.shape[0] == num_cfmms * p.shape[0]
assert x.shape[0] == y.shape[0]
assert x.shape[0] == A.shape[1]
assert W.shape[0] == A.shape[0]
assert W.shape[0] == W.shape[1]
AA = torch.mm(A.permute(1,0), A).to(device)
WA = torch.mm(W, A).to(device)
AWWA = torch.mm(WA.permute(1,0), WA).to(device)
Sp = torch.cat([p for _ in range(num_cfmms)], dim=0).to(device)
cost_term = torch.mm(AA, Sp)
risk_term = torch.mm(AWWA, x - y) * psi
# risk_term = torch.mm(AWWA, x - y) * torch.mm(A.permute(1,0), psi)
grad = cost_term + risk_term
return grad.to(device)
def grad_h(xi, p, W, A, psi, num_cfmms):
''' Compute gradient of h(xi) = - U(x,y) where xi = (v, x, y, z).
Precondition:
- Inputs are tensors and a scalar
Postcondition:
- Output grad shape matches input xi shape
Note: The x and y components of gradient are negatives of each other.
The indexing of x and y blocks in the tuple xi is done using
dummy variables 'q' and 'm'. Also, the v and z blocks of the
gradient are automatically zero since there is no dependence in h
on these blocks.
'''
for val in [xi, p, W, A]:
assert torch.is_tensor(val)
q = A.shape[0]
m = A.shape[1]
grad = xi.clone()
grad[0:q, :] = torch.zeros(grad[0:q, :].shape)
grad[-q:, :] = torch.zeros(grad[0:q, :].shape)
grad[q:q+m, :] = grad_neg_U(xi[q:q+m, :], xi[q+m:q+2*m, :],p, W, A,
psi, num_cfmms)
grad[q+m:q+2*m, :] = - grad[q:q+m, :]
valid_shape = grad.shape[0] == xi.shape[0] and grad.shape[1] == xi.shape[1]
assert valid_shape
return grad
Geometric Mean Constraints¶
Next we look at blockwise-projections onto the hyperplanes
$$ \mathcal{H}_j = \left\lbrace z^j : \left<w^j,z^j\right> = \ln(1+\delta_j) + \ln\left( \left<w^j, d^j\right>\right)\right\rbrace, \ \ \ \text{for all $j\in[m]$.}$$
The formula for the $j$-th hyperplane projection is
$$ P_{\mathcal{H}_j}(z^j) = z^j - \dfrac{\left<w^j,z^j\right>-b(d^j)}{\|w^j\|^2} w^j.$$
where
$$ b(d^j) = \ln(1+\delta_j) + \left<w, \ln(d^j)\right>.$$
Generalizing to all blocks gives the constraint
$$ \mathcal{H} = \mathcal{H}_1 \times \mathcal{H}_2 \times \cdots \times \mathcal{H}_m .$$
The projection on $\mathcal{H}$ is given by
$$ P_\mathcal{H}(z) = z - N(z) w, $$
where
$$ N(z) \triangleq \text{diag}\left( \mu_1(z) \mathrm{I}, \ldots, \mu_m(z) \mathrm{I} \right),$$
$$\delta_{jg} \triangleq \begin{cases}\begin{array}{cl} 1 & \text{if $j$-th constraint is geometric}, \\ 0 & \text{otherwise,}\end{array} \end{cases}$$
and
$$ \mu(z) \triangleq \text{diag}\left(\delta_{1g}\|w^1\|^{-2},\ldots,\delta_{mg}\|w^m\|^{-2}\right)\left[\text{diag}\left( (w^1)^\top,\ldots, (w^m)^\top \right) \cdot (z-\ln(d)) - \ln(1-\delta)\right]. $$
Note the inclusion of $\delta_{jg}$ means $\mathcal{H}$ is not actually a tuple of just $\mathcal{H}_j$ sets since some of these sets are simply all of $\mathbb{R}^{n_j}$.
def prox_H(z, w, d, delta, geo_indices, cfmm_type):
''' Project z onto Cartesian product of hyperplanes.
Inputs:
z - vector to project, z = (z1 z2 ... zj ... zm)
w - vector for scalar product, w = (w1 w2 ... wj ... wm)
d - data vector, d = (d1 d2 ... dj ... dm)
delta - tolerances for projection
geo_indices - list of indices for each CFMM
cfmm_type - list of booleans identifying which CFMMs are geometric
Output:
prox - projection of z onto H, the product of hyperplanes
Preconditions:
- All inputs are tensors or lists of tensors
- Tensors w, z, and d are same shape
Postconditions:
- Output prox has same shape as input z
'''
assert torch.is_tensor(z)
assert torch.is_tensor(w)
assert torch.is_tensor(d)
assert torch.is_tensor(delta)
assert isinstance(geo_indices, list)
assert isinstance(cfmm_type, list)
assert all([torch.is_tensor(entry) for _, entry in enumerate(cfmm_type)])
#assert torch.equal(z.shape, w.shape)
#assert torch.equal(z.shape, d.shape)
zd = z - torch.log(d)
wzd = torch.cat([torch.mm(w[idx, :].permute(1,0), zd[idx, :])
for idx in geo_indices], dim=0)
wzd_d = wzd - torch.log(1.0 + delta)
w_inv = torch.cat([cfmm_type[j] * (torch.norm(w[idx, :]) ** - 2.0)
for j, idx in enumerate(geo_indices)], dim=0)
mu = torch.mm(torch.diag(w_inv), wzd_d)
Nw = torch.cat([mu[j] * w[idx, :] for j, idx in enumerate(geo_indices)],
dim=0)
prox = z - Nw
# assert torch.equal(z.shape, prox.shape)
return prox
Halfspace Constraints¶
Next we look at projections on the set
$$ \mathcal{A}_j = \left\lbrace v : \left<w^j, v\right> \geq \delta_j \left<w^j, d^j\right> \right\rbrace.$$
The formula is
$$ P_{\mathcal{A}_j}(v) = v - \dfrac{\left[\left<w^j,v\right> - \delta_j\left<w^j, d^j\right>\right]_-}{\|w^j\|^2} w^j.$$
Generalizing to all blocks gives the constraint
$$ \mathcal{A} = \mathcal{A}_1 \times \mathcal{A}_2 \times \cdots \times \mathcal{A}_m .$$
The projection on $\mathcal{A}$ is given by
$$ P_{\mathcal{A}}(z) = z - N(z) w, $$
where
$$ N(z) \triangleq \text{diag}\left( \mu_1(z) \mathrm{I}, \ldots, \mu_m(z) \mathrm{I} \right),$$
$$\delta_{ja} \triangleq \begin{cases}\begin{array}{cl} 1 & \text{if $j$-th constraint is arithmetic}, \\ 0 & \text{otherwise,}\end{array} \end{cases}$$
and
$$ \mu(z) \triangleq \left[ \text{diag}\left(\delta_{1a}\|w^1\|^{-2},\ldots,\delta_{ma}\|w^m\|^{-2}\right) \cdot\text{diag}\left( (w^1)^\top,\ldots, (w^m)^\top \right) \cdot (v+\delta d)\right]_-. $$
Note the inclusion of $\delta_{jg}$ means $\mathcal{A}$ is not actually a product of just $\mathcal{A}_j$ sets since some of these sets are simply all of $\mathbb{R}^{n_j}$.
def prox_A(v, w, d, delta, geo_indices, cfmm_type):
''' Project v onto Cartesian product of halfspaces.
Inputs:
v - vector to project, v = (v1 v2 ... vj ... vm)
w - vector for scalar product, w = (w1 w2 ... wj ... wm)
d - data vector, d = (d1 d2 ... dj ... dm)
delta - tolerances for projection
geo_indices - list of indices for each CFMM
cfmm_type - list of booleans identifying which CFMMs are geometric
Output:
prox - projection of v onto A, the product of halfspaces
Note:
Here cfmm_type must be negated (i.e. use 1 - type) to get delta_{ja}
since we are interested in arithemtic CFMM constraints, *not*
geometric CFMM constraints.
'''
assert torch.is_tensor(v)
assert torch.is_tensor(w)
assert torch.is_tensor(d)
assert torch.is_tensor(delta)
assert isinstance(geo_indices, list)
assert isinstance(cfmm_type, list)
assert all([torch.is_tensor(entry) for _, entry in enumerate(cfmm_type)])
# delta = [11, 1]
# d = [11, batch size]
dd = (delta * d).to(device)
#dd = torch.cat([delta[j] * d[idx, :] for j, idx in enumerate(geo_indices)],
# dim=0).to(device)
vd = v - dd
wvd = torch.cat([torch.mm(w[idx, :].permute(1,0), vd[idx, :])
for idx in geo_indices], dim=0).to(device)
w_inv = torch.cat([(1.0 - cfmm_type[j]) * (torch.norm(w[idx, :]) ** - 2.0)
for j, idx in enumerate(geo_indices)],
dim=0).to(device)
mu = torch.clamp(torch.mm(torch.diag(w_inv), wvd), max=0.0).to(device)
Nw = torch.cat([mu[j] * w[idx, :] for j, idx in enumerate(geo_indices)],
dim=0).to(device)
prox = v - Nw
return prox
Linear System Constraints¶
Next we consider the linear system $$ \mathcal{R}_j \triangleq \{(v^j,x^j,y^j) : v^j = \gamma_j A^j x^j - A^jy^j\}, \ \ \text{for all $j\in[m]$.}$$
Generalizing to all blocks gives the constraint
$$ \mathcal{R} \triangleq \mathcal{R_1}\times\mathcal{R_2}\times\cdots\times \mathcal{R_m} = \{ (v,x,y) : v = \Gamma Ax - Ay \},$$
where
$$ A \triangleq \text{diag}(A^1,\ldots, A^m) \ \ \text{and} \ \ \Gamma \triangleq \text{diag}(\gamma_1 \mathrm{I}, \ldots, \gamma_m \mathrm{I}). $$
Setting
$$ N \triangleq [\Gamma A \ \ -A] \ \ \text{and} \ \ M \triangleq (\mathrm{I}+N^\top N)^{-1}$$
yields
$$ \left[P_{R}(v,x,y)\right]_{(x,y)} = M\left[\begin{array}{c} x+A^\top \Gamma v \\ y - A^\top v\end{array}\right], $$
from which the $v$ component is of the projection is directly obtained by matrix multiplication.
def prox_R(v, x, y, A, Gamma):
''' Project (v,x,y) onto constraint set v = A * (Gamma * x - y)
'''
for val in [v, x, y, A, Gamma]:
assert torch.is_tensor(val)
# XXX - Check that shapes match up!
prox_v = v.clone()
prox_x = x.clone()
prox_y = y.clone()
N = torch.cat((torch.mm(Gamma, A), -A), dim=1).to(device)
NN = torch.mm(N.permute(1,0), N).to(device)
I = torch.diag(torch.ones(NN.shape[0])).to(device)
M = torch.linalg.inv(I + NN).to(device)
AGv = torch.mm(A.permute(1,0), torch.mm(Gamma, v)).to(device)
Av = torch.mm(A.permute(1,0), v).to(device)
xy = torch.cat((x + AGv, y - Av), dim=0).to(device)
prox_xy = torch.mm(M, xy).to(device)
prox_x = prox_xy[:x.shape[0], :]
prox_y = prox_xy[x.shape[0]:, :]
GAx = torch.mm(Gamma, torch.mm(A, prox_x)).to(device)
Ay = torch.mm(A, prox_y).to(device)
prox_v = GAx - Ay
return prox_v, prox_x, prox_y
Logarithmic Constraints¶
Lastly, we consider the set
$$ \mathcal{P} = \{(v,z) : \ln(v+d) \geq z \}.$$
The projection $(\overline{v},\overline{z})$ of $(v,z)$ onto $\mathcal{P}$ satisfies
$$ 0 = (\overline{v} + d) \odot (\overline{v} - {v}) + \ln(\overline{v}+d) - z.$$
Since the relations are element-wise and separable, we compute the projection $(\overline{x}, \overline{z})$ using Newton iteration. Treating the right hand side like $f(\hat{v})$ and differentiating $f:\mathbb{R}\rightarrow\mathbb{R}$ gives
$$ f'(\hat{v}) = 2\hat{v} + (d-v) + \dfrac{1}{\hat{v}+d}.$$
Then Newton iteration takes the form
$$ \hat{v}^{k+1} = \hat{v}^k - \dfrac{f(\hat{v}^k)}{ f'(\hat{v}^k)}.$$
def prox_P(v, z, d, newton_steps=10):
''' Project (v, z) onto constraint set { (v,z) : ln(v + d) <= z }.
'''
assert torch.is_tensor(v)
assert torch.is_tensor(z)
assert torch.is_tensor(d)
#print("v.shape = ", v.shape)
#print("z.shape = ", z.shape)
#print("d.shape = ", d.shape)
with torch.no_grad():
#print("v.shape = ", v.shape)
#print("d.shape = ", d.shape)
#print("z.shape = ", z.shape)
indices = v + d < torch.exp(z)
vv = v[indices]
dd = d[indices]
pv = torch.max(vv, -dd + 0.5 * torch.abs(dd))
zz = z[indices]
for _ in range(newton_steps):
fv = (pv + dd) * (pv - vv) + torch.log(pv + dd) - zz
fpv = 2 * pv + (dd - vv) + ((pv + dd) ** -1.0)
pv = torch.max(pv - fv / fpv, -dd + 1.0e-6 * torch.abs(dd))
prox_v = v.clone()
prox_z = z.clone()
prox_v[indices] = pv.clone()
prox_z[indices] = torch.log(pv + dd)
return prox_v, prox_z
Projection onto sum of constraints¶
Next we consider the set $\mathcal{M}$, which is most easily expressed by using the sum of indicator functions, i.e.
$$\delta_{\mathcal{M}}(\xi) \triangleq \sum_{i\in\mathcal{I}_1} \delta_{\mathcal{P}_j}(v^j, z^j) + \sum_{j\in\mathcal{I}_2} \delta_{\mathcal{A}_j}(v^j).$$
This function marks a transition of input types. Now we use $\xi = (v,x,y,z)$. For implementation, $\xi$ is a list, i.e. $\xi = [v, x, y, z]$ where each component is itself a list of tensors, one tensor for each CFMM.
def prox_M(xi, d, w, delta, geo_indices, cfmm_types):
''' Test...
lsdjf
'''
assert torch.is_tensor(xi)
assert torch.is_tensor(d)
assert torch.is_tensor(w)
# XXX - preconditions
q = A.shape[0]
m = A.shape[1]
v_idx = range(q)
x_idx = range(q, q+m)
y_idx = range(q+m,q+2*m)
z_idx = range(q+2*m, xi.shape[0])
prox = xi.clone()
prox_v, prox_z = prox_P(xi[v_idx, :], xi[z_idx, :], d)
prox_v = prox_A(prox_v, w, d, delta, geo_indices, cfmm_types)
prox[v_idx, :] = prox_v
prox[z_idx, :] = prox_z
return prox
Proximal of $f$¶
Moving to the highest level of abstraction, we obtain
$$f(\xi) \triangleq \delta_{\geq0}(x,y) + \delta_{\mathcal{M}}(v,z).$$
Because the two terms of $f$ depend on different components of $\xi$, we may apply the proximals sequentially.
def prox_f_model(xi, d, w, delta, geo_indices, cfmm_types):
'''
xi = (v, x, y, z)
Note: We first project on M. Then we threshold the x and y values.
XXX - NEED TO MAKE TESTS FOR THIS
'''
q = A.shape[0]
m = A.shape[1]
v_idx = range(q)
x_idx = range(q, q+m)
y_idx = range(q+m,q+2*m)
z_idx = range(q+2*m, xi.shape[0])
prox = prox_M(xi, d, w, delta, geo_indices, cfmm_types)
prox[x_idx, :] = torch.clamp(prox[x_idx, :], min=0.0).to(device)
prox[y_idx, :] = torch.clamp(prox[y_idx, :], min=0.0).to(device)
return prox
Proximal of $g$¶
The final function to compute proximals for is
$$g(\xi) \triangleq \delta_{\mathcal{R}}(v,x,y) + \delta_{\mathcal{H}}(z).$$
Since the components used by each of the two terms do not overlap, we can apply the proximal for each successively.
def prox_g_model(xi, d, w, delta, geo_indices, cfmm_types, A, Gamma):
''' fff
'''
q = A.shape[0]
m = A.shape[1]
v_idx = range(q)
x_idx = range(q, q+m)
y_idx = range(q+m,q+2*m)
z_idx = range(q+2*m, xi.shape[0])
prox = torch.zeros(xi.shape).to(device)
v, x, y = prox_R(xi[v_idx, :], xi[x_idx, :], xi[y_idx, :], A, Gamma)
prox[v_idx, :] = v
prox[x_idx, :] = x
prox[y_idx, :] = y
prox[z_idx, :] = prox_H(xi[z_idx, :], w, d, delta, geo_indices, cfmm_types)
return prox
PyTests¶
Various tests are used to check aspects of the above functions (mimicking what one usually encodes as pytests in a .py file).
def get_block_diag(mats):
''' Create block diagonal matrix from list of tensors.
'''
assert isinstance(mats, list)
assert all([torch.is_tensor(entry) for _, entry in enumerate(mats)])
mat = mats[0].to(device)
for j, val in enumerate(mats):
if j > 0:
zeros_top = torch.zeros(mat.shape[0], val.shape[1]).to(device)
zeros_bot = torch.zeros(val.shape[0], mat.shape[1]).to(device)
mat_top = torch.cat((mat, zeros_top), dim=1).to(device)
mat_bot = torch.cat((zeros_bot, val.to(device)), dim=1).to(device)
mat = torch.cat((mat_top, mat_bot), dim=0).to(device)
return mat
def get_cfmm_indices(cfmm_sizes):
n_cfmms = len(cfmm_sizes)
idx_lo = 0
cfmm_indices = []
for j in range(n_cfmms):
idx_hi = idx_lo + cfmm_sizes[j]
cfmm_indices.append(range(idx_lo,idx_hi))
idx_lo = idx_hi
n_size = sum(cfmm_sizes)
assert idx_hi == n_size
return cfmm_indices
#@title
import random
import numpy as np
torch.manual_seed(42)
random.seed(42)
def test_apply_DYS():
''' Apply DYS to toy problem and compare output to true solution.
Here we consider the 1D problem with
f(xi) = indicator for [1, 10], g(xi) = -6 * x, h(xi) = x^2 + 9.
Combining these gives the overall problem
min (x - 3)^2 s.t. 1 <= x <= 10,
and xi is initialized to 15.
'''
tol_err = 1.0e-3
def prox_f_toy(xi, scalar):
return torch.clamp(xi, min=1.0, max=10.0)
def prox_g_toy(xi, scalar):
return xi + 6.0 * scalar
def grad_h_toy(xi):
return 2.0 * xi
L = 2.0
alpha = 0.5 / L
xi_init = 15.0 * torch.ones(1, 1)
xi_opt = 3.0 * torch.ones(1, 1)
xi_sol, _ = apply_DYS(prox_f_toy, prox_g_toy, grad_h_toy, xi_init, alpha)
err = torch.norm(xi_sol - xi_opt)
assert err <= tol_err
def test_prox_H():
''' Check if projection of batch of vectors is in hyperplane is feasible.
Proximal is feasible if linear equation is satisfied (to numerical
precision) for CFMMs that are defined using weighted geometric means.
For the remaining arithmetic means, the proximal should give apply the
identity operation.
'''
n_batches = 20
n_cfmms = 10
tol_feas = 1.0e-6
cfmm_sizes = [random.randint(2, 10) for _ in range(n_cfmms)]
cfmm_indices = get_cfmm_indices(cfmm_sizes)
n_size = sum(cfmm_sizes)
v = torch.randn(n_size, n_batches)
w = torch.rand(n_size, 1)
d = torch.rand(n_size, 1)
delta = 0.05 * torch.rand(n_cfmms, 1)
cfmm_types = [torch.randint(2, (1,)) for _ in range(n_cfmms)]
prox = prox_H(v, w, d, delta, cfmm_indices, cfmm_types)
feasible = []
for j, idx in enumerate(cfmm_indices):
cfmm_geometric = bool(cfmm_types[j].numpy())
if cfmm_geometric:
wt = w[idx, :].permute(1,0)
b = torch.log(1 + delta[j]) + torch.mm(wt, torch.log(d[idx, :]))
res = torch.mm(wt, prox[idx, :]) - b
rel_res = torch.mean(res / torch.norm(prox[idx, :], dim=0))
feasible.append(bool(rel_res <= tol_feas))
else:
rel_res = torch.mean(torch.norm(prox[idx, :] - v[idx, :], dim=0))
feasible.append(bool(rel_res <= tol_feas))
assert all(feasible)
def test_prox_A():
''' Check if projection of batch of vectors is onto halfspace is feasible.
Proximal is feasible if linear inequality is satisfied (to numerical
precision) for CFMMs that are defined using weighted arithmetic means.
For the remaining geometric means, the proximal should give apply the
identity operation.
'''
n_batches = 20
n_cfmms = 10
tol_feas = 1.0e-6
cfmm_sizes = [random.randint(2, 10) for _ in range(n_cfmms)]
cfmm_indices = get_cfmm_indices(cfmm_sizes)
n_size = sum(cfmm_sizes)
v = torch.randn(n_size, n_batches).to(device)
w = torch.rand(n_size, 1).to(device)
d = torch.rand(n_size, 1).to(device)
delta = 0.05 * torch.rand(n_cfmms, 1).to(device)
cfmm_types = [torch.randint(2, (1,)).to(device) for _ in range(n_cfmms)]
prox = prox_A(v, w, d, delta, cfmm_indices, cfmm_types)
feasible = []
for j, idx in enumerate(cfmm_indices):
cfmm_arithmetic = not bool(cfmm_types[j].to('cpu').numpy())
if cfmm_arithmetic:
wt = w[idx, :].permute(1,0)
b = delta[j] * torch.mm(wt, d[idx, :])
res = torch.clamp(torch.mm(wt, prox[idx, :]) - b, max=0.0)
rel_res = torch.mean(res / torch.norm(prox[idx, :], dim=0))
feasible.append(bool(rel_res <= tol_feas))
else:
rel_res = torch.mean(torch.norm(prox[idx, :] - v[idx, :], dim=0))
feasible.append(bool(rel_res <= tol_feas))
assert all(feasible)
def test_prox_R():
''' Check if projection onto {v = A * (Gamma * x - y)} is feasible.
This test requires the creation of a few mock CFMMs, and so matrices
A from the global coordinates to the local CFMM coordinates are needed.
First the number of tokens in each CFMM is randomly chosen between 2 and
the global size. Having this, a binary matrix A is constructed using
Bernoulli distributions (violating actual A structure, but close).
'''
tol_feas = 1.0e-6
n_batches = 20
n_cfmms = 10
n_tokens = 10
cfmm_sizes = [random.randint(2, n_tokens) for _ in range(n_cfmms)]
cfmm_indices = get_cfmm_indices(cfmm_sizes)
n_size = sum(cfmm_sizes)
B = [torch.rand(cfmm_sizes[j], n_tokens) for j in range(n_cfmms)]
A = [torch.bernoulli(B[j]).to(device) for j in range(n_cfmms)]
A = get_block_diag(A).to(device)
Gamma = [0.97 * torch.eye(cfmm_sizes[j]) for j in range(n_cfmms)]
Gamma = get_block_diag(Gamma).to(device)
v = torch.randn(n_size, n_batches).to(device)
x = torch.randn(n_cfmms * n_tokens, n_batches).to(device)
y = torch.randn(n_cfmms * n_tokens, n_batches).to(device)
prox_v, prox_x, prox_y = prox_R(v, x, y, A, Gamma)
GAx = torch.mm(Gamma, torch.mm(A, prox_x))
Ay = torch.mm(A,prox_y)
res = torch.norm(prox_v - (GAx - Ay), dim=0)
rel_res = torch.mean(res / torch.norm(prox_v, dim=0))
feasible = bool(rel_res <= tol_feas)
assert feasible
def test_prox_P():
''' Check whether projection satisfies optimality condition.
The projection is a solution...
'''
n_batches = 40
n_size = 7
tol_feas = 1.0e-2
v = torch.randn(n_size, n_batches)
z = torch.randn(n_size, n_batches)
d = torch.randn(n_size, n_batches)
prox_v, prox_z = prox_P(v, z, d)
indices = v + d < torch.exp(z)
pv = prox_v[indices]
vv = v[indices]
zz = z[indices]
dd = d[indices]
res = (pv + dd) * (pv - vv) + torch.log(pv + dd) - zz
res = torch.mean(torch.abs(res))
feasible = bool(res <= tol_feas)
assert feasible
#@title
test_apply_DYS()
test_prox_A()
test_prox_H()
test_prox_P()
test_prox_R()
print('All Pytests pass.')
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-13-534319815045> in <module>() 2 3 test_apply_DYS() ----> 4 test_prox_A() 5 test_prox_H() 6 test_prox_P() <ipython-input-12-4684df0a2ab5> in test_prox_A() 102 cfmm_types = [torch.randint(2, (1,)).to(device) for _ in range(n_cfmms)] 103 --> 104 prox = prox_A(v, w, d, delta, cfmm_indices, cfmm_types) 105 106 feasible = [] <ipython-input-5-88ac0cae8cc8> in prox_A(v, w, d, delta, geo_indices, cfmm_type) 29 # d = [11, batch size] 30 ---> 31 dd = (delta * d).to(device) 32 #dd = torch.cat([delta[j] * d[idx, :] for j, idx in enumerate(geo_indices)], 33 # dim=0).to(device) RuntimeError: The size of tensor a (10) must match the size of tensor b (49) at non-singleton dimension 0
Pytorch Model¶
import torch.nn as nn
class CfmmTradeModel(nn.Module):
''' Implicit L2O model for trading cryptoassets.
'''
def __init__(self, w, A, p, cfmm_type, delta, Gamma, geo_indices):
super(CfmmTradeModel, self).__init__()
self.A = A.to(device)
self.AA = torch.mm(self.A.permute(1, 0), self.A)
self.W = nn.Parameter(torch.rand(A.shape[0], A.shape[0]).to(device))
self.mat1 = nn.Parameter(torch.randn(A.shape[0], A.shape[0]).to(device))
self.mat2 = nn.Parameter(torch.randn(A.shape[0], A.shape[0]).to(device))
self.bias1 = nn.Parameter(torch.randn(A.shape[0],1).to(device))
self.bias2 = nn.Parameter(torch.randn(A.shape[0],1).to(device))
self.leaky_relu = nn.LeakyReLU(0.1)
self.tanh = nn.Tanh()
self.relu = nn.ReLU()
self.w = w.to(device)
# self.delta = delta.to(device)
#self.delta = torch.ones(delta.shape).to(device)
#self.delta_scale = nn.Parameter(-3 * torch.ones(1).to(device))
self.delta = nn.Parameter(-3 * torch.ones(delta.shape).to(device))
self.L = None
self.psi_mean = 0
self.cfmm_types = cfmm_types
self.Gamma = Gamma.to(device)
self.p = p.to(device)
self.geo_indices = geo_indices
self.local_size = self.A.shape[0]
self.global_size = self.A.shape[1]
def device(self) -> str:
return next(self.parameters()).data.device
def psi(self, d, tol=1.0e-1):
''' Give weight for risk term in utility
Note: Since phi is used in the utility, we must ensure its output
does not result in a trade transacted with a coin that is not
available for trade on a given CFMM. This is accounted for
by mapping psi from global coordinates to local coordinates,
and then back to global coordinates via A and its transpose.
This process "zeros out" any token values that are "invalid."
'''
sigmoid = nn.Sigmoid()
risk_weight = torch.mm(self.mat1, d) + self.bias1
risk_weight = self.tanh(torch.mm(self.mat2, risk_weight) + self.bias2)
return sigmoid(risk_weight).to(device)
def zero_weights(self):
n_tokens = int(self.A.shape[1] / len(self.cfmm_types))
W_zero = torch.zeros(self.W.data.shape).to(device)
idx_lo = 0
for j, n_tokens_loc in enumerate(self.geo_indices):
idx_hi = idx_lo + len(n_tokens_loc)
W_zero[idx_lo:idx_hi, idx_lo:idx_hi] = self.W.data[idx_lo:idx_hi,
idx_lo:idx_hi]
idx_lo = idx_hi
self.W.data = W_zero
def forward(self, d, fxd_pt_tol=1.0e-3, max_iters=int(2.0e3),
verbose=False, return_obj=False, analytic=False):
''' Apply Davis-Yin Splitting to minimize f + g + h
Convergence is measured by fixed point residual.
XXX - Force L to be actual Lipschitz constant
Note: Assumes input d = [batch_size, reserves_size]
'''
assert torch.is_tensor(d)
self.zero_weights()
# local coordinate sizes
v = torch.zeros(self.local_size, d.shape[1]).to(device)
z = torch.zeros(self.local_size, d.shape[1]).to(device)
# global coordinate sizes
x = torch.zeros(self.global_size, d.shape[1]).to(device)
y = torch.zeros(self.global_size, d.shape[1]).to(device)
zeta = torch.cat((v, x, y, z), dim=0)
# delta = torch.exp(self.delta_scale * self.delta)
delta = torch.exp(self.delta)
def prox_f_m(xi, alpha=None):
return prox_f_model(xi, d, self.w, delta, self.geo_indices,
self.cfmm_types)
def prox_g_m(xi, alpha=None):
return prox_g_model(xi, d, self.w, delta, self.geo_indices,
self.cfmm_types, self.A,
self.Gamma)
# psi = self.psi(d) if not analytic else 0.0 * self.psi(d)
# self.psi_mean = torch.mean(psi).detach().to('cpu').numpy()
psi = torch.ones(1).to(device) if not analytic else torch.zeros(1).to(device)
W = torch.abs(self.W)
# W = torch.zeros(self.W.shape).to(device)
# if not analytic:
# W[self.W != 0] = torch.abs(self.W[self.W != 0])
# W = torch.exp(self.W) if not analytic else torch.zeros(self.W.shape).to(device)
def grad_h_m(xi):
return grad_h(xi, self.p, W, self.A,
psi,
len(self.cfmm_types))
WA = torch.mm(W, self.A).to(device)
AWWA = torch.mm(WA.permute(1,0), WA).to(device)
# self.L = 1.0e-5 + torch.linalg.matrix_norm(AWWA, ord=2)
# self.psi_scaling = if not analytic else 1.0
self.L = 1.0e-1 + torch.norm(AWWA) * torch.max(psi)
alpha = 1.99 / self.L
with torch.no_grad():
# Bound singular values to unity and zero out off block diagonals
# self.zero_weights()
# u, s, v = torch.svd(self.W.data)
# s = s / s[0]
# #s[s > 1.0] = torch.ones(s[s > 1].shape).to(device)
# self.W.data = torch.mm(torch.mm(u, torch.diag(s)), v.t())
#alpha = min(1.0e2, 0.99 / (float(torch.max(psi_val).cpu() * self.L)))
# alpha = min(1.0e2, 0.99 / self.L)
# print('alpha = ', alpha)
xi, zeta = apply_DYS(prox_f_m, prox_g_m, grad_h_m, zeta, alpha,
fxd_pt_tol=fxd_pt_tol, max_iters=max_iters,
verbose=verbose)
# Attach gradients for final iterations
xi, _ = apply_DYS(prox_f_m, prox_g_m, grad_h_m, zeta, alpha,
fxd_pt_tol=fxd_pt_tol, max_iters=1,
verbose=verbose)
q = A.shape[0]
m = A.shape[1]
x_idx = range(q, q+m)
y_idx = range(q+m,q+2*m)
x = xi[x_idx, :]
y = xi[y_idx, :]
# Post conditions
tender_nonnegative = (x >= 0).to('cpu').numpy().all()
receive_nonnegative = (y >= 0).to('cpu').numpy().all()
assert tender_nonnegative
assert receive_nonnegative
check_violation(x, y, d, self.w, self.A, self.Gamma, geo_indices,
self.cfmm_types, verbose=verbose,
tol_err=1.0e-1)
return x.permute(1,0), y.permute(1,0)
Toy Crypto Trade Problem¶
Check trade violation¶
This function is used to check whether a given trade is "valid".
def check_violation(x, y, d, w, A, Gamma, geo_indices, cfmm_type,
tol_err=1.0e-2, verbose=False):
assert torch.is_tensor(x)
assert torch.is_tensor(y)
assert torch.is_tensor(d)
assert torch.is_tensor(w)
assert torch.is_tensor(A)
assert torch.is_tensor(Gamma)
violation = []
num_samples = x.shape[1]
for s in range(num_samples):
for i, j in enumerate(geo_indices):
geometric = cfmm_type[i]
G = Gamma[j[0]:(1+j[-1]), j[0]:(1+j[-1])]
if geometric:
v = torch.log(d[j, :] + torch.mm(G, torch.mm(A[j, :], x)) - torch.mm(A[j, :], y))
new_val = torch.mm(w[j, :].permute(1,0), v).to('cpu').detach().numpy()[0][s]
ref_val = torch.mm(w[j, :].permute(1,0), torch.log(d[j, :])).to('cpu').detach().numpy()[0][s]
else:
v = d[j] + torch.mm(G, torch.mm(A[j, :], x)) - torch.mm(A[j, :], y)
new_val = torch.mm(w[j, :].permute(1,0), v).to('cpu').detach().numpy()[0][s]
ref_val = torch.mm(w[j, :].permute(1,0), d[j]).to('cpu').detach().numpy()[0][s]
err = ref_val - new_val
if verbose:
msg = 'Rel error in phi[{:d}](trade) = {:0.4f}%'
print(msg.format(i, 100.0 * err / ref_val))
assert err <= tol_err * ref_val
(First we include a snippet of code for printing trades)
#@title
def print_trade(x, y):
for j, _ in enumerate(x):
print('Trades with CFMM', j)
for i in range(x[j].shape[0]):
val = float(x[j][i].numpy())
print('Tendered {:.2f} of token {:d}.'.format(val, i))
for i in range(y[j].shape[0]):
val = float(y[j][i].numpy())
print('Received {:.2f} of token {:d}.'.format(val, i))
Toy Problem¶
The following snippet is used to show the implementation of the model (without training and without noise).
# Tolerances for each CFMM
deltas = 1.0e-2 * torch.ones(5,1).to(device) #[torch.zeros(1).to(device) for delta in range(5)]
cfmm_sizes = [3, 2, 2, 2, 2]
geo_indices = get_cfmm_indices(cfmm_sizes)
# Trade fee parameters in each CFMM
gammas = [0.98, 0.99, 0.98, 0.99, 0.98]
Gamma = [gamma * torch.eye(cfmm_sizes[j]).to(device)
for j, gamma in enumerate(gammas)]
Gamma = get_block_diag(Gamma).to(device)
# Weights for the modified means in each CFMM
w1 = torch.tensor([[0.2], [0.3], [0.5]]).to(device)
w2 = torch.tensor([[0.4], [0.6]]).to(device)
w3 = torch.tensor([[0.5], [0.5]]).to(device)
w4 = torch.tensor([[0.6], [0.4]]).to(device)
w5 = torch.tensor([[0.6], [0.4]]).to(device)
w = torch.cat((w1, w2, w3, w4, w5), dim=0).to(device)
# Label for which CFMMs uses geometric means (otherwise use arithmetic)
cfmm_types = [True, True, True, True, False]
cfmm_types = [float(cfmm_type) * torch.ones(1).to(device) for cfmm_type in cfmm_types]
# Transformations from global to local coordinates of each CFMM
A1 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A2 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]).to(device)
A3 = torch.tensor([[0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A4 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A5 = torch.tensor([[1.0, 0.0, 0.0], [0.0, 0.0, 1.0]]).to(device)
A = [A1, A2, A3, A4, A5]
A = get_block_diag(A).to(device)
print('A = ', A)
print('A.shape = ', A.shape)
# Define intrinsic value of assets
p = torch.tensor([[1.0], [1.0], [1.0]]).to(device)
# Trade model for CFMM network
the_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices)
print('model loaded successfully')
# Sample data for reserves in CFMM network
d1 = torch.tensor([[5.0], [7.3], [12.5]]).to(device)
d2 = torch.tensor([[10.0], [4.7]]).to(device)
d3 = torch.tensor([[5.5], [7.4]]).to(device)
d4 = torch.tensor([[4.6], [14.9]]).to(device)
d5 = torch.tensor([[7.6], [5.4]]).to(device)
d = torch.cat((d1, d2, d3, d4, d5), dim=0).to(device)
#d = d.permute(1,0).clone()
# Optimal trade for provided data
x, y = the_model(d, verbose=True)
print('x = ', x)
print('y = ', y)
Price Generation¶
$$ \dfrac{p_j}{p_k} = \dfrac{\nabla_j \phi(d)}{\nabla_k \phi(d)}.$$
For geometric CFMMs,
$$ \nabla_\ell \phi(d) = \dfrac{w_\ell}{d_\ell} \cdot \phi(d).$$
which implies
$$ \dfrac{p_j}{p_k} = \dfrac{w_j / d_j}{w_k / d_k} = \dfrac{w_j}{w_k}\cdot \dfrac{d_k}{d_j}.$$
Thus, for a known prices $p_j$ and $p_k$, weights $w_j$ and $w_k$, and reserves $d_k$, we can find the amount of reserves for the $j$-th token; namely,
$$ d_j = \dfrac{w_j p_k d_k}{w_k p_j}.$$
import json
from torch.utils.data import TensorDataset, DataLoader
from torch.utils.data.dataset import random_split
rel_token_vals_ave = torch.tensor([[1.0], [1.9], [0.5]])
res_ave = [50.0, 30.0, 23.5, 35.7, 46.9]
the_model.p = rel_token_vals_ave
def get_token_val_sample(rel_vals, var):
sample = torch.tensor([[(1.0 + var * torch.randn(1)) * val.numpy()]
for _, val in enumerate(rel_vals)])
return sample
def create_loader(A, n_tokens, cfmm_indices, train_size=5000, test_size=500,
train_batch_size=100, test_batch_size=500):
var_per = 1.0e-1
torch.manual_seed(2022)
d = []
batch_size = train_size + test_size
for j, idx in enumerate(cfmm_indices):
A_block = A[idx, j*n_tokens:(j+1)*n_tokens]
loc_rel_token_vals = torch.mm(A_block.to('cpu'), rel_token_vals_ave)
rel_token_vals_sample = loc_rel_token_vals
for k in range(batch_size):
new_sample = get_token_val_sample(loc_rel_token_vals, var_per)
if k > 0:
rel_token_vals_sample = torch.cat((rel_token_vals_sample, new_sample), dim=1)
else:
rel_token_vals_sample = new_sample
num_tokens = len(idx)
reserves = torch.zeros(num_tokens, batch_size)
for t in range(num_tokens):
res_samples = (1.0 + var_per * torch.randn(batch_size)) * res_ave[j]
reserves[t, :] = float(w[idx, :][t]) * rel_token_vals_sample[0, :]
reserves[t, :] *= res_samples
reserves[t, :] /= (float(w[idx, :][0]) * rel_token_vals_sample[t, :])
reserves = torch.tensor(reserves)
d.append(reserves)
d_ten = d[0]
for idx, _ in enumerate(d):
if idx > 0:
d_ten = torch.cat((d_ten, d[idx]), dim=0)
d_ten = d_ten.permute(1,0)
# --------------------------------------------------------------------------
# Analytic predictions with no noise
# --------------------------------------------------------------------------
# anal_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices)
# anal_model.eval()
# with torch.no_grad():
# for j, val in enumerate(anal_model.W):
# anal_model.W[j][0]= torch.zeros(val[0].shape)
# x_true, y_true = anal_model(d_ten.permute(1,0).to(device),
# max_iters=int(1.0e4))
# x_true = x_true.detach()
# y_true = y_true.detach()
# --------------------------------------------------------------------------
# --------------------------------------------------------------------------
std = 1.0e-3 * torch.rand(d_ten.shape[0], 1)
std = std * torch.ones(d_ten.shape)
noise = torch.normal(mean=torch.zeros(d_ten.shape), std=std)
d_obs = (1.0 + noise) * d_ten
dataset = TensorDataset(d_obs.to(device), d_ten.to(device))
# dataset = TensorDataset(d_obs.to(device), d_ten.to(device), x_true.to(device), y_true.to(device))
train_data, test_data = random_split(dataset, [train_size,
test_size])
loader_train = DataLoader(dataset=train_data,
batch_size=train_batch_size, shuffle=True)
loader_test = DataLoader(dataset=test_data,
batch_size=test_batch_size, shuffle=False)
return loader_train, loader_test
def convert_to_tensor(x_list):
x_ten = x_list[0]
x_ten = x_list[0]
for idx, val in enumerate(x_list):
if idx > 0:
x_ten = torch.cat((x_ten, val), dim=0)
x_ten = x_ten.permute(1,0)
return x_ten
Correct Trade Errors¶
Define
$$ \theta(\tau) \triangleq \phi(d + \Gamma A x - A(y - \tau p)) - \phi(d).$$
Use Newton
$$ \tau_{k+1} = \tau_k - \dfrac{ \theta(\tau_k)}{\theta'(\tau_k)}.$$
Let $q = d + \gamma x - y$. For geommetric CFMMs, we use
$$ \tilde{\phi}(v) \triangleq \left< w, \ln(v)\right>$$
so that
$$ \theta'(\tau) = \left< \nabla \tilde{\phi}(z + \tau A p),A p\right> = \tilde{\phi}(z+\tau A p) \left<\frac{w}{z+\tau A p}, A p\right>.$$
Meanwhile, for weighted arithmetic mean CFMMs,
$$ \tilde{\phi}(v) \triangleq \left< w, v\right>,$$
for which
$$ \theta'(\tau) = \left< \nabla \tilde{\phi}(z + \tau A p), Ap\right> = \left<w, Ap\right>.$$
## REPLACE THIS WITH BISECTION METHOD
def correct_trade(x, y, d, p, A, Gamma, ww, geo_indices, cfmm_type, n_tokens, newton_iters=200):
y_correct = []
y_correct = torch.zeros(y.shape).to(device)
for j, idx in enumerate(geo_indices):
tau = 0.0
#iter = 0
#invalid_trade = True
#while iter < newton_iters and invalid_trade:
# iter = iter + 1
for iter in range(newton_iters):
geometric = cfmm_type[j]
G = Gamma[idx[0]:(1+idx[-1]), idx[0]:(1+idx[-1])]
if geometric:
ytp = y[n_tokens*j:(j+1)*n_tokens, :] - tau * p
ytp = ytp.to(device)
Aytp = torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], ytp).to(device)
Ax = torch.mm(A[idx, :], x).to(device)
GAxAytp = (torch.mm(G, Ax) - Aytp).to(device)
d_block = d[idx, :].to(device)
v = torch.log(d_block + GAxAytp).to(device)
v = torch.clamp(v, min=1.0e-6).to(device)
phi_z = torch.mm(ww[idx, :].permute(1,0), v)[0]
phi_d = torch.mm(ww[idx, :].permute(1,0), torch.log(d[idx, :]))[0]
theta_z = phi_z - phi_d
# invalid_trade = (theta_z < 0).any()
dot_product = torch.mm(torch.mm(p.permute(1,0), A[idx, n_tokens*j:(j+1)*n_tokens].permute(1,0)), ww[idx, :] / v)
if iter < 20:
tau = tau - (0.05 / float(iter + 1)) * theta_z
else:
tau = tau - theta_z / (phi_z * dot_product)
else:
v = d[idx, :] + torch.mm(G, torch.mm(A[idx, :], x)) - torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], y[n_tokens*j:(j+1)*n_tokens, :] - tau * p)
theta_z = torch.mm(ww[idx, :].permute(1,0), v - d[idx, :])
if iter < 20:
tau = tau - (0.05 / float(iter + 1)) * theta_z
else:
tau = tau - theta_z / torch.mm(ww[idx, :].permute(1,0), torch.mm(A[idx, n_tokens*j:(j+1)*n_tokens], p))
y_block = y[n_tokens*j:(j+1)*n_tokens, :] - tau * p
y_correct[n_tokens*j:(j+1)*n_tokens, :] = y_block
check_violation(x, y_correct, d, ww, A, Gamma, geo_indices, cfmm_types)
return x, y_correct
def compute_energy(d, ww, A, geo_indices, cfmm_type):
''' Get the energy violation of system
'''
energy = torch.zeros(len(geo_indices), d.shape[1]).to(device)
for j, idx in enumerate(geo_indices):
geometric = cfmm_type[j]
if geometric:
term = torch.mm(ww[idx, :].permute(1,0), torch.log(d[idx, :]))
energy[j, :] += torch.sum(torch.exp(term))
else:
energy[j, :] += torch.sum(torch.mm(ww[idx, :].permute(1,0), d[idx, :]))
return energy
def get_violation(x, y, d, w, A, Gamma, geo_indices, cfmm_type):
relu = nn.ReLU()
energy_new = compute_energy(d + torch.mm(Gamma, torch.mm(A, x)) - torch.mm(A,y), w, A, geo_indices, cfmm_type)
energy_old = compute_energy(d, w, A, geo_indices, cfmm_type)
error = relu(energy_old - energy_new)
error_mean = torch.sum(torch.mean(error, dim=1))
error_rel = torch.sum(torch.mean(error, dim=1)) / torch.sum(torch.mean(energy_old))
return error_mean, error_rel
def get_list_from_ten(d, w):
d_list = []
idx_lo = 0
for j, _ in enumerate(w):
idx_hi = idx_lo + w[j].shape[0]
d_list.append(d[:, idx_lo:idx_hi].permute(1,0))
idx_lo = idx_hi
return d_list
def get_ten_from_list(d):
d_ten = torch.zeros(d[0].shape)
for j, val in enumerate(d):
d_ten = torch.cat((d_ten, val), dim=0)
return d_ten
def utility(x, y, p):
n_cfmms = int(x.shape[0] / p.shape[0])
obj = torch.zeros(1, x.shape[1]).to(device)
for j in range(n_cfmms):
yx = (y - x).to(device)
obj = obj + torch.mm(p.permute(1,0), yx[j*p.shape[0]:(j+1)*p.shape[0]])
return obj
Train Model¶
from prettytable import PrettyTable
n_tokens = 3
loader_train, loader_test = create_loader(A, n_tokens, geo_indices)
num_epochs = int(1.0e3)
learning_rate = 5.0e-3
optimizer = torch.optim.Adam(the_model.parameters(), lr=learning_rate)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5,
gamma=0.98)
training_msg = '[{:4d}] Contraint Error = {:6.2f}%'
training_msg += '| loss = {:8.1e} | pred util = {:8.1e} | fro-norm = {:2.2e} '
training_msg += '| psi = {:2.3e} | lr = {:2.2e}'
MSE = nn.MSELoss()
def model_params(net):
table = PrettyTable(["Network Component", "# Parameters"])
num_params = 0
for name, parameter in net.named_parameters():
if not parameter.requires_grad:
continue
table.add_row([name, parameter.numel()])
num_params += parameter.numel()
table.add_row(['TOTAL', num_params])
return table
print(model_params(the_model))
loss_ave = 0.0
obj_ave = 0.0
the_model.to(device)
for epoch in range(num_epochs):
for d_obs, d_true in loader_train:
optimizer.zero_grad()
the_model.train()
x, y = the_model(d_obs.permute(1,0).to(device), max_iters=int(1.0e3))
obj = torch.mean(utility(x.permute(1,0), y.permute(1,0), the_model.p.to(device)))
viol, rel_viol = get_violation(x.permute(1,0), y.permute(1,0),
d_true.permute(1,0), w, A, Gamma,
geo_indices, cfmm_types)
loss = 1.0e-1 * (viol ** 2) - obj
loss.backward()
optimizer.step()
loss_curr = loss.detach().item()
loss_ave *= 0.9
loss_ave += 0.1 * loss_curr if loss_ave != 0 else loss_curr
W = torch.abs(the_model.W.data.clone())
print(training_msg.format(epoch, 100 * rel_viol, loss_ave, obj,
torch.norm(W),
the_model.psi_mean,
optimizer.param_groups[0]['lr']
))
print('delta = ', the_model.delta)
lr_scheduler.step()
print(the_model.delta)
print(the_model.delta_scale)
Print of heatmap of matrix
from matplotlib import pyplot as plt
W = the_model.W.data.clone()
W[W != 0] = torch.exp(W[W != 0])
plt.imshow(W.to('cpu').detach().numpy(), interpolation='nearest')
plt.colorbar()
plt.show()
# CONFUSION MATRIX?
print(the_model.W.data.shape)
Load analytic model and set weights $W$ to be zero, i.e. no correction term.
# anal_model = CfmmTradeModel(w, A, p, cfmm_types, deltas, Gamma, geo_indices)
# with torch.no_grad():
# for j, val in enumerate(anal_model.W):
# anal_model.W[j][0]= torch.zeros(val[0].shape)
Testing Results¶
# XXX - Loss Testing stuff....
#for d_obs, d_true, x_true, y_true in loader_test:
for d_obs, d_true in loader_test:
the_model.eval()
x_l2o, y_l2o = the_model(d_obs.permute(1,0).to(device), max_iters=int(1e3))
x_anal, y_anal = the_model(d_obs.permute(1,0).to(device), max_iters=int(2e3),
analytic=True)
# x_anal, y_anal = anal_model(d_obs.permute(1,0).to(device), max_iters=int(1e3))
# d_true = get_list_from_ten(d_true, w)
# x_l2o_corr, y_l2o_corr = correct_trade(x_l2o.permute(1,0),
# y_l2o.permute(1,0),
# d_true.permute(1,0),
# the_model.p.to(device), A, Gamma,
# w, geo_indices, cfmm_types,
# n_tokens)
# x_anal_corr, y_anal_corr = correct_trade(x_anal.permute(1,0),
# y_anal.permute(1,0),
# d_true.permute(1,0),
# the_model.p.to(device), A, Gamma,
# w, geo_indices, cfmm_types,
# n_tokens)
#y_ten_l2o = get_ten_from_list(y_l2o)
#y_corr_l2o = get_ten_from_list(y_corr_l2o)
#mse = MSE(y_l2o, y_true) + MSE(x_l2o, x_true)
#print('y_l2o = ', y_l2o)
#print('y_true = ', y_true)
#print('mse = ', mse)
Plot Data for Figure 10 in paper¶
def check_constraints(x, y, d, w, A, Gamma, geo_indices, cfmm_type):
relu = nn.ReLU()
energy_new = compute_energy(d + torch.mm(Gamma, torch.mm(A, x)) - torch.mm(A,y), w, A, geo_indices, cfmm_type)
energy_old = compute_energy(d, w, A, geo_indices, cfmm_type)
constraint_viol = relu(energy_old - energy_new)
return constraint_viol
util_l2o = utility(x_l2o.to(device).permute(1,0), y_l2o.to(device).permute(1,0), the_model.p.to(device))
print('util_l2o = ', util_l2o)
constraint_viol = check_constraints(x_l2o.permute(1,0), y_l2o.permute(1,0),
d_true.permute(1,0), w, A, Gamma,
geo_indices, cfmm_types)
constraint_viol = torch.sum(constraint_viol, dim=0, keepdim=True)
util_l2o[constraint_viol > 0] = 0.0 * util_l2o[constraint_viol > 0]
print('util_l2o = ', torch.mean(util_l2o))
util_anal = utility(x_anal.permute(1,0), y_anal.permute(1,0), the_model.p.to(device))
constraint_viol = check_constraints(x_anal.permute(1,0), y_anal.permute(1,0),
d_true.permute(1,0), w, A, Gamma,
geo_indices, cfmm_types)
constraint_viol = torch.sum(constraint_viol, dim=0, keepdim=True)
print('pre_anal = ', util_anal)
util_anal[constraint_viol > 0] = 0.0 * util_anal[constraint_viol > 0]
print('util_anal = ', torch.mean(util_anal))