Latin Hypercube Sampling

We are considering the function \(f:\left[0,2\right] \rightarrow \mathbb{R}\) where \(f(q)=(6q^2+3)\sin(6q-4)\). We generate \(M=15\) random samples from a \(\mathcal{U}(0,2)\) distribution and 15 samples over the same interval using Latin Hypercube sampling (thus ensuring that there is a point in each interval \(\left[\frac{2k}{15}, \frac{2k+2}{15}\right]\) for \(k \in \left\{0, 1, \dots, 14\right\}\)).

The MATLAB code used to generate the random data and response is shown below.

%%%Question 1: Uniform Vs Latin Hypercube%%%
%%%1a. Set up%%%
rng(540);                                                  %to make reproducible%
n=15;                                                       %number of samples%
l_bound=0;                                              %where the samples come from%
u_bound=2;                                             %where the samples come from%


%%%1b. Get x values%%%
u=unifrnd(l_bound,u_bound,n,1);            %n samples from uniform (0,2)% 
lh=u_bound*lhsdesign(n,1);                     %n samples from latin hypercube%
x=linspace(l_bound, u_bound, 100);        %for high-fidelity response%


%%%1c. Get y data%%%
f = @(q) (6*(q.^2)+3).*sin(6.*q-4);          %given% 
y_u=f(u);                                                  %Uniform samples% 
y_lh=f(lh);                                                %Latin hypercube samples%
y=f(x);                                                      %True Line%

We can plot the data and response with the following:

%%%1d. Plot Data%%%
figure(1);
plot(x, y, ...
    'k-', 'LineWidth', 2,...                                           %'k-' is black line%
    'DisplayName', 'True Function')
hold on
plot(lh, y_lh,...
    'b.', 'MarkerSize', 25,...                                       %'b.' is blue filled circle%
    'DisplayName', 'Latin Hypercube Samples')
plot(u, y_u,...
    'g.', 'MarkerSize', 25,...                                        %'g.' is green filled circle%
    'DisplayName', 'Uniform Samples')
hold off
xlabel('Parameter $q$', 'Interpreter', 'latex', 'FontSize', 20)
ylabel('Response $f(q)$','Interpreter', 'latex','FontSize', 20)
title('$f(q) = (6q^2+3)\sin(6q-4)$','Interpreter', 'latex', 'FontSize', 20)
legend('location', 'northwest','FontSize', 20)

Notice that our uniform samples have a tendency to clump together at certain points (e.g. there is no point in the interval \((0,\frac{2}{15})\) or in the interval \((\frac{10}{15}, \frac{12}{15})\)). This is only natural– with independent draws, uniform sampling means any one point is no more likely to be selected than any other. This stands in contrast to Latin Hypercube sampling, where successive draws ensure that “blocks” of the space are sufficiently explored.

Next, we fit an \(8^{\text{th}}\) order polynomial to the 15 data points between \(0\) and \(2\) in a provided text file \texttt{qdata.txt}. The data and fit are plotted in the Figure below over the range \(\left[-0.5,2.5\right]\) . Notice how within the sample region, the surrogate function is fairly accurate, but out of sample, the surrogate varies considerably from the true function values. This issue is magnified by the (relatively) large value of \(K\) for the small number of \(M\) (i.e. we are likely over-fitting). While smaller order surrogates will degrade model accuracy, they may perform better in terms of metrics like AIC or BIC.

%%%1e. Load in LH data and create polynomial surrogate%%%
data=load('q_data.txt');              %provided%
response=f(data);                      %the response%
p=polyfit(data, response,8);      %ceoffs c_i for c_8x^8+\cdots+c_0%

x_vals=linspace(-0.5, 2.5);         %The range%
y_vals=polyval(p, x_vals);          %The 8th order polynomial%
y_true=f(x_vals)                         %The actual function%

figure(2);
plot(x_vals, y_vals,...
    'b--', 'LineWidth', 2,...             %'b--' is blue dashed line%
    'DisplayName', '8th Order Polynomial Fit')
hold on
plot(x_vals, y_true,...
    'k-', 'LineWidth', 2,...               %'k-' is black line%
    'DisplayName', 'True Function')
plot(data, response,...
    'r.', 'MarkerSize', 25,...             %'r.' is red circle%
    'DisplayName', 'Latin Hypercube Samples')
hold off
xlabel('Parameter $q$', 'Interpreter', 'latex', 'FontSize', 20)
ylabel('Response $f(q)$','Interpreter', 'latex','FontSize', 20)
title('$f(q) = (6q^2+3)\sin(6q-4)$','Interpreter', 'latex', 'FontSize', 20)
legend('location', 'northwest','FontSize', 20)