Cara menggunakan exponential transformation python

If you have a set of data points that look like they’re increasing rapidly, it might be useful to fit them with a smooth, exponentially increasing line in order to describe the general shape of the data:

The line that you need to fit in order to achieve this shape will be one that is described by an exponential function, that is any function of the form:

\(y = AB^x + C\)

or

\(y = ae^{bx} + c\)

(these two are mathematically equivalent because \(AB^x = Ae^{x\ln(B)}\)). The important thing to realise is that an exponential function can be fully defined with three constants. We will use the second of these formulations, which can be written in Python as

import matplotlib.pyplot as plt

# Formatting options for plots
A = 6  # Want figure to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
7 where
import matplotlib.pyplot as plt

# Formatting options for plots
A = 6  # Want figure to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
8 is the exponential function \(e^x\) from the Numpy package (renamed
import matplotlib.pyplot as plt

# Formatting options for plots
A = 6  # Want figure to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
9 in our examples).

For this tutorial, let’s create some fake data to use as an example. This should be a set of points that increase exponentially (or else our attempts to fit an exponential curve to them won’t work well!) with some random noise thrown in to mimic real-world data:

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 50
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

The random noise is being added with the

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
0 function from Numpy which draws random samples from a normal (Gaussian) distribution. Let’s take a look at what this example data looks like on a scatter plot:

import matplotlib.pyplot as plt

# Formatting options for plots
A = 6  # Want figure to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')

This method only works when \(c = 0\), ie when you want to fit a curve with equation \(y = ae^{bx}\) to your data. If you want to fit a curve with equation \(y = ae^{bx} + c\) with \(c \neq 0\) you will need to use method 2.

The

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
1 command from Numpy is used to fit a polynomial function to data. This might seem a little strange: why are we trying to fit a polynomial function to the data when we want to fit an exponential function? The answer is that we can convert an exponential function into a polynomial one using the fact that:

\(y = ae^{bx} \implies \ln(y) = \ln(a) + bx\)

because we can take the natural logarithm of both sides. This creates a linear equation \(f(x) = mx + c\) where:

  • \(f(x) = \ln(y)\)
  • \(m = b\)
  • \(c = \ln(a)\)

So

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
1 can be used to fit \(\ln(y)\) against \(x\):

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)

This polynomial can now be converted back into an exponential:

# Convert the polynomial back into an exponential
a = np.exp(p[1])
b = p[0]
x_fitted = np.linspace(np.min(x), np.max(x), 100)
y_fitted = a * np.exp(b * x_fitted)

Let’s take a look at the fit:

import matplotlib.pyplot as plt

ax = plt.axes()
ax.scatter(x, y, label='Raw data')
ax.plot(x_fitted, y_fitted, 'k', label='Fitted curve')
ax.set_title('Using polyfit() to fit an exponential function')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
ax.legend()

This method has the disadvantage of over-emphasising small values: points that have large values and which are relatively close to the linear line of best fit created by

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
1 become much further away from the line of best fit when the polynomial is converted back into an exponential. The act of transforming a polynomial function into an exponential one has the effect of increasing large values much more than it does small values, and thus it has the effect of increasing the distance to the fitted curve for large values more than it does for small values. This can be mitigated by adding a ‘weight’ proportional to \(y\): tell
import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
1 to lend more importance to data points with a large y-value:

# Fit a weighted polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1, w=np.sqrt(y))

# Convert the polynomial back into an exponential
a = np.exp(p[1])
b = p[0]
x_fitted_weighted = np.linspace(np.min(x), np.max(x), 100)
y_fitted_weighted = a * np.exp(b * x_fitted_weighted)

# Plot
ax = plt.axes()
ax.scatter(x, y, label='Raw data')
ax.plot(x_fitted, y_fitted, 'k', label='Fitted curve, unweighted')
ax.plot(x_fitted_weighted, y_fitted_weighted, 'k--', label='Fitted curve, weighted')
ax.set_title('Using polyfit() to fit an exponential function')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
ax.legend()

Using a weight has improved the fit.

From the Scipy pacakge we can get the

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
5 function. This is more general than
import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
1 (we can fit any type of function we like, exponential or not) but it’s more complicated in that we sometimes need to provide an initial guess as to what the constants could be in order for it to work.

Let’s use our original example data (with \(c \neq 0\)):

import numpy as np
import matplotlib.pyplot as plt

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 50
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')

Now let’s fit the function \(y = ae^{bx} + c\). This is done by defining it as a lambda function (ie as an object rather than as a command) of a dummy variable \(t\) and using the

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
5 function to fit this object to the x- and y-data. Note that the
import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
5 function needs to be imported from the
import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
9 sub-package:

from scipy.optimize import curve_fit

# Fit the function a * np.exp(b * t) + c to x and y
popt, pcov = curve_fit(lambda t, a, b, c: a * np.exp(b * t) + c, x, y)

Note that we need to remove any values that are equal to zero from our y-data (and their corresponding x-values from the x-data) for this to work, although there aren’t any of these in this example data so it’s not relevant here

The first output,

# Convert the polynomial back into an exponential
a = np.exp(p[1])
b = p[0]
x_fitted = np.linspace(np.min(x), np.max(x), 100)
y_fitted = a * np.exp(b * x_fitted)
0, is a list of the optimised values for the parameters which, in our case, are the constants \(a\), \(b\) and \(c\):

a = popt[0]
b = popt[1]
c = popt[2]

Let’s see what this looks like:

# Create the fitted curve
x_fitted = np.linspace(np.min(x), np.max(x), 100)
y_fitted = a * np.exp(b * x_fitted) + c

# Plot
ax = plt.axes()
ax.scatter(x, y, label='Raw data')
ax.plot(x_fitted, y_fitted, 'k', label='Fitted curve')
ax.set_title(r'Using curve\_fit() to fit an exponential function')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
ax.legend()

This looks really good, and we didn’t need to provide an initial guess! This is because the example data we are using is close enough to exponential in nature that the optimisation algorithm behind

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
5 could fit a curve without accidentally choosing the wrong local minimum. This won’t always be the case, so here’s how to do it with an initial guess provided:

import matplotlib.pyplot as plt

# Formatting options for plots
A = 6  # Want figure to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
0
import matplotlib.pyplot as plt

# Formatting options for plots
A = 6  # Want figure to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
1

Let’s plot all three methods against one another using the same example data (\(c = 0\)) for each:

import matplotlib.pyplot as plt

# Formatting options for plots
A = 6  # Want figure to be A6
plt.rc('figure', figsize=[46.82 * .5**(.5 * A), 33.11 * .5**(.5 * A)])
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.rc('text.latex', preamble=r'\usepackage{textgreek}')

# Create a plot
ax = plt.axes()
ax.scatter(x, y)
ax.set_title('Example Data')
ax.set_ylabel('y-Values')
ax.set_ylim(0, 500)
ax.set_xlabel('x-Values')
2

As you can see, the

import numpy as np

# Set a seed for the random number generator so we get the same random numbers each time
np.random.seed(20210706)

# Create fake x-data
x = np.arange(10)
# Create fake y-data
a = 4.5
b = 0.5
c = 0
y = a * np.exp(b * x) + c  # Use the second formulation from above
y = y + np.random.normal(scale=np.sqrt(np.max(y)), size=len(x))  # Add noise

# Fit a polynomial of degree 1 (a linear function) to the data
p = np.polyfit(x, np.log(y), 1)
5 method has given us the best approximation of the true underlying exponential behaviour.

We can use the fitted curve to estimate what our data would be for other values of \(x\) that are not in our raw dataset: what would the value be at \(x=11\) (which is outside our domain and thus requires us to forecast into the future) or \(x = 8.5\) (which is inside our domain and thus requires us to ‘fill in a gap’ in our data)? To answer these questions, we simply plug these x-values as numbers into the equation of the fitted curve: