Cara menggunakan dynamically create dataframe python

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters**kwargsdict of {str: callable or Series}

The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

ReturnsDataFrame

A new DataFrame with the new columns in addition to all the existing columns.

Notes

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

Examples

>>> df = pd.DataFrame({'temp_c': [17.0, 25.0]},
...                   index=['Portland', 'Berkeley'])
>>> df
          temp_c
Portland    17.0
Berkeley    25.0

Where the value is a callable, evaluated on df:

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
          temp_c  temp_f
Portland    17.0    62.6
Berkeley    25.0    77.0

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

Create new DataFrame by divide both columns and rename columns by DataFrame.add_suffix, last append to original by DataFrame.join:

cols = ['a','b']
new = [f'{x}1' for x in cols]

df = df.join(df[cols].div(df[new].to_numpy()).mul(100).add_suffix('%'))
print (df)

   a  b  a1  b1         a%         b%
0  1  6  10  20  10.000000  30.000000
1  2  7  11  21  18.181818  33.333333
2  3  8  12  22  25.000000  36.363636
3  4  9  13  23  30.769231  39.130435
4  5  2  14  24  35.714286   8.333333

Ever thought you could build a real-time dashboard in Python without writing a single line of HTML, CSS, or Javascript?

Yes, you can! In this post, you’ll learn:

  1. How to import the required libraries and read input data
  2. How to do a basic dashboard setup
  3. How to design a user interface
  4. How to refresh the dashboard for real-time or live data feed
  5. How to auto-update components

Can’t wait and want to jump right in? Here's the code repo and the video tutorial.

What’s a real-time live dashboard?

A real-time live dashboard is a web app used to display Key Performance Indicators (KPIs).

If you want to build a dashboard to monitor the stock market, IoT Sensor Data, AI Model Training, or anything else with streaming data, then this tutorial is for you.

Cara menggunakan dynamically create dataframe python

1. How to import the required libraries and read input data

Here are the libraries that you’ll need for this dashboard:

  • Streamlit (st). As you might’ve guessed, you’ll be using Streamlit for building the web app/dashboard.
  • Time, NumPy (np). Because you don’t have a data source, you’ll need to simulate a live data feed. Use NumPy to generate data and make it live (looped) with the Time library (unless you already have a live data feed).
  • Pandas (pd). You’ll use pandas to read the input data source. In this case, you’ll use a Comma Separated Values (CSV) file.

Go ahead and import all the required libraries:

import time  # to simulate a real time data, time loop

import numpy as np  # np mean, np random
import pandas as pd  # read csv, df manipulation
import plotly.express as px  # interactive charts
import streamlit as st  # 🎈 data web app development

You can read your input data in a CSV by using

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
3. But remember, this data source could be streaming from an API, a JSON or an XML object, or even a CSV that gets updated at regular intervals.

Next, add the

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
3 call within a new function
dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
5 so that it gets properly cached.

What's caching? It's simple. Adding the decorator

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
6 will make the function
dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
5 run once. Then every time you rerun your app, the data will stay memoized! This way you can avoid downloading the dataset again and again. Read more about caching in Streamlit docs.

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()

Cara menggunakan dynamically create dataframe python

2. How to do a basic dashboard setup

Now let’s set up a basic dashboard. Use

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
8 with parameters serving the following purpose:

  • The web app title
    dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"
    
    # read csv from a URL
    @st.experimental_memo
    def get_data() -> pd.DataFrame:
        return pd.read_csv(dataset_url)
    
    df = get_data()
    
    9 in the HTML tag <title> and in the browser tab
  • The favicon that uses the argument
    st.set_page_config(
        page_title="Real-Time Data Science Dashboard",
        page_icon="✅",
        layout="wide",
    )
    
    0 (also in the browser tab)
  • The
    st.set_page_config(
        page_title="Real-Time Data Science Dashboard",
        page_icon="✅",
        layout="wide",
    )
    
    1 that renders the web app/dashboard with a wide-screen layout
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)

3. How to design a user interface

A typical dashboard contains the following basic UI design components:

  • A page title
  • A top-level filter
  • KPIs/summary cards
  • Interactive charts
  • A data table

Let’s drill into them in detail.

Page title

The title is rendered as the <h1> tag. To display the title, use

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
2. It’ll take the string “Real-Time / Live Data Science Dashboard” and display it in the Page Title.

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")

Top-level filter

First, create the filter by using

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
3. It’ll display a dropdown with a list of options. To generate it, take the unique elements of the
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
4 column from the dataframe df. The selected item is saved in an object named
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
5:

# top-level filters
job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))

Now that your filter UI is ready, use

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
5 to filter your dataframe df.

# dataframe filter
df = df[df["job"] == job_filter]

KPIs/summary cards

Before you can design your KPIs, divide your layout into a 3 column layout by using

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
7. The three columns are kpi1, kpi2, and kpi3.
st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
8 helps you create a KPI card. Use it to fill one KPI in each of those columns.

st.set_page_config(
    page_title="Real-Time Data Science Dashboard",
    page_icon="✅",
    layout="wide",
)
8’s label helps you display the KPI title. The value **is the argument that helps you show the actual metric (value) and add-ons like delta to compare the KPI value with the KPI goal.

# create three columns
kpi1, kpi2, kpi3 = st.columns(3)

# fill in those three columns with respective metrics or KPIs
kpi1.metric(
    label="Age ⏳",
    value=round(avg_age),
    delta=round(avg_age) - 10,
)

kpi2.metric(
    label="Married Count 💍",
    value=int(count_married),
    delta=-10 + count_married,
)

kpi3.metric(
    label="A/C Balance $",
    value=f"$ {round(balance,2)} ",
    delta=-round(balance / count_married) * 100,
)

Interactive charts

Split your layout into 2 columns and fill them with charts. Unlike the metric above, use the

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
0 clause to fill the interactive charts in the respective columns:

  • Density_heatmap in fig_col1
  • Histogram in fig_col2
# create two columns for charts
fig_col1, fig_col2 = st.columns(2)

with fig_col1:
    st.markdown("### First Chart")
    fig = px.density_heatmap(
        data_frame=df, y="age_new", x="marital"
    )
    st.write(fig)
   
with fig_col2:
    st.markdown("### Second Chart")
    fig2 = px.histogram(data_frame=df, x="age_new")
    st.write(fig2)

Data table

Use

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
1 to display the data frame. Remember, your data frame gets filtered based on the filter option selected at the top:

st.markdown("### Detailed Data View")
st.dataframe(df)

4. How to refresh the dashboard for real-time or live data feed

Since you don’t have a real-time or live data feed yet, you’re going to simulate your existing data frame (unless you already have a live data feed or real-time data flowing in).

To simulate it, use a

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
2 loop from 0 to 200 seconds (as an option, on every iteration you’ll have a second
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
3/pause):

for seconds in range(200):

    df["age_new"] = df["age"] * np.random.choice(range(1, 5))
    df["balance_new"] = df["balance"] * np.random.choice(range(1, 5))
    time.sleep(1)

Inside the loop, use NumPy's

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
4 to generate a random number between 1 to 5. Use it as a multiplier to randomize the values of age and balance columns that you’ve used for your metrics and charts.

5. How to auto-update components

Now you know how to do a Streamlit web app!

To display the live data feed with auto-updating KPIs/Metrics/Charts, put all these components inside a single-element container using

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
5. Call it
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
6:

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
0

Put your components inside the

# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
6 by using a
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
0 clause. This way you’ll replace them in every iteration of the data update. The code below contains the
# dashboard title
st.title("Real-Time / Live Data Science Dashboard")
9 along with the UI components you created above:

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
1

And...here is the full code!

dataset_url = "https://raw.githubusercontent.com/Lexie88rus/bank-marketing-analysis/master/bank.csv"

# read csv from a URL
@st.experimental_memo
def get_data() -> pd.DataFrame:
    return pd.read_csv(dataset_url)

df = get_data()
2

To run this dashboard on your local computer:

  1. Save the code as a single monolithic
    # top-level filters
    job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))
    0.
  2. Open your Terminal or Command Prompt in the same path where the
    # top-level filters
    job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))
    0 is stored.
  3. Execute
    # top-level filters
    job_filter = st.selectbox("Select the Job", pd.unique(df["job"]))
    2 for the dashboard to start running on your localhost and the link would be displayed in your Terminal and also opened as a new Tab in your default browser.

Wrapping up

Congratulations! You have learned how to build your own real-time live dashboard with Streamlit. I hope you had fun along the way.

If you have any questions, please leave them below in the comments or reach out to me at [email protected] or on Linkedin.

Apa yang dimaksud dengan data frame?

Dataframe merupakan tabel atau data tabular dengan array dua dimensi yaitu baris dan kolom. Struktur data ini merupakan cara paling standar untuk menyimpan data. Setiap kolom pada dataframe merupakan objek dari Series, dan baris terdiri dari elemen yang ada pada Series.

Apa itu Pandas pada python?

Nah dalam hal ini Library Pandas berarti sebuah library open source yang ada pada bahasa pemrograman Python yang sering digunakan untuk memproses data, mulai pembersihan data, manipulasi data, hingga melakukan analisis data.