• PySpark – PySpark MLlib

    PySpark – PySpark MLlib

    PySpark – PySpark MLLib Table Of Contents: What is PySpark MLlib? Two APIs in MLlib Why Use MLlib? Key Features Example ML Pipeline (End-to-End) Commonly Used Classes When to Use PySpark MLlib? (1) What Is PySpark MLLib? (2) Two APIs in MLlib (3) Why Use MLlib? (4) Key Features (5) What is a PySpark Pipeline? model.transform(data) model = pipeline.fit(data) from pyspark.ml import Pipeline from pyspark.ml.feature import StringIndexer, VectorAssembler from pyspark.ml.classification import LogisticRegression # Step 1: Convert label to numeric indexer = StringIndexer(inputCol="purchased", outputCol="label") # Step 2: Assemble features assembler = VectorAssembler(inputCols=["age", "salary"], outputCol="features") # Step 3: Model lr = LogisticRegression(featuresCol="features",

    Read More

  • PySpark – PySpark SQL

    PySpark – PySpark SQL

    PySpark – PySpark SQL Table Of Contents: What is PySpark SQL? Why Use PySpark SQL? Setting It Up (Step-by-Step) SQL vs DataFrame APIs (Both Supported!) Advanced Features in PySpark SQL Input Data Formats Performance Optimizations Real-World Use Cases Summary (1) What is PySpark SQL? (2) Why Use PySpark SQL? (3) Setting It Up (Step-by-Step) Step 1: Create a SparkSession from pyspark.sql import SparkSession spark = SparkSession.builder .appName("PySparkSQLDemo") .getOrCreate() Step 2: Load Data into a DataFrame df = spark.read.csv("employees.csv", header=True, inferSchema=True) df.show() Step 3: Register DataFrame as SQL Table (Temp View) df.createOrReplaceTempView("employees") Step 4: Run SQL Queries! result = spark.sql(""" SELECT

    Read More

  • PySpark – DataFrames

    PySpark – DataFrames

    PySpark – Dataframes Table Of Contents: What Is PySpark DataFrames. Why Use DataFrames In PySpark? How To Create DataFrames In PySSark? Common DataFrame Operations. Lazy Evaluation. Under the Hood: Catalyst & Tungsten (1) What Is PySpark DataFrames. (2) Why Use DataFrames in PySpark? (3) How to Create a DataFrame? From A List: from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Example").getOrCreate() data = [("Alice", 30), ("Bob", 25), ("Charlie", 35)] columns = ["Name", "Age"] df = spark.createDataFrame(data, columns) df.show() From A CSV File: df = spark.read.csv("employees.csv", header=True, inferSchema=True) df.show() (4) Common DataFrame Operations Filtering Rows: df.filter(df.Age > 30).show() Selecting Columns: df.select("Name").show() Group and

    Read More

  • PySpark – Spark Application Lifecycle Overview

    PySpark – Spark Application Lifecycle Overview

    PySpark – Spark Application Lifecycle Overview Table Of Contents: Spark Application Starts Driver Program Is Launched Cluster Manager Allocates Resources Job is Created on Action DAG Scheduler Breaks Job into Stages Tasks are Sent to Executors Results Returned to Driver SparkContext Stops / Application Ends (1) Spark Application Start from pyspark.sql import SparkSession spark = SparkSession.builder.appName("MyApp").getOrCreate() We need to first initialize a Spark application to enable distributed data processing with Apache Spark. you are initializing a Spark application. This is the entry point for using Spark. (2) Driver Program Is Launched from pyspark.sql import SparkSession # This runs on the

    Read More

  • PySpark – Apache PySpark Ecosystem Overview.

    PySpark – Apache PySpark Ecosystem Overview.

    PySpark – Apache PySpark Ecosystem Overview. Table Of Contents: SparkContext RDD (Resilient Distributed Dataset) DataFrame Spark SQL SparkSession MLlib Spark Streaming / Structured Streaming GraphX / GraphFrames Data Sources & Integration Deployment & Cluster Management PySpark Libraries (1) Spark Context from pyspark import SparkContext sc = SparkContext("local", "MyApp") (2) RDD (Resilient Distributed Dataset) rdd = sc.parallelize([1, 2, 3, 4]) rdd2 = rdd.map(lambda x: x * 2) (3) DataFrame from pyspark.sql import SparkSession spark = SparkSession.builder.appName("App").getOrCreate() df = spark.read.csv("data.csv", header=True) (4) Spark SQL df.createOrReplaceTempView("people") spark.sql("SELECT * FROM people WHERE age > 30").show() (4) SparkSession spark = SparkSession.builder.appName("App").getOrCreate() (5) MLlib from pyspark.ml.classification

    Read More

  • PySpark – PySpark Vs Pandas Vs Dask .

    PySpark – PySpark Vs Pandas Vs Dask .

    PySpark – PySpark Vs Pandas Vs Dask Table Of Contents: PySpark Vs Pandas Vs Dask . Use Case-Based Comparison . Summary . (1) PySpark Vs Pandas Vs Dask (2) Use Case-Based Comparison . (3) Summary

    Read More

  • PySpark – Why Use PySpark Over Python ?

    PySpark – Why Use PySpark Over Python ?

    PySpark – Why Use PySpark Over Python ? Table Of Contents: Why Use PySpark Over Python ? Distributed Computing. Big Data Support. Lazy Evaluation. In Built Fault Tolerance. Support For SQL, ML, Streaming and Graphs. Cluster Deployment. Optimized Engine. (1) Why Use PySpark Over Python ?

    Read More

  • PySpark – What Is PySpark ?

    PySpark – What Is PySpark ?

    PySpark – What Is Pyspark ? Table Of Contents: What Is PySpark ? What Is Distributed Computing ? What Happens If I Have A Single Computer With Me How The Task Will Get Distributed ? How Spark Works On Single Core Device ? (1) What Is PySpark ? (2) What Is Distributed Computing ? (3) What Happens If I Have A Single Computer With Me How The Task Will Get Distributed ? (4) How Spark Works On Single Core Device ?

    Read More

  • PySpark – Syllabus

    PySpark – Syllabus

    PySpark – Syllabus Table Of Contents:

    Read More

  • NLP – BERT Architecture

    NLP – BERT Architecture

    NLP – BERT Architecture Table Of Contents: Introduction to BERT BERT Architecture Input Representation Pretraining Objectives Fine-Tuning BERT Variants of BERT BERT Evaluation and Benchmarks Advanced Concepts Implementation with Libraries Limitations and Challenges Applications of BERT (1) Introduction To BERT. (2) BERT – Questions What is BERT and the transformer, and why do I need to understand it? Models like BERT are already massively impacting academia and business, so we’ll outline some of the ways these models are used, and clarify some of the terminology around them. What did we do before these models? To understand these models, it’s important to look

    Read More