How To Get Other Columns When Using Spark Dataframe Groupby

Furthermore, we are going to add a suffix to each column and use reset_index to get a dataframe. Let's say the table have 4 columns, cust_id, f1,f2,f3 and I want to group by cust_id and then get avg(f1), avg(f2) and avg(f3). DataFrame-> pandas. How can I get better performance with DataFrame UDFs? If the functionality exists in the available built-in functions, using these will perform better. Spark DataFrame Can serialize the data into off-heap storage (in memory) in binary format and then perform many transformations directly on this off heap memory because spark understands the schema. You'll use the dataframe as your source and use the groupBy() method. merge() function. My Spark & Python series of tutorials can be examined individually, although there is a more or less linear 'story' when followed in sequence. We use the built-in functions and the withColumn() API to add new columns. Pandas DataFrame are rectangular grids which are used to store data. * * Different from other join functions, the join columns will only appear once in the output, * i. Get value of a particular cell in Spark Dataframe I have a Spark dataframe which has 1 row and 3 columns, namely start_date, end_date, end_month_id. Pyspark Joins by Example This entry was posted in Python Spark on January 27, 2018 by Will Summary: Pyspark DataFrames have a join method which takes three parameters: DataFrame on the right side of the join, Which fields are being joined on, and what type of join (inner, outer, left_outer, right_outer, leftsemi). Let’s have some overview first then we’ll understand this operation by some examples in Scala, Java and Python languages. You can upsert data from an Apache Spark DataFrame into a Delta table using the merge operation. cannot construct expressions). Apply a function on each group. The BeanInfo, obtained using reflection, defines the schema of the table. select(colNames). I want to retrieve the value from first cell into a variable and use that variable to filter another dataframe. A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns. Spark SQL supports operating on a variety of data sources through the DataFrame interface. S licing and Dicing. That is,you can make the date column the index of the DataFrame using the. ReduceByKey pyspark dataframe performance reducebykey groupbykey Question by sk777 · Feb 22, 2016 at 06:27 AM ·. id" ) You can specify a join condition (aka join expression ) as part of join operators or using where or filter operators. DataFrame API Examples. For instance, this is the setting I use. how to get unique values of a column in pyspark dataframe. size() This method can be used to count frequencies of objects over single or multiple columns. You can upsert data from an Apache Spark DataFrame into a Delta table using the merge operation. any¶ DataFrame. Spark SQL follows the same pre-SQL:1999 convention as most of the major databases (PostgreSQL, Oracle, MS SQL Server) which doesn't allow additional columns in aggregation queries. We use the built-in functions and the withColumn() API to add new columns. You can find all the code at the GitHub repository. Let’s have some overview first then we’ll understand this operation by some examples in Scala, Java and Python languages. For efficiency, once you are finished using cached DataFrame, you can optionally tell Spark to stop caching it in memory by using the DataFrame’s unpersist() method to inform Spark that you no longer need the cached data. How to resolve this issue. In Spark, a DataFrame is a distributed collection of data organized into named columns. Returns: DataFrame of bool. Pandas DataFrame aggregate function using multiple columns. any (self, axis=0, bool_only=None, skipna=True, level=None, **kwargs) [source] ¶ Return whether any element is True, potentially over an axis. Spark SQL follows the same pre-SQL:1999 convention as most of the major databases (PostgreSQL, Oracle, MS SQL Server) which doesn't allow additional columns in aggregation queries. Let' see how to combine multiple columns in Pandas using groupby with dictionary with into a data-frame. Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. 4? If so I would. Nested JavaBeans and List or Array fields are supported though. See GroupedData for all the available aggregate functions. SQL can be run over a temporary view created using DataFrames. foreach(println) My UDF takes a parameter including the column to operate on. Notice columns with matching names are coalesced into each other, which we interpret as “take the value from the left table, unless it is missing. Mode of a data frame, mode of column and mode of rows, let's see an example of each We need to use the package name "statistics" in calculation of mode. SparkSession import org. 1 and above, display attempts to render image thumbnails for DataFrame columns matching Spark’s ImageSchema. DataFrames can be summarized using the groupby method. In Spark, dataframe is actually a wrapper around RDDs, the basic data structure in Spark. Or generate another data frame, then join with the original data frame. You can automate it using this addition to your notebook. How do I pass this. Users can use DataFrame API to perform various relational operations on both external data sources and Spark’s built-in distributed collections without providing specific procedures for processing data. Mode Function in Python pandas (Dataframe, Row and column wise mode) Mode Function in python pandas is used to calculate the mode or most repeated value of a given set of numbers. Example usage below. For instance OneHotEncoder multiplies two columns (or one column by a constant number) and then creates a new column to fill it with the results. Mastering Spark schemas is necessary for debugging code and writing tests. Koalas is an open-source Python package…. Recently in one of the POCs of MEAN project, I used groupBy and join in apache spark. Groups the DataFrame using the specified columns, so we can run aggregation on them. Kinetica Spark Connector Guide. Let us explore the objectives of Running SQL Queries using Spark in the next section. I'm a beginner in Spark and I want to calculate the average of number per name. A community forum to discuss working with Databricks Cloud and Spark. groupby(level="symbol"). Registering a DataFrame as a temporary view allows you to run SQL queries over its data. alias( ' new_name_for_A ' ) # in other cases the col method is nice for referring to columnswithout having to repeat the dataframe name. We want to process each of the columns independently, and we know that the content of each of the columns is small enough to fit comfortably in memory (up to tens of millions of doubles). Let's say the table have 4 columns, cust_id, f1,f2,f3 and I want to group by cust_id and then get avg(f1), avg(f2) and avg(f3). We tried creating a spark sql subquery but it seems spark sub query is not working in spark structured streaming. A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns. You may say that we already have that, and it's called groupBy, but as far as I can tell, groupBy only lets you aggregate using some very limited options. In other words, I want to get the following result:. groupby ([by]) Group DataFrame or Series using a mapper or by a Series of columns. 6 was the ability to pivot data, creating pivot tables, with a DataFrame (with Scala, Java, or Python). level: int or label. Note that Spark DataFrame doesn’t have an index. One-hot encoding is a simple way to transform categorical features into vectors that are easy to deal with. filter($"count" >= 2)// or. Returns False unless there at least one element within a series or along a Dataframe axis that is True or equivalent (e. At the moment, all DataFrame grouping operations assume that you're grouping for the purposes of aggregating data. Sometimes it will display all the rows if you print the dataframe. Using groupby functionality we group the rows by key, and after it, we call aggregate with the operations. I have searched with google, found that groupBy always used with "agg" function. Mode of a data frame, mode of column and mode of rows, let's see an example of each We need to use the package name "statistics" in calculation of mode. Using Spark for Data Profiling or Exploratory Data Analysis Data profiling is the process of examining the data available in an existing data source (e. stack¶ DataFrame. 5, including new built-in functions, time interval literals, and user-defined aggregation function interface. It can also handle Petabytes of data. That is,you can make the date column the index of the DataFrame using the. I want to group by one of the columns and aggregate other columns all the once. txt") A SparkSession can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files. Let's Start with a simple example of renaming the columns and then we will check the re-ordering and other actions we can perform using these functions. This helps Spark optimize execution plan on these queries. Spark SQL can also be used to read data from an existing Hive installation. cannot construct expressions). When working in Java, data operations like the following should be easy. Let's say the table have 4 columns, cust_id, f1,f2,f3 and I want to group by cust_id and then get avg(f1), avg(f2) and avg(f3). It consists of rows and columns. Append Spark Dataframe with a new Column by UDF To change the schema of a data frame, we can operate on its RDD, then apply a new schema. get (self, key, default=None) [source] ¶ Get item from object for given key (ex: DataFrame column). So the better way to do this could be using dropDuplicates Dataframe API available in Spark 1. How can I get better performance with DataFrame UDFs? If the functionality exists in the available built-in functions, using these will perform better. Pandas has two ways to rename their Dataframe columns, first using the df. Next, we can use shape in order to return a tuple representing the dimensionality of the DataFrame. Split the data into groups by using DataFrame. Bob Seattle 2 2. One use of Spark SQL is to execute SQL queries. similar to reduce and aggregate operations) than doing a naive shuffle. When working in Java, data operations like the following should be easy. Spark Tutorial: Validating Data in a Spark DataFrame Part Two - DZone Big Data / Big. Here derived column need to be added, The withColumn is used, with returns a dataframe. 4, you can finally port pretty much any relevant piece of Pandas' DataFrame computation to Apache Spark parallel computation framework using Spark SQL's DataFrame. csv, other functions like describe works on. See GroupedData for all the available aggregate functions. ***You can control this behavior by setting some defaults of your own while importing Pandas. In this tutorial we will present Koalas, a new open source project that we announced at the Spark + AI Summit in April. But the result is a dataframe with hierarchical columns, which are not very easy to work with. For the standard deviation, see scala - Calculate the standard deviation of grouped data in a Spark DataFrame - Stack Overflow. DataFrame. The new columns are populated with predicted values or combination of other columns. apply(), you must define the following:. foldLeft can be used to eliminate all whitespace in multiple columns or…. We started by downloading a small data set for the purpose of this blog post, but in real life, if you were using Spark, the data set would likely be much bigger and hosted remotely. Pandas groupby Start by importing pandas, numpy and creating a data frame. I prefer a solution that I can use within the context of groupBy / agg, so that I can mix it with other PySpark aggregate functions. After grouping a DataFrame object on one or more columns, we can apply size() method on the resulting groupby object to get a Series object containing frequency count. Adding a New Column Using keys from Dictionary matching a column in pandas. In this article we discuss how to get a list of column and row names of a DataFrame object in python pandas. One-hot encoding is a simple way to transform categorical features into vectors that are easy to deal with. groupby() and. How to get other columns when using Spark DataFrame groupby? 7 answers I am new to spark(2. ***Sometimes your notebook won’t show you all the columns. Class Overview. Here we are grouping on continents and count the number of countries within each continent in the dataframe using aggregate function and came up with the pie-chart as shown in the figure below. The input data contains all the rows and columns for each group. 6 Dataframe; How to exclude multiple columns in Spark dataframe in Python; Adding a new column in Data Frame derived from other columns (Spark) Spark DataFrame groupBy and sort in the descending order (pyspark) Filter Spark DataFrame by checking if value is in a list, with. You can use Spark SQL, as in listing 6. 5, including new built-in functions, time interval literals, and user-defined aggregation function interface. We set up environment variables, dependencies, loaded the necessary libraries for working with both. For instance, this is the setting I use. For example, you can use the describe() method of DataFrame s to perform a set of aggregations that describe each group in the data:. ***Sometimes your notebook won’t show you all the columns. The simplest way to create a DataFrame is to convert a local R data. One of the many new features added in Spark 1. x with Kinetica via the Spark Data Source API. any¶ DataFrame. If this is not possible for some. Introduction. It accepts a function word => word. To add on, it may not be the case that we want to groupBy all columns other than the column(s) in aggregate function i. filter("`count` >= 2"). (similar to R data frames, dplyr) but on large datasets. In the couple of months since, Spark has already gone from version 1. and printing yields a GroupBy object: City Name Name City. This blog provides an exploration of Spark Structured Streaming with DataFrames, extending the previous Spark MLLib Instametrics data prediction blog example to make predictions from streaming data. WHAT OTHER'S ARE READING. Cross joins create a new row in DataFrame #1 per record in DataFrame #2: Anatomy of a cross join. * * This is a variant of groupby that can only group by existing columns using column names * This is a variant of groupBy that can only group by existing columns using column names. Alice Seattle 1 1. ORC format was introduced in Hive version 0. Spark DataFrame Can serialize the data into off-heap storage (in memory) in binary format and then perform many transformations directly on this off heap memory because spark understands the schema. Combine the results into a new DataFrame. We’ll show how to work with IntegerType, StringType, LongType, ArrayType, MapType and StructType columns. Parse nested JSON to Data Frame in R. For grouping by percentiles, I suggest defining a new column via a user-defined function (UDF), and using groupBy on that column. any (self, axis=0, bool_only=None, skipna=True, level=None, **kwargs) [source] ¶ Return whether any element is True, potentially over an axis. In both cases this will return a dataframe, where the columns are the numerical columns of the original dataframe, and the rows are the statistical values. Pandas is one of those packages and makes importing and analyzing data much easier. Further,it helps us to make the colum names to have the format we want, for example, to avoid spaces in the names of the columns. If we want to keep it shorter, and also get rid of the ellipsis in order to read the entire content of the columns, we can run df. Not specified column and not the duplicated rows. groupby(['c','d'], axis = 1, level = 1) #or like this df. show() If you want to know more about Spark, then do check out this awesome. val colNames = Seq("c1", "c2") df. There is a toJSON() function that returns an RDD of JSON strings using the column names and schema to produce the JSON records. My Spark & Python series of tutorials can be examined individually, although there is a more or less linear 'story' when followed in sequence. Long story short in general you have to join aggregated results with the original table. tolist() ['A_1', 'A_2', 'B_1', 'B_2', 's_ID'] To split the column names and get part of it, we can use Pandas "str" function. groupBy("name"). stack¶ DataFrame. after grouping to minimum value in pandas, how to display the matching row result entirely along min() value to get column 'b' in your result, use pd. Broadcast across a level, matching Index values on the passed MultiIndex level. cacheTable("tableName") or dataFrame. How to resolve this issue. Example usage below. A DataFrame is a distributed collection of data, which is organized into named columns. Synopsis This tutorial will demonstrate using Spark for data processing operations on a large set of data consisting of pipe delimited text files. Dataframe basics for PySpark. The reason max isn't working for your dataframe is because it is trying to find the max for that column for every row in you dataframe and not just the max in the array. Data frame A PIs usually supports elaborate methods for slicing-and-dicing the data. ***You can control this behavior by setting some defaults of your own while importing Pandas. drop¶ DataFrame. Integer division of dataframe and other, element-wise (binary operator floordiv). And the most famous algorithm for large scale data processing is Hadoop MapReduce. Here's an easy example of how to rename all columns in an Apache Spark DataFrame. Usage ## S4 method for signature 'DataFrame' groupBy(x, ). To use groupBy(). x) using spark-sql , I created a dataframe using spark sql context. In real data science projects, you'll be dealing with large amounts of data. Remember that our dataset contains disparate data sources. // Compute the average for all numeric columns grouped by department. createOrReplaceTempView("people") 7. count() methods on your DataFrame to do so. %md # Code recipe: how to process large numbers of columns in a Spark dataframe with Pandas Here is a dataframe that contains a large number of columns (up to tens of thousands). Str returns a string object. parquet("") Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: :class:`DataFrame`, :class:`Column`. The rest looks like regular SQL. If it has more partitions than available memory, by default, it will evict older partitions to make room for new ones. How to Get Unique Values from a Column in Pandas Data Frame? January 31, 2018 by cmdline Often while working with a big data frame in pandas, you might have a column with string/characters and you want to find the number of unique elements present in the column. Here we are grouping on continents and count the number of countries within each continent in the dataframe using aggregate function and came up with the pie-chart as shown in the figure below. A call without columns just prepares the DataFrame so that aggregation functions like mean can be applied. We set up environment variables, dependencies, loaded the necessary libraries for working with both. Let us first understand the. SparkR in notebooks. Next, we can use shape in order to return a tuple representing the dimensionality of the DataFrame. non-zero or non-empty). You can use Spark SQL, as in listing 6. The results of SQL queries are DataFrames and support all the normal RDD operations. In this lesson, you will learn how to access rows, columns, cells, and subsets of rows and columns from a pandas dataframe. The last datatypes of each column, but not necessarily in the corresponding order to the listed columns. Lets take the below Data for demonstrating about how to use groupBy in Data Frame We can still use multiple columns to groupBy. Spark dataframe split one column into multiple columns using split function April 23, 2018 adarsh 4d Comments Lets say we have dataset as below and we want to split a single column into multiple columns using withcolumn and split functions of dataframe. groupBy() optimized for the data locality (i. "np" is the conventional way to name the library in code. groupby(level="symbol"). You can vote up the examples you like and your votes will be used in our system to generate more good examples. In this blog post, we introduce the new window function feature that was added in Apache Spark 1. The input data contains all the rows and columns for each group. val rdd_json = df. DataFrame(jdf, sql_ctx)¶ A distributed collection of data grouped into named columns. json JSON file, which when converted into DataFrame produced the dataframe below consisting of columns id, author, tag_name. For example, if your dataset is sorted by time, you can quickly select data for a particular day, perform time series joins, etc. if I use Java, such as:. Spark Dataframe WHERE Filter How to Subtract TIMESTAMP-DATE-TIME in HIVE Hive Date Functions - all possible Date operations Spark Dataframe - Distinct or Drop Duplicates How to implement recursive queries in Spark? Hive - BETWEEN Spark Dataframe LIKE NOT LIKE RLIKE Spark Dataframe NULL values. stack (self, level=-1, dropna=True) [source] ¶ Stack the prescribed level(s) from columns to index. What If I want to get the DataFrame which won't have duplicate rows of given DataFrame? We can use dropDuplicates operation to drop the duplicate rows of a DataFrame and get the DataFrame which won't have duplicate rows. We could have also used withColumnRenamed() to replace an existing column after the transformation. The table will have many columns. This helps Spark optimize execution plan on these queries. We know that RDD is a fault-tolerant collection of elements that can be processed in parallel. SparkR in notebooks. Cheat sheet for Spark Dataframes (using Python). Using transform you can create a new column with the aggregated data and get your original dataframe back. Accessing pandas dataframe columns, rows, and cells At this point you know how to load CSV data in Python. Spark SQL is a part of Apache Spark big data framework designed for processing structured and semi-structured data. To get the ID of an experiment by its name using Scala, see the below code: (scala) import org. This post will help you get started using Apache Spark DataFrames with Scala on the MapR Sandbox. How can I do this for dataframe with same datatype and different dataypes. You should use the dtypes method to get the datatype for each column. We started by downloading a small data set for the purpose of this blog post, but in real life, if you were using Spark, the data set would likely be much bigger and hosted remotely. You can vote up the examples you like and your votes will be used in our system to generate more good examples. Long story short in general you have to join aggregated results with the original table. With the addition of new date functions, we aim to improve Spark’s performance, usability, and operational stability. Suppose you have a dataset containing credit card transactions, including: the date of the transaction; the credit card number; the type of the expense. DataFrame column selection in GroupBy¶ Once you have created the GroupBy object from a DataFrame, you might want to do something different for each of the columns. To retrieve the column names, in both cases we can just type df. Welcome to the fourth chapter of the Apache Spark and Scala tutorial (part of the Apache Spark and Scala course). Adding a New Column Using keys from Dictionary matching a column in pandas. 4, users will be able to cross-tabulate two columns of a DataFrame in order to obtain the counts of the different pairs that are observed in those columns. 6: PySpark DataFrame GroupBy vs. Combine the results into a new DataFrame. Using Spark for Data Profiling or Exploratory Data Analysis Data profiling is the process of examining the data available in an existing data source (e. Our data frame contains simple tabular data: In code the same table is:. We use the built-in functions and the withColumn() API to add new columns. ***You can control this behavior by setting some defaults of your own while importing Pandas. SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. 6 Dataframe; How to exclude multiple columns in Spark dataframe in Python; Adding a new column in Data Frame derived from other columns (Spark) Spark DataFrame groupBy and sort in the descending order (pyspark) Filter Spark DataFrame by checking if value is in a list, with. For instance, this is the setting I use. To get the ID of an experiment by its name using Scala, see the below code: (scala) import org. RDD represents Resilient Distributed Dataset. Sometimes it will display all the rows if you print the dataframe. if I use Java, such as:. any¶ DataFrame. Mallory Portland 2 2. function documentation. Suppose you have a Spark DataFrame that contains new data for events with eventId. Registering a DataFrame as a temporary view allows you to run SQL queries over its data. I'm a beginner in Spark and I want to calculate the average of number per name. Combine the results into a new DataFrame. The table will have many columns. 1 and above, display attempts to render image thumbnails for DataFrame columns matching Spark’s ImageSchema. Long story short in general you have to join aggregated results with the original table. New columns can be created only by using literals (other literal types are described in How to add a constant column in a Spark DataFrame?. This topic uses the new syntax. Now lets group by name of the student and find the average score of students in the following code # mean score of Students df['Score']. You can see examples of this in the code. have them as columns). Welcome to the fourth chapter of the Apache Spark and Scala tutorial (part of the Apache Spark and Scala course). A DataFrame is equivalent to a relational table in Spark SQL, and can be created using various functions in SQLContext:. The Scala foldLeft method can be used to iterate over a data structure and perform multiple operations on a Spark DataFrame. How to sort by column in descending order in Spark SQL? How to use orderby() with descending order in Spark window functions? Pyspark replace strings in Spark dataframe column; How to get other columns when using Spark DataFrame groupby? How do I add a new column to a Spark DataFrame (using PySpark)?. Recently in one of the POCs of MEAN project, I used groupBy and join in apache spark. Groups the DataFrame using the specified columns, so we can run aggregation on them. The multi-index can be difficult to work with, and I typically have to rename columns after a groupby operation. The table will have many columns. // Compute the average for all numeric columns grouped by department. How can I get better performance with DataFrame UDFs? If the functionality exists in the available built-in functions, using these will perform better. If we want to keep it shorter, and also get rid of the ellipsis in order to read the entire content of the columns, we can run df. One use of Spark SQL is to execute SQL queries. Both DataFrames can have arbitrary other columns. Append Spark Dataframe with a new Column by UDF To change the schema of a data frame, we can operate on its RDD, then apply a new schema. Registering a DataFrame as a temporary view allows you to run SQL queries over its data. In other words, I have mean but I also would like to know how many number were used to get these means. Flatten hierarchical indices created by groupby. Spark dataframe split one column into multiple columns using split function April 23, 2018 adarsh 4d Comments Lets say we have dataset as below and we want to split a single column into multiple columns using withcolumn and split functions of dataframe. A DataFrame can be constructed from an array of different sources such as Hive tables, Structured Data files, external databases, or existing RDDs. • The DataFrame API is likely to be more efficient, because. Not specified column and not the duplicated rows. val colNames = Seq("c1", "c2") df. You can flatten multiple aggregations on a single columns using the following procedure:. Install Apache Spark & some basic concepts about Apache Spark. Let's say the table have 4 columns, cust_id, f1,f2,f3 and I want to group by cust_id and then get avg(f1), avg(f2) and avg(f3). Spark makes great use of object oriented programming! The RelationalGroupedDataset class also defines a sum() method that can be used to get the same result with less code. This can only be used to assign a new storage level if the RDD does not have a storage level set yet. 5 alone; so, we thought it is a good time for revisiting the subject, this time also utilizing the external package spark-csv, provided by Databricks. We heavily utilize Apache Spark both for our ML jobs (Spark MLlib) and other non-ML batch jobs. Using our simple example you can see that PySpark supports the same type of join operations as the traditional, persistent database systems such as Oracle, IBM DB2, Postgres and MySQL. Vertex DataFrame: A vertex DataFrame should contain a special column named id which specifies unique IDs for each vertex in the graph. foreach(println) My UDF takes a parameter including the column to operate on. cannot construct expressions). Spark has moved to a dataframe API since version 2. In the couple of months since, Spark has already gone from version 1. This post will help you get started using Apache Spark DataFrames with Scala on the MapR Sandbox. Mastering Spark schemas is necessary for debugging code and writing tests. In Spark, a DataFrame is a distributed collection of data organized into named columns. Observe this dataset first. Dragoons regiment company name preTestScore postTestScore 4 Dragoons 1st Cooze 3 70 5 Dragoons 1st Jacon 4 25 6 Dragoons 2nd Ryaner 24 94 7 Dragoons 2nd Sone 31 57 Nighthawks regiment company name preTestScore postTestScore 0 Nighthawks 1st Miller 4 25 1 Nighthawks 1st Jacobson 24 94 2 Nighthawks 2nd Ali 31 57 3 Nighthawks 2nd Milner 2 62 Scouts regiment. DataFrames can be summarized using the groupby method. ORC format was introduced in Hive version 0. The simplest way to create a DataFrame is to convert a local R data. columns, which is the list representation of all the columns in dataframe. I want to retrieve the value from first cell into a variable and use that variable to filter another dataframe. Hopefully, I’ve covered the basics well enough to pique your interest and help you get started with Spark. Alice Seattle 1 1. For instance, this is the setting I use. set_index() method (n. 5, with more than 100 built-in functions introduced in Spark 1. Let's see a few examples.