The data selected is a shoe data that was obtained from the web site named data world. The data was not pre-processed and was given in its raw form. We had to use regular expressions and scripts to pre-process the data.
Python and Tableau
The weights were given directly as strings with both value and its unit combined. As script was written to break the string into an integer(value) and string(unit). The values were all brought from different units like lbs, grams, pounds, kgs etc into a single unit called “kg”. There was another challenge where g and grams both represented grams. This was the case with other units too, they were grouped together to a single unit and then converted to kgs.
The categories were given in a format that was very inconvenient for data visualization. For example, “Athletic and outdoor shoes, hunting boots, shoes, Men’s shoes” as a category name and “Athletic and running shoes, Men’s shoes, Running shoes”. While both the categories are slightly different, they have a common name athletic. We choose to replace many of the sub-categories with common names. To achieve this custom split option with space delimiter was used.
Shoes were marked with different colors and that data has to be analyzed. Two unique challenges were encountered in this case. In the first one there were many sub colors for each color and was hard to be visualized. For example, black had sub colors like anthracite black, charcoal black, and so on. To overcome this problem, similar colors were grouped into a single entity and named with an alias containing a common name. The second challenge was each product contained more than one colors in its description. There was no mention of the majority color in that table. So we assume the first name as its major color and grouped with the same. One solution could be the use of Venn diagrams to represent the share of each color sales, but given the no of different colors intersecting with each other, that option is not feasible.
From the above bar chart, we can infer that multi-color shoes and black, white color shoes are very popular among customers. Whereas purple and other such colors are less popular. We can use this to increase or decrease the production of specific colors.
In this chart, size of the bubble indicates its earnings. It can be clearly seen that puma brand is the highest earner of all the brands.
There were different kinds of discounts given by different websites. The offers were named reduced 10$, reduced 50$, discount 30%, discount 40% and so on. Though the naming is different the offers are basically the same. We wanted to find out which naming convention was pulling more customer attraction.
We used a tree map to visualize the share of offers, we could have used a bar graph but one of the outputs is very high and would become difficult to compare results of other smaller outputs, so tree map was used instead.
It is observed that the offers named “reduced” are pulling more customers than any others. But this conclusion might be distorted by the fact that different websites are employing different naming conventions and it is the website popularity rather than the effect of naming convention on customers.
As we can observe from the data the sales are high early in the morning at 6’0 clock. We can understand that shopping rate is high in this period and increase ad postings during this time. There is sharp decline in sales at 7’0 clock and 8’0 clock as this is the time when users go to their work and shop less. The shopping rate is again at its peak at 1’0 clock at night. So there is no use in posting ads during this time, as it would irritate the users without yielding any results. Peak hours are observed during 1,6,9,11,14. But this doesn’t tell the whole story as users shopping behavior might be different during different during days, for example during weekends they might be waking up late and shopping might be less at 6’0 clock.
As we have guessed, the shopping patterns are different during weekends, i.e users are not shopping at 6’0 clock early in the morning but are shopping more at 11’0 clock. As users regularly wake up at 6’0 clock, the general graph shows us that are actively shopping immediately after they wake up. This also tells us that majority of users are waking up late during weekends.
Here, we can observe that the conclusion about users shopping at 6’0 clock has been distorted by the stats from Thursday. If we observe sales from all weekdays, only on Thursday the sales have been abnormal. So we can disqualify our previous conclusion that users shop abnormally high early in the morning. We come to conclusion that users shop abnormally high only on Thursday mornings. This can be due to working customers being highly active during mid- week and that reflects on customers shopping early in the morning.
As we can observe from the above visualization, customers are shopping heavily on Friday nights, especially at 1’0 clock. This can be attributed to the fact that users sleep late as they have holiday on Saturday and Sunday and hence shop late at night.
Here, we observe an interesting pattern in sales across week days. Generally, we assume that more sales are made on weekends like Saturday and Sunday. While it is correct with offline sales where users shop on weekends, contrarily online shoppers expect the order to delivered on weekend. They wouldn’t be available at home on weekdays generally and hence order a two days before weekend expecting the order to be delivered on weekend.
As we can infer from the above graph, online sales have rocketed between 2015 and 2016. We have omitted data from 2017 as only few months of data was available for that year.
These are the few points observed from the above data.
Generally, there are claims from customer complaining that discounts are bogus. It is assumed that online websites rise their prices and then give more discounts to attract people to buy products from their website. When we plotted a chart to see if it is true, we found some interesting results. Not only the prices have remained same in most case, the product prices also dropped simultaneous to discount offer. This might be due to the willingness of sellers to sell out the outdated stock. Nevertheless, there are few cases prices have been raised in case of discounts.
The above graph is week wise earnings of all websites combined. When forecasting was done on the above data, we observed a linear increase in earnings for few weeks.
From the above bar graph, we can observe that online customers are willing to buy if they are given full money back in case of returns. This may be due to no confidence in online shopping, especially regarding expensive items.
Reviews were selected and exported into csv files. This csv file was pre-processed to extract only reviews. Each review of every product was put into a different row and a maximum of seven review were taken. Sentiment analysis was done on each of these reviews and the polarity of sentiment was exported into csv file. This file was imported into tableau to visualize the reviews as heat map. Red values indicated negative average values and blue values represented positive reviews.
We decided to find which company was selling more products. This might help specific brands to become exclusive partners with online websites. For example, Oneplus has exclusive partnership with amazon.com. As we can clearly understand that walmart has been selling highest number of products in the online space.
In the above chart we have plotted bar charts for different companies and their respective maximum and minimum prices. Maximum price has been plotted in blue color and minimum price has been plotted in orange color. From this chart we can segregate Diamond wish and Brioni as premium brands. These kind of products can be advertised for expensive buyers.
From the above diagram we can observe the companies that are producing foot wear with different weights. Average weights of shoes produced by each companies are visualized in bar chart. This can be used segregate companies into heavy and light weight footwear companies. This data will help the user in selecting the brand by selecting his preference for weight.
As we can see from the above pie chart, though the no of sales have been distributed among four websites, the earnings are more distributed across only two websites Walmart and ebay.
From the above stacked bar graph, we can observe that