5 typical f***ups happening to data folks (which you can easily avoid) 

Two years ago I switched from academia to become a data scientist in a pretty small company which is now not as small anymore. Along with the company growth, my knowledge of best practices was growing as well. Here in short stories, I am sharing some annoying problems which often happen and destroy your day (or days) but can be prevented.

 

  • At the end of the day push your changes to the remote git repository!

John is a new data trainee in the company. After studying computer science in Sheffield, he decided to become a data scientist. A company in Berlin gave him this chance. John is on a probation period and naturally wants to impress his boss with his skills. He volunteered for an interesting but pretty tough project. John knew that he would manage that. He was working on an implementation of a new feature for about two weeks. After testing it thoroughly on his local computer, he wanted to deploy it to the production server the next morning. However, when John arrived to work and checked „git status“ to be sure that all the last commits are there, he found a bunch of corrupted objects… Nothing has helped to repair those and frustrated John had no choice other than just cloning the remote repo and redoing all his work again. John’s boss wasn’t amused obviously but John painfully learned this valuable lesson to git push to remote after each couple of hours of meaningful work.

  • Never deploy on Fridays! (and not because of #fridaysforfuture)

Maria is responsible for helping a product team with A/B Tests. The product team loves to start new tests on Friday afternoons, just before the weekend. As they explain, the reason is a huge weekend traffic to all company websites and the possibility to see significant results already on Monday. To help her colleagues out, this Friday Maria added some new columns to the AB data mart, did a couple of changes in a corresponding workbook and ran home to prepare a birthday party for her twins. However, on Saturday morning, right after she woke up and was ready to give birthday presents to her kids, Maria got a telephone call from her boss who insisted that she has to check what is going on right away because the workbook is missing not only the new tests but all the data which was there before. Not only Maria’s weekend was completely ruined but the twins were very upset to have a totally absentminded mum during the most important weekend of the year. Maria learned her lesson. On Monday she insisted on a new company rule: All new tests and deployments should be done before Thursday afternoon.

  • Every now and then make a snapshot of your virtual machine!

Stephan is a young data team lead. He has several people and many projects to manage. At home he has to take care of his wife who is suffering from a difficult pregnancy. Nevertheless, Stephan likes to try new stuff out and improve data pipelines. He is constantly installing new packages and updating the pipenv environment. Of course, some things occasionally go wrong including many reinstallations of his virtual machine. Once it was especially painful. All members of the data team were either sick or had vacations and the owner of the company asked Stephan for a small favor. The most important partner was suddenly visiting the office and the boss wanted to show them some numbers achieved by the company due to their collaboration. The task could not be any easier. Stephan wanted to start his virtual machine running Linux and quickly run some queries like usual in the database console but according to Murphy’s law, the virtual machine was broken. Stephan did not have time for reinstallation, but he was smart enough to figure out other ways to reach the warehouse from Windows. However, lots of stress would not be necessary if Stephan had a habit of making snapshots of the machine from time to time.

Screenshot of making a snapshot.

  • Backup your workbooks (and other important stuff of course too)!

Ola’s job is to prepare fancy but intuitive workbooks for the stakeholders. Usually he works on them locally and then publishes to the online server. Some of the workbooks are saved on his desktop but the most needed one unfortunately wasn’t. Ola has invested lots of time in it to work out all those complex calculations and graphs and now … it is completely gone. It happened when Ola didn’t get much sleep because his thoughts were wandering all night how to improve performance of data sources connected to his workbooks. He seemed to find a solution which he wanted to try out the next morning. However, instead of deleting the data source, he deleted his most important workbook, which had exactly the same name. There was no way to recover it. Ola learned to name stuff differently and to backup everything not only locally but also on several remote storage facilities.

  • Automatically notify stakeholders about missing data!

Ella works as a data engineer in a data-driven company. She is proud that almost all company’s decisions are based on data, „her data“. Ella’s boss checks how the company is doing practically in real time looking at a dashboard specially optimized for his tablet. He needs to be aware about losses and profits in order to take action as soon as possible. The boss knows that he can trust the data team that the data is usually correct. If something is temporarily missing due to technical reasons, they are telling him straight away. However, not that time. Ella decided to spend her well deserved weekend with some friends in a camp. She didn’t realize that not only the Internet connection wasn’t available there but even her mobile wasn’t working. She couldn’t change it anyway and relied on her colleagues in case something happened to the data. They in turn had other reasons not to notice what was going on with Ella’s cost importers. On Monday she had a serious talk with her boss who found out that the company has lost lots of money over the weekend because he thought that the new campaigns are doing great and had no idea of all the missing costs. After a sleepless night, the first thing in the tuesday morning Ella set up a missing data notification system automatically sending emails to relevant stakeholders.

Is it familiar? What was the biggest fuckup that happened to you on the job? Share it in the comments!