Quarterly Q – April 2018

Hello READER,

Hope everyone had a successful 1st quarter.  As Q2 kicks off, wanted to share more insight on a newer capability we’re hearing more and more about – Cloud Data Warehousing.  As we all know, infrastructure is starting to move into the cloud at a very fast rate and Data Warehousing is becoming one of those components that companies are honing in on.  Wanted to introduce our friend Frank Bell. Frank runs IT Strategists, a consulting firm helping teams deliver data solutions to their organizations, which include Disney, Ticketmaster, Nissan, Toyota, USAF and Unilever.  He’s very excited to share his experience and insight on this topic with all of us.

Snowflake – Cloud Data Warehousing Revolution

It’s the age of disruption and companies must be agile and data-driven, or it’s quite certain in due time they will be disrupted and replaced.  Of course, this scenario is very scary, but it’s also an exciting time.

As most of us can attest to, data problems and inefficiencies are prevalent within all of our organizations in some capacity.  While Data Warehousing and even Big Data solutions have been around awhile including Teradata, Netezza, Vertica, Impala, Presto, Redshift, and Hadoop, these technologies are very complex to integrate and yet still lead to scaling challenges, slow implementation efforts, remain very costly and none offer sustainable flexibility.

From our experience, most of the organizations we survey, assess, and work with have a ton of data and managing it is getting more and more complex.  Companies are always on the prowl to achieve access to data faster and faster to increase analysis and automation value.  Slow moving technology executives and teams are constantly getting side-stepped from marketing and operations’ teams which are moving data to their own cloud silos.  Data complexity is growing rapidly and companies are experiencing many problems including:

  • Data speed.  Data loading for many businesses is still batch driven and often takes hours and sometimes even days.  Modern businesses just cannot wait this long to analyze and drive automation.
  • Data concurrency problems.  Business users often cannot access the latest data fast enough or have to wait until loading is done.
  • Data sources are more numerous and varied.  (Not just traditional rows/columns but JSON, AVRO, Parquet, etc.)
  • Data is almost always in silos and cannot be cross-referenced.
  • Data access is too often complex.

In addition, data security is a huge concern as breaches continue to increase across the ecosystem.

One of the tools that we feel is imperative to combat these complexities is Snowflake.  First off, Snowflake is easy to use, very fast and handles concurrency issues & limitations effortlessly.  Some of the other efficiencies that Snowflake brings to the table includes:

  • SQL is the most common technical language used.  It’s relatively easy for even business users to pick up versus learning new syntax and languages.
  • Being able to easily load, query, and relate JSON, XML, Parquet, and other sources with relational data make analysis much faster.
  • It allows the creation of entire clones of production in seconds.  No more waiting hours to duplicate content.  This is amazingly efficient for QA and Data Quality.
  • Security is now taken care of for you with security experts.
  • Time Travel eliminates the needs for costly and complex backup operations.  You can even query your previous data table(s) down to the millisecond.
  • Separating Compute and Storage opens up major innovations not available before.
  • Since compute can now be separated, organizations can have isolated workloads for data loading, marketing, operations, data scientists, etc. etc.
  • Paying only for what you use. Now you can effectively size your costs for your workload when you need it. No longer do you have to buy hardware to scale for the maximum use cases.
  • Going to 1/10th the cost of database administration is amazing for TCO. All that expertise you had with indexing, vacuuming, etc. is no longer needed to pay for. It comes as part of it.

We have seen very positive results with Snowflake implementations in a very short amount of time including:

  • 78% cost savings replacing on-prem data warehouse and Hadoop.
  • Implementation time goes from months or years to weeks.
  • ETLs adjust from days to hours or even minutes.

by Frank Bell
Big Data Principal | IT Strategists | www.ITStrategists.com

Elastic Beanstalk And Docker

Problem

At Lykuid we needed a mechanism to ingest customer data. It had to provide high availability and complete isolation, so customers are not impacted by possible downtime, service upgrades, or bugs introduced from other components. This requires an isolated service which would be simple and robust.

We also needed predictable response times and minimal resource constraints. The platform needed to support high concurrency without requiring a large thread or worker pool. In order to do this we needed an application where all I/O is asynchronous.

Solution

We chose Node.js because it provides concurrency without having to manage resource pools. With Node.js we were able to implement our logic in a performant high level language without the concern of being blocked by any outside services.

Elastic Beanstalk is an Amazon-managed service which provides monitoring and auto provisioning. It reduces our maintenance by providing upgrades and auto expanding and shrinking. Elastic Beanstalk also provides log management as well as archival and metric collection. An Amazon provided Docker platform is also included. This allows us to run our application in a containerized environment.

Why Docker with Elastic Beanstalk?

Traditional Elastic Beanstalk deployments use Amazon Linux running Node which runs your application. This ties you to using Amazon’s Node.js version and configuration. By using Docker we are able to customize the Node.js environment and package it with our dependencies. This provides greater control over our application and does not tie us to the constraints of traditional Elastic Beanstalk environments. This method achieves the flexibility to use any of the published Docker base images on Docker hub or other registries.

For this use case, we selected Amazon’s Elastic Beanstalk with the Docker Platform and Elastic Container Registry (ECR). Elastic Beanstalk provides us with a cluster of ingestion nodes spread across multiple availability zones with a managed platform capable of running standard docker images.

Elastic Beanstalk provides us with deployment automation, health monitoring, log and metric collection and auto scaling.

Elastic Beanstalk / Docker Architecture

Elastic Beanstalk / Docker Architecture

A developer writes a Dockerfile which describes how to package his application into a Docker image. This allows him to build the image using the docker build command. He then can tag the image with docker tag and push to ECR using docker push. This image is now housed on Amazon’s infrastructure and is ready to be deployed using Elastic Beanstalk. With the image on ECR, the developer is able to launch a docker Elastic Beanstalk environment and deploy his application by providing a Dockerrun.aws.json file.

Example Dockerfile

FROM node:boron

RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app


# Install app dependencies
COPY package.json /usr/src/app/
RUN npm install

COPY . /usr/src/app

EXPOSE 3000
CMD [ "npm", "start" ]

Example Dockerrun.aws.json

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "392939824843.dkr.ecr.us-east-1.amazonaws.com/myproject:0.1",
    "Update": "true"
  },
  "Ports": [
    {
      "ContainerPort": "3000"
    }
  ]
}

Contribution by: Lykuid Blog

What Is In Store For 2017!

In 2017, targeting passive candidates will be a crucial aspect of overall recruitment strategy. Companies will face the challenge of landing talent, providing a great team experience for employees while compromising on value due to increasing salaries and total compensation packages (because of supply & demand, not necessarily because of caliber & skill level of talent).

Our data shows that average base salary ranges (not including benefits, perks, bonuses, stock/equity or other factors that might influence total compensation) for the most common positions are:

  • Data Engineer:$110K-120K (mid) $130K-$160K (Sr)
  • Data Scientist: $120K-$140K (mid) $150K-180K (Sr)
  • Data Architect: $165K – $200K (Sr)
  • QA Automation/SDET:  $110K – $125K (mid) $125K – $160K (Sr)
  • .Net Engineer: $90K – $110K (mid), $110K – $150K (Sr)
  • Java Engineer: $110K – $125K (mid), $125K – $170K (Sr)
  • Scala Engineer: $135K – $185K (Sr)
  • DevOps Engineer: $110K – $125K (mid), $130K – $170K (Sr)
  • Node.JS Engineer: $100K – $120K (mid), $120K – $150K (Sr)
  • PHP Dev: $90K – $110K (mid), $110K – $140K (Sr)
  • FE/Web Dev: $90K – $120K (mid), $125K – $180K (Sr)
  • Product Manager: $90K – $120K (mid), $120 -150K (Sr)
  • UX Designer: $80K – $110K (mid), $110K – $160K (Sr)
  • Project Manager: $90K – $120K (mid), $120K – $140K (Sr)
  • iOS Developer: $110K – $130K (mid), $130K – $170K (Sr)
  • Android Developer: $120K – $180K (all levels)

As 2017 progresses, we anticipate that the most in demand skill-sets will include:

  • Hadoop
  • Spark
  • AWS
  • Swift
  • RESTful API
  • Scala
  • Go (Golang)
  • Java
  • .Net
  • Node.JS
  • JS Frameworks (React, Meteor, Angular, Ember, Backbone)
  • Python
  • Ruby on Rails
  • Docker
  • Jenkins (CI/CD)
  • Ansible/Chef

As forecasted, traditional Systems Engineers (Windows and Linux), Network Engineers, manual QA testers and PHP developers have fallen significantly in demand. Contrastingly, DevOps, QA Automation/SDETs and Engineers focused on Big Data and Machine learning/AI have spiked, and AWS and API experience have become a must have in most engineering team environments; We’ve also seen Business Analysts, and PM opportunities remain in high demand at enterprise level organizations, however that skillset is being integrated into the Product Manager/Engineer role, or Software Engineer in the SMB market.  As SaaS platforms are starting to replace and run back-office functions, most engineering teams are targeting engineers with web application and scalability experience.
We’re looking forward to a fantastic year, full of great partnerships, new technologies and fruitful collaboration.

Cheers to a Successful Year Ahead!

Building a Successful App for Your Workforce or Your Team

The right app can be a force multiplier for the workforce. We all know that putting the right information and tools in the right hands (at the right time) can tremendously increase the productivity of your team members. So why is it so complex and costly to build a successful app for the workforce? And why do most workforce apps feel so clunky? Is there anything you can you do about it?

The list below highlights the lessons I’ve learned over the past 10 years, building apps and mobile initiatives for small to large ($B) companies.

So let’s get started:

1)Aligning with a business goal and specific users
Successful mobile apps are concise, elegant, lean and functional. They make it easier to achieve a business goal with minimal number of steps, and while being on the go.

Start your mobile app by defining a clear (and measurable) business goal, and knowing who your target audience/ users will be. Make sure the audience is excited about the app’s purpose, and that they describe the app as crucial for getting the goal accomplished. Your app should be a MUST, not a nice to have.

2)Designing a user experience and selecting a technology set
If it’s BYOD or a company-issued device, when it comes to devices your workforce is typically less fragmented than consumers at large. This means that you will need to design for less screen sizes and platforms than the average app developer. In addition, you may be able to take advantage of platforms such as Android for Work (AfW) or Apple Enterprise to rapidly develop and deploy apps to your workforce.
These decisions, along with the business goals, will help you define the front-end and the back-end technologies to use. I recommend starting with a simple Android app wired to Google Cloud Platform services (GCP). If your field app contains many screens and interactions, I would even consider building a lean app, and using HTML5 pages in a WebView to iterate through the initial designs.

After a few iterations and feedback from users in the field, you will likely finalize the flow and the design of the app, and then it may be a good opportunity to switch from a web experience to a crisp native experience across the app screens.

3)Integrating with your existing systems and platforms
In the past, integration and security constraints squashed most attempts to build Enterprise apps internally. But today, vendors make it easier to tie your F5 or ADFS to a Mobile app, or to get your backend platforms and your apps talking JSON.

However the complexity of a multi-screen desktop-based enterprise app cannot be poured over to a Mobile app. You will likely need to create a middle layer that bridges between the backend system data structure and flow, and the Mobile flow (and data structure). Once you do, you will need to secure the communication between the Mobile app and the middle layer, using Google Cloud Endpoint.

To summarize – with the recent development of tools and technologies, building an app for your workforce is easier and cheaper than it used to be. Make sure you align your app with a business goal, select the (Mobile) technology that achieves the goal quickly, and integrate deeply with existing platforms. When designed and deployed correctly, the right app can be a game changer for your workforce and help them advance your business forward.

– Shuki Lehavi

 

Shuki is a hands-on Mobile technologist who spent the last ten years building Mobile companies and leading Mobile projects for Enterprise clients. His extensive resume includes being the co-founder and CEO of Gumiyo.com (acquired in 2013), a cloud-based Mobile development platform that hosted more than 300,000 mobile sites and apps for publishers, brands and Enterprise clients.

 

What Is Idempotence?

Configuration management programs such as CFEngineChefPuppetAnsible, and Salt talk about idempotency. What exactly does that mean? Lets look at the Merriam-Webster definition:

idempotent (adjective | idem·po·tent | \ˈī-dəm-ˌpō-tənt) relating to or being a mathematical quantity which when applied to itself under a given binary operation (as multiplication) equals itself; also relating to or being an operation under which a mathematical quantity is idempotent.

I’m not sure that helps us. Lets look at Wikipedia’s definition:

Idempotence (/ˌaɪdᵻmˈpoʊtəns/ eye-dəm-poh-təns) is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application.

This definition is much closer. In terms of Configuration Management, idempotency is the desired state. Running a configuration management utility like Ansible will bring the system to this state. With a brand new server, this will be every change necessary to have a properly configured server. In the case of an existing running machine, idempotency is about detecting any changes and correcting only these changes. Lets give some examples using simple BASH commands.

Suppose you have a dev server with certain directories owned by the developer. Some of the developers have sudo capability, and every once in awhile some of their files end up being owned as root. Lets say this is a web project with the files located under /var/www/html. You could easily runsudo chown -R $(OWNER) /var/www/html/$ (SITE) but this will cause a few problems. Looking at these files with stat, we see every single file has it’s change timestamp updated. I attribute all these writes from doing exactly this to the death of a server’s SSD after only two months. This shotgun approach will fix the problem, but it’s not idempotent as all files in the directory are being changed, not just the files with incorrect ownership. Not only do we have excessive unneeded writes to the drive but more importantly we have no logging or understanding of what went wrong and what we fixed…. continue reading

How We Stay Sane With A Large AWS Infrastructure

Jon Dokulil is the VP of Engineering at Hudl. He’s been with Hudl since the early days and has helped grow the company from one customer to over 50,000 today. Jon is passionate about the craft of building great software and is particularly interested in distributed systems, resilience, operations and scaling.

We’ve been running hudl.com in AWS since 2009 and have grown to running hundreds, at times even thousands, of servers. As our business grew, we developed a few standards that helped us to make sense of our large AWS infrastructure.

Names and Tags
We use three custom tags for our instances, EBS volumes, RDS and Redshift databases, and anything else that supports tagging. They are extremely useful for cost analysis but are also useful for running commands like describeInstances.

  • Environment – we use one AWS account for our environments. This tag helps us differentiate resources. We only use four values for this: test, internal, stage or prod.
  • Group – this is ad-hoc, and typically denotes a single microservice, team or project. Due to several ongoing projects ongoing at any given time, we discourage abbreviations to improve clarify. Examples at Hudl: monolith, cms, users, teamcity
  • Role – within a group, this denotes the role this instance plays. For example RoleNginx, RoleRedis or RoleRedshift.

We also name our instances. To facilitate talking about them, which can help when firefighting. We use Sumo Logic for log aggregation and our _sourceName values match up with our AWS names. That makes comparing logs and CloudWatch metrics easier. We pack a lot of information into the name:

bd04692c-ccd0-49e1-8cdd-dce2965d382b

At a glance I can tell this is a production instance that supports our monolith. It’s a RabbitMQ server in the ‘D’ availability zone of the ‘us-east-1’ region. To account for multiple instances of the same type, we tack on the ‘id’ value, in this case it’s the first of its kind. For servers which are provisioned via Auto-Scaling Groups, instead of a two-digit number we use a six-digit hash. Short enough that humans can keep it in short-term memory and long enough to provide uniqueness.

Security Groups & IAM Roles
If you are familiar with Security Groups and IAM Roles, skip this paragraph. Security groups are simple firewalls for EC2 instances. We can open ports to specific IPs or IP ranges or we can reference other security groups. For example, we might open port 22 to our office network. IAM Roles are how instances are granted permissions to call other AWS web services. These are useful in a number of ways. Our database instances all run regular backup scripts. Part of that script is used to upload the backups to S3. IAM Roles allow us to grant S3 upload ability but only to our backups S3 bucket and they can only upload, not read or delete.

We have a few helper security groups like ‘management’ and ‘chef’. When new instances are provisioned we create a security group that matches the (environment)-(group)-(role) naming convention. This is how we keep our security groups minimally exposed. The naming makes it easier to reason about and audit. If we see an “s-*” security group referenced from a “p-*” security group, we know there’s a problem.

We keep the (environment)-(group)-(role) convention for our IAM Role names. Again, this lets us grant minimal AWS privileges to each instance but is easy for us humans to be sure we are viewing/editing the correct roles.

Wrap.It.Up
We’ve adopted these naming conventions and made it part of how folks provision AWS resources at Hudl, making it easier to understand how our servers are related to each other, who can communicate with who and on what ports, and we can precisely filtering via the API or from the management console. For very small infrastructures, this level of detail is probably unnecessary. However, as you grow beyond ten, and definitely once past hundreds of servers, standards like these will keep your engineering teams sane.
Jon can be reached at www.linkedin.com/in/jondokulil

Transactions in Redis?

Over the last few months, I’ve been thinking about and implementing transactions for Lua scripting in Redis. Not everyone understands why I’m doing this, so let me explain with a bit of history.

MySQL and Postgres
In 1998-2003 if you wanted to start a serious database driven web site/service and didn’t have money to pay Microsoft or Oracle for their databases, you picked either MySQL or Postgres. A lot of people chose MySQL because it was faster, and much of that was due to the MyISAM storage engine that traded performance for a lack of transaction capability – speed is speed. Some people went with Postgres because despite its measurably slower performance on the same hardware, they could rely on Postgres to not lose theit data (to be fair, the data loss with MySQL was relatively rare, but data loss is never fun).

A lot of time has passed since then; MySQL moved on from MyISAM as the default storage engine to InnoDB (which has been available for a long time now), gained full transaction support in the storage engine, and more. At the same time, Postgres got faster, and added a continually expanding list of features to distinguish itself in the marketplace. Now the choice of whether to use MySQL or Postgres usually boils down to experience and preference, though occasionally business or regulatory needs dictate other choices.

TL;DR; data integrity
In a lot of ways, Redis is, up till now, very similar to how MySQL was before InnoDB was an option. There is already a reasonable best-effort to ensure data integrity (replication, AOF, etc.), and the introduction of Lua scripting in Redis 2.6 has helped Redis grow up considerably in its capabilities and the overall simplification of writing software that uses Redis.

Comparatively, Lua scripting operates very much like stored procedures in other databases, but script execution itself has a few caveats. The most important caveat for this post is that once a Lua script has written to the database, it will execute until any one of the following occurs:

  1. The script exits naturally after finishing its work, all writes have been applied
  2. The script hits an error and exits in the middle, all writes that were done up to the error have occurred, but no more writes will be done from the script
  3. Redis is shut down without saving via SHUTDOWN NOSAVE
  4. You attach a debugger and “fix” your script to get it to do #1 or #2 (or some other heroic deed that allows you to not lose data)

To anyone who is writing software against a database, I would expect that you agree that only case #1 in that list is desirable. Cases #2, #3, and #4 are situations where you can end up with data corruption (cases #2 and #4) and/or data loss (cases #3 and #4). If you care about your data, you should be doing just about anything possible to prevent data corruption and loss. This is not philosophy, this is doing your job. Unfortunately, current Redis doesn’t offer a lot of help here. I want to change that.

Transactions in Lua
I am seeking to eliminate cases #2, #3, and #4 above, replacing the entire list with:

  1. The script exits naturally after finishing its work, all writes have been applied
  2. The script exits with an error, no changes have been made (all writes were rolled back)

No data loss. Either everything is written, or nothing is written. This should be the expectation of any database, and I intend to add it to the expectations that we all have about Redis.

The current pull request is a proof of concept. It does what it says it does, removing the need to lose data as long as you either a) explicitly run your scripts using the transactional variants, or b) force all Lua script calls to have transactional semantics with a configuration option.

There are many ways the current patch can be made substantially better, and I hope for help from Salvatore (the author of Redis) and the rest of the community.

Currently VP of Technology at OpenMail, Josiah Carlson is focused on building great tech for startups in/around Los Angeles. He loves teaching and hopes to one day teach the next generation of programmers.

RECAP OF 2015, WHAT IS IN STORE FOR 2016?

Happy 2016!  As we kick off the new year, wanted to close the loop on the prior.  To summarize, 2015 was a year that seemed to further fortify an employee’s market.  We’ve seen salary ranges increase $10K – $15K on average, new skillsets emerge as more in-demand than prior years, and the talent market become increasingly competitive.

To land good talent, employers are competing against a variety of factors including multiple offers, counter offers and most of all speed and flexibility.  It comes down to which company can make the fastest offer, with the most flexible terms. Being an employee’s market, the increased comp ranges are primarily based on supply and demand.  Compensation doesn’t always rely on what’s brought to the table in terms of skillset or delivery, it’s also based on what other companies in the market are willing to pay, and currently compensate for similar skillsets.

The employment market is encountering new philosophies and embracing the millennial talent pool, which is now the largest age-related demographic in today’s workforce, and there are no defined rules of engagement.  The skillsets that seem to be in high demand at the present time include Software Engineers – (NodeJS, Python, Ruby, Java/Scala, .Net), Mobile Developers – (iOS/Android), Front End Developers with more specific frameworks (ie Angular, React, Meteor, Backbone, etc.), UX Designers, QA Automation Engineers, DevOps Engineers and Data Engineers/Scientists.

Our data shows that average base salary ranges (not including benefits, perks, bonuses, stock/equity or other factors that might influence total compensation) for the most common positions are:

  • NodeJS Engineer: $90K – $110K (mid), $110K – $135K (Sr)
  • Python Engineer: $90K – $110K (mid), $110K – $135K (Sr)
  • Ruby on Rails Engineer: $100K – $120K (mid), $120K – $160K (Sr)
  • .Net Engineer: $90K – $110K (mid), $110K – $140K (Sr)
  • Java/Scala Engineer: $110K – $125K (mid), $125K – $150K (Sr)
  • PHP Engineer: $90K – $110K (mid), $110K – $135K (Sr)
  • Front End Engineer: $100K – $120K (mid), $125K – $150K (Sr)
  • DevOps/Linux Engineer: $110K – $125K (mid), $130K – $150K (Sr)
  • Data Engineer: $110K – $130K (mid), $130K – $150K (Sr)
  • Data Scientist: $115K – $130K (mid), $130K – $180K (Sr)
  • QA Automation/SDET:  $110K – $125K (mid) $125K – $150K (Sr)
  • Product Manager: $90K – $120K (mid), $120 -150K (Sr)
  • UX Designer: $80K – $110K (mid), $110K – $135K (Sr)
  • Project Manager: $90K – $110K (mid), $110K – $130K (Sr)
  • iOS Developer: $110K – $125K (mid), $125K – $165K (Sr)
  • Android Developer: $110K – $160K (mid), $160K – $180K (Sr)

As we look forward to 2016, here are some actionable strategies employers can deploy quickly to land new talent in this highly competitive landscape:

  • Improve internal workflow for interviewing candidates and making offers
  • Offer telecommuting options (if setup for that)
  • Target and hire jr-mid Level talent (if setup for that)
  • Offer free training and certification courses
  • Community evangelism
  • Sell your environment, highlight why it’s a great place to come to work every day, and how impactful the role is

Pin It on Pinterest