Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Attendees

Former user (Deleted)

Former user (Deleted)

Former user (Deleted)

Former user (Deleted)

Bishoy youssef

Michael Hepfer

Former user (Deleted)

Former user (Deleted)

Former user (Deleted)

Leo Zhang



Agenda

  1. Ease of use of the Vagrant based demo
    1. Week of 8/7 comment on the slack channel: "At this point I've given up on rackhd. If even the demo requires an old version of ubuntu to run an old version of virtualbox to get it working, I will stick with something simpler."
      1. Requirements agreed to  by the team:   
        1. The demo should be simple, convenient, easy to use / bring up and debug
        2. The environments should be workable across different versions or latest versions. Such as mongodb, docker/virtualbox, 
        3. Host OS independent, Can run on Windows, Linux, or MacOS host system.
        4. Has no impact / dependency of the host network.
        5. utilizes existing nightly RackHD images, does not require building / testing of additional new images
        6.  Uses infrasim for vmbc nodes
        7. can run discovery workflows
        8. can run OS install workflows
        9. can run FIT smoke test suite
        10. The demo solution could support running in cloud (ex: IaaS, PaaS) technically
        11. to include the smi microservice containers  
      2. Review POC from Former user (Deleted)
      3. Ran 2 scripts (1 to get rackhd services up, 1 to get infrasim started)
      4. RackHD code up to date as of the prior sprint release, Infrasim version locked down.
      5. look in to ways for re-using the existing config.json file
      6. CC team voted for the "docker - compose " POC effort: https://github.com/RackHD/RackHD/pull/889
        1. Review POC from Former user (Deleted)n
          1. https://github.com/RackHD/RackHD/pull/857
          2. RackHD and Infrasim run from source inside a single docker container
        2. Review POC "Docker in Docker"
          1. Reference email
        3. Results of the voting: ?
          1. 7 votes
          2.  new stories to be created and driven in veyron backlog.  Epic to be created and sent via email for team to review.
    2. How to add stand alone services to the Master CI/CD pipeline (ex SMI Micro Services, UCS etc)  right now Master CI is strictly core RackHD
      1. Status of on-network/on-topology and test/deployment options
        1. Requirements for testing on-taskgraph as a standalone service

          • The goal is to have automated tests that run against on-taskgraph as a standalone service and not require the other RackHD services. These tests would be part of both the PR merge gate and the post-merge testing for the on-taskgraph repo.
          • Taskgraph requires rabbitmq and mongodb when running as a standalone service, both of these services will also need to be run with the tests.
          • Since on-http will not be included in the test environment, the tests will need to interface directly to the on-taskgraph http and grpc APIs, The tests may also need direct access to AMQP and mongodb to verify test results.
          • The tests should be testing the basic taskgraph functionality (scheduling and running tasks), not the functionally of the various RackHD tasks in on-tasks (OS installs, pollers, etc). These will continue to be tested by the RackHD FIT tests.
          • We will need to choose a language and test framework for these tests. One option would be to re-use the python FIT framework from RackHD. Other options could also be considered, however python and FIT are already well know by the developers. We still need to determine if the complexities of FIT are needed or if a more basic test-runner like nose would be sufficient.
          • We need to determine what code repository the tests will exist in. We could either add them to the existing on-taskgraph. Since these tests are going to be specific to on-taskgraph, keeping them in the on-taskgraph repo would be the best choice.
          • The tests should run tests against the ci generated taskgraph docker image. This allow the use of mongodb, and rabbitmq docker images in the test environment. The tests themselves could either run natively on the test system, or in a separate docker container sharing a network with the on-taskgraph container.
          1. New services should follow 12factor.net guidelines, Rest API should be available for IPC between services.
        2. SMI Service Integration to CI

          1. Former user (Deleted) has downloaded the idrac simulation tool, currently under evaluation
            1. tool supports only read operations
            2. RackHD Epic to be created that introduces workflow testing to rackhd CI/CD.  This will cover smi service testing, does not cover "plugin" integration tsting 
          2.  the idrac simulation tool will be used for virtualized testing (PR quality gates /MasterCIand post merge testing) and introduce more the 13g Dell physical hardware to the Regression-Baremetal job for smi workflow testing for regression test..   

        3. Michael Hepfer and Former user (Deleted) to sync up offline to stand up a concourse environment 
      2. Architectural discussion : single entry point for services
        1. looking for 1 entry point to all of the microservices, discussed briefly how today smi services leverages zuul.   Should this also be considered for on-taskgraph and other future RackHD services?  Discussion to continue
        2. currently smi services are also using the workflow engine as an entry point to those services.  Looking to eliminate the smi/workflow engine dependency and leverage a standard api gateway.


      Did not get to these items below:

      1. RackHD Release Cadence

        1. As we’re moving in to continuous delivery for the Concourse based CI (ie, deployed packages per merged PR), does that change the need or frequency for weekly RackHD sprint releases?  Email thread started on 9/20.

        2. If we are releasing debians and docker containers AND the demo is moved to a docker image, do we still need to provide a script in the new CI env that allows users to generate a Vagrant based RackHD image
      2. RackHD Tooling Updates
        1. Ubuntu to be upgraded to 16.04
          1.   What has been developed to date for the Concouse env includes the 16.04 migration, should Jenkins based env be upgraded?
          2. ova scripts will need to be updated (passing a parameter) to move to 16.04 (covered by Felouka: 
            Jira Legacy
            serverJIRA (rackhd.atlassian.net)
            serverIdd7cc09d9-666d-3263-a71c-2a9ec3b8cd13
            keyRAC-5987
        2. Node v6 is the current available version, RackHD is running v4.
          1. RackHD Epic to be created to migrate from v4→ v8
            1. Needs to be assigned.  Former user (Deleted)/Maglev team to help create the epic.
        3. RackHD Story tracking testing the latest MongoDB version in CI (Mongo recommending using 3.X + versions only, not supporting anything in the 2.X version family)
          1. Do we want to support this in Jenkins, Concourse, or both.  Will be part of the Concourse env.
            1. Concourse env tracking story: 
              Jira Legacy
              serverJIRA (rackhd.atlassian.net)
              serverIdd7cc09d9-666d-3263-a71c-2a9ec3b8cd13
              keyRAC-5991
            2. Jenkins side, seems to be a trivial effort to support.  May bring out issues in RackHD and if previous issues have since been resolved.
              1. Plan to try testing with the latest, see what the issues are.  If trivial set of isues, move to the latest.  If many issues encountered, hold off.
              2. Maglev team to create the story and target next sprint. - any update? 
      3. Process change for Master CI failures - how long can a developer work on a fix for a Master CI Failure before requiring to back out the change and get back to green?
        1. have your 1 working day to resolve the issue (ie, up to the next MasterCI run at 6:31pm EST) 
        2. if not resolved, code is backed out, the MasterCI job is re-run to ensure that pipeline is returned to Green.
        3. If thought to be resolved, MasterCI to be re-run to ensure the pipeline returns to green
      4. Review slides from Former user (Deleted) for CI Security moving CI to container , moving CI to cloud .
        1. AI: CC team to review the slides, come back with feedback/answers to the questions posed in the slide deck.
      5. Racadm→WSMAN tooling conversion
        1. Agreement at OLT that we will be going fully wsman-based and eliminate racadm from workflow support.  
      6. BareMetal Regression Pipeline now created/monitored. 
        1. Plan is to monitor for a few weeks, should it then be a gate?
        2. BareMetal OS install on real hardware currently runs every 2 hours on the nightly docker images.  Will need to kick off BareMetal at same time as CI
        3. Do we then continue to run BareMetal every 2 hours
        4. Should this be part of the Master CI pipeline, if so then we would need a modification to the Merge Freeze tool to freeze on failure of BareMetal regression tests.

      Being addressed via email:

      1. All SMI Services  have been published.  
        1. What documentation is needed, what kind of communication is needed for the open source community?


      Next meeting will be Thursday September 28.